OVERVIEW
MSCC (Memory Safe C Compiler) is a tool to ensure both temporal and spatial
memory safety in C programs through a source-to-source transformation.
MSCC was developed with the following criteria in mind:
* Detect all spatial and temporal memory errors.
* Handle most C programs, with almost no source code changes.
* No changes to the memory allocation model of C. Freed memory is
immediately returned to the heap for reuse.
* Relatively low performance overheads compared to previous techniques.
MSCC detects memory access errors at the level of memory blocks, which
correspond to the least units of memory allocation, such as a global or local
variable, or memory returned by a single invocation of malloc. It flags memory
errors only at a point where a pointer is dereferenced. For each pointer p,
metadata that describe the temporal and spatial attributes of the memory block
pointed by p are maintained at runtime and used for detecting memory errors
before dereferences of p. Metadata is stored separately from a pointer for
better backwards-compatibility with untransformed libraries. MSCC handles most
common forms of pointer arithmetic as well as type casts that can be
classified into upcasts (cast from subtype to supertype) or downcasts (casting
from a supertype to a subtype), which account for most casts in typical C
programs.
More details of the transformation can be found in our ACM SIGSOFT FSE 2004
paper:
"An efficient and backwards-compatible transformation to ensure memory safety
of C programs" (http://www.seclab.cs.sunysb.edu/seclab/pubs/papers/fse04.pdf)
MSCC is implemented in Objective Caml (http://caml.inria.fr) and uses CIL
(http://manju.cs.berkeley.edu/cil/) as the front end to manipulate C
constructs.
The MSCC package is available at http://www.seclab.cs.sunysb.edu/mscc/.
This work is supported by NFS grants CCR-0098154 and CCR-0208877, and an ONR
grant N000140110967.
COPYRIGHT
MSCC is distributed under the terms of the GNU General Public License Version
2.
Some of the source code are borrowed from CCured
(http://manju.cs.berkeley.edu/ccured/). They are governed by their own
copyright terms.
STATUS
MSCC is alpha software. It is provided ONLY for the research and evaluation
purpose.
MSCC has been tested on the Olden benchmark programs, several SPECINT
benchmark programs, and several GNU utilities programs such as bc, gzip,
patch, and tar. These programs range from 400 to 30,000 lines of code.
INSTALLATION
This version of MSCC has been tested to be working with GNU C Compiler 3.2.2
on Red Hat Linux 9. It requires Objective Caml 3.07 and CIL 1.2.6.
* PREREQUISITES
1. Obtain and install Objective Caml 3.07.
wget http://caml.inria.fr/pub/distrib/ocaml-3.07pl2/ocaml-3.07pl2.tar.gz
tar xvfz ocaml-3.07pl2.tar.gz
cd ocaml-3.07
./configure
make world.opt
make install
OCaml will be installed onto /usr/local/bin and /usr/local/lib.
2. Obtain and install CIL 1.2.6.
wget http://manju.cs.berkeley.edu/cil/distrib/cil-1.2.6.tar.gz
tar xvfz cil-1.2.6.tar.gz
cd cil
./configure
make
make install
The CIL library files and scripts will be installed onto /usr/local/lib/cil
and /usr/local/share/cil respectively.
* INSTALLING MSCC
tar xvfz mscc-0.2.2.tar.gz
cd mscc-0.2.2
./configure
make
The "configure" script accepts the following options:
--cil-prefix
(default: /usr/local)
Set cil-libdir and cil-datadir to
/lib/cil and /share/cil respectively.
--cil-libdir (default: /usr/local/lib/cil)
Directory where the CIL library files are installed
--cil-datadir (default: /usr/local/share/cil)
Directory where the CIL scripts are installed
Once MSCC is compiled, you can add /path/to/mscc-0.2.2/bin into PATH.
USING MSCC
MSCC can be used just like a C compiler, and it accepts most GCC options. For
example, the following command transforms "foo.c" and compiles the transformed
source into an executable "foo":
mscc -g -O2 -o foo foo.c
To use MSCC with make, simply set the values of CC and AR to mscc as follows:
make CC="mscc" AR="mscc --mode=AR" RANLIB="echo"
Note that RANLIB is also disabled because the command "ranlib" cannot
recognize the intermediate files generated by mscc.
COMMON PROBLEMS WHEN USING MSCC
* RESOLVING MERGING ERRORS
MSCC merges all C files of an application into one single file and then
applies the memory-safe transformation on the merged file. However, merging
can sometimes fail. Most merging errors are due to inconsistent declarations
of the same global variable or function in different C source files, and thus
can be easily fixed. For instance, tar 1.12 has a merging error caused by two
different prototypes of xmalloc:
void * xmalloc(); /* in xmalloc.c */
and
char * xmalloc(); /* in xgetcwd.c */
This error can be fixed by changing "char *" to "void *" in the second
xmalloc() declaration.
* SPECIFYING INTENDED DATA TYPE AT ALLOCATION SITES
MSCC needs to know the intended data type of each heap-allocated memory block
at allocation time in order to generate appropriate code for allocating and
initializing the associated pointer meta-data. If a malloc-allocated memory
block is used for storing pointers, then the intended data type of the block
should be specified as an explicit type cast on the return value of malloc (or
other heap-allocation functions). For example,
void *p = malloc(1024);
...
char **q = (char **)p;
should be changed to
char **p = (char **) malloc(1024);
...
char **q = p;
Because of the explicit type cast "(char **)" in the second code snippet, MSCC
knows that the allocated block is intended to store "char *" pointers, and
hence MSCC will generate code to allocate memory for meta-data associated with
each "char *" pointer stored in the allocated block.
* SPECIFYING MALLOC-LIKE USER-DEFINED MEMORY MANAGEMENT FUNCTIONS
For each call to a memory allocation function, MSCC will generate additional
code to allocate and initialize the meta-data associated with the allocated
block. Therefore, all memory allocation functions should be registered with
MSCC. By default, MSCC knows only about the standard library memory
allocation functions such as malloc/calloc. User-defined allocations
functions that have similar semantics to malloc/calloc can be registered using
the "csafealloc" pragma, e.g.
#pragma csafealloc("xmalloc", nozero, sizein(1))
#pragma csafealloc("xcalloc", zero, sizemul(1,2))
The above two pragma's register "xmalloc" and "xcalloc" as allocation
functions, where xmalloc (similar to malloc) uses its first argument as the
allocation size and does not zero out the allocated block, while xcalloc
(similar to calloc) uses the muplication of its first two arguments as the
allocation size and initializes the allocated block with zeroes.
Similarly, deallocation functions can be specified using the "csafedealloc"
pragma, e.g.
#pragma csafedealloc("free")
Because user-defined memory management functions are not transformed, any
functions invoked by these functions should also remain untransformed. To
tell MSCC not to transform a function, the attribute "__compat__" can be added
to the function definition, e.g.
static void * (__attribute__((__compat__)) fixup_null_alloc) (size_t n);
KNOWN BUGS AND LIMITATIONS
* USER-DEFINED MEMORY MANAGEMENT FUNCTIONS
Currently MSCC does not support user-defined memory management functions that
have different semantics compared to malloc/free, e.g. an allocation function
that returns a matrix of objects.
* EXTERNAL FUNCTIONS
There are two major problems related to external functions.
The first is related to function prototypes. MSCC changes the function
prototypes by introducing extra arguments that hold meta-data pertaining to
the original arguments. External functions are automatically marked and their
function prototypes are unchanged. If a user-defined function is called by an
external function as a callback, however, the modified function prototype of
the user-defined function will be incompatible with what the external function
expects. To work around this problem, such user-defined functions can be
manually marked as "__compat__" to avoid being transformed.
The second problem is related to meta-data of external pointers. When an
external function returns a pointer, MSCC assumes that the returning pointer
is always valid and thus assigns special meta-data to the pointer such that
dereferencing the pointer always succeed. This solution avoids the needs for
wrapper functions in most cases, but its drawback is that memory errors caused
by these pointers won't be detected. Meanwhile, the current support for
external pointers is limited to simple data structures, such as "char *". If
an external function returns a complex data structure that contains deep-level
pointers, MSCC does not generate enough meta-data for every pointer contained
within the data structure. Runtime memory errors may occur and won't be
detected. Even worse, the transformed program may terminate prematurely
because of the lacking of meta-data for validating pointer dereferences.
Wrapper functions are required in these cases.
* TYPE CASTS
MSCC supports type casts that follow the upcast/downcast paradigm,
e.g. casting from "char **" to "void *" then to "int **", or casting from
"struct A *" to "struct B *" then to "struct A *", if "struct A" is a bigger
structure and "struct B" is a smaller compatible structure.
There are two kinds of typecasting operations that MSCC currently does not
support. The first is casting from a pointer to an integer then back to a
pointer. The second is casting between structure pointers in a manner that
violates the subtype criteria.
Bad casts themselves don't cause runtime errors. However, if the resulting
pointer is dereferenced, a runtime memory error will be reported.
Usually bad casts can be eliminated by modifying the source code. For example,
if the integer in an integer-to-pointer cast previously stores a pointer
value, then we can get rid of this bad cast by changing the integer type to
"void *".
* POINTER ARITHMETIC
MSCC supports all the pointer arithmetic that advances a pointer from one
array element to another array element, no matter that this is achieved by a
trivial pointer increment operation, or by first casting the pointer into
"void *" then adding carefully calculated offsets and casting back to a
pointer of the desired type. If a pointer arithmetic on a pointer p doesn't
satisfy the above condition, it is still allowed to dereference p, but runtime
errors will be reported if pointers contained within *p are accessed.