... a C++ preprocessor library for Python

Tutorial

Contents

  1. Installing CppPre
  2. Basic usage
    1. A simple program
    2. A detailed look
    3. Preprocessing tokens
  3. Additional tools
  4. Named preprocessors
  5. The CppPreprocessor class in detail
    1. The CppPreprocessor constructor
    2. CppPreprocessor methods
  6. Special defines

Installing CppPre

Installing CppPre is simple as all the code is in Python files: download the CppPre package and unpack it to some location on your computer. You should find that the entire source tree for CppPre is contained in a folder named cpppre, with the following structure:

cpppre (folder)
    cpppre_test (folder)
        cpppre test files
    cpppre source files

Now you only need to make that cpppre folder visible to Python. The normal way to do this is to add a .pth file to the site-packages folder of the Python installation. (The path to the site-packages folder should be as follows: folder-where-Python-was-installed-to/Lib/site-packages.) That .pth file should contain just a single line of text: the path to the location of the cpppre folder. The content of my .pth file for CppPre is:

Z:\CODEDRIVE\ECLIPSE

...because the cpppre folder is contained within the ECLIPSE folder. Note that you can call your .pth by any name, as long as it doesn't conflict with another file in that folder and as long as it has that exact extension.

Note that another way to make the cpppre folder visible to Python is to alter the sys.path variable while Python is running. You can see how I do this in the cpppre/cpppre_test/path_to_source.py file. I did this purely so that the CppPre test suite could be run on a given system without having to add any content to site-packages.

Basic usage

A simple program

Let's say we wish to preprocess the following C++ source file, called first.cpp:

#define GOAT 33

int main(int argc, char * argc []) {
	int x = GOAT;
	return 0;
}

(I'm not going to win a Google Code Jam with that effort, but it serves a purpose.) Our first python program will be the following:

from cpppre.preprocessor import *
from cpppre.pp_error import *
from cpppre.pp_tokens import *

if __name__ == "__main__":
  try: 
    preproc = CppPreprocessor('first.cpp')
    for tok in preproc:
      if tok.type() == u'ID':
        print 'Identifier token:', unicode(tok)
      elif tok.type() == u'NUM':
        if tok.numberType() is NumberPPToken.Integer:
          label = 'Integer'
        else:
          label = 'Floating point'
        print label, 'number token:', unicode(tok)
      elif tok.type() == u'CHAR':
        print 'Character token:', unicode(tok)
      elif tok.type() == u'STR':
        print 'String token:', unicode(tok)
      elif tok.type() == u'OP':
        print 'Operator token:', unicode(tok)
      print '    location:', tok.file() + \
            ', line #' + `tok.line()`, \
          'column #' + `tok.column()`+'.'
  except FatalParsingErrorException:
    print "A fatal parsing error has occurred!"

Assuming that you have put first.cpp in the same folder as the above program, running it should result in information on each preprocessing token in first.cpp being output.

A detailed look

Let's now look at the above Python program in more detail.
from cpppre.preprocessor import *

The preprocessor class itself is called CppPreprocessor and it can be found in the cpppre.preprocessor module. The above line gives us access to that class.

from cpppre.pp_error import *

If a serious error is found while parsing, a FatalParsingErrorException is thrown to terminate processing; its declaration can be found in cpppre.pp_error. The above line gives us access to it. In our Python program, to illustrate its use, I catch that exception if it is thrown and print a simple error message.

from cpppre.pp_tokens import *

The preprocessor outputs preprocessing tokens. There are a few types of token, and they can all be found in cpppre.pp_tokens. The above line gives us access to them.

preproc = CppPreprocessor('first.cpp')

The above line constructs an instance of the preprocessor. As a minimum, the constructor requires the name of the file to be preprocessed. You can pass a relative, drive relative (on Windows) or absolute path.

for tok in preproc:

The preprocessor has an iterator interface, so it supports iterator syntax like for in. As is convention in Python, when there are no more tokens to be output, the preprocessor throws a StopIteration exception.

Preprocessing tokens

There are five types of preprocessing token that can be output by the preprocessor, and each has a corresponding Python class (as found in the cpppre.pp_tokens module):

Token typePython class
identifierIdentifierPPToken
numberNumberPPtoken
character literalCharacterLiteralPPToken
string literalStringLiteralPPToken
operator or punctuationOpOrPuncPPToken

Note that all these tokens are types of PPToken (with the PPToken class also being found in the cpppre.pp_tokens module).

There are two ways to test the type of a token:

  • use isinstance, e.g. isinstance(token, IdentifierPPToken)
  • check the string returned by the tag() method that all token types have (as I have done in my Python program).

Each type of token has a particular tag() string, as shown below:

Token typetag() string
identifier'ID'
number'NUM'
character literal'CHAR'
string literal'STR'
operator or punctuation'OP'

Each token also has the following methods, some of which I have made use of in the Python program:

  • unicode(): Returns the text of the token, as a unicode string.
  • file(): Returns the file that the token is from. This will be the absolute path to that file.
  • line(): Returns the line number in that file that the token is from. The first line of a file is numbered 1.
  • column(): Returns the column number in that line that the token starts at. The first character in a line is in column #1.

Some token classes have additional methods. The NumberPPToken class has the following method:

  • numberType(): Returns the class NumberPPToken.Integer if the number token is an integer, or the class NumberPPToken.Floating if the number token is a floating point number.

The CharacterLiteralPPToken and StringLiteralPPToken classes both have the following method:

  • isWide(): Returns True if the text literal token is a wide literal, otherwise False.

Finally, please also note the following important points about the tokens:

  • All the keywords, except for new and delete, are parsed as IdentifierPPToken instances.
  • The keywords new and delete are parsed as OpOrPuncPPToken instances.
  • Certain operators have alternative representations. An example is and, which is the alternative representation for &&, the logical AND operator. An alternative representation is parsed as an OpOrPuncPPToken instance and it is converted into the operator that it represents; the unicode() method for such a token returns the operator representation. The result of this approach is that you do not need to take into account the issue of alternative representations when using CppPre.
  • The values returned by the file() and line() methods of a token can be altered by #line preprocessor directives in the C++ source.

Additional tools

The module cpppre.pp_tools contains functions that can be used with a preprocessor instance. Currently there is only the reconstruct() function, which takes a preprocessor instance and an output sink, repeatedly calling next() on that instance to output a reconstructed version of the translation unit to the output sink:

from cpppre.preprocessor import *
from cpppre.pp_error import *
from cpppre.pp_tools import *
import sys

if __name__ == "__main__":
  try: 
    preproc = CppPreprocessor('first.cpp')
    reconstruct(preproc, sys.stdout)
  except FatalParsingErrorException:
    print "A fatal parsing error has occurred!"

The sink you pass can be any object that has a write(str) interface, where str will be a unicode string; sys.stdout is such an object, and results in the output being written to standard output.

Named preprocessors

The module cpppre.preprocessor_named contains a few functions that each create an instance of CppPreprocessor that has been set up to mimic the behaviour of a particular preprocessor. The function cppStandard2003() mimics the behaviour of a preprocessor that conforms completely to the C++ Standard 2003. The following fragment of Python code illustrates its use:

from cpppre.preprocessor_named import *
...
preproc = cppStandard2003('first.cpp')

The function microsoftVisualCpp2008() mimics the behaviour of the preprocessor stage of the Microsoft Visual C++ 2008 compiler. It is used in the same way as cppStandard2003().

The CppPreprocessor class in detail

The CppPreprocessor constructor

The constructor for CppPreprocessor, the class found in cpppre.preprocessor, takes a number of arguments:

def __init__(
        self,
        starting_source_file,
        system_core_dirs=None,
        system_user_dirs=None,
        predefined_defines=None,
        inclusion_callback=None,
        define_directive_callback=None
        ):

The starting_source_file parameter is the parameter we have used to date.

The system_core_dirs and system_user_dirs parameters can both be a list of absolute paths to folders. The paths in system_core_dirs are those that a compiler implicitly searches when resolving system includes (those delineated with angle brackets, e.g., ). For the Microsoft Visual C++ 2008 compiler, these are the folders that are specified as the VC++ Directory Include value in the Visual C++ IDE. The paths in system_user_dirs are the project-specific folders; these are usually passed as command-line arguments to a compiler.

The predefined_macros parameter can be a list of object-like macros that will be predefined before preprocessing begins. When using an actual compiler, you would set such defines on the command line to the compiler. Each element of the list can be a string or a tuple of two strings. If an element is a string, it is either the identifier string for a macro that will be replaced by the number 1, or it is the identifier string for a special macro; special macros will be dealt with in a moment. Alternatively, if the element is a string, the first element of the tuple is the identifier string for the macro and the second element of the tuple is the replacement list for that macro. That replacement list will be parsed and the resulting tokens will form the replacement list for that define. An example list is the following:

['WIN32', ('GNUT', '(2 + 5)')]

It defines WIN32 as the number 1, and GNUT as the expression (2 + 5).

As described above, passing a tuple for a predefined define means that the replacement list string needs to be parsed before preprocessing begins; in fact, this parsing occurs in the CppPreprocessor instance. The problem here is that you may make a syntax error in the string that you pass. In that case, the CppPreprocessor constructor will throw a ConstructionError exception; that exception class is found in cpppre.preprocessor.

The inclusion_callback parameter can be a callable object that has the following signature:

callback(include_path, include_type)

This callback will be invoked everytime an include directive is encountered when preprocessing. If the callback function returns True, the include directive will be parsed normally and the content of that file included in the preprocessing. If the callback function returns False, the include directive will be ignored and the content of that file will not be included in the preprocessing. This callback thus allows you to control exactly which files will or will not be included in a translation unit. The include_path passed to the callback is the absolute path to the file that may be included. The include_type parameter will be one of the two classes declared in the cpppre.includes_type module: Local and System. Local includes are those that, in an include directive, are delineated with double quotes (e.g. "this.h"), while System includes are those that are delineated with angle brackets (e.g., ).

An example of a suitable callable object to use as such a callback is the following function, which results in only non-system folders being included:

from cpppre.includes_type import *
def myCallback(include_path, include_type):
  if include_type is System:
    return False
  return True

The define_directive_callback parameter can be a callable object that has the following signature:

callback(list_of_tokens)

This callback will be invoked everytime a define directive is encountered when preprocessing. The parameter passed to it is the list of preprocessing tokens that forms the define. It allows you to see this content and so make checks on it, but it does not allow you to influence the processing of the define; this callback should not return any value. Also, this callback should not alter the content of list_of_tokens in any way.

Finally, note that the functions in the module cpppre.preprocessor_named that create special instances of CppPreprocessor have the same set of parameters as does the CppPreprocessor constructor.

CppPreprocessor methods

The CppPreprocessor class contains a number of methods besides the constructor and next().

setErrorSink(self, sink)

By default, all error and warning messages are output to sys.stderr (the standard error output). You can use the setErrorSink() method to redirect those messages to a different location. The sink can be any object that implements the write(str) interface, or it can be None if you do not want to see the messages. See cpppre.error_handler for StringErrorSink, an example error sink that just stores all messages.

getErrorSink(self)

This method allows you to access the currently installed sink.

getErrorCounts(self)

This method returns the number of warnings and errors that have been encountered to date, as a tuple of the form (no. of warnings, no. of errors). Note that fatal errors are counted simply as errors.

setExtension(self, extension_id, new_value)

Although CppPre implements a preprocessor that conforms to the C++ Standard 2003, it also supports a few language extensions that relate to preprocessing. The extension_id can be one of the following identifiers found in the cpppre.extensions module:

dollar_sign_identifiers
integer_bitwidth_suffix
integer_double_l_suffix

For new_value, pass True to switch the specified extension on or pass False to turn it off. Note that all extensions are off by default.

getExtension(self)

This method returns the current value of the specified extension (so either True or False).

close(self)

This method is to be used when next() throws an exception other than StopIteration. It closes any open files.

parsedFiles(self)

This method returns a set of all the files that have been completely parsed at least once. You can call this method at any time, including after preprocessing has completed. Being a set, the returned file names will be in undefined order and will not contain duplicate names.

Special defines

There are a number of defines that can be used which are dynamic: their content can vary. The C++ Standard 2003 specifies the following, which are always defined automatically:

__TIME__
__DATE_
__LINE__
__FILE__

C++ compilers often implement other special defines. The following such defines have been implemented in CppPre:

__COUNTER__
__BASE_FILE__
__INCLUDE_LEVEL__
__TIMESTAMP__

It is very easy to include these special defines in a CppPreprocessor instance: simply pass the appropriate identifier as a string as part of the predefined_defines parameter to the CppPreprocessor constructor (or to one of the functions in the cpppre.preprocessor_named module).