+++++++++++++++++++++++++++++++++++++++++++ Building Hybrid Systems with Boost.Python +++++++++++++++++++++++++++++++++++++++++++ :Author: David Abrahams :Contact: dave@boost-consulting.com :organization: `Boost Consulting`_ :date: 2003-05-14 :Author: Ralf W. Grosse-Kunstleve :copyright: Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved .. contents:: Table of Contents .. _`Boost Consulting`: http://www.boost-consulting.com ========== Abstract ========== Boost.Python is an open source C++ library which provides a concise IDL-like interface for binding C++ classes and functions to Python. Leveraging the full power of C++ compile-time introspection and of recently developed metaprogramming techniques, this is achieved entirely in pure C++, without introducing a new syntax. Boost.Python's rich set of features and high-level interface make it possible to engineer packages from the ground up as hybrid systems, giving programmers easy and coherent access to both the efficient compile-time polymorphism of C++ and the extremely convenient run-time polymorphism of Python. ============== Introduction ============== Python and C++ are in many ways as different as two languages could be: while C++ is usually compiled to machine-code, Python is interpreted. Python's dynamic type system is often cited as the foundation of its flexibility, while in C++ static typing is the cornerstone of its efficiency. C++ has an intricate and difficult compile-time meta-language, while in Python, practically everything happens at runtime. Yet for many programmers, these very differences mean that Python and C++ complement one another perfectly. Performance bottlenecks in Python programs can be rewritten in C++ for maximal speed, and authors of powerful C++ libraries choose Python as a middleware language for its flexible system integration capabilities. Furthermore, the surface differences mask some strong similarities: * 'C'-family control structures (if, while, for...) * Support for object-orientation, functional programming, and generic programming (these are both *multi-paradigm* programming languages.) * Comprehensive operator overloading facilities, recognizing the importance of syntactic variability for readability and expressivity. * High-level concepts such as collections and iterators. * High-level encapsulation facilities (C++: namespaces, Python: modules) to support the design of re-usable libraries. * Exception-handling for effective management of error conditions. * C++ idioms in common use, such as handle/body classes and reference-counted smart pointers mirror Python reference semantics. Given Python's rich 'C' interoperability API, it should in principle be possible to expose C++ type and function interfaces to Python with an analogous interface to their C++ counterparts. However, the facilities provided by Python alone for integration with C++ are relatively meager. Compared to C++ and Python, 'C' has only very rudimentary abstraction facilities, and support for exception-handling is completely missing. 'C' extension module writers are required to manually manage Python reference counts, which is both annoyingly tedious and extremely error-prone. Traditional extension modules also tend to contain a great deal of boilerplate code repetition which makes them difficult to maintain, especially when wrapping an evolving API. These limitations have lead to the development of a variety of wrapping systems. SWIG_ is probably the most popular package for the integration of C/C++ and Python. A more recent development is SIP_, which was specifically designed for interfacing Python with the Qt_ graphical user interface library. Both SWIG and SIP introduce their own specialized languages for customizing inter-language bindings. This has certain advantages, but having to deal with three different languages (Python, C/C++ and the interface language) also introduces practical and mental difficulties. The CXX_ package demonstrates an interesting alternative. It shows that at least some parts of Python's 'C' API can be wrapped and presented through a much more user-friendly C++ interface. However, unlike SWIG and SIP, CXX does not include support for wrapping C++ classes as new Python types. The features and goals of Boost.Python_ overlap significantly with many of these other systems. That said, Boost.Python attempts to maximize convenience and flexibility without introducing a separate wrapping language. Instead, it presents the user with a high-level C++ interface for wrapping C++ classes and functions, managing much of the complexity behind-the-scenes with static metaprogramming. Boost.Python also goes beyond the scope of earlier systems by providing: * Support for C++ virtual functions that can be overridden in Python. * Comprehensive lifetime management facilities for low-level C++ pointers and references. * Support for organizing extensions as Python packages, with a central registry for inter-language type conversions. * A safe and convenient mechanism for tying into Python's powerful serialization engine (pickle). * Coherence with the rules for handling C++ lvalues and rvalues that can only come from a deep understanding of both the Python and C++ type systems. The key insight that sparked the development of Boost.Python is that much of the boilerplate code in traditional extension modules could be eliminated using C++ compile-time introspection. Each argument of a wrapped C++ function must be extracted from a Python object using a procedure that depends on the argument type. Similarly the function's return type determines how the return value will be converted from C++ to Python. Of course argument and return types are part of each function's type, and this is exactly the source from which Boost.Python deduces most of the information required. This approach leads to *user guided wrapping*: as much information is extracted directly from the source code to be wrapped as is possible within the framework of pure C++, and some additional information is supplied explicitly by the user. Mostly the guidance is mechanical and little real intervention is required. Because the interface specification is written in the same full-featured language as the code being exposed, the user has unprecedented power available when she does need to take control. .. _Python: http://www.python.org/ .. _SWIG: http://www.swig.org/ .. _SIP: http://www.riverbankcomputing.co.uk/sip/index.php .. _Qt: http://www.trolltech.com/ .. _CXX: http://cxx.sourceforge.net/ .. _Boost.Python: http://www.boost.org/libs/python/doc =========================== Boost.Python Design Goals =========================== The primary goal of Boost.Python is to allow users to expose C++ classes and functions to Python using nothing more than a C++ compiler. In broad strokes, the user experience should be one of directly manipulating C++ objects from Python. However, it's also important not to translate all interfaces *too* literally: the idioms of each language must be respected. For example, though C++ and Python both have an iterator concept, they are expressed very differently. Boost.Python has to be able to bridge the interface gap. It must be possible to insulate Python users from crashes resulting from trivial misuses of C++ interfaces, such as accessing already-deleted objects. By the same token the library should insulate C++ users from low-level Python 'C' API, replacing error-prone 'C' interfaces like manual reference-count management and raw ``PyObject`` pointers with more-robust alternatives. Support for component-based development is crucial, so that C++ types exposed in one extension module can be passed to functions exposed in another without loss of crucial information like C++ inheritance relationships. Finally, all wrapping must be *non-intrusive*, without modifying or even seeing the original C++ source code. Existing C++ libraries have to be wrappable by third parties who only have access to header files and binaries. ========================== Hello Boost.Python World ========================== And now for a preview of Boost.Python, and how it improves on the raw facilities offered by Python. Here's a function we might want to expose:: char const* greet(unsigned x) { static char const* const msgs[] = { "hello", "Boost.Python", "world!" }; if (x > 2) throw std::range_error("greet: index out of range"); return msgs[x]; } To wrap this function in standard C++ using the Python 'C' API, we'd need something like this:: extern "C" // all Python interactions use 'C' linkage and calling convention { // Wrapper to handle argument/result conversion and checking PyObject* greet_wrap(PyObject* args, PyObject * keywords) { int x; if (PyArg_ParseTuple(args, "i", &x)) // extract/check arguments { char const* result = greet(x); // invoke wrapped function return PyString_FromString(result); // convert result to Python } return 0; // error occurred } // Table of wrapped functions to be exposed by the module static PyMethodDef methods[] = { { "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" } , { NULL, NULL, 0, NULL } // sentinel }; // module initialization function DL_EXPORT init_hello() { (void) Py_InitModule("hello", methods); // add the methods to the module } } Now here's the wrapping code we'd use to expose it with Boost.Python:: #include using namespace boost::python; BOOST_PYTHON_MODULE(hello) { def("greet", greet, "return one of 3 parts of a greeting"); } and here it is in action:: >>> import hello >>> for x in range(3): ... print hello.greet(x) ... hello Boost.Python world! Aside from the fact that the 'C' API version is much more verbose, it's worth noting a few things that it doesn't handle correctly: * The original function accepts an unsigned integer, and the Python 'C' API only gives us a way of extracting signed integers. The Boost.Python version will raise a Python exception if we try to pass a negative number to ``hello.greet``, but the other one will proceed to do whatever the C++ implementation does when converting an negative integer to unsigned (usually wrapping to some very large number), and pass the incorrect translation on to the wrapped function. * That brings us to the second problem: if the C++ ``greet()`` function is called with a number greater than 2, it will throw an exception. Typically, if a C++ exception propagates across the boundary with code generated by a 'C' compiler, it will cause a crash. As you can see in the first version, there's no C++ scaffolding there to prevent this from happening. Functions wrapped by Boost.Python automatically include an exception-handling layer which protects Python users by translating unhandled C++ exceptions into a corresponding Python exception. * A slightly more-subtle limitation is that the argument conversion used in the Python 'C' API case can only get that integer ``x`` in *one way*. PyArg_ParseTuple can't convert Python ``long`` objects (arbitrary-precision integers) which happen to fit in an ``unsigned int`` but not in a ``signed long``, nor will it ever handle a wrapped C++ class with a user-defined implicit ``operator unsigned int()`` conversion. Boost.Python's dynamic type conversion registry allows users to add arbitrary conversion methods. ================== Library Overview ================== This section outlines some of the library's major features. Except as neccessary to avoid confusion, details of library implementation are omitted. ------------------ Exposing Classes ------------------ C++ classes and structs are exposed with a similarly-terse interface. Given:: struct World { void set(std::string msg) { this->msg = msg; } std::string greet() { return msg; } std::string msg; }; The following code will expose it in our extension module:: #include BOOST_PYTHON_MODULE(hello) { class_("World") .def("greet", &World::greet) .def("set", &World::set) ; } Although this code has a certain pythonic familiarity, people sometimes find the syntax bit confusing because it doesn't look like most of the C++ code they're used to. All the same, this is just standard C++. Because of their flexible syntax and operator overloading, C++ and Python are great for defining domain-specific (sub)languages (DSLs), and that's what we've done in Boost.Python. To break it down:: class_("World") constructs an unnamed object of type ``class_`` and passes ``"World"`` to its constructor. This creates a new-style Python class called ``World`` in the extension module, and associates it with the C++ type ``World`` in the Boost.Python type conversion registry. We might have also written:: class_ w("World"); but that would've been more verbose, since we'd have to name ``w`` again to invoke its ``def()`` member function:: w.def("greet", &World::greet) There's nothing special about the location of the dot for member access in the original example: C++ allows any amount of whitespace on either side of a token, and placing the dot at the beginning of each line allows us to chain as many successive calls to member functions as we like with a uniform syntax. The other key fact that allows chaining is that ``class_<>`` member functions all return a reference to ``*this``. So the example is equivalent to:: class_ w("World"); w.def("greet", &World::greet); w.def("set", &World::set); It's occasionally useful to be able to break down the components of a Boost.Python class wrapper in this way, but the rest of this article will stick to the terse syntax. For completeness, here's the wrapped class in use: :: >>> import hello >>> planet = hello.World() >>> planet.set('howdy') >>> planet.greet() 'howdy' Constructors ============ Since our ``World`` class is just a plain ``struct``, it has an implicit no-argument (nullary) constructor. Boost.Python exposes the nullary constructor by default, which is why we were able to write: :: >>> planet = hello.World() However, well-designed classes in any language may require constructor arguments in order to establish their invariants. Unlike Python, where ``__init__`` is just a specially-named method, In C++ constructors cannot be handled like ordinary member functions. In particular, we can't take their address: ``&World::World`` is an error. The library provides a different interface for specifying constructors. Given:: struct World { World(std::string msg); // added constructor ... we can modify our wrapping code as follows:: class_("World", init()) ... of course, a C++ class may have additional constructors, and we can expose those as well by passing more instances of ``init<...>`` to ``def()``:: class_("World", init()) .def(init()) ... Boost.Python allows wrapped functions, member functions, and constructors to be overloaded to mirror C++ overloading. Data Members and Properties =========================== Any publicly-accessible data members in a C++ class can be easily exposed as either ``readonly`` or ``readwrite`` attributes:: class_("World", init()) .def_readonly("msg", &World::msg) ... and can be used directly in Python: :: >>> planet = hello.World('howdy') >>> planet.msg 'howdy' This does *not* result in adding attributes to the ``World`` instance ``__dict__``, which can result in substantial memory savings when wrapping large data structures. In fact, no instance ``__dict__`` will be created at all unless attributes are explicitly added from Python. Boost.Python owes this capability to the new Python 2.2 type system, in particular the descriptor interface and ``property`` type. In C++, publicly-accessible data members are considered a sign of poor design because they break encapsulation, and style guides usually dictate the use of "getter" and "setter" functions instead. In Python, however, ``__getattr__``, ``__setattr__``, and since 2.2, ``property`` mean that attribute access is just one more well-encapsulated syntactic tool at the programmer's disposal. Boost.Python bridges this idiomatic gap by making Python ``property`` creation directly available to users. If ``msg`` were private, we could still expose it as attribute in Python as follows:: class_("World", init()) .add_property("msg", &World::greet, &World::set) ... The example above mirrors the familiar usage of properties in Python 2.2+: :: >>> class World(object): ... __init__(self, msg): ... self.__msg = msg ... def greet(self): ... return self.__msg ... def set(self, msg): ... self.__msg = msg ... msg = property(greet, set) Operator Overloading ==================== The ability to write arithmetic operators for user-defined types has been a major factor in the success of both languages for numerical computation, and the success of packages like NumPy_ attests to the power of exposing operators in extension modules. Boost.Python provides a concise mechanism for wrapping operator overloads. The example below shows a fragment from a wrapper for the Boost rational number library:: class_ >("rational_int") .def(init()) // constructor, e.g. rational_int(3,4) .def("numerator", &rational::numerator) .def("denominator", &rational::denominator) .def(-self) // __neg__ (unary minus) .def(self + self) // __add__ (homogeneous) .def(self * self) // __mul__ .def(self + int()) // __add__ (heterogenous) .def(int() + self) // __radd__ ... The magic is performed using a simplified application of "expression templates" [VELD1995]_, a technique originally developed for optimization of high-performance matrix algebra expressions. The essence is that instead of performing the computation immediately, operators are overloaded to construct a type *representing* the computation. In matrix algebra, dramatic optimizations are often available when the structure of an entire expression can be taken into account, rather than evaluating each operation "greedily". Boost.Python uses the same technique to build an appropriate Python method object based on expressions involving ``self``. .. _NumPy: http://www.pfdubois.com/numpy/ Inheritance =========== C++ inheritance relationships can be represented to Boost.Python by adding an optional ``bases<...>`` argument to the ``class_<...>`` template parameter list as follows:: class_ >("Derived") ... This has two effects: 1. When the ``class_<...>`` is created, Python type objects corresponding to ``Base1`` and ``Base2`` are looked up in Boost.Python's registry, and are used as bases for the new Python ``Derived`` type object, so methods exposed for the Python ``Base1`` and ``Base2`` types are automatically members of the ``Derived`` type. Because the registry is global, this works correctly even if ``Derived`` is exposed in a different module from either of its bases. 2. C++ conversions from ``Derived`` to its bases are added to the Boost.Python registry. Thus wrapped C++ methods expecting (a pointer or reference to) an object of either base type can be called with an object wrapping a ``Derived`` instance. Wrapped member functions of class ``T`` are treated as though they have an implicit first argument of ``T&``, so these conversions are neccessary to allow the base class methods to be called for derived objects. Of course it's possible to derive new Python classes from wrapped C++ class instances. Because Boost.Python uses the new-style class system, that works very much as for the Python built-in types. There is one significant detail in which it differs: the built-in types generally establish their invariants in their ``__new__`` function, so that derived classes do not need to call ``__init__`` on the base class before invoking its methods : :: >>> class L(list): ... def __init__(self): ... pass ... >>> L().reverse() >>> Because C++ object construction is a one-step operation, C++ instance data cannot be constructed until the arguments are available, in the ``__init__`` function: :: >>> class D(SomeBoostPythonClass): ... def __init__(self): ... pass ... >>> D().some_boost_python_method() Traceback (most recent call last): File "", line 1, in ? TypeError: bad argument type for built-in operation This happened because Boost.Python couldn't find instance data of type ``SomeBoostPythonClass`` within the ``D`` instance; ``D``'s ``__init__`` function masked construction of the base class. It could be corrected by either removing ``D``'s ``__init__`` function or having it call ``SomeBoostPythonClass.__init__(...)`` explicitly. Virtual Functions ================= Deriving new types in Python from extension classes is not very interesting unless they can be used polymorphically from C++. In other words, Python method implementations should appear to override the implementation of C++ virtual functions when called *through base class pointers/references from C++*. Since the only way to alter the behavior of a virtual function is to override it in a derived class, the user must build a special derived class to dispatch a polymorphic class' virtual functions:: // // interface to wrap: // class Base { public: virtual int f(std::string x) { return 42; } virtual ~Base(); }; int calls_f(Base const& b, std::string x) { return b.f(x); } // // Wrapping Code // // Dispatcher class struct BaseWrap : Base { // Store a pointer to the Python object BaseWrap(PyObject* self_) : self(self_) {} PyObject* self; // Default implementation, for when f is not overridden int f_default(std::string x) { return this->Base::f(x); } // Dispatch implementation int f(std::string x) { return call_method(self, "f", x); } }; ... def("calls_f", calls_f); class_("Base") .def("f", &Base::f, &BaseWrap::f_default) ; Now here's some Python code which demonstrates: :: >>> class Derived(Base): ... def f(self, s): ... return len(s) ... >>> calls_f(Base(), 'foo') 42 >>> calls_f(Derived(), 'forty-two') 9 Things to notice about the dispatcher class: * The key element which allows overriding in Python is the ``call_method`` invocation, which uses the same global type conversion registry as the C++ function wrapping does to convert its arguments from C++ to Python and its return type from Python to C++. * Any constructor signatures you wish to wrap must be replicated with an initial ``PyObject*`` argument * The dispatcher must store this argument so that it can be used to invoke ``call_method`` * The ``f_default`` member function is needed when the function being exposed is not pure virtual; there's no other way ``Base::f`` can be called on an object of type ``BaseWrap``, since it overrides ``f``. Deeper Reflection on the Horizon? ================================= Admittedly, this formula is tedious to repeat, especially on a project with many polymorphic classes. That it is neccessary reflects some limitations in C++'s compile-time introspection capabilities: there's no way to enumerate the members of a class and find out which are virtual functions. At least one very promising project has been started to write a front-end which can generate these dispatchers (and other wrapping code) automatically from C++ headers. Pyste_ is being developed by Bruno da Silva de Oliveira. It builds on GCC_XML_, which generates an XML version of GCC's internal program representation. Since GCC is a highly-conformant C++ compiler, this ensures correct handling of the most-sophisticated template code and full access to the underlying type system. In keeping with the Boost.Python philosophy, a Pyste interface description is neither intrusive on the code being wrapped, nor expressed in some unfamiliar language: instead it is a 100% pure Python script. If Pyste is successful it will mark a move away from wrapping everything directly in C++ for many of our users. It will also allow us the choice to shift some of the metaprogram code from C++ to Python. We expect that soon, not only our users but the Boost.Python developers themselves will be "thinking hybrid" about their own code. .. _`GCC_XML`: http://www.gccxml.org/HTML/Index.html .. _`Pyste`: http://www.boost.org/libs/python/pyste --------------- Serialization --------------- *Serialization* is the process of converting objects in memory to a form that can be stored on disk or sent over a network connection. The serialized object (most often a plain string) can be retrieved and converted back to the original object. A good serialization system will automatically convert entire object hierarchies. Python's standard ``pickle`` module is just such a system. It leverages the language's strong runtime introspection facilities for serializing practically arbitrary user-defined objects. With a few simple and unintrusive provisions this powerful machinery can be extended to also work for wrapped C++ objects. Here is an example:: #include struct World { World(std::string a_msg) : msg(a_msg) {} std::string greet() const { return msg; } std::string msg; }; #include using namespace boost::python; struct World_picklers : pickle_suite { static tuple getinitargs(World const& w) { return make_tuple(w.greet()); } }; BOOST_PYTHON_MODULE(hello) { class_("World", init()) .def("greet", &World::greet) .def_pickle(World_picklers()) ; } Now let's create a ``World`` object and put it to rest on disk:: >>> import hello >>> import pickle >>> a_world = hello.World("howdy") >>> pickle.dump(a_world, open("my_world", "w")) In a potentially *different script* on a potentially *different computer* with a potentially *different operating system*:: >>> import pickle >>> resurrected_world = pickle.load(open("my_world", "r")) >>> resurrected_world.greet() 'howdy' Of course the ``cPickle`` module can also be used for faster processing. Boost.Python's ``pickle_suite`` fully supports the ``pickle`` protocol defined in the standard Python documentation. Like a __getinitargs__ function in Python, the pickle_suite's getinitargs() is responsible for creating the argument tuple that will be use to reconstruct the pickled object. The other elements of the Python pickling protocol, __getstate__ and __setstate__ can be optionally provided via C++ getstate and setstate functions. C++'s static type system allows the library to ensure at compile-time that nonsensical combinations of functions (e.g. getstate without setstate) are not used. Enabling serialization of more complex C++ objects requires a little more work than is shown in the example above. Fortunately the ``object`` interface (see next section) greatly helps in keeping the code manageable. ------------------ Object interface ------------------ Experienced 'C' language extension module authors will be familiar with the ubiquitous ``PyObject*``, manual reference-counting, and the need to remember which API calls return "new" (owned) references or "borrowed" (raw) references. These constraints are not just cumbersome but also a major source of errors, especially in the presence of exceptions. Boost.Python provides a class ``object`` which automates reference counting and provides conversion to Python from C++ objects of arbitrary type. This significantly reduces the learning effort for prospective extension module writers. Creating an ``object`` from any other type is extremely simple:: object s("hello, world"); // s manages a Python string ``object`` has templated interactions with all other types, with automatic to-python conversions. It happens so naturally that it's easily overlooked:: object ten_Os = 10 * s[4]; // -> "oooooooooo" In the example above, ``4`` and ``10`` are converted to Python objects before the indexing and multiplication operations are invoked. The ``extract`` class template can be used to convert Python objects to C++ types:: double x = extract(o); If a conversion in either direction cannot be performed, an appropriate exception is thrown at runtime. The ``object`` type is accompanied by a set of derived types that mirror the Python built-in types such as ``list``, ``dict``, ``tuple``, etc. as much as possible. This enables convenient manipulation of these high-level types from C++:: dict d; d["some"] = "thing"; d["lucky_number"] = 13; list l = d.keys(); This almost looks and works like regular Python code, but it is pure C++. Of course we can wrap C++ functions which accept or return ``object`` instances. ================= Thinking hybrid ================= Because of the practical and mental difficulties of combining programming languages, it is common to settle a single language at the outset of any development effort. For many applications, performance considerations dictate the use of a compiled language for the core algorithms. Unfortunately, due to the complexity of the static type system, the price we pay for runtime performance is often a significant increase in development time. Experience shows that writing maintainable C++ code usually takes longer and requires *far* more hard-earned working experience than developing comparable Python code. Even when developers are comfortable working exclusively in compiled languages, they often augment their systems by some type of ad hoc scripting layer for the benefit of their users without ever availing themselves of the same advantages. Boost.Python enables us to *think hybrid*. Python can be used for rapidly prototyping a new application; its ease of use and the large pool of standard libraries give us a head start on the way to a working system. If necessary, the working code can be used to discover rate-limiting hotspots. To maximize performance these can be reimplemented in C++, together with the Boost.Python bindings needed to tie them back into the existing higher-level procedure. Of course, this *top-down* approach is less attractive if it is clear from the start that many algorithms will eventually have to be implemented in C++. Fortunately Boost.Python also enables us to pursue a *bottom-up* approach. We have used this approach very successfully in the development of a toolbox for scientific applications. The toolbox started out mainly as a library of C++ classes with Boost.Python bindings, and for a while the growth was mainly concentrated on the C++ parts. However, as the toolbox is becoming more complete, more and more newly added functionality can be implemented in Python. .. image:: images/python_cpp_mix.png This figure shows the estimated ratio of newly added C++ and Python code over time as new algorithms are implemented. We expect this ratio to level out near 70% Python. Being able to solve new problems mostly in Python rather than a more difficult statically typed language is the return on our investment in Boost.Python. The ability to access all of our code from Python allows a broader group of developers to use it in the rapid development of new applications. ===================== Development history ===================== The first version of Boost.Python was developed in 2000 by Dave Abrahams at Dragon Systems, where he was privileged to have Tim Peters as a guide to "The Zen of Python". One of Dave's jobs was to develop a Python-based natural language processing system. Since it was eventually going to be targeting embedded hardware, it was always assumed that the compute-intensive core would be rewritten in C++ to optimize speed and memory footprint [#proto]_. The project also wanted to test all of its C++ code using Python test scripts [#test]_. The only tool we knew of for binding C++ and Python was SWIG_, and at the time its handling of C++ was weak. It would be false to claim any deep insight into the possible advantages of Boost.Python's approach at this point. Dave's interest and expertise in fancy C++ template tricks had just reached the point where he could do some real damage, and Boost.Python emerged as it did because it filled a need and because it seemed like a cool thing to try. This early version was aimed at many of the same basic goals we've described in this paper, differing most-noticeably by having a slightly more cumbersome syntax and by lack of special support for operator overloading, pickling, and component-based development. These last three features were quickly added by Ullrich Koethe and Ralf Grosse-Kunstleve [#feature]_, and other enthusiastic contributors arrived on the scene to contribute enhancements like support for nested modules and static member functions. By early 2001 development had stabilized and few new features were being added, however a disturbing new fact came to light: Ralf had begun testing Boost.Python on pre-release versions of a compiler using the EDG_ front-end, and the mechanism at the core of Boost.Python responsible for handling conversions between Python and C++ types was failing to compile. As it turned out, we had been exploiting a very common bug in the implementation of all the C++ compilers we had tested. We knew that as C++ compilers rapidly became more standards-compliant, the library would begin failing on more platforms. Unfortunately, because the mechanism was so central to the functioning of the library, fixing the problem looked very difficult. Fortunately, later that year Lawrence Berkeley and later Lawrence Livermore National labs contracted with `Boost Consulting`_ for support and development of Boost.Python, and there was a new opportunity to address fundamental issues and ensure a future for the library. A redesign effort began with the low level type conversion architecture, building in standards-compliance and support for component-based development (in contrast to version 1 where conversions had to be explicitly imported and exported across module boundaries). A new analysis of the relationship between the Python and C++ objects was done, resulting in more intuitive handling for C++ lvalues and rvalues. The emergence of a powerful new type system in Python 2.2 made the choice of whether to maintain compatibility with Python 1.5.2 easy: the opportunity to throw away a great deal of elaborate code for emulating classic Python classes alone was too good to pass up. In addition, Python iterators and descriptors provided crucial and elegant tools for representing similar C++ constructs. The development of the generalized ``object`` interface allowed us to further shield C++ programmers from the dangers and syntactic burdens of the Python 'C' API. A great number of other features including C++ exception translation, improved support for overloaded functions, and most significantly, CallPolicies for handling pointers and references, were added during this period. In October 2002, version 2 of Boost.Python was released. Development since then has concentrated on improved support for C++ runtime polymorphism and smart pointers. Peter Dimov's ingenious ``boost::shared_ptr`` design in particular has allowed us to give the hybrid developer a consistent interface for moving objects back and forth across the language barrier without loss of information. At first, we were concerned that the sophistication and complexity of the Boost.Python v2 implementation might discourage contributors, but the emergence of Pyste_ and several other significant feature contributions have laid those fears to rest. Daily questions on the Python C++-sig and a backlog of desired improvements show that the library is getting used. To us, the future looks bright. .. _`EDG`: http://www.edg.com ============= Conclusions ============= Boost.Python achieves seamless interoperability between two rich and complimentary language environments. Because it leverages template metaprogramming to introspect about types and functions, the user never has to learn a third syntax: the interface definitions are written in concise and maintainable C++. Also, the wrapping system doesn't have to parse C++ headers or represent the type system: the compiler does that work for us. Computationally intensive tasks play to the strengths of C++ and are often impossible to implement efficiently in pure Python, while jobs like serialization that are trivial in Python can be very difficult in pure C++. Given the luxury of building a hybrid software system from the ground up, we can approach design with new confidence and power. =========== Citations =========== .. [VELD1995] T. Veldhuizen, "Expression Templates," C++ Report, Vol. 7 No. 5 June 1995, pp. 26-31. http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html =========== Footnotes =========== .. [#proto] In retrospect, it seems that "thinking hybrid" from the ground up might have been better for the NLP system: the natural component boundaries defined by the pure python prototype turned out to be inappropriate for getting the desired performance and memory footprint out of the C++ core, which eventually caused some redesign overhead on the Python side when the core was moved to C++. .. [#test] We also have some reservations about driving all C++ testing through a Python interface, unless that's the only way it will be ultimately used. Any transition across language boundaries with such different object models can inevitably mask bugs. .. [#feature] These features were expressed very differently in v1 of Boost.Python