123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172 |
- [/
- / Copyright (c) 2008 Eric Niebler
- /
- / Distributed under the Boost Software License, Version 1.0. (See accompanying
- / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- /]
- [section Cycle collection with [^tracking_ptr<>]]
- In xpressive, regex objects can refer to each other and themselves by value or by reference.
- In addition, they ref-count their referenced regexes to keep them alive. This creates the possibility
- for cyclic reference counts, and raises the possibility of memory leaks. xpressive avoids leaks
- by using a type called `tracking_ptr<>`. This doc describes at a high level how `tracking_ptr<>`
- works.
- [h2 Constraints]
- Our solution must meet the following design constraints:
- * No dangling references: All objects referred to directly or indirectly must be kept alive
- as long as the references are needed.
- * No leaks: all objects must be freed eventually.
- * No user intervention: The solution must not require users to explicitly invoke some cycle
- collection routine.
- * Clean-up is no-throw: The collection phase will likely be called from a destructor, so it
- must never throw an exception under any circumstance.
-
- [h2 Handle-Body Idiom]
- To use `tracking_ptr<>`, you must separate your type into a handle and a body. In the case of
- xpressive, the handle type is called `basic_regex<>` and the body is called `regex_impl<>`. The
- handle will store a `tracking_ptr<>` to the body.
- The body type must inherit from `enable_reference_tracking<>`. This gives the body the bookkeeping
- data structures that `tracking_ptr<>` will use. In particular, it gives the body:
- # `std::set<shared_ptr<body> > refs_` : collection of bodies to which this body refers, and
- # `std::set<weak_ptr<body> > deps_` : collection of bodies which refer to this body.
- [h2 References and Dependencies]
- We refer to (1) above as the "references" and (2) as the "dependencies". It is crucial to the
- understanding of `tracking_ptr<>` to recognize that the set of references includes both those
- objects that are referred to directly as well as those that are referred to indirectly (that
- is, through another reference). The same is true for the set of dependencies. In other words,
- each body holds a ref-count directly to every other body that it needs.
- Why is this important? Because it means that when a body no longer has a handle referring
- to it, all its references can be released immediately without fear of creating dangling references.
- References and dependencies cross-pollinate. Here's how it works:
- # When one object acquires another as a reference, the second object acquires the first as
- a dependency.
- # In addition, the first object acquires all of the second object's references, and the second
- object acquires all of the first object's dependencies.
- # When an object picks up a new reference, the reference is also added to all dependent objects.
- # When an object picks up a new dependency, the dependency is also added to all referenced
- objects.
- # An object is never allowed to have itself as a dependency. Objects may have themselves as
- references, and often do.
- Consider the following code:
- sregex expr;
- {
- sregex group = '(' >> by_ref(expr) >> ')'; // (1)
- sregex fact = +_d | group; // (2)
- sregex term = fact >> *(('*' >> fact) | ('/' >> fact)); // (3)
- expr = term >> *(('+' >> term) | ('-' >> term)); // (4)
- } // (5)
- Here is how the references and dependencies propagate, line by line:
- [table
- [[Expression][Effects]]
- [[1) `sregex group = '(' >> by_ref(expr) >> ')';`]
- [[^group: cnt=1 refs={expr} deps={}\n
- expr: cnt=2 refs={} deps={group}]]]
- [[2) `sregex fact = +_d | group;`]
- [[^group: cnt=2 refs={expr} deps={fact}\n
- expr: cnt=3 refs={} deps={group,fact}\n
- fact: cnt=1 refs={expr,group} deps={}]]]
- [[3) `sregex term = fact >> *(('*' >> fact) | ('/' >> fact));`]
- [[^group: cnt=3 refs={expr} deps={fact,term}\n
- expr: cnt=4 refs={} deps={group,fact,term}\n
- fact: cnt=2 refs={expr,group} deps={term}\n
- term: cnt=1 refs={expr,group,fact} deps={}]]]
- [[4) `expr = term >> *(('+' >> term) | ('-' >> term));`]
- [[^group: cnt=5 refs={expr,group,fact,term} deps={expr,fact,term}\n
- expr: cnt=5 refs={expr,group,fact,term} deps={group,fact,term}\n
- fact: cnt=5 refs={expr,group,fact,term} deps={expr,group,term}\n
- term: cnt=5 refs={expr,group,fact,term} deps={expr,group,fact}]]]
- [[5) `}`]
- [[^expr: cnt=2 refs={expr,group,fact,term} deps={group,fact,term}]]]
- ]
- This shows how references and dependencies propagate when creating cycles of objects. After
- line (4), which closes the cycle, every object has a ref-count on every other object, even
- to itself. So how does this not leak? Read on.
- [h2 Cycle Breaking]
- Now that the bodies have their sets of references and dependencies, the hard part is done.
- All that remains is to decide when and where to break the cycle. That is the job of `tracking_ptr<>`,
- which is part of the handle. The `tracking_ptr<>` holds 2 `shared_ptr`s. The first, obviously,
- is the `shared_ptr<body>` -- the reference to the body to which this handle refers. The other
- `shared_ptr` is used to break the cycle. It ensures that when all the handles to a body go out
- of scope, the body's set of references is cleared.
- This suggests that more than one handle can refer to a body. In fact, `tracking_ptr<>` gives
- you copy-on-write semantics -- when you copy a handle, the body is shared. That makes copies
- very efficient. Eventually, all the handles to a particular body go out of scope. When that
- happens, the ref count to the body might still be greater than 0 because some other body (or
- this body itself!) might be holding a reference to it. However, we are certain that the cycle-breaker's
- ref-count goes to 0 because the cycle-breaker only lives in handles. No more handles, no more
- cycle-breakers.
- What does the cycle-breaker do? Recall that the body has a set of references of type
- `std::set<shared_ptr<body> >`. Let's call this type "references_type". The cycle-breaker is a
- `shared_ptr<references_type>`. It uses a custom deleter, which is defined as follows:
- template<typename DerivedT>
- struct reference_deleter
- {
- void operator ()(std::set<shared_ptr<DerivedT> > *refs) const
- {
- refs->clear();
- }
- };
- The job of to the cycle breaker is to ensure that when the last handle to a body goes away,
- the body's set of references is cleared. That's it.
- We can clearly see how this guarantees that all bodies are cleaned up eventually. Once every
- handle has gone out of scope, all the bodies' sets of references will be cleared, leaving none
- with a non-zero ref-count. No leaks, guaranteed.
- It's a bit harder to see how this guarantees no dangling references. Imagine that there are 3
- bodies: A, B and C. A refers to B which refers to C. Now all the handles to B go out of scope,
- so B's set of references is cleared. Doesn't this mean that C gets deleted, even though it
- is being used (indirectly) by A? It doesn't. This situation can never occur because we propagated
- the references and dependencies above such that A will be holding a reference directly to C
- in addition to B. When B's set of references is cleared, no bodies get deleted, because they
- are all still in use by A.
- [h2 Future Work]
- All these `std::set`s and `shared_ptr`s and `weak_ptr`s! Very inefficient. I used them because
- they were handy. I could probably do better.
- Also, some objects stick around longer than they need to. Consider:
- sregex b;
- {
- sregex a = _;
- b = by_ref(a);
- b = _;
- }
- // a is still alive here!
- Due to the way references and dependencies are propagated, the `std::set` of references can only
- grow. It never shrinks, even when some references are no longer needed. For xpressive this
- isn't an issue. The graphs of referential objects generally stay small and isolated. If someone
- were to try to use `tracking_ptr<>` as a general ref-count-cycle-collection mechanism, this problem
- would have to be addressed.
- [endsect]
|