123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354 |
- [/
- / Copyright (c) 2008 Marcin Kalicinski (kalita <at> poczta dot onet dot pl)
- / Copyright (c) 2009 Sebastian Redl (sebastian dot redl <at> getdesigned dot at)
- /
- / Distributed under the Boost Software License, Version 1.0. (See accompanying
- / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- /]
- [section XML Parser]
- [def __xml__ [@http://en.wikipedia.org/wiki/XML XML format]]
- [def __xml_parser.hpp__ [headerref boost/property_tree/xml_parser.hpp xml_parser.hpp]]
- [def __RapidXML__ [@http://rapidxml.sourceforge.net/ RapidXML]]
- [def __boost__ [@http://www.boost.org Boost]]
- The __xml__ is an industry standard for storing information in textual
- form. Unfortunately, there is no XML parser in __boost__ as of the
- time of this writing. The library therefore contains the fast and tiny
- __RapidXML__ parser (currently in version 1.13) to provide XML parsing support.
- RapidXML does not fully support the XML standard; it is not capable of parsing
- DTDs and therefore cannot do full entity substitution.
- By default, the parser will preserve most whitespace, but remove element content
- that consists only of whitespace. Encoded whitespaces (e.g.  ) does not
- count as whitespace in this regard. You can pass the trim_whitespace flag if you
- want all leading and trailing whitespace trimmed and all continuous whitespace
- collapsed into a single space.
- Please note that RapidXML does not understand the encoding specification. If
- you pass it a character buffer, it assumes the data is already correctly
- encoded; if you pass it a filename, it will read the file using the character
- conversion of the locale you give it (or the global locale if you give it none).
- This means that, in order to parse a UTF-8-encoded XML file into a wptree, you
- have to supply an alternate locale, either directly or by replacing the global
- one.
- XML / property tree conversion schema (__read_xml__ and __write_xml__):
- * Each XML element corresponds to a property tree node. The child elements
- correspond to the children of the node.
- * The attributes of an XML element are stored in the subkey [^<xmlattr>]. There
- is one child node per attribute in the attribute node. Existence of the
- [^<xmlattr>] node is not guaranteed or necessary when there are no attributes.
- * XML comments are stored in nodes named [^<xmlcomment>], unless comment
- ignoring is enabled via the flags.
- * Text content is stored in one of two ways, depending on the flags. The default
- way concatenates all text nodes and stores them as the data of the element
- node. This way, the entire content can be conveniently read, but the
- relative ordering of text and child elements is lost. The other way stores
- each text content as a separate node, all called [^<xmltext>].
- The XML storage encoding does not round-trip perfectly. A read-write cycle loses
- trimmed whitespace, low-level formatting information, and the distinction
- between normal data and CDATA nodes. Comments are only preserved when enabled.
- A write-read cycle loses trimmed whitespace; that is, if the origin tree has
- string data that starts or ends with whitespace, that whitespace is lost.
- [endsect] [/xml_parser]
|