design.htm 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353
  1. <html>
  2. <head>
  3. <meta http-equiv="Content-Language" content="en-us">
  4. <meta name="GENERATOR" content="Microsoft FrontPage 5.0">
  5. <meta name="ProgId" content="FrontPage.Editor.Document">
  6. <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  7. <title>Boost Filesystem Library Design</title>
  8. <link href="styles.css" rel="stylesheet">
  9. </head>
  10. <body bgcolor="#FFFFFF">
  11. <h1>
  12. <img border="0" src="../../../boost.png" align="center" width="277" height="86">Filesystem
  13. Library Design</h1>
  14. <p><a href="#Introduction">Introduction</a><br>
  15. <a href="#Requirements">Requirements</a><br>
  16. <a href="#Realities">Realities</a><br>
  17. <a href="#Rationale">Rationale</a><br>
  18. <a href="#Abandoned_Designs">Abandoned_Designs</a><br>
  19. <a href="#References">References</a></p>
  20. <h2><a name="Introduction">Introduction</a></h2>
  21. <p>The primary motivation for beginning work on the Filesystem Library was
  22. frustration with Boost administrative tools.&nbsp; Scripts were written in
  23. Python, Perl, Bash, and Windows command languages.&nbsp; There was no single
  24. scripting language familiar and acceptable to all Boost administrators. Yet they
  25. were all skilled C++ programmers - why couldn't C++ be used as the scripting
  26. language?</p>
  27. <p>The key feature C++ lacked for script-like applications was the ability to
  28. perform portable filesystem operations on directories and their contents. The
  29. Filesystem Library was developed to fill that void.</p>
  30. <p>The intent is not to compete with traditional scripting languages, but to
  31. provide a solution for situations where C++ is already the language
  32. of choice..</p>
  33. <h2><a name="Requirements">Requirements</a></h2>
  34. <ul>
  35. <li>Be able to write portable script-style filesystem operations in modern
  36. C++.<br>
  37. <br>
  38. Rationale: This is a common programming need. It is both an
  39. embarrassment and a hardship that this is not possible with either the current
  40. C++ or Boost libraries.&nbsp; The need is particularly acute
  41. when C++ is the only toolset allowed in the tool chain.&nbsp; File system
  42. operations are provided by many languages&nbsp;used on multiple platforms,
  43. such as Perl and Python, as well as by many platform specific scripting
  44. languages. All operating systems provide some form of API for filesystem
  45. operations, and the POSIX bindings are increasingly available even on
  46. operating systems not normally associated with POSIX, such as the Mac, z/OS,
  47. or OS/390.<br>
  48. &nbsp;</li>
  49. <li>Work within the <a href="#Realities">realities</a> described below.<br>
  50. <br>
  51. Rationale: This isn't a research project. The need is for something that works on
  52. today's platforms, including some of the embedded operating systems
  53. with limited file systems. Because of the emphasis on portability, such a
  54. library would be much more useful if standardized. That means being able to
  55. work with a much wider range of platforms that just Unix or Windows and their
  56. clones.<br>
  57. &nbsp;</li>
  58. <li>Avoid dangerous programming practices. Particularly, all-too-easy-to-ignore error notifications
  59. and use of global variables.&nbsp;If a dangerous feature is provided, identify it as such.<br>
  60. <br>
  61. Rationale: Normally this would be covered by &quot;the usual Boost requirements...&quot;,
  62. but it is mentioned explicitly because the equivalent native platform and
  63. scripting language interfaces often depend on all-too-easy-to-ignore error
  64. notifications and global variables like &quot;current
  65. working directory&quot;.<br>
  66. &nbsp;</li>
  67. <li>Structure the library so that it is still useful even if some functionality
  68. does not map well onto a given platform or directory tree. Particularly, much
  69. useful functionality should be portable even to flat
  70. (non-hierarchical) filesystems.<br>
  71. <br>
  72. Rationale: Much functionality which does not
  73. require a hierarchical directory structure is still useful on flat-structure
  74. filesystems.&nbsp; There are many systems, particularly embedded systems,
  75. where even very limited functionality is still useful.</li>
  76. </ul>
  77. <ul>
  78. <li>Interface smoothly with current C++ Standard Library input/output
  79. facilities.&nbsp; For example, paths should be
  80. easy to use in std::basic_fstream constructors.<br>
  81. <br>
  82. Rationale: One of the most common uses of file system functionality is to
  83. manipulate paths for eventual use in input/output operations.&nbsp;
  84. Thus the need to interface smoothly with standard library I/O.<br>
  85. &nbsp;</li>
  86. <li>Suitable for eventual standardization. The implication of this requirement
  87. is that the interface be close to minimal, and that great care be take
  88. regarding portability.<br>
  89. <br>
  90. Rationale: The lack of file system operations is a serious hole
  91. in the current standard, with no other known candidates to fill that hole.
  92. Libraries with elaborate interfaces and difficult to port specifications are much less likely to be accepted for
  93. standardization.<br>
  94. &nbsp;</li>
  95. <li>The usual Boost <a href="http://www.boost.org/more/lib_guide.htm">requirements and
  96. guidelines</a> apply.<br>
  97. &nbsp;</li>
  98. <li>Encourage, but do not require, portability in path names.<br>
  99. <br>
  100. Rationale: For paths which originate from user input it is unreasonable to
  101. require portable path syntax.<br>
  102. &nbsp;</li>
  103. <li>Avoid giving the illusion of portability where portability in fact does not
  104. exist.<br>
  105. <br>
  106. Rationale: Leaving important behavior unspecified or &quot;implementation defined&quot; does a
  107. great disservice to programmers using a library because it makes it appear
  108. that code relying on the behavior is portable, when in fact there is nothing
  109. portable about it. The only case where such under-specification is acceptable is when both users and implementors know from
  110. other sources exactly what behavior is required, yet for some reason it isn't
  111. possible to specify it exactly.</li>
  112. </ul>
  113. <h2><a name="Realities">Realities</a></h2>
  114. <ul>
  115. <li>Some operating systems have a single directory tree root, others have
  116. multiple roots.<br>
  117. &nbsp;</li>
  118. <li>Some file systems provide both a long and short form of filenames.<br>
  119. &nbsp;</li>
  120. <li>Some file systems have different syntax for file paths and directory
  121. paths.<br>
  122. &nbsp;</li>
  123. <li>Some file systems have different rules for valid file names and valid
  124. directory names.<br>
  125. &nbsp;</li>
  126. <li>Some file systems (ISO-9660, level 1, for example) use very restricted
  127. (so-called 8.3) file names.<br>
  128. &nbsp;</li>
  129. <li>Some operating systems allow file systems with different
  130. characteristics to be &quot;mounted&quot; within a directory tree.&nbsp; Thus an
  131. ISO-9660 or Windows
  132. file system may end up as a sub-tree of a POSIX directory tree.<br>
  133. &nbsp;</li>
  134. <li>Wide-character versions of directory and file operations are available on some operating
  135. systems, and not available on others.<br>
  136. &nbsp;</li>
  137. <li>There is no law that says directory hierarchies have to be specified in
  138. terms of left-to-right decent from the root.<br>
  139. &nbsp;</li>
  140. <li>Some file systems have a concept of file &quot;version number&quot; or &quot;generation
  141. number&quot;.&nbsp; Some don't.<br>
  142. &nbsp;</li>
  143. <li>Not all operating systems use single character separators in path names.&nbsp; Some use
  144. paired notations. A typical fully-specified OpenVMS filename
  145. might look something like this:<br>
  146. <br>
  147. <code>&nbsp;&nbsp; DISK$SCRATCH:[GEORGE.PROJECT1.DAT]BIG_DATA_FILE.NTP;5<br>
  148. </code><br>
  149. The general OpenVMS format is:<br>
  150. <br>
  151. &nbsp;&nbsp;&nbsp;&nbsp;
  152. <i>Device:[directories.dot.separated]filename.extension;version_number</i><br>
  153. &nbsp;</li>
  154. <li>For common file systems, determining if two descriptors are for same
  155. entity is extremely difficult or impossible.&nbsp; For example, the concept of
  156. equality can be different for each portion of a path - some portions may be
  157. case or locale sensitive, others not. Case sensitivity is a property of the
  158. pathname itself, and not the platform. Determining collating sequence is even
  159. worse.<br>
  160. &nbsp;</li>
  161. <li>Race-conditions may occur. Directory trees, directories, files, and file attributes are in effect shared between all threads, processes, and computers which have access to the
  162. filesystem.&nbsp; That may well include computers on the other side of the
  163. world or in orbit around the world. This implies that file system operations
  164. may fail in unexpected ways.&nbsp;For example:<br>
  165. <br>
  166. <code>&nbsp;&nbsp;&nbsp;&nbsp; assert( exists(&quot;foo&quot;) == exists(&quot;foo&quot;) );
  167. // may fail!<br>
  168. &nbsp;&nbsp;&nbsp;&nbsp; assert( is_directory(&quot;foo&quot;) == is_directory(&quot;foo&quot;);
  169. // may fail!<br>
  170. </code><br>
  171. In the first example, the file may have been deleted between calls to
  172. exists().&nbsp; In the second example, the file may have been deleted and then
  173. replaced by a directory of the same name between the calls to is_directory().<br>
  174. &nbsp;</li>
  175. <li>Even though an application may be portable, it still will have to traffic
  176. in system specific paths occasionally; user provided input is a common
  177. example.<br>
  178. &nbsp;</li>
  179. <li><a name="symbolic-link-use-case">Symbolic</a> links cause canonical and
  180. normal form of some paths to represent different files or directories. For
  181. example, given the directory hierarchy <code>/a/b/c</code>, with a symbolic
  182. link in <code>/a</code> named <code>x</code>&nbsp; pointing to <code>b/c</code>,
  183. then under POSIX Pathname Resolution rules a path of <code>&quot;/a/x/..&quot;</code>
  184. should resolve to <code>&quot;/a/b&quot;</code>. If <code>&quot;/a/x/..&quot;</code> were first
  185. normalized to <code>&quot;/a&quot;</code>, it would resolve incorrectly. (Case supplied
  186. by Walter Landry.)</li>
  187. </ul>
  188. <h2><a name="Rationale">Rationale</a></h2>
  189. <p>The <a href="#Requirements">Requirements</a> and <a href="#Realities">
  190. Realities</a> above drove much of the C++ interface design.&nbsp; In particular,
  191. the desire to make script-like code straightforward caused a great deal of
  192. effort to go into ensuring that apparently simple expressions like <i>exists( &quot;foo&quot;
  193. )</i> work as expected.</p>
  194. <p>See the <a href="faq.htm">FAQ</a> for the rationale behind many detailed
  195. design decisions.</p>
  196. <p>Several key insights went into the <i>path</i> class design:</p>
  197. <ul>
  198. <li>Decoupling of the input formats, internal conceptual (<i>vector&lt;string&gt;</i>
  199. or other sequence)
  200. model, and output formats.</li>
  201. <li>Providing two input formats (generic and O/S specific) broke a major
  202. design deadlock.</li>
  203. <li>Providing several output formats solved another set of previously
  204. intractable problems.</li>
  205. <li>Several non-obvious functions (particularly decomposition and composition)
  206. are required to support portable code. (Peter Dimov, Thomas Witt, Glen
  207. Knowles, others.)</li>
  208. </ul>
  209. <p>Error checking was a particularly difficult area. One key insight was that
  210. with file and directory names, portability isn't a universal truth.&nbsp;
  211. Rather, the programmer must think out the question &quot;What operating systems do I
  212. want this path to be portable to?&quot;&nbsp; By providing support for several
  213. answers to that question, the Filesystem Library alerts programmers of the need
  214. to ask it in the first place.</p>
  215. <h2><a name="Abandoned_Designs">Abandoned Designs</a></h2>
  216. <h3>operations.hpp</h3>
  217. <p>Dietmar Kühl's original dir_it design and implementation supported
  218. wide-character file and directory names. It was abandoned after extensive
  219. discussions among Library Working Group members failed to identify portable
  220. semantics for wide-character names on systems not providing native support. See
  221. <a href="faq.htm#wide-character_names">FAQ</a>.</p>
  222. <p>Previous iterations of the interface design used explicitly named functions providing a
  223. large number of convenience operations, with no compile-time or run-time
  224. options. There were so many function names that they were very confusing to use,
  225. and the interface was much larger. Any benefits seemed theoretical rather than
  226. real. </p>
  227. <p>Designs based on compile time (rather than runtime) flag and option selection
  228. (via policy, enum, or int template parameters) became so complicated that they
  229. were abandoned, often after investing quite a bit of time and effort. The need
  230. to qualify attribute or option names with namespaces, even aliases, made use in
  231. template parameters ugly; that wasn't fully appreciated until actually writing
  232. real code.</p>
  233. <p>Yet another set of convenience functions ( for example, <i>remove</i> with
  234. permissive, prune, recurse, and other options, plus predicate, and possibly
  235. other, filtering features) were abandoned because the details became both
  236. complex and contentious.</p>
  237. <p>What is left is a toolkit of low-level operations from which the user can
  238. create more complex convenience operations, plus a very small number of
  239. convenience functions which were found to be useful enough to justify inclusion.</p>
  240. <h3>path.hpp</h3>
  241. <p>There were so many abandoned path designs, I've lost track. Policy-based
  242. class templates in several flavors, constructor supplied runtime policies,
  243. operation specific runtime policies, they were all considered, often
  244. implemented, and ultimately abandoned as far too complicated for any small
  245. benefits observed.</p>
  246. <p>Additional design considerations apply to <a href="v3_design.html">Internationalization</a>. </p>
  247. <h3>error checking</h3>
  248. <p>A number of designs for the error checking machinery were abandoned, some
  249. after experiments with implementations. Totally automatic error checking was
  250. attempted in particular. But automatic error checking tended to make the overall
  251. library design much more complicated.</p>
  252. <p>Some designs associated error checking mechanisms with paths.&nbsp; Some with
  253. operations functions.&nbsp; A policy-based error checking template design was
  254. partially implemented, then abandoned as too complicated for everyday
  255. script-like programs.</p>
  256. <p>The final design, which depends partially on explicit error checking function
  257. calls,&nbsp; is much simpler and straightforward, although it does depend to
  258. some extent on programmer discipline.&nbsp; But it should allow programmers who
  259. are concerned about portability to be reasonably sure that their programs will
  260. work correctly on their choice of target systems.</p>
  261. <h2><a name="References">References</a></h2>
  262. <table border="0" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">
  263. <tr>
  264. <td width="13%" valign="top">[<a name="IBM-01">IBM-01</a>]</td>
  265. <td width="87%">IBM Corporation, <i>z/OS V1R3.0 C/C++ Run-Time
  266. Library Reference</i>, SA22-7821-02, 2001,
  267. <a href="http://www-1.ibm.com/servers/eserver/zseries/zos/bkserv/">
  268. www-1.ibm.com/servers/eserver/zseries/zos/bkserv/</a></td>
  269. </tr>
  270. <tr>
  271. <td width="13%" valign="top">[<a name="ISO-9660">ISO-9660</a>]</td>
  272. <td width="87%">International Standards Organization, 1988</td>
  273. </tr>
  274. <tr>
  275. <td width="13%" valign="top">[<a name="Kuhn">Kuhn</a>]</td>
  276. <td width="87%">UTF-8 and Unicode FAQ for Unix/Linux,
  277. <a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html">
  278. www.cl.cam.ac.uk/~mgk25/unicode.html</a></td>
  279. </tr>
  280. <tr>
  281. <td width="13%" valign="top">[<a name="MSDN">MSDN</a>] </td>
  282. <td width="87%">Microsoft Platform SDK for Windows, Storage Start
  283. Page,
  284. <a href="http://msdn.microsoft.com/library/en-us/fileio/base/storage_start_page.asp">
  285. msdn.microsoft.com/library/en-us/fileio/base/storage_start_page.asp</a></td>
  286. </tr>
  287. <tr>
  288. <td width="13%" valign="top">[<a name="POSIX-01">POSIX-01</a>]</td>
  289. <td width="87%">IEEE&nbsp;Std&nbsp;1003.1-2001, ISO/IEC 9945:2002, and The Open Group Base Specifications, Issue 6. Also known as The
  290. Single Unix<font face="Times New Roman">® Specification, Version 3.
  291. Available from each of the organizations involved in its creation. For
  292. example, read online or download from
  293. <a href="http://www.unix.org/single_unix_specification/">
  294. www.unix.org/single_unix_specification/</a>.</font> The ISO JTC1/SC22/WG15 - POSIX
  295. homepage is <a href="http://www.open-std.org/jtc1/sc22/WG15/">
  296. www.open-std.org/jtc1/sc22/WG15/</a></td>
  297. </tr>
  298. <tr>
  299. <td width="13%" valign="top">[<a name="URI">URI</a>]</td>
  300. <td width="87%">RFC-2396, Uniform Resource Identifiers (URI): Generic
  301. Syntax, <a href="http://www.ietf.org/rfc/rfc2396.txt">
  302. www.ietf.org/rfc/rfc2396.txt</a></td>
  303. </tr>
  304. <tr>
  305. <td width="13%" valign="top">[<a name="UTF-16">UTF-16</a>]</td>
  306. <td width="87%">Wikipedia, UTF-16,
  307. <a href="http://en.wikipedia.org/wiki/UTF-16">
  308. en.wikipedia.org/wiki/UTF-16</a></td>
  309. </tr>
  310. <tr>
  311. <td width="13%" valign="top">[<a name="Wulf-Shaw-73">Wulf-Shaw-73</a>]</td>
  312. <td width="87%">William Wulf, Mary Shaw, <i>Global
  313. Variable Considered Harmful</i>, ACM SIGPLAN Notices, 8, 2, 1973, pp. 23-34</td>
  314. </tr>
  315. </table>
  316. <hr>
  317. <p>Revised
  318. <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->26 December, 2014<!--webbot bot="Timestamp" endspan i-checksum="38646" --></p>
  319. <p>&copy; Copyright Beman Dawes, 2002</p>
  320. <p> Use, modification, and distribution are subject to the Boost Software
  321. License, Version 1.0. (See accompanying file <a href="../../../LICENSE_1_0.txt">
  322. LICENSE_1_0.txt</a> or copy at <a href="http://www.boost.org/LICENSE_1_0.txt">
  323. www.boost.org/LICENSE_1_0.txt</a>)</p>
  324. </body>
  325. </html>