messages_formatting.txt 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566
  1. //
  2. // Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
  3. //
  4. // Distributed under the Boost Software License, Version 1.0. (See
  5. // accompanying file LICENSE_1_0.txt or copy at
  6. // http://www.boost.org/LICENSE_1_0.txt)
  7. //
  8. // vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen
  9. /*!
  10. \page messages_formatting Messages Formatting (Translation)
  11. - \ref messages_formatting_into
  12. - \ref msg_loading_dictionaries
  13. - \ref message_translation
  14. - \ref indirect_message_translation
  15. - \ref plural_forms
  16. - \ref multiple_gettext_domain
  17. - \ref direct_message_translation
  18. - \ref extracting_messages_from_code
  19. - \ref custom_file_system_support
  20. - \ref msg_non_ascii_keys
  21. - \ref msg_qna
  22. \section messages_formatting_into Introduction
  23. Messages formatting is probably the most important part of
  24. the localization - making your application speak in the user's language.
  25. Boost.Locale uses the <a href="http://www.gnu.org/software/gettext/">GNU Gettext</a> localization model.
  26. We recommend you read the general <a href="http://www.gnu.org/software/gettext/manual/gettext.html">documentation</a>
  27. of GNU Gettext, as it is outside the scope of this document.
  28. The model is following:
  29. - First, our application \c foo is prepared for localization by calling the \ref boost::locale::translate() "translate" function
  30. for each message used in user interface.
  31. \n
  32. For example:
  33. \code
  34. cout << "Hello World" << endl;
  35. \endcode
  36. Is changed to
  37. \n
  38. \code
  39. cout << translate("Hello World") << endl;
  40. \endcode
  41. - Then all messages are extracted from the source code and a special \c foo.po file is generated that contains all of the
  42. original English strings.
  43. \n
  44. \verbatim
  45. ...
  46. msgid "Hello World"
  47. msgstr ""
  48. ...
  49. \endverbatim
  50. - The \c foo.po file is translated for the supported locales. For example, \c de.po, \c ar.po, \c en_CA.po , and \c he.po.
  51. \n
  52. \verbatim
  53. ...
  54. msgid "Hello World"
  55. msgstr "שלום עולם"
  56. \endverbatim
  57. And then compiled to the binary \c mo format and stored in the following file structure:
  58. \n
  59. \verbatim
  60. de
  61. de/LC_MESSAGES
  62. de/LC_MESSAGES/foo.mo
  63. en_CA/
  64. en_CA/LC_MESSAGES
  65. en_CA/LC_MESSAGES/foo.mo
  66. ...
  67. \endverbatim
  68. \n
  69. When the application starts, it loads the required dictionaries. Then when the \c translate function is called and the message is written
  70. to an output stream, a dictionary lookup is performed and the localized message is written out instead.
  71. \section msg_loading_dictionaries Loading dictionaries
  72. All the dictionaries are loaded by the \ref boost::locale::generator "generator" class.
  73. Using localized strings in the application, requires specification
  74. of the following parameters:
  75. -# The search path of the dictionaries
  76. -# The application domain (or name)
  77. This is done by calling the following member functions of the \ref boost::locale::generator "generator" class:
  78. - \ref boost::locale::generator::add_messages_path() "add_messages_path" - add the root path to the dictionaries.
  79. \n
  80. For example: if the dictionary is located at \c /usr/share/locale/ar/LC_MESSAGES/foo.mo, then path should be \c /usr/share/locale.
  81. \n
  82. - \ref boost::locale::generator::add_messages_domain() "add_messages_domain" - add the domain (name) of the application. In the above case it would be "foo".
  83. \note At least one domain and one path should be specified in order to load dictionaries.
  84. This is an example of our first fully localized program:
  85. \code
  86. #include <boost/locale.hpp>
  87. #include <iostream>
  88. using namespace std;
  89. using namespace boost::locale;
  90. int main()
  91. {
  92. generator gen;
  93. // Specify location of dictionaries
  94. gen.add_messages_path(".");
  95. gen.add_messages_domain("hello");
  96. // Generate locales and imbue them to iostream
  97. locale::global(gen(""));
  98. cout.imbue(locale());
  99. // Display a message using current system locale
  100. cout << translate("Hello World") << endl;
  101. }
  102. \endcode
  103. \section message_translation Message Translation
  104. There are two ways to translate messages:
  105. - using \ref boost_locale_translate_family "boost::locale::translate()" family of functions:
  106. \n
  107. These functions create a special proxy object \ref boost::locale::basic_message "basic_message"
  108. that can be converted to string according to given locale or written to \c std::ostream
  109. formatting the message in the \c std::ostream's locale.
  110. \n
  111. It is very convenient for working with \c std::ostream object and for postponing message
  112. translation
  113. - Using \ref boost_locale_gettext_family "boost::locale::gettext()" family of functions:
  114. \n
  115. These are functions that are used for direct message translation: they receive as a parameter
  116. an original message or a key and convert it to the \c std::basic_string in given locale.
  117. \n
  118. These functions have similar names to thous used in the GNU Gettext library.
  119. \subsection indirect_message_translation Indirect Message Translation
  120. The basic function that allows us to translate a message is \ref boost_locale_translate_family "boost::locale::translate()" family of functions.
  121. These functions use a character type \c CharType as template parameter and receive either <tt>CharType const *</tt> or <tt>std::basic_string<CharType></tt> as input.
  122. These functions receive an original message and return a special proxy
  123. object - \ref boost::locale::basic_message "basic_message<CharType>".
  124. This object holds all the required information for the message formatting.
  125. When this object is written to an output \c ostream, it performs a dictionary lookup of the message according to the locale
  126. imbued in \c iostream.
  127. If the message is found in the dictionary it is written to the output stream,
  128. otherwise the original string is written to the stream.
  129. For example:
  130. \code
  131. // Translate a simple message "Hello World!"
  132. std::cout << boost::locale::translate("Hello World!") << std::endl;
  133. \endcode
  134. This allows the program to postpone translation of the message until the translation is actually needed, even to different
  135. locale targets.
  136. \code
  137. // Several output stream that we write a message to
  138. // English, Japanese, Hebrew etc.
  139. // Each one them has installed std::locale object that represents
  140. // their specific locale
  141. std::ofstream en,ja,he,de,ar;
  142. // Send single message to multiple streams
  143. void send_to_all(message const &msg)
  144. {
  145. // in each of the cases below
  146. // the message is translated to different
  147. // language
  148. en << msg;
  149. ja << msg;
  150. he << msg;
  151. de << msg;
  152. ar << msg;
  153. }
  154. int main()
  155. {
  156. ...
  157. send_to_all(translate("Hello World"));
  158. }
  159. \endcode
  160. \note
  161. - \ref boost::locale::basic_message "basic_message" can be implicitly converted
  162. to an apopriate std::basic_string using
  163. the global locale:
  164. \n
  165. \code
  166. std::wstring msg = translate(L"Do you want to open the file?");
  167. \endcode
  168. - \ref boost::locale::basic_message "basic_message" can be explicitly converted
  169. to a string using the \ref boost::locale::basic_message::str() "str()" member function for a specific locale.
  170. \n
  171. \code
  172. std::locale ru_RU = ... ;
  173. std::string msg = translate("Do you want to open the file?").str(ru_RU);
  174. \endcode
  175. \subsection plural_forms Plural Forms
  176. GNU Gettext catalogs have simple, robust and yet powerful plural forms support. We recommend to read the
  177. original GNU documentation <a href="http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms">here</a>.
  178. Let's try to solve a simple problem, displaying a message to the user:
  179. \code
  180. if(files == 1)
  181. cout << translate("You have 1 file in the directory") << endl;
  182. else
  183. cout << format(translate("You have {1} files in the directory")) % files << endl;
  184. \endcode
  185. This very simple task becomes quite complicated when we deal with languages other than English. Many languages have more
  186. than two plural forms. For example, in Hebrew there are special forms for single, double, plural, and plural above 10.
  187. They can't be distinguished by the simple rule "is n 1 or not"
  188. The correct solution is to give a translator an ability to choose a plural form on its own. Thus the translate
  189. function can receive two additional parameters English plural form a number: <tt>translate(single,plural,count)</tt>
  190. For example:
  191. \code
  192. cout << format(translate( "You have {1} file in the directory",
  193. "You have {1} files in the directory",
  194. files)) % files << endl;
  195. \endcode
  196. A special entry in the dictionary specifies the rule to choose the correct plural form in the target language.
  197. For example, the Slavic language family has 3 plural forms, that can be chosen using following equation:
  198. \code
  199. plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
  200. \endcode
  201. Such equation is stored in the message catalog itself and it is evaluated during translation to supply the correct form.
  202. So the code above would display 3 different forms in Russian locale for values of 1, 3 and 5:
  203. \verbatim
  204. У вас есть 1 файл в каталоге
  205. У вас есть 3 файла в каталоге
  206. У вас есть 5 файлов в каталоге
  207. \endverbatim
  208. And for Japanese that does not have plural forms at all it would display the same message
  209. for any numeric value.
  210. For more detailed information please refer to GNU Gettext: <a href="http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms">11.2.6 Additional functions for plural forms</a>
  211. \subsection adding_context_information Adding Context Information
  212. In many cases it is not sufficient to provide only the original English string to get the correct translation.
  213. You sometimes need to provide some context information. In German, for example, a button labeled "open" is translated to
  214. "öffnen" in the context of "opening a file", or to "aufbauen" in the context of opening an internet connection.
  215. In these cases you must add some context information to the original string, by adding a comment.
  216. \code
  217. button->setLabel(translate("File","open"));
  218. \endcode
  219. The context information is provided as the first parameter to the \ref boost::locale::translate() "translate"
  220. function in both singular and plural forms. The translator would see this context information and would be able to translate the
  221. "open" string correctly.
  222. For example, this is how the \c po file would look:
  223. \code
  224. msgctxt "File"
  225. msgid "open"
  226. msgstr "öffnen"
  227. msgctxt "Internet Connection"
  228. msgid "open"
  229. msgstr "aufbauen"
  230. \endcode
  231. \note Context information requires more recent versions of the gettext tools (>=0.15) for extracting strings and
  232. formatting message catalogs.
  233. \subsection multiple_gettext_domain Working with multiple messages domains
  234. In some cases it is useful to work with multiple message domains.
  235. For example, if an application consists of several independent modules, it may
  236. have several domains - a separate domain for each module.
  237. For example, developing a FooBar office suite we might have:
  238. - a FooBar Word Processor, using the "foobarwriter" domain
  239. - a FooBar Spreadsheet, using the "foobarspreadsheet" domain
  240. - a FooBar Spell Checker, using the "foobarspell" domain
  241. - a FooBar File handler, using the "foobarodt" domain
  242. There are three ways to use non-default domains:
  243. - When working with \c iostream, you can use the parameterized manipulator \ref
  244. boost::locale::as::domain "as::domain(std::string const &)", which allows switching domains in a stream:
  245. \n
  246. \code
  247. cout << as::domain("foo") << translate("Hello") << as::domain("bar") << translate("Hello");
  248. // First translation is taken from dictionary foo and the other from dictionary bar
  249. \endcode
  250. - You can specify the domain explicitly when converting a \c message object to a string:
  251. \code
  252. std::wstring foo_msg = translate(L"Hello World").str("foo");
  253. std::wstring bar_msg = translate(L"Hello World").str("bar");
  254. \endcode
  255. - You can specify the domain directly using a \ref direct_message_translation "convenience" interface:
  256. \code
  257. MessageBox(dgettext("gui","Error Occurred"));
  258. \endcode
  259. \subsection direct_message_translation Direct translation (Convenience Interface)
  260. Many applications do not write messages directly to an output stream or use only one locale in the process, so
  261. calling <tt>translate("Hello World").str()</tt> for a single message would be annoying. Thus Boost.Locale provides
  262. GNU Gettext-like localization functions for direct translation of the messages. However, unlike the GNU Gettext functions,
  263. the Boost.Locale translation functions provide an additional optional parameter (locale), and support wide, u16 and u32 strings.
  264. The GNU Gettext like functions prototypes can be found \ref boost_locale_gettext_family "in this section".
  265. All of these functions can have different prefixes for different forms:
  266. - \c d - translation in specific domain
  267. - \c n - plural form translation
  268. - \c p - translation in specific context
  269. \code
  270. MessageBoxW(0,pgettext(L"File Dialog",L"Open?").c_str(),gettext(L"Question").c_str(),MB_YESNO);
  271. \endcode
  272. \section extracting_messages_from_code Extracting messages from the source code
  273. There are many tools to extract messages from the source code into the \c .po file format. The most
  274. popular and "native" tool is \c xgettext which is installed by default on most Unix systems and freely downloadable
  275. for Windows (see \ref gettext_for_windows).
  276. For example, we have a source file called \c dir.cpp that prints:
  277. \code
  278. cout << format(translate("Listing of catalog {1}:")) % file_name << endl;
  279. cout << format(translate("Catalog {1} contains 1 file","Catalog {1} contains {2,num} files",files_no))
  280. % file_name % files_no << endl;
  281. \endcode
  282. Now we run:
  283. \verbatim
  284. xgettext --keyword=translate:1,1t --keyword=translate:1,2,3t dir.cpp
  285. \endverbatim
  286. And a file called \c messages.po created that looks like this (approximately):
  287. \code
  288. #: dir.cpp:1
  289. msgid "Listing of catalog {1}:"
  290. msgstr ""
  291. #: dir.cpp:2
  292. msgid "Catalog {1} contains 1 file"
  293. msgid_plural "Catalog {1} contains {2,num} files"
  294. msgstr[0] ""
  295. msgstr[1] ""
  296. \endcode
  297. This file can be given to translators to adapt it to specific languages.
  298. We used the \c --keyword parameter of \c xgettext to make it suitable for extracting messages from
  299. source code localized with Boost.Locale, searching for <tt>translate()</tt> function calls instead of the default <tt>gettext()</tt>
  300. and <tt>ngettext()</tt> ones.
  301. The first parameter <tt>--keyword=translate:1,1t</tt> provides the template for basic messages: a \c translate function that is
  302. called with 1 argument (1t) and the first message is taken as the key. The second one <tt>--keyword=translate:1,2,3t</tt> is used
  303. for plural forms.
  304. It tells \c xgettext to use a <tt>translate()</tt> function call with 3 parameters (3t) and take the 1st and 2nd parameter as keys. An
  305. additional marker \c Nc can be used to mark context information.
  306. The full set of xgettext parameters suitable for Boost.Locale is:
  307. \code
  308. xgettext --keyword=translate:1,1t --keyword=translate:1c,2,2t \
  309. --keyword=translate:1,2,3t --keyword=translate:1c,2,3,4t \
  310. --keyword=gettext:1 --keyword=pgettext:1c,2 \
  311. --keyword=ngettext:1,2 --keyword=npgettext:1c,2,3 \
  312. source_file_1.cpp ... source_file_N.cpp
  313. \endcode
  314. Of course, if you do not use "gettext" like translation you
  315. may ignore some of these parameters.
  316. \subsection custom_file_system_support Custom Filesystem Support
  317. When the access to actual file system is limited like in ActiveX controls or
  318. when the developer wants to ship all-in-one executable file,
  319. it is useful to be able to load \c gettext catalogs from a custom location -
  320. a custom file system.
  321. Boost.Locale provides an option to install boost::locale::message_format facet
  322. with customized options provided in boost::locale::gnu_gettext::messages_info structure.
  323. This structure contains \c boost::function based
  324. \ref boost::locale::gnu_gettext::messages_info::callback_type "callback"
  325. that allows user to provide custom functionality to load message catalog files.
  326. For example:
  327. \code
  328. // Configure all options for message catalog
  329. namespace blg = boost::locale::gnu_gettext;
  330. blg::messages_info info;
  331. info.language = "he";
  332. info.country = "IL";
  333. info.encoding="UTF-8";
  334. info.paths.push_back(""); // You need some even empty path
  335. info.domains.push_back(blg::messages_info::domain("my_app"));
  336. info.callback = some_file_loader; // Provide a callback
  337. // Create a basic locale without messages support
  338. boost::locale::generator gen;
  339. std::locale base_locale = gen("he_IL.UTF-8");
  340. // Install messages catalogs for "char" support to the final locale
  341. // we are going to use
  342. std::locale real_locale(base_locale,blg::create_messages_facet<char>(info));
  343. \endcode
  344. In order to setup \ref boost::locale::gnu_gettext::messages_info::language "language", \ref boost::locale::gnu_gettext::messages_info::country "country" and other members you may use \ref boost::locale::info facet for convenience,
  345. \code
  346. // Configure all options for message catalog
  347. namespace blg = boost::locale::gnu_gettext;
  348. blg::messages_info info;
  349. info.paths.push_back(""); // You need some even empty path
  350. info.domains.push_back(blg::messages_info::domain("my_app"));
  351. info.callback = some_file_loader; // Provide a callback
  352. // Create an object with default locale
  353. std::locale base_locale = gen("");
  354. // Use boost::locale::info to configure all parameters
  355. boost::locale::info const &properties = std::use_facet<boost::locale::info>(base_locale);
  356. info.language = properties.language();
  357. info.country = properties.country();
  358. info.encoding = properties.encoding();
  359. info.variant = properties.variant();
  360. // Install messages catalogs to the final locale
  361. std::locale real_locale(base_locale,blg::create_messages_facet<char>(info));
  362. \endcode
  363. \section msg_non_ascii_keys Non US-ASCII Keys
  364. Boost.Locale assumes that you use English for original text messages. And the best
  365. practice is to use US-ASCII characters for original keys.
  366. However in some cases it us useful in insert some Unicode characters in text like
  367. for example Copyright "©" character.
  368. As long as your narrow character string encoding is UTF-8 nothing further should be done.
  369. Boost.Locale assumes that your sources are encoded in UTF-8 and the input narrow
  370. string use UTF-8 - which is the default for most compilers around (with notable
  371. exception of Microsoft Visual C++).
  372. However if your narrow strings encoding in the source file is not UTF-8 but some other
  373. encoding like windows-1252, the string would be misinterpreted.
  374. You can specify the character set of the original strings when you specify the
  375. domain name for the application.
  376. \code
  377. #include <boost/locale.hpp>
  378. #include <iostream>
  379. using namespace std;
  380. using namespace boost::locale;
  381. int main()
  382. {
  383. generator gen;
  384. // Specify location of dictionaries
  385. gen.add_messages_path(".");
  386. // Specify the encoding of the source string
  387. gen.add_messages_domain("copyrighted/windows-1255");
  388. // Generate locales and imbue them to iostream
  389. locale::global(gen(""));
  390. cout.imbue(locale());
  391. // In Windows 1255 (C) symbol is encoded as 0xA9
  392. cout << translate("© 2001 All Rights Reserved") << endl;
  393. }
  394. \endcode
  395. Thus if the programs runs in UTF-8 locale the copyright symbol would
  396. be automatically converted to an appropriate UTF-8 sequence if the
  397. key is missing in the dictionary.
  398. \subsection msg_qna Questions and Answers
  399. - Do I need GNU Gettext to use Boost.Locale?
  400. \n
  401. Boost.Locale provides a run-time environment to load and use GNU Gettext message catalogs, but it does
  402. not provide tools for generation, translation, compilation and management of these catalogs.
  403. Boost.Locale only reimplements the GNU Gettext libintl.
  404. \n
  405. You would probably need:
  406. \n
  407. -# Boost.Locale itself -- for runtime.
  408. -# A tool for extracting strings from source code, and managing them: GNU Gettext provides good tools, but other
  409. implementations are available as well.
  410. -# A good translation program like <a href="http://userbase.kde.org/Lokalize">Lokalize</a>, <a href="http://www.poedit.net/">Pedit</a> or <a href="http://projects.gnome.org/gtranslator/">GTranslator</a>.
  411. - Why doesn't Boost.Locale provide tools for extracting and management of message catalogs. Why should
  412. I use GPL-ed software? Are my programs or message catalogs affected by its license?
  413. \n
  414. -# Boost.Locale does not link to or use any of the GNU Gettext code, so you need not worry about your code as
  415. the runtime library is fully reimplemented.
  416. -# You may freely use GPL-ed software for extracting and managing catalogs, the same way as you are free to use
  417. a GPL-ed editor. It does not affect your message catalogs or your code.
  418. -# I see no reason to reimplement well debugged, working tools like \c xgettext, \c msgfmt, \c msgmerge that
  419. do a very fine job, especially as they are freely available for download and support almost any platform.
  420. All Linux distributions, BSD Flavors, Mac OS X and other Unix like operating systems provide GNU Gettext tools
  421. as a standard package.\n
  422. Windows users can get GNU Gettext utilities via MinGW project. See \ref gettext_for_windows.
  423. - Is there any reason to prefer the Boost.Locale implementation to the original GNU Gettext runtime library?
  424. In either case I would probably need some of the GNU tools.
  425. \n
  426. There are two important differences between the GNU Gettext runtime library and the Boost.Locale implementation:
  427. \n
  428. -# The GNU Gettext runtime supports only one locale per process. It is not thread-safe to use multiple locales
  429. and encodings in the same process. This is perfectly fine for applications that interact directly with
  430. a single user like most GUI applications, but is problematic for services and servers.
  431. -# The GNU Gettext API supports only 8-bit encodings, making it irrelevant in environments that natively use
  432. wide strings.
  433. -# The GNU Gettext runtime library distributed under LGPL license which may be not convenient for some users.
  434. */