simple_cxx11_metaprogramming_2.adoc 32 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984
  1. ////
  2. Copyright 2015-2017 Peter Dimov
  3. Distributed under the Boost Software License, Version 1.0.
  4. See accompanying file LICENSE_1_0.txt or copy at
  5. http://www.boost.org/LICENSE_1_0.txt
  6. ////
  7. # Simple {cpp}11 metaprogramming, part 2
  8. Peter Dimov
  9. 2015-06-20
  10. :toc: left
  11. :idprefix:
  12. :docinfo: shared-footer
  13. [.lead]
  14. __Efficient algorithms for membership testing, random access, and retrieval by
  15. key__
  16. NOTE: Being late to the metaprogramming party, I make no claim of having
  17. invented the techniques in this article. A quick look at the implementations
  18. of, for example, Louis Dionne's https://github.com/ldionne/mpl11[mpl11] and
  19. Eric Niebler's https://github.com/ericniebler/meta[meta], shows that most of
  20. these tricks are already known. Dave Abrahams
  21. https://github.com/dabrahams/mpl11[has experimented] along these lines in 2012.
  22. The original inventor of the multiple inheritance trick and the `void*`
  23. arguments trick is probably Richard Smith, who has posted
  24. https://llvm.org/bugs/attachment.cgi?id=8825[two]
  25. https://llvm.org/bugs/attachment.cgi?id=8838[examples] in response to
  26. https://llvm.org/bugs/show_bug.cgi?id=13263[a Clang bug report].
  27. ## Vectors, sets, and maps
  28. <<simple_cxx11_metaprogramming.adoc#,Last time>>, I outlined a style of
  29. metaprogramming that operated on type lists -- variadic class templates:
  30. ```
  31. template<class... T> struct mp_list {};
  32. ```
  33. Classic Lisp uses lists as its only data structure, but operating on a list is
  34. usually linear in the number of its elements.
  35. In addition to `list`, the STL has `vector`, `set`, and `map`. `vector`
  36. supports random access by index; `set` has efficient test for membership; `map`
  37. associates keys with values and has efficient lookup based on key.
  38. Instead of introducing separate data structure such as `mp_vector`, `mp_set`,
  39. `mp_map`, we'll keep our data in a list form, and attempt to provide efficient
  40. algorithms for random access, membership testing, and lookup.
  41. ## mp_contains
  42. Let's starts with sets. A set is just a list with unique elements. To obtain a
  43. set from an arbitrary list, we'll need an algorithm that removes the
  44. duplicates. Let's call it `mp_unique<L>`:
  45. [subs=+quotes]
  46. ```
  47. // mp_if
  48. template<bool C, class T, class E> struct mp_if_c_impl;
  49. template<class T, class E> struct mp_if_c_impl<true, T, E>
  50. {
  51. using type = T;
  52. };
  53. template<class T, class E> struct mp_if_c_impl<false, T, E>
  54. {
  55. using type = E;
  56. };
  57. template<bool C, class T, class E>
  58. using mp_if_c = typename mp_if_c_impl<C, T, E>::type;
  59. template<class C, class T, class E>
  60. using mp_if = typename mp_if_c_impl<C::value != 0, T, E>::type;
  61. // mp_unique
  62. template<class L> struct mp_unique_impl;
  63. template<class L> using mp_unique = typename mp_unique_impl<L>::type;
  64. template<template<class...> class L> struct mp_unique_impl<L<>>
  65. {
  66. using type = L<>;
  67. };
  68. template<template<class...> class L, class T1, class... T>
  69. struct mp_unique_impl<L<T1, T...>>
  70. {
  71. using _rest = mp_unique<L<T...>>;
  72. using type = mp_if<**mp_contains**<_rest, T1>, _rest, mp_push_front<_rest, T1>>;
  73. };
  74. ```
  75. For membership testing, we've introduced an algorithm `mp_contains<L, V>` that
  76. returns `true` when `L` contains `V`. The straightforward recursive
  77. implementation of `mp_contains` is:
  78. ```
  79. template<class L, class V> struct mp_contains_impl;
  80. template<class L, class V> using mp_contains = typename mp_contains_impl<L, V>::type;
  81. template<template<class...> class L, class V> struct mp_contains_impl<L<>, V>
  82. {
  83. using type = std::false_type;
  84. };
  85. template<template<class...> class L, class... T, class V>
  86. struct mp_contains_impl<L<V, T...>, V>
  87. {
  88. using type = std::true_type;
  89. };
  90. template<template<class...> class L, class T1, class... T, class V>
  91. struct mp_contains_impl<L<T1, T...>, V>: mp_contains_impl<L<T...>, V>
  92. {
  93. };
  94. ```
  95. Note that `mp_unique<L>` makes `N` calls to `mp_contains`, where `N` is the
  96. length of the list `L`. This means that `mp_contains` needs to be as fast as
  97. possible, which the above implementation, well, isn't.
  98. Here are the compile times in seconds for invoking `mp_unique` on a list with
  99. `N` (distinct) elements:
  100. |===
  101. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  102. |VC$$++$$ 2013, recursive |2.1 |DNF ||||||
  103. |clang$$++$$ 3.5.1, recursive |0.9 |4.5 |13.2 |30.2 |DNF |||
  104. |g$$++$$ 4.9.2, recursive |0.7 |3.6 |10.4 |23.2 |DNF |||
  105. |===
  106. (Tests done under Windows/Cygwin. All compilers are 32 bit. No optimizations.
  107. DNF stands for "did not finish", which usually means that the compiler ran out
  108. of heap space or crashed.)
  109. We clearly need a better alternative.
  110. I ended the previous article with an implementation of `mp_contains` that
  111. relied on `mp_count`, which in turn relied on `mp_plus`. Let's see how it
  112. fares:
  113. |===
  114. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  115. |VC$$++$$ 2013, mp_count/mp_plus |1.1 |9.8 |50.5 |DNF ||||
  116. |clang$$++$$ 3.5.1, mp_count/mp_plus |0.5 |1.4 |3.1 |6.1 |DNF |||
  117. |g$$++$$ 4.9.2, mp_count/mp_plus |0.5 |1.3 |2.9 |5.8 |9.7 |15.6 |22.4 |32.3
  118. |===
  119. Not _that_ bad, at least if your compiler happens to be `g$$++$$`. Still, there
  120. ought to be room for improvement here.
  121. To do better, we have to somehow leverage the language features, such as pack
  122. expansion, to do more of the work for us. For inspiration, let's turn to
  123. section 14.5.3 paragraph 4 of the {cpp}11 standard, which explains that pack
  124. expansions can occur in the following contexts:
  125. * **In a function parameter pack (8.3.5); the pattern is the
  126. __parameter-declaration__ without the ellipsis.**
  127. * In a template parameter pack that is a pack expansion (14.1):
  128. * **In an __initializer-list__ (8.5); the pattern is an
  129. __initializer-clause__.**
  130. * **In a __base-specifier-list__ (Clause 10); the pattern is a
  131. __base-specifier__.**
  132. * In a __mem-initializer-list__ (12.6.2); the pattern is a
  133. __mem-initializer__.
  134. * In a __template-argument-list__ (14.3); the pattern is a
  135. __template-argument__.
  136. * In a __dynamic-exception-specification__ (15.4); the pattern is a
  137. __type-id__.
  138. * In an __attribute-list__ (7.6.1); the pattern is an __attribute__.
  139. * In an __alignment-specifier__ (7.6.2); the pattern is the
  140. __alignment-specifier__ without the ellipsis.
  141. * In a __capture-list__ (5.1.2); the pattern is a __capture__.
  142. * In a `sizeof$$...$$` expression (5.3.3); the pattern is an __identifier__.
  143. The **emphasis** is mine and indicates possible leads.
  144. Our first option is to expand the parameter pack into arguments for a function
  145. call. Since we're interested in operations that occur at compile time, calling
  146. a function may not appear useful; but {cpp}11 functions can be `constexpr`, and
  147. `constexpr` function "calls" do occur at compile time.
  148. Recall our `mp_count`:
  149. ```
  150. template<class L, class V> struct mp_count_impl;
  151. template<template<class...> class L, class... T, class V>
  152. struct mp_count_impl<L<T...>, V>
  153. {
  154. using type = mp_plus<std::is_same<T, V>...>;
  155. };
  156. template<class L, class V> using mp_count = typename mp_count_impl<L, V>::type;
  157. ```
  158. Instead of using the template alias `mp_plus` to sum the `is_same` expressions,
  159. we can use a `constexpr` function:
  160. ```
  161. constexpr std::size_t cx_plus()
  162. {
  163. return 0;
  164. }
  165. template<class T1, class... T> constexpr std::size_t cx_plus(T1 t1, T... t)
  166. {
  167. return t1 + cx_plus(t...);
  168. }
  169. // mp_size_t
  170. template<std::size_t N> using mp_size_t = std::integral_constant<std::size_t, N>;
  171. // mp_count
  172. template<class L, class V> struct mp_count_impl;
  173. template<template<class...> class L, class... T, class V>
  174. struct mp_count_impl<L<T...>, V>
  175. {
  176. using type = mp_size_t<cx_plus(std::is_same<T, V>::value...)>;
  177. };
  178. template<class L, class V> using mp_count = typename mp_count_impl<L, V>::type;
  179. ```
  180. with the following results:
  181. |===
  182. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  183. |clang$$++$$ 3.5.1, mp_count/cx_plus |0.4 |1.1 |2.5 |5.0 |DNF |||
  184. |g$$++$$ 4.9.2, mp_count/cx_plus |0.4 |0.9 |1.7 |2.9 |4.7 |6.7 |9.2 |11.8
  185. |===
  186. We've improved the times, but lost VC$$++$$ 2013 due to its not implementing
  187. `constexpr`.
  188. Let's try pack expansion into an __initializer-list__. Instead of passing the
  189. `is_same` expressions to a function, we can build a constant array out of them,
  190. then sum the array with a `constexpr` function:
  191. ```
  192. constexpr std::size_t cx_plus2(bool const * first, bool const * last)
  193. {
  194. return first == last? 0: *first + cx_plus2(first + 1, last);
  195. }
  196. // mp_count
  197. template<class L, class V> struct mp_count_impl;
  198. template<template<class...> class L, class... T, class V>
  199. struct mp_count_impl<L<T...>, V>
  200. {
  201. static constexpr bool _v[] = { std::is_same<T, V>::value... };
  202. using type = mp_size_t<cx_plus2(_v, _v + sizeof...(T))>;
  203. };
  204. template<class L, class V> using mp_count = typename mp_count_impl<L, V>::type;
  205. ```
  206. This is a neat trick, but is it fast?
  207. |===
  208. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  209. |clang$$++$$ 3.5.1, mp_count/cx_plus2 |0.4 |0.9 |1.8 |DNF ||||
  210. |g$$++$$ 4.9.2, mp_count/cx_plus2 |0.4 |0.9 |1.9 |3.4 |5.4 |7.8 |11.0 |14.7
  211. |===
  212. That's a bit disappointing. Let's see what can we do with expanding a parameter
  213. pack into a base-specifier-list. We would be able to define a class that
  214. derives from every element of the pack:
  215. ```
  216. struct U: T... {};
  217. ```
  218. We can then use `std::is_base_of<V, U>` to test whether a type `V` is a base of
  219. `U`, that is, whether it's one of the elements of the parameter pack. Which is
  220. exactly what we need.
  221. Arbitrary types such as `void`, `int`, or `void(int)` can't be used as base
  222. classes, but we'll wrap the types in an empty class template, which we'll call
  223. `mp_identity`.
  224. ```
  225. template<class T> struct mp_identity
  226. {
  227. using type = T;
  228. };
  229. template<class L, class V> struct mp_contains_impl;
  230. template<class L, class V> using mp_contains = typename mp_contains_impl<L, V>::type;
  231. template<template<class...> class L, class... T, class V>
  232. struct mp_contains_impl<L<T...>, V>
  233. {
  234. struct U: mp_identity<T>... {};
  235. using type = std::is_base_of<mp_identity<V>, U>;
  236. };
  237. ```
  238. Performance?
  239. |===
  240. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  241. |VC$$++$$ 2013, is_base_of |0.3 |0.6 |1.3 |2.5 |DNF |||
  242. |clang$$++$$ 3.5.1, is_base_of |0.3 |0.4 |0.6 |0.8 |DNF |||
  243. |g$$++$$ 4.9.2, is_base_of |0.3 |0.4 |0.6 |0.9 |1.3 |1.7 |2.3 |3.0
  244. |===
  245. This implementation is a clear winner.
  246. In fairness, we ought to note that the first four implementations of
  247. `mp_contains` do not rely on the list elements being unique. This makes
  248. `mp_contains` an algorithm that supports arbitrary lists, not just sets.
  249. The `is_base_of` implementation, however, does not support lists that contain
  250. duplicates, because it's not possible to inherit directly from the same type
  251. twice. So it does not implement the general `mp_contains`, but something that
  252. should probably be named `mp_set_contains`.
  253. We can avoid the "no duplicates" requirement by modifying the implementation to
  254. inherit from `mp_identity<T>` indirectly, via an intermediate base class:
  255. [subs=+macros]
  256. ```
  257. // indirect_inherit
  258. template<std::size_t I, class T> struct inherit_second: T {};
  259. template<class L, class S> struct indirect_inherit_impl;
  260. template<template<class...> class L, class... T, std::size_t... J>
  261. struct indirect_inherit_impl<L<T...>, http://en.cppreference.com/w/cpp/utility/integer_sequence[integer_sequence]<std::size_t, J...>>:
  262. inherit_second<J, mp_identity<T>>... {};
  263. template<class L> using indirect_inherit =
  264. indirect_inherit_impl<L, http://en.cppreference.com/w/cpp/utility/integer_sequence[make_index_sequence]<mp_size<L>::value>>;
  265. // mp_contains
  266. template<class L, class V> struct mp_contains_impl
  267. {
  268. using U = indirect_inherit<L>;
  269. using type = std::is_base_of<mp_identity<V>, U>;
  270. };
  271. template<class L, class V> using mp_contains = typename mp_contains_impl<L, V>::type;
  272. ```
  273. This, however, pretty much nullifies the spectacular performance gains we've
  274. observed with the original `is_base_of`-based implementation:
  275. |===
  276. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  277. |VC$$++$$ 2013, recursive |2.1 |DNF ||||||
  278. |VC$$++$$ 2013, mp_count/mp_plus |1.1 |9.8 |50.5 |DNF ||||
  279. |VC$$++$$ 2013, is_base_of |0.3 |0.6 |1.3 |2.5 |DNF |||
  280. |VC$$++$$ 2013, is_base_of/indirect |1.0 |9.3 |49.5 |153.8 |DNF |||
  281. |===
  282. |===
  283. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  284. |clang$$++$$ 3.5.1, recursive |0.9 |4.5 |13.2 |30.2 |DNF |||
  285. |clang$$++$$ 3.5.1, mp_count/mp_plus |0.5 |1.4 |3.1 |6.1 |DNF |||
  286. |clang$$++$$ 3.5.1, mp_count/cx_plus |0.4 |1.1 |2.5 |5.0 |DNF |||
  287. |clang$$++$$ 3.5.1, mp_count/cx_plus2 |0.4 |0.9 |1.8 |DNF ||||
  288. |clang$$++$$ 3.5.1, is_base_of |0.3 |0.4 |0.6 |0.8 |DNF |||
  289. |clang$$++$$ 3.5.1, is_base_of/indirect |0.4 |0.9 |1.6 |2.5 |DNF |||
  290. |===
  291. |===
  292. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  293. |g$$++$$ 4.9.2, recursive |0.7 |3.6 |10.4 |23.2 |DNF |||
  294. |g$$++$$ 4.9.2, mp_count/mp_plus |0.5 |1.3 |2.9 |5.8 |9.7 |15.6 |22.4 |32.3
  295. |g$$++$$ 4.9.2, mp_count/cx_plus |0.4 |0.9 |1.7 |2.9 |4.7 |6.7 |9.2 |11.8
  296. |g$$++$$ 4.9.2, mp_count/cx_plus2 |0.4 |0.9 |1.9 |3.4 |5.4 |7.8 |11.0 |14.7
  297. |g$$++$$ 4.9.2, is_base_of |0.3 |0.4 |0.6 |0.9 |1.3 |1.7 |2.3 |3.0
  298. |g$$++$$ 4.9.2, is_base_of/indirect |0.5 |1.1 |2.3 |4.0 |6.6 |9.8 |13.6 |18.2
  299. |===
  300. ## mp_map_find
  301. A map, in the STL sense, is a data structure that associates keys with values
  302. and can efficiently retrieve, given a key, its associated value. For our
  303. purposes, a map will be any list of lists for which the inner lists have at
  304. least one element, the key; the rest of the elements we'll consider to be the
  305. associated value. For example, the list
  306. ```
  307. [[A, B], [C, D, E], [F], [G, H]]
  308. ```
  309. is a map with keys `A`, `C`, `F`, and `G`, with associated values `[B]`,
  310. `[D, E]`, `[]`, and `[H]`, respectively. We'll require unique keys, for reasons
  311. that'll become evident later.
  312. I'll show two other examples of maps, this time using real {cpp} code:
  313. ```
  314. using Map = mp_list<mp_list<int, int*>, mp_list<void, void*>, mp_list<char, char*>>;
  315. ```
  316. ```
  317. using Map2 = std::tuple<std::pair<int, int[2]>, std::pair<char, char[2]>>;
  318. ```
  319. The Lisp name of the algorithm that performs retrieval based on key is `ASSOC`,
  320. but I'll call it `mp_map_find`. `mp_map_find<M, K>` returns the element of `M`
  321. whose first element is `K`. For example, `mp_map_find<Map2, int>` would return
  322. `std::pair<int, int[2]>`. If there's no such key, it returns `void`.
  323. There's almost no need to implement and benchmark the recursive version of
  324. `mp_map_find` -- we can be pretty sure it will perform horribly. Still,
  325. ```
  326. template<class M, class K> struct mp_map_find_impl;
  327. template<class M, class K> using mp_map_find = typename mp_map_find_impl<M, K>::type;
  328. template<template<class...> class M, class K> struct mp_map_find_impl<M<>, K>
  329. {
  330. using type = void;
  331. };
  332. template<template<class...> class M, class T1, class... T, class K>
  333. struct mp_map_find_impl<M<T1, T...>, K>
  334. {
  335. using type = mp_if<std::is_same<mp_front<T1>, K>, T1, mp_map_find<M<T...>, K>>;
  336. };
  337. ```
  338. The compile time, in seconds, for `N` lookups into a map of size `N`, is as
  339. follows:
  340. |===
  341. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  342. |VC$$++$$ 2013, recursive |38.2 |DNF ||||||
  343. |clang$$++$$ 3.5.1, recursive |2.5 |13.7 |DNF |||||
  344. |g$$++$$ 4.9.2, recursive |1.9 |10.2 |28.8 |DNF ||||
  345. |===
  346. I told you there was no point.
  347. But, I hear some of you say, you're evaluating the else branch even if the
  348. condition is true, and that's horribly inefficient!
  349. Well, this would only improve the performance by a factor of approximately two
  350. on average, and only if the element is present, but fine, let's try it. The
  351. element happens to be present in the benchmark, so let's see.
  352. ```
  353. // mp_eval_if
  354. template<bool C, class T, template<class...> class E, class... A>
  355. struct mp_eval_if_c_impl;
  356. template<class T, template<class...> class E, class... A>
  357. struct mp_eval_if_c_impl<true, T, E, A...>
  358. {
  359. using type = T;
  360. };
  361. template<class T, template<class...> class E, class... A>
  362. struct mp_eval_if_c_impl<false, T, E, A...>
  363. {
  364. using type = E<A...>;
  365. };
  366. template<bool C, class T, template<class...> class E, class... A>
  367. using mp_eval_if_c = typename mp_eval_if_c_impl<C, T, E, A...>::type;
  368. template<class C, class T, template<class...> class E, class... A>
  369. using mp_eval_if = typename mp_eval_if_c_impl<C::value != 0, T, E, A...>::type;
  370. // mp_map_find
  371. template<class M, class K> struct mp_map_find_impl;
  372. template<class M, class K> using mp_map_find = typename mp_map_find_impl<M, K>::type;
  373. template<template<class...> class M, class K> struct mp_map_find_impl<M<>, K>
  374. {
  375. using type = void;
  376. };
  377. template<template<class...> class M, class T1, class... T, class K>
  378. struct mp_map_find_impl<M<T1, T...>, K>
  379. {
  380. using type = mp_eval_if<std::is_same<mp_front<T1>, K>, T1, mp_map_find, M<T...>, K>;
  381. };
  382. ```
  383. There you go:
  384. |===
  385. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  386. |VC$$++$$ 2013, recursive |15.6 |DNF ||||||
  387. |clang$$++$$ 3.5.1, recursive |1.8 |9.5 |DNF |||||
  388. |g$$++$$ 4.9.2, recursive |1.4 |7.0 |19.7 |DNF ||||
  389. |===
  390. I told you there was no point.
  391. Point or no, to establish that the recursive implementation is inefficient is
  392. not the same as to come up with an efficient one. There are two things that
  393. make the `mp_contains` techniques inapplicable to our present case: first,
  394. `mp_contains` only had to return true or false, whereas `mp_map_find` returns a
  395. type, and second, in `mp_contains` we knew the exact type of the element for
  396. which we were looking, whereas here, we only know its `mp_front`.
  397. Fortunately, there does exist a language feature that can solve both: {cpp} can
  398. deduce the template parameters of base classes when passed a derived class. In
  399. this example,
  400. ```
  401. struct K1 {};
  402. struct V1 {};
  403. struct X: std::pair<K1, V1> {};
  404. template<class A, class B> void f(std::pair<A, B> const & p);
  405. int main()
  406. {
  407. f(X());
  408. }
  409. ```
  410. the call to `f(X())` deduces `A` as `K1` and `B` as `V1`. If we have more than
  411. one `std::pair` base class, we can fix `A` to be `K1`:
  412. ```
  413. struct K1 {};
  414. struct V1 {};
  415. struct K2 {};
  416. struct V2 {};
  417. struct X: std::pair<K1, V1>, std::pair<K2, V2> {};
  418. template<class B> void f(std::pair<K1, B> const & p);
  419. int main()
  420. {
  421. f(X());
  422. }
  423. ```
  424. and `B` will be deduced as `V1`.
  425. We can retrieve the results of the deduction by returning the type we want:
  426. ```
  427. template<class B> std::pair<K1, B> f(std::pair<K1, B> const & p);
  428. ```
  429. and then using `decltype(f(X()))` to obtain this return type.
  430. What if `X` doesn't have a base of type `std::pair<K1, B>`? The deduction will
  431. fail and we'll get an error that `f(X())` cannot be called. To avoid it, we can
  432. add an overload of `f` that takes anything and returns `void`. But in this
  433. case, what will happen if `X` has two bases of the form that match the first
  434. `f` overload, such as for example `std::pair<K1, Y>` and `std::pair<K1, Z>`?
  435. The deduction will fail, the second overload will again be chosen and we'll get
  436. `void`. This is why we require maps to have unique keys.
  437. Here's an implementation of `mp_map_find` based on this technique:
  438. ```
  439. template<class M, class K> struct mp_map_find_impl;
  440. template<class M, class K>
  441. using mp_map_find = typename mp_map_find_impl<M, K>::type;
  442. template<template<class...> class M, class... T, class K>
  443. struct mp_map_find_impl<M<T...>, K>
  444. {
  445. struct U: mp_identity<T>... {};
  446. template<template<class...> class L, class... U>
  447. static mp_identity<L<K, U...>>
  448. f( mp_identity<L<K, U...>>* );
  449. static mp_identity<void> f( ... );
  450. using V = decltype( f((U*)0) );
  451. using type = typename V::type;
  452. };
  453. ```
  454. and its corresponding compile times:
  455. |===
  456. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  457. |VC$$++$$ 2013, deduction |0.3 |0.7 |1.8 |3.6 |6.4 |10.4 |16.2 |DNF
  458. |clang$$++$$ 3.5.1, deduction |0.3 |0.4 |0.6 |0.9 |1.2 |1.6 |2.2 |2.7
  459. |g$$++$$ 4.9.2, deduction |0.3 |0.5 |0.9 |1.6 |2.3 |3.4 |4.7 |6.3
  460. |===
  461. This looks ready to ship.
  462. The implementation contains one inefficiency though. If we evaluate
  463. `mp_map_find<M, K1>`, then `mp_map_find<M, K2>`, the two nested `U` types are
  464. the same as they only depend on `M`, but the compiler doesn't know that and
  465. will instantiate each one separately. We should move this type outside
  466. `mp_map_find_impl` so that it can be reused:
  467. [subs=+quotes]
  468. ```
  469. template<class... T> struct **mp_inherit**: T... {};
  470. template<class M, class K> struct mp_map_find_impl;
  471. template<class M, class K>
  472. using mp_map_find = typename mp_map_find_impl<M, K>::type;
  473. template<template<class...> class M, class... T, class K>
  474. struct mp_map_find_impl<M<T...>, K>
  475. {
  476. **using U = mp_inherit<mp_identity<T>...>;**
  477. template<template<class...> class L, class... U>
  478. static mp_identity<L<K, U...>>
  479. f( mp_identity<L<K, U...>>* );
  480. static mp_identity<void> f( ... );
  481. using V = decltype( f((U*)0) );
  482. using type = typename V::type;
  483. };
  484. ```
  485. (This same optimization, by the way, applies to our `is_base_of` implementation
  486. of `mp_contains`.)
  487. The improvement in compile times on our benchmark is measurable:
  488. |===
  489. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  490. |VC$$++$$ 2013, deduction+mp_inherit |0.3 |0.6 |1.4 |2.6 |4.5 |7.1 |10.7 |DNF
  491. |clang$$++$$ 3.5.1, deduction+mp_inherit |0.3 |0.4 |0.6 |0.8 |1.0 |1.4 |1.8 |2.2
  492. |g$$++$$ 4.9.2, deduction+mp_inherit |0.3 |0.4 |0.6 |0.9 |1.3 |1.8 |2.3 |2.9
  493. |===
  494. ## mp_at
  495. With sets and maps covered, it's time to tackle vectors. Vectors for us are
  496. just lists, to which we'll need to add the ability to efficiently access an
  497. element based on its index. The customary name for this accessor is
  498. `mp_at<L, I>`, where `L` is a list and `I` is an `integral_constant` that
  499. represents the index. We'll also follow the Boost.MPL convention and add
  500. `mp_at_c<L, I>`, where `I` is the index of type `size_t`.
  501. The recursive implementation of `mp_at` is:
  502. ```
  503. template<class L, std::size_t I> struct mp_at_c_impl;
  504. template<class L, std::size_t I> using mp_at_c = typename mp_at_c_impl<L, I>::type;
  505. template<class L, class I> using mp_at = typename mp_at_c_impl<L, I::value>::type;
  506. template<template<class...> class L, class T1, class... T>
  507. struct mp_at_c_impl<L<T1, T...>, 0>
  508. {
  509. using type = T1;
  510. };
  511. template<template<class...> class L, class T1, class... T, std::size_t I>
  512. struct mp_at_c_impl<L<T1, T...>, I>
  513. {
  514. using type = mp_at_c<L<T...>, I-1>;
  515. };
  516. ```
  517. and the compile times for making `N` calls to `mp_at` with a list of size `N`
  518. as the first argument are:
  519. |===
  520. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  521. |VC$$++$$ 2013, recursive |3.6 |DNF ||||||
  522. |clang$$++$$ 3.5.1, recursive |1.0 |5.1 |15.3 |DNF ||||
  523. |g$$++$$ 4.9.2, recursive |0.9 |4.7 |14.2 |32.4 |DNF |||
  524. |===
  525. To improve upon this appalling result, we'll again exploit pack expansion into a
  526. function call, but in a novel way. Let's suppose that we need to access the
  527. fourth element (`I = 3`). We'll generate the function signature
  528. ```
  529. template<class W> W f( void*, void*, void*, W*, ... );
  530. ```
  531. and then, given a list `L<T1, T2, T3, T4, T5, T6, T7>`, we'll evaluate the
  532. expression
  533. ```
  534. decltype( f( (T1*)0, (T2*)0, (T3*)0, (T4*)0, (T5*)0, (T6*)0, (T7*)0 ) )
  535. ```
  536. The three `void*` parameters will eat the first three elements, `W` will be
  537. deduced as the fourth, and the ellipsis will take care of the rest.
  538. A working implementation based on this technique is shown below:
  539. ```
  540. // mp_repeat_c
  541. template<std::size_t N, class... T> struct mp_repeat_c_impl
  542. {
  543. using _l1 = typename mp_repeat_c_impl<N/2, T...>::type;
  544. using _l2 = typename mp_repeat_c_impl<N%2, T...>::type;
  545. using type = mp_append<_l1, _l1, _l2>;
  546. };
  547. template<class... T> struct mp_repeat_c_impl<0, T...>
  548. {
  549. using type = mp_list<>;
  550. };
  551. template<class... T> struct mp_repeat_c_impl<1, T...>
  552. {
  553. using type = mp_list<T...>;
  554. };
  555. template<std::size_t N, class... T> using mp_repeat_c =
  556. typename mp_repeat_c_impl<N, T...>::type;
  557. // mp_at
  558. template<class L, class L2> struct mp_at_c_impl;
  559. template<template<class...> class L, class... T,
  560. template<class...> class L2, class... U>
  561. struct mp_at_c_impl<L<T...>, L2<U...>>
  562. {
  563. template<class W> static W f( U*..., W*, ... );
  564. using R = decltype( f( (mp_identity<T>*)0 ... ) );
  565. using type = typename R::type;
  566. };
  567. template<class L, std::size_t I> using mp_at_c =
  568. typename mp_at_c_impl<L, mp_repeat_c<I, void>>::type;
  569. template<class L, class I> using mp_at = mp_at_c<L, I::value>;
  570. ```
  571. and the compile times in the following table show it to be good enough for most
  572. practical purposes.
  573. |===
  574. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  575. |VC$$++$$ 2013, void* |0.4 |1.1 |2.4 |4.7 |DNF |||
  576. |clang$$++$$ 3.5.1, void* |0.4 |0.7 |1.2 |1.9 |2.7 |3.8 |5.0 |6.6
  577. |g$$++$$ 4.9.2, void* |0.3 |0.5 |0.9 |1.3 |2.1 |3.0 |4.2 |5.5
  578. |===
  579. Are we done with `mp_at`, then?
  580. Let's try something else -- transform the input list `[T1, T2, T3]` into a map
  581. `[[0, T1], [1, T2], [2, T3]]`, then use `mp_map_find` for the lookup:
  582. [subs=+macros]
  583. ```
  584. // mp_map_from_list
  585. template<class L, class S> struct mp_map_from_list_impl;
  586. template<template<class...> class L, class... T, std::size_t... J>
  587. struct mp_map_from_list_impl<L<T...>, http://en.cppreference.com/w/cpp/utility/integer_sequence[integer_sequence]<std::size_t, J...>>
  588. {
  589. using type = mp_list<mp_list<mp_size_t<J>, T>...>;
  590. };
  591. template<class L> using mp_map_from_list = typename mp_map_from_list_impl<L,
  592. http://en.cppreference.com/w/cpp/utility/integer_sequence[make_index_sequence]<mp_size<L>::value>>::type;
  593. // mp_at
  594. template<class L, std::size_t I> struct mp_at_c_impl
  595. {
  596. using map = mp_map_from_list<L>;
  597. using type = mp_second<mp_map_find<map, mp_size_t<I>>>;
  598. };
  599. template<class L, std::size_t I> using mp_at_c = typename mp_at_c_impl<L, I>::type;
  600. template<class L, class I> using mp_at = typename mp_at_c_impl<L, I::value>::type;
  601. ```
  602. At first sight, this looks ridiculous, but metaprogramming has its own rules.
  603. Let's measure:
  604. |===
  605. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  606. |VC$$++$$ 2013, map |0.3 |0.7 |1.5 |2.9 |5.0 |7.8 |11.9 |DNF
  607. |clang$$++$$ 3.5.1, map |0.3 |0.4 |0.6 |0.8 |1.1 |1.5 |1.8 |2.3
  608. |g$$++$$ 4.9.2, map |0.3 |0.4 |0.7 |1.0 |1.4 |1.9 |2.5 |3.2
  609. |===
  610. Surprise, this is the best implementation.
  611. ## mp_drop
  612. It turned out that we didn't need the `void*` trick for `mp_at`, but I'll show
  613. an example where we do: `mp_drop`. `mp_drop<L, N>` returns the list `L` without
  614. its first `N` elements; or, in other words, it drops the first `N` elements
  615. (presumably on the cutting room floor) and returns what's left.
  616. To implement `mp_drop`, we just need to change
  617. ```
  618. template<class W> W f( void*, void*, void*, W*, ... );
  619. ```
  620. from above to return the rest of the elements, rather than just one:
  621. ```
  622. template<class... W> mp_list<W> f( void*, void*, void*, W*... );
  623. ```
  624. Adding the necessary `mp_identity` seasoning produces the following working
  625. implementation:
  626. ```
  627. template<class L, class L2> struct mp_drop_c_impl;
  628. template<template<class...> class L, class... T,
  629. template<class...> class L2, class... U>
  630. struct mp_drop_c_impl<L<T...>, L2<U...>>
  631. {
  632. template<class... W> static mp_identity<L<W...>> f( U*..., mp_identity<W>*... );
  633. using R = decltype( f( (mp_identity<T>*)0 ... ) );
  634. using type = typename R::type;
  635. };
  636. template<class L, std::size_t N> using mp_drop_c =
  637. typename mp_drop_c_impl<L, mp_repeat_c<N, void>>::type;
  638. template<class L, class N> using mp_drop = mp_drop_c<L, N::value>;
  639. ```
  640. I'll skip the recursive implementation and the performance comparison for this
  641. one. We can pretty much tell who's going to win, and by how much.
  642. ## mp_find_index
  643. The final algorithm that I'll bring to your attention is `mp_find_index`.
  644. `mp_find_index<L, V>` returns an integral constant of type `size_t` with a
  645. value that is the index of the first occurrence of `V` in `L`. If `V` is not in
  646. `L`, the return value is the size of `L`.
  647. We'll start with the recursive implementation, as usual:
  648. ```
  649. template<class L, class V> struct mp_find_index_impl;
  650. template<class L, class V> using mp_find_index = typename mp_find_index_impl<L, V>::type;
  651. template<template<class...> class L, class V> struct mp_find_index_impl<L<>, V>
  652. {
  653. using type = mp_size_t<0>;
  654. };
  655. template<template<class...> class L, class... T, class V>
  656. struct mp_find_index_impl<L<V, T...>, V>
  657. {
  658. using type = mp_size_t<0>;
  659. };
  660. template<template<class...> class L, class T1, class... T, class V>
  661. struct mp_find_index_impl<L<T1, T...>, V>
  662. {
  663. using type = mp_size_t<1 + mp_find_index<L<T...>, V>::value>;
  664. };
  665. ```
  666. and will continue with the compile times for `N` calls to `mp_find_index` on a
  667. list with `N` elements, as usual:
  668. |===
  669. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  670. |VC$$++$$ 2013, recursive |3.5 |DNF ||||||
  671. |clang$$++$$ 3.5.1, recursive |1.1 |5.5 |DNF |||||
  672. |g$$++$$ 4.9.2, recursive |0.8 |4.6 |13.6 |DNF ||||
  673. |===
  674. What can we do here?
  675. Let's go back to `mp_contains` and look at the "mp_count/cx_plus2"
  676. implementation which we rejected in favor of the inheritance-based one. It
  677. built a `constexpr` array of booleans and summed them in a `constexpr`
  678. function. We can do the same here, except instead of summing the array, we can
  679. find the index of the first true value:
  680. ```
  681. template<class L, class V> struct mp_find_index_impl;
  682. template<class L, class V> using mp_find_index = typename mp_find_index_impl<L, V>::type;
  683. template<template<class...> class L, class V> struct mp_find_index_impl<L<>, V>
  684. {
  685. using type = mp_size_t<0>;
  686. };
  687. constexpr std::size_t cx_find_index( bool const * first, bool const * last )
  688. {
  689. return first == last || *first? 0: 1 + cx_find_index( first + 1, last );
  690. }
  691. template<template<class...> class L, class... T, class V>
  692. struct mp_find_index_impl<L<T...>, V>
  693. {
  694. static constexpr bool _v[] = { std::is_same<T, V>::value... };
  695. using type = mp_size_t< cx_find_index( _v, _v + sizeof...(T) ) >;
  696. };
  697. ```
  698. The performance of this version is:
  699. |===
  700. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  701. |clang$$++$$ 3.5.1, constexpr |0.5 |1.3 |2.9 |DNF ||||
  702. |g$$++$$ 4.9.2, constexpr |0.5 |1.4 |3.1 |5.5 |8.7 |13.0 |18.0 |DNF
  703. |===
  704. which, while not ideal, is significantly better than our recursive punching
  705. bag. But if our compiler of choice is VC$$++$$ 2013, we can't use `constexpr`.
  706. We may attempt an implementation along the same lines, but with the `constexpr`
  707. function replaced with ordinary metaprogramming:
  708. ```
  709. template<class L, class V> struct mp_find_index_impl;
  710. template<class L, class V> using mp_find_index = typename mp_find_index_impl<L, V>::type;
  711. template<template<class...> class L, class V> struct mp_find_index_impl<L<>, V>
  712. {
  713. using type = mp_size_t<0>;
  714. };
  715. template<bool...> struct find_index_impl_;
  716. template<> struct find_index_impl_<>
  717. {
  718. static const std::size_t value = 0;
  719. };
  720. template<bool B1, bool... R> struct find_index_impl_<B1, R...>
  721. {
  722. static const std::size_t value = B1? 0: 1 + find_index_impl_<R...>::value;
  723. };
  724. template<bool B1, bool B2, bool B3, bool B4, bool B5,
  725. bool B6, bool B7, bool B8, bool B9, bool B10, bool... R>
  726. struct find_index_impl_<B1, B2, B3, B4, B5, B6, B7, B8, B9, B10, R...>
  727. {
  728. static const std::size_t value = B1? 0: B2? 1: B3? 2: B4? 3: B5? 4:
  729. B6? 5: B7? 6: B8? 7: B9? 8: B10? 9: 10 + find_index_impl_<R...>::value;
  730. };
  731. template<template<class...> class L, class... T, class V>
  732. struct mp_find_index_impl<L<T...>, V>
  733. {
  734. using type = mp_size_t<find_index_impl_<std::is_same<T, V>::value...>::value>;
  735. };
  736. ```
  737. This is still recursive, so we don't expect miracles, but it wouldn't hurt to
  738. measure:
  739. |===
  740. ||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800
  741. |VC$$++$$ 2013, bool... |4.7 |94.5 |488.3 |XFA ||||
  742. |clang$$++$$ 3.5.1, bool... |0.6 |2.2 |5.8 |12.0 |21.7 |35.2 |DNF |
  743. |g$$++$$ 4.9.2, bool... |0.6 |2.4 |6.5 |13.2 |23.8 |39.1 |59.0 |DNF
  744. |===
  745. (where XFA stands for "experimenter fell asleep".)
  746. This is an interesting tradeoff for VC$$++$$ 2013 and Clang. On the one hand,
  747. this implementation is slower; on the other, it doesn't crash the compiler as
  748. easily. Which to prefer is a matter of taste and of stern evaluation of one's
  749. needs to manipulate type lists of length 300.
  750. Note that once we have `mp_drop` and `mp_find_index`, we can derive the
  751. `mp_find<L, V>` algorithm, which returns the suffix of `L` starting with the
  752. first occurrence of `V`, if any, and an empty list otherwise, by using
  753. `mp_drop<L, mp_find_index<L, V>>`.
  754. ## Conclusion
  755. In this article, I have shown efficient algorithms that allow us to treat type
  756. lists as sets, maps and vectors, demonstrating various {cpp}11 implementation
  757. techniques in the process.