1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054 |
- % Copyright Matthew Pulver 2018 - 2019.
- % Distributed under the Boost Software License, Version 1.0.
- % (See accompanying file LICENSE_1_0.txt or copy at
- % https://www.boost.org/LICENSE_1_0.txt)
- \documentclass{article}
- \usepackage{amsmath} %\usepackage{mathtools}
- \usepackage{amssymb} %\mathbb
- \usepackage{array} % m{} column in tabular
- \usepackage{csquotes} % displayquote
- \usepackage{fancyhdr}
- \usepackage{fancyvrb}
- \usepackage[margin=0.75in]{geometry}
- \usepackage{graphicx}
- \usepackage{hyperref}
- %\usepackage{listings}
- \usepackage{multirow}
- \usepackage[super]{nth}
- \usepackage{wrapfig}
- \usepackage{xcolor}
- \hypersetup{%
- colorlinks=false,% hyperlinks will be black
- linkbordercolor=blue,% hyperlink borders will be red
- urlbordercolor=blue,%
- pdfborderstyle={/S/U/W 1}% border style will be underline of width 1pt
- }
- \pagestyle{fancyplain}
- \fancyhf{}
- \renewcommand{\headrulewidth}{0pt}
- \cfoot[]{\thepage\\
- \scriptsize\color{gray} Copyright \textcopyright\/ Matthew Pulver 2018--2019.
- Distributed under the Boost Software License, Version 1.0.\\
- (See accompanying file LICENSE\_1\_0.txt or copy at
- \url{https://www.boost.org/LICENSE\_1\_0.txt})}
- \DeclareMathOperator{\sinc}{sinc}
- \begin{document}
- \title{Autodiff\\
- \large Automatic Differentiation C++ Library}
- \author{Matthew Pulver}
- \maketitle
- %\date{}
- %\begin{abstract}
- %\end{abstract}
- \tableofcontents
- %\section{Synopsis}
- %\begingroup
- %\fontsize{10pt}{10pt}\selectfont
- %\begin{verbatim}
- % example/synopsis.cpp
- %\end{verbatim}
- %\endgroup
- \newpage
- \section{Description}
- Autodiff is a header-only C++ library that facilitates the
- \href{https://en.wikipedia.org/wiki/Automatic_differentiation}{automatic differentiation} (forward mode) of
- mathematical functions of single and multiple variables.
- This implementation is based upon the \href{https://en.wikipedia.org/wiki/Taylor_series}{Taylor series} expansion of
- an analytic function $f$ at the point $x_0$:
- \begin{align*}
- f(x_0+\varepsilon) &= f(x_0) + f'(x_0)\varepsilon + \frac{f''(x_0)}{2!}\varepsilon^2 + \frac{f'''(x_0)}{3!}\varepsilon^3 + \cdots \\
- &= \sum_{n=0}^N\frac{f^{(n)}(x_0)}{n!}\varepsilon^n + O\left(\varepsilon^{N+1}\right).
- \end{align*}
- The essential idea of autodiff is the substitution of numbers with polynomials in the evaluation of $f(x_0)$. By
- substituting the number $x_0$ with the first-order polynomial $x_0+\varepsilon$, and using the same algorithm
- to compute $f(x_0+\varepsilon)$, the resulting polynomial in $\varepsilon$ contains the function's derivatives
- $f'(x_0)$, $f''(x_0)$, $f'''(x_0)$, ... within the coefficients. Each coefficient is equal to the derivative of
- its respective order, divided by the factorial of the order.
- In greater detail, assume one is interested in calculating the first $N$ derivatives of $f$ at $x_0$. Without loss
- of precision to the calculation of the derivatives, all terms $O\left(\varepsilon^{N+1}\right)$ that include powers
- of $\varepsilon$ greater than $N$ can be discarded. (This is due to the fact that each term in a polynomial depends
- only upon equal and lower-order terms under arithmetic operations.) Under these truncation rules, $f$ provides a
- polynomial-to-polynomial transformation:
- \[
- f \qquad : \qquad x_0+\varepsilon \qquad \mapsto \qquad
- \sum_{n=0}^Ny_n\varepsilon^n=\sum_{n=0}^N\frac{f^{(n)}(x_0)}{n!}\varepsilon^n.
- \]
- C++'s ability to overload operators and functions allows for the creation of a class {\tt fvar}
- (\underline{f}orward-mode autodiff \underline{var}iable) that represents polynomials in $\varepsilon$. Thus
- the same algorithm $f$ that calculates the numeric value of $y_0=f(x_0)$, when
- written to accept and return variables of a generic (template) type, is also used to calculate the polynomial
- $\sum_{n=0}^Ny_n\varepsilon^n=f(x_0+\varepsilon)$. The derivatives $f^{(n)}(x_0)$ are then found from the
- product of the respective factorial $n!$ and coefficient $y_n$:
- \[ \frac{d^nf}{dx^n}(x_0)=n!y_n. \]
- \section{Examples}
- \subsection{Example 1: Single-variable derivatives}
- \subsubsection{Calculate derivatives of $f(x)=x^4$ at $x=2$.}
- In this example, {\tt make\_fvar<double, Order>(2.0)} instantiates the polynomial $2+\varepsilon$. The {\tt Order=5}
- means that enough space is allocated (on the stack) to hold a polynomial of up to degree 5 during the proceeding
- computation.
- Internally, this is modeled by a {\tt std::array<double,6>} whose elements {\tt \{2, 1, 0, 0, 0, 0\}} correspond
- to the 6 coefficients of the polynomial upon initialization. Its fourth power, at the end of the computation, is
- a polynomial with coefficients {\tt y = \{16, 32, 24, 8, 1, 0\}}. The derivatives are obtained using the formula
- $f^{(n)}(2)=n!*{\tt y[n]}$.
- \begin{verbatim}
- #include <boost/math/differentiation/autodiff.hpp>
- #include <iostream>
- template <typename T>
- T fourth_power(T const& x) {
- T x4 = x * x; // retval in operator*() uses x4's memory via NRVO.
- x4 *= x4; // No copies of x4 are made within operator*=() even when squaring.
- return x4; // x4 uses y's memory in main() via NRVO.
- }
- int main() {
- using namespace boost::math::differentiation;
- constexpr unsigned Order = 5; // Highest order derivative to be calculated.
- auto const x = make_fvar<double, Order>(2.0); // Find derivatives at x=2.
- auto const y = fourth_power(x);
- for (unsigned i = 0; i <= Order; ++i)
- std::cout << "y.derivative(" << i << ") = " << y.derivative(i) << std::endl;
- return 0;
- }
- /*
- Output:
- y.derivative(0) = 16
- y.derivative(1) = 32
- y.derivative(2) = 48
- y.derivative(3) = 48
- y.derivative(4) = 24
- y.derivative(5) = 0
- */
- \end{verbatim}
- The above calculates
- \begin{alignat*}{3}
- {\tt y.derivative(0)} &=& f(2) =&& \left.x^4\right|_{x=2} &= 16\\
- {\tt y.derivative(1)} &=& f'(2) =&& \left.4\cdot x^3\right|_{x=2} &= 32\\
- {\tt y.derivative(2)} &=& f''(2) =&& \left.4\cdot 3\cdot x^2\right|_{x=2} &= 48\\
- {\tt y.derivative(3)} &=& f'''(2) =&& \left.4\cdot 3\cdot2\cdot x\right|_{x=2} &= 48\\
- {\tt y.derivative(4)} &=& f^{(4)}(2) =&& 4\cdot 3\cdot2\cdot1 &= 24\\
- {\tt y.derivative(5)} &=& f^{(5)}(2) =&& 0 &
- \end{alignat*}
- \subsection{Example 2: Multi-variable mixed partial derivatives with multi-precision data type}\label{multivar}
- \subsubsection{Calculate $\frac{\partial^{12}f}{\partial w^{3}\partial x^{2}\partial y^{4}\partial z^{3}}(11,12,13,14)$
- with a precision of about 50 decimal digits,\\
- where $f(w,x,y,z)=\exp\left(w\sin\left(\frac{x\log(y)}{z}\right)+\sqrt{\frac{wz}{xy}}\right)+\frac{w^2}{\tan(z)}$.}
- In this example, {\tt make\_ftuple<float50, Nw, Nx, Ny, Nz>(11, 12, 13, 14)} returns a {\tt std::tuple} of 4
- independent {\tt fvar} variables, with values of 11, 12, 13, and 14, for which the maximum order derivative to
- be calculated for each are 3, 2, 4, 3, respectively. The order of the variables is important, as it is the same
- order used when calling {\tt v.derivative(Nw, Nx, Ny, Nz)} in the example below.
- \begin{verbatim}
- #include <boost/math/differentiation/autodiff.hpp>
- #include <boost/multiprecision/cpp_bin_float.hpp>
- #include <iostream>
- using namespace boost::math::differentiation;
- template <typename W, typename X, typename Y, typename Z>
- promote<W, X, Y, Z> f(const W& w, const X& x, const Y& y, const Z& z) {
- using namespace std;
- return exp(w * sin(x * log(y) / z) + sqrt(w * z / (x * y))) + w * w / tan(z);
- }
- int main() {
- using float50 = boost::multiprecision::cpp_bin_float_50;
- constexpr unsigned Nw = 3; // Max order of derivative to calculate for w
- constexpr unsigned Nx = 2; // Max order of derivative to calculate for x
- constexpr unsigned Ny = 4; // Max order of derivative to calculate for y
- constexpr unsigned Nz = 3; // Max order of derivative to calculate for z
- // Declare 4 independent variables together into a std::tuple.
- auto const variables = make_ftuple<float50, Nw, Nx, Ny, Nz>(11, 12, 13, 14);
- auto const& w = std::get<0>(variables); // Up to Nw derivatives at w=11
- auto const& x = std::get<1>(variables); // Up to Nx derivatives at x=12
- auto const& y = std::get<2>(variables); // Up to Ny derivatives at y=13
- auto const& z = std::get<3>(variables); // Up to Nz derivatives at z=14
- auto const v = f(w, x, y, z);
- // Calculated from Mathematica symbolic differentiation.
- float50 const answer("1976.319600747797717779881875290418720908121189218755");
- std::cout << std::setprecision(std::numeric_limits<float50>::digits10)
- << "mathematica : " << answer << '\n'
- << "autodiff : " << v.derivative(Nw, Nx, Ny, Nz) << '\n'
- << std::setprecision(3)
- << "relative error: " << (v.derivative(Nw, Nx, Ny, Nz) / answer - 1) << '\n';
- return 0;
- }
- /*
- Output:
- mathematica : 1976.3196007477977177798818752904187209081211892188
- autodiff : 1976.3196007477977177798818752904187209081211892188
- relative error: 2.67e-50
- */
- \end{verbatim}
- \subsection{Example 3: Black-Scholes Option Pricing with Greeks Automatically Calculated}
- \subsubsection{Calculate greeks directly from the Black-Scholes pricing function.}
- Below is the standard Black-Scholes pricing function written as a function template, where the price, volatility
- (sigma), time to expiration (tau) and interest rate are template parameters. This means that any Greek based on
- these 4 variables can be calculated using autodiff. The below example calculates delta and gamma where the variable
- of differentiation is only the price. For examples of more exotic greeks, see {\tt example/black\_scholes.cpp}.
- \begin{verbatim}
- #include <boost/math/differentiation/autodiff.hpp>
- #include <iostream>
- using namespace boost::math::constants;
- using namespace boost::math::differentiation;
- // Equations and function/variable names are from
- // https://en.wikipedia.org/wiki/Greeks_(finance)#Formulas_for_European_option_Greeks
- // Standard normal cumulative distribution function
- template <typename X>
- X Phi(X const& x) {
- return 0.5 * erfc(-one_div_root_two<X>() * x);
- }
- enum class CP { call, put };
- // Assume zero annual dividend yield (q=0).
- template <typename Price, typename Sigma, typename Tau, typename Rate>
- promote<Price, Sigma, Tau, Rate> black_scholes_option_price(CP cp,
- double K,
- Price const& S,
- Sigma const& sigma,
- Tau const& tau,
- Rate const& r) {
- using namespace std;
- auto const d1 = (log(S / K) + (r + sigma * sigma / 2) * tau) / (sigma * sqrt(tau));
- auto const d2 = (log(S / K) + (r - sigma * sigma / 2) * tau) / (sigma * sqrt(tau));
- switch (cp) {
- case CP::call:
- return S * Phi(d1) - exp(-r * tau) * K * Phi(d2);
- case CP::put:
- return exp(-r * tau) * K * Phi(-d2) - S * Phi(-d1);
- }
- }
- int main() {
- double const K = 100.0; // Strike price.
- auto const S = make_fvar<double, 2>(105); // Stock price.
- double const sigma = 5; // Volatility.
- double const tau = 30.0 / 365; // Time to expiration in years. (30 days).
- double const r = 1.25 / 100; // Interest rate.
- auto const call_price = black_scholes_option_price(CP::call, K, S, sigma, tau, r);
- auto const put_price = black_scholes_option_price(CP::put, K, S, sigma, tau, r);
- std::cout << "black-scholes call price = " << call_price.derivative(0) << '\n'
- << "black-scholes put price = " << put_price.derivative(0) << '\n'
- << "call delta = " << call_price.derivative(1) << '\n'
- << "put delta = " << put_price.derivative(1) << '\n'
- << "call gamma = " << call_price.derivative(2) << '\n'
- << "put gamma = " << put_price.derivative(2) << '\n';
- return 0;
- }
- /*
- Output:
- black-scholes call price = 56.5136
- black-scholes put price = 51.4109
- call delta = 0.773818
- put delta = -0.226182
- call gamma = 0.00199852
- put gamma = 0.00199852
- */
- \end{verbatim}
- \section{Advantages of Automatic Differentiation}
- The above examples illustrate some of the advantages of using autodiff:
- \begin{itemize}
- \item Elimination of code redundancy. The existence of $N$ separate functions to calculate derivatives is a form
- of code redundancy, with all the liabilities that come with it:
- \begin{itemize}
- \item Changes to one function require $N$ additional changes to other functions. In the \nth{3} example above,
- consider how much larger and inter-dependent the above code base would be if a separate function were
- written for \href{https://en.wikipedia.org/wiki/Greeks\_(finance)#Formulas\_for\_European\_option\_Greeks}
- {each Greek} value.
- \item Dependencies upon a derivative function for a different purpose will break when changes are made to
- the original function. What doesn't need to exist cannot break.
- \item Code bloat, reducing conceptual integrity. Control over the evolution of code is easier/safer when
- the code base is smaller and able to be intuitively grasped.
- \end{itemize}
- \item Accuracy of derivatives over finite difference methods. Single-iteration finite difference methods always
- include a $\Delta x$ free variable that must be carefully chosen for each application. If $\Delta x$ is too
- small, then numerical errors become large. If $\Delta x$ is too large, then mathematical errors become large.
- With autodiff, there are no free variables to set and the accuracy of the answer is generally superior to finite
- difference methods even with the best choice of $\Delta x$.
- \end{itemize}
- \section{Mathematics}
- In order for the usage of the autodiff library to make sense, a basic understanding of the mathematics will help.
- \subsection{Truncated Taylor Series}
- Basic calculus courses teach that a real \href{https://en.wikipedia.org/wiki/Analytic_function}{analytic function}
- $f : D\rightarrow\mathbb{R}$ is one which can be expressed as a Taylor series at a point
- $x_0\in D\subseteq\mathbb{R}$:
- \[
- f(x) = f(x_0) + f'(x_0)(x-x_0) + \frac{f''(x_0)}{2!}(x-x_0)^2 + \frac{f'''(x_0)}{3!}(x-x_0)^3 + \cdots
- \]
- One way of thinking about this form is that given the value of an analytic function $f(x_0)$ and its derivatives
- $f'(x_0), f''(x_0), f'''(x_0), ...$ evaluated at a point $x_0$, then the value of the function
- $f(x)$ can be obtained at any other point $x\in D$ using the above formula.
- Let us make the substitution $x=x_0+\varepsilon$ and rewrite the above equation to get:
- \[
- f(x_0+\varepsilon) = f(x_0) + f'(x_0)\varepsilon + \frac{f''(x_0)}{2!}\varepsilon^2 + \frac{f'''(x_0)}{3!}\varepsilon^3 + \cdots
- \]
- Now consider $\varepsilon$ as {\it an abstract algebraic entity that never acquires a numeric value}, much like
- one does in basic algebra with variables like $x$ or $y$. For example, we can still manipulate entities
- like $xy$ and $(1+2x+3x^2)$ without having to assign specific numbers to them.
- Using this formula, autodiff goes in the other direction. Given a general formula/algorithm for calculating
- $f(x_0+\varepsilon)$, the derivatives are obtained from the coefficients of the powers of $\varepsilon$
- in the resulting computation. The general coefficient for $\varepsilon^n$ is
- \[\frac{f^{(n)}(x_0)}{n!}.\]
- Thus to obtain $f^{(n)}(x_0)$, the coefficient of $\varepsilon^n$ is multiplied by $n!$.
- \subsubsection{Example}
- Apply the above technique to calculate the derivatives of $f(x)=x^4$ at $x_0=2$.
- The first step is to evaluate $f(x_0+\varepsilon)$ and simply go through the calculation/algorithm, treating
- $\varepsilon$ as an abstract algebraic entity:
- \begin{align*}
- f(x_0+\varepsilon) &= f(2+\varepsilon) \\
- &= (2+\varepsilon)^4 \\
- &= \left(4+4\varepsilon+\varepsilon^2\right)^2 \\
- &= 16+32\varepsilon+24\varepsilon^2+8\varepsilon^3+\varepsilon^4.
- \end{align*}
- Equating the powers of $\varepsilon$ from this result with the above $\varepsilon$-taylor expansion
- yields the following equalities:
- \[
- f(2) = 16, \qquad
- f'(2) = 32, \qquad
- \frac{f''(2)}{2!} = 24, \qquad
- \frac{f'''(2)}{3!} = 8, \qquad
- \frac{f^{(4)}(2)}{4!} = 1, \qquad
- \frac{f^{(5)}(2)}{5!} = 0.
- \]
- Multiplying both sides by the respective factorials gives
- \[
- f(2) = 16, \qquad
- f'(2) = 32, \qquad
- f''(2) = 48, \qquad
- f'''(2) = 48, \qquad
- f^{(4)}(2) = 24, \qquad
- f^{(5)}(2) = 0.
- \]
- These values can be directly confirmed by the \href{https://en.wikipedia.org/wiki/Power_rule}{power rule}
- applied to $f(x)=x^4$.
- \subsection{Arithmetic}
- What was essentially done above was to take a formula/algorithm for calculating $f(x_0)$ from a number $x_0$,
- and instead apply the same formula/algorithm to a polynomial $x_0+\varepsilon$. Intermediate steps operate on
- values of the form
- \[
- {\bf x} = x_0 + x_1\varepsilon + x_2\varepsilon^2 +\cdots+ x_N\varepsilon^N
- \]
- and the final return value is of this polynomial form as well. In other words, the normal arithmetic operators
- $+,-,\times,\div$ applied to numbers $x$ are instead applied to polynomials $\bf x$. Through the overloading of C++
- operators and functions, floating point data types are replaced with data types that represent these polynomials. More
- specifically, C++ types such as {\tt double} are replaced with {\tt std::array<double,N+1>}, which hold the above
- $N+1$ coefficients $x_i$, and are wrapped in a {\tt class} that overloads all of the arithmetic operators.
- The logic of these arithmetic operators simply mirror that which is applied to polynomials. We'll look at
- each of the 4 arithmetic operators in detail.
- \subsubsection{Addition}
- The addition of polynomials $\bf x$ and $\bf y$ is done component-wise:
- \begin{align*}
- {\bf z} &= {\bf x} + {\bf y} \\
- &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) + \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\
- &= \sum_{i=0}^N(x_i+y_i)\varepsilon^i \\
- z_i &= x_i + y_i \qquad \text{for}\; i\in\{0,1,2,...,N\}.
- \end{align*}
- \subsubsection{Subtraction}
- Subtraction follows the same form as addition:
- \begin{align*}
- {\bf z} &= {\bf x} - {\bf y} \\
- &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) - \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\
- &= \sum_{i=0}^N(x_i-y_i)\varepsilon^i \\
- z_i &= x_i - y_i \qquad \text{for}\; i\in\{0,1,2,...,N\}.
- \end{align*}
- \subsubsection{Multiplication}
- Multiplication produces higher-order terms:
- \begin{align*}
- {\bf z} &= {\bf x} \times {\bf y} \\
- &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\
- &= x_0y_0 + (x_0y_1+x_1y_0)\varepsilon + (x_0y_2+x_1y_1+x_2y_0)\varepsilon^2 + \cdots +
- \left(\sum_{j=0}^Nx_jy_{N-j}\right)\varepsilon^N + O\left(\varepsilon^{N+1}\right) \\
- &= \sum_{i=0}^N\sum_{j=0}^ix_jy_{i-j}\varepsilon^i + O\left(\varepsilon^{N+1}\right) \\
- z_i &= \sum_{j=0}^ix_jy_{i-j} \qquad \text{for}\; i\in\{0,1,2,...,N\}.
- \end{align*}
- In the case of multiplication, terms involving powers of $\varepsilon$ greater than $N$, collectively denoted
- by $O\left(\varepsilon^{N+1}\right)$, are simply discarded. Fortunately, the values of $z_i$ for $i\le N$ do not
- depend on any of these discarded terms, so there is no loss of precision in the final answer. The only information
- that is lost are the values of higher order derivatives, which we are not interested in anyway. If we were, then
- we would have simply chosen a larger value of $N$ to begin with.
- \subsubsection{Division}
- Division is not directly calculated as are the others. Instead, to find the components of
- ${\bf z}={\bf x}\div{\bf y}$ we require that ${\bf x}={\bf y}\times{\bf z}$. This yields
- a recursive formula for the components $z_i$:
- \begin{align*}
- x_i &= \sum_{j=0}^iy_jz_{i-j} \\
- &= y_0z_i + \sum_{j=1}^iy_jz_{i-j} \\
- z_i &= \frac{1}{y_0}\left(x_i - \sum_{j=1}^iy_jz_{i-j}\right) \qquad \text{for}\; i\in\{0,1,2,...,N\}.
- \end{align*}
- In the case of division, the values for $z_i$ must be calculated sequentially, since $z_i$
- depends on the previously calculated values $z_0, z_1, ..., z_{i-1}$.
- \subsection{General Functions}
- Calling standard mathematical functions such as {\tt log()}, {\tt cos()}, etc. should return accurate higher
- order derivatives. For example, {\tt exp(x)} may be written internally as a specific \nth{14}-degree polynomial to
- approximate $e^x$ when $0<x<1$. This would mean that the \nth{15} derivative, and all higher order derivatives, would
- be 0, however we know that $\frac{d^{15}}{dx^{15}}e^x=e^x$. How should such functions whose derivatives are known
- be written to provide accurate higher order derivatives? The answer again comes back to the function's Taylor series.
- To simplify notation, for a given polynomial ${\bf x} = x_0 + x_1\varepsilon + x_2\varepsilon^2 +\cdots+
- x_N\varepsilon^N$ define
- \[
- {\bf x}_\varepsilon = x_1\varepsilon + x_2\varepsilon^2 +\cdots+ x_N\varepsilon^N = \sum_{i=1}^Nx_i\varepsilon^i.
- \]
- This allows for a concise expression of a general function $f$ of $\bf x$:
- \begin{align*}
- f({\bf x}) &= f(x_0 + {\bf x}_\varepsilon) \\
- & = f(x_0) + f'(x_0){\bf x}_\varepsilon + \frac{f''(x_0)}{2!}{\bf x}_\varepsilon^2 + \frac{f'''(x_0)}{3!}{\bf x}_\varepsilon^3 + \cdots + \frac{f^{(N)}(x_0)}{N!}{\bf x}_\varepsilon^N + O\left(\varepsilon^{N+1}\right) \\
- & = \sum_{i=0}^N\frac{f^{(i)}(x_0)}{i!}{\bf x}_\varepsilon^i + O\left(\varepsilon^{N+1}\right)
- \end{align*}
- where $\varepsilon$ has been substituted with ${\bf x}_\varepsilon$ in the $\varepsilon$-taylor series
- for $f(x)$. This form gives a recipe for calculating $f({\bf x})$ in general from regular numeric calculations
- $f(x_0)$, $f'(x_0)$, $f''(x_0)$, ... and successive powers of the epsilon terms ${\bf x}_\varepsilon$.
- For an application in which we are interested in up to $N$ derivatives in $x$ the data structure to hold
- this information is an $(N+1)$-element array {\tt v} whose general element is
- \[ {\tt v[i]} = \frac{f^{(i)}(x_0)}{i!} \qquad \text{for}\; i\in\{0,1,2,...,N\}. \]
- \subsection{Multiple Variables}
- In C++, the generalization to mixed partial derivatives with multiple independent variables is conveniently achieved
- with recursion. To begin to see the recursive pattern, consider a two-variable function $f(x,y)$. Since $x$
- and $y$ are independent, they require their own independent epsilons $\varepsilon_x$ and $\varepsilon_y$,
- respectively.
- Expand $f(x,y)$ for $x=x_0+\varepsilon_x$:
- \begin{align*}
- f(x_0+\varepsilon_x,y) &= f(x_0,y)
- + \frac{\partial f}{\partial x}(x_0,y)\varepsilon_x
- + \frac{1}{2!}\frac{\partial^2 f}{\partial x^2}(x_0,y)\varepsilon_x^2
- + \frac{1}{3!}\frac{\partial^3 f}{\partial x^3}(x_0,y)\varepsilon_x^3
- + \cdots
- + \frac{1}{M!}\frac{\partial^M f}{\partial x^M}(x_0,y)\varepsilon_x^M
- + O\left(\varepsilon_x^{M+1}\right) \\
- &= \sum_{i=0}^M\frac{1}{i!}\frac{\partial^i f}{\partial x^i}(x_0,y)\varepsilon_x^i + O\left(\varepsilon_x^{M+1}\right).
- \end{align*}
- Next, expand $f(x_0+\varepsilon_x,y)$ for $y=y_0+\varepsilon_y$:
- \begin{align*}
- f(x_0+\varepsilon_x,y_0+\varepsilon_y) &= \sum_{j=0}^N\frac{1}{j!}\frac{\partial^j}{\partial y^j}
- \left(\sum_{i=0}^M\varepsilon_x^i\frac{1}{i!}\frac{\partial^if}{\partial x^i}\right)(x_0,y_0)\varepsilon_y^j
- + O\left(\varepsilon_x^{M+1}\right) + O\left(\varepsilon_y^{N+1}\right) \\
- &= \sum_{i=0}^M\sum_{j=0}^N\frac{1}{i!j!}\frac{\partial^{i+j}f}{\partial x^i\partial y^j}(x_0,y_0)
- \varepsilon_x^i\varepsilon_y^j + O\left(\varepsilon_x^{M+1}\right) + O\left(\varepsilon_y^{N+1}\right).
- \end{align*}
- Similar to the single-variable case, for an application in which we are interested in up to $M$ derivatives in
- $x$ and $N$ derivatives in $y$, the data structure to hold this information is an $(M+1)\times(N+1)$
- array {\tt v} whose element at $(i,j)$ is
- \[
- {\tt v[i][j]} = \frac{1}{i!j!}\frac{\partial^{i+j}f}{\partial x^i\partial y^j}(x_0,y_0)
- \qquad \text{for}\; (i,j)\in\{0,1,2,...,M\}\times\{0,1,2,...,N\}.
- \]
- The generalization to additional independent variables follows the same pattern.
- \subsubsection{Declaring Multiple Variables}
- Internally, independent variables are represented by vectors within orthogonal vector spaces. Because of this,
- one must be careful when declaring more than one independent variable so that they do not end up in
- parallel vector spaces. This can easily be achieved by following one rule:
- \begin{itemize}
- \item When declaring more than one independent variable, call {\tt make\_ftuple<>()} once and only once.
- \end{itemize}
- The tuple of values returned are independent. Though it is possible to achieve the same result with multiple calls
- to {\tt make\_fvar}, this is an easier and less error-prone method. See Section~\ref{multivar} for example usage.
- %\section{Usage}
- %
- %\subsection{Single Variable}
- %
- %To calculate derivatives of a single variable $x$, at a particular value $x_0$, the following must be
- %specified at compile-time:
- %
- %\begin{enumerate}
- %\item The numeric data type {\tt T} of $x_0$. Examples: {\tt double},
- % {\tt boost::multiprecision::cpp\_bin\_float\_50}, etc.
- %\item The maximum derivative order $M$ that is to be calculated with respect to $x$.
- %\end{enumerate}
- %Note that both of these requirements are entirely analogous to declaring and using a {\tt std::array<T,N>}. {\tt T}
- %and {\tt N} must be set as compile-time, but which elements in the array are accessed can be determined at run-time,
- %just as the choice of what derivatives to query in autodiff can be made during run-time.
- %
- %To declare and initialize $x$:
- %
- %\begin{verbatim}
- % using namespace boost::math::differentiation;
- % autodiff_fvar<T,M> x = make_fvar<T,M>(x0);
- %\end{verbatim}
- %where {\tt x0} is a run-time value of type {\tt T}. Assuming {\tt 0 < M}, this represents the polynomial $x_0 +
- %\varepsilon$. Internally, the member variable of type {\tt std::array<T,M>} is {\tt v = \{ x0, 1, 0, 0, ... \}},
- %consistent with the above mathematical treatise.
- %
- %To find the derivatives $f^{(n)}(x_0)$ for $0\le n\le M$ of a function
- %$f : \mathbb{R}\rightarrow\mathbb{R}$, the function can be represented as a template
- %
- %\begin{verbatim}
- % template<typename T>
- % T f(T x);
- %\end{verbatim}
- %Using a generic type {\tt T} allows for {\tt x} to be of a regular type such as {\tt double}, but also allows for\\
- %{\tt boost::math::differentiation::autodiff\_fvar<>} types.
- %
- %Internal calls to mathematical functions must allow for
- %\href{https://en.cppreference.com/w/cpp/language/adl}{argument-dependent lookup} (ADL). Many standard library functions
- %are overloaded in the {\tt boost::math::differentiation} namespace. For example, instead of calling {\tt std::cos(x)}
- %from within {\tt f}, include the line {\tt using std::cos;} and call {\tt cos(x)} without a namespace prefix.
- %
- %Calling $f$ and retrieving the calculated value and derivatives:
- %
- %\begin{verbatim}
- % using namespace boost::math::differentiation;
- % autodiff_fvar<T,M> x = make_fvar<T,M>(x0);
- % autodiff_fvar<T,M> y = f(x);
- % for (int n=0 ; n<=M ; ++n)
- % std::cout << "y.derivative("<<n<<") == " << y.derivative(n) << std::endl;
- %\end{verbatim}
- %{\tt y.derivative(0)} returns the undifferentiated value $f(x_0)$, and {\tt y.derivative(n)} returns $f^{(n)}(x_0)$.
- %Casting {\tt y} to type {\tt T} also gives the undifferentiated value. In other words, the following 3 values
- %are equal:
- %
- %\begin{enumerate}
- %\item {\tt f(x0)}
- %\item {\tt y.derivative(0)}
- %\item {\tt static\_cast<T>(y)}
- %\end{enumerate}
- %
- %\subsection{Multiple Variables}
- %
- %Independent variables are represented in autodiff as independent dimensions within a multi-dimensional array.
- %This is perhaps best illustrated with examples. The {\tt namespace boost::math::differentiation} is assumed.
- %
- %The following instantiates a variable of $x=13$ with up to 3 orders of derivatives:
- %
- %\begin{verbatim}
- % autodiff_fvar<double,3> x = make_fvar<double,3>(13);
- %\end{verbatim}
- %This instantiates {\bf an independent} value of $y=14$ with up to 4 orders of derivatives:
- %
- %\begin{verbatim}
- % autodiff_fvar<double,0,4> y = make_fvar<double,0,4>(14);
- %\end{verbatim}
- %Combining them together {\bf promotes} their data type automatically to the smallest multidimensional array that
- %accommodates both.
- %
- %\begin{verbatim}
- % // z is promoted to autodiff_fvar<double,3,4>
- % auto z = 10*x*x + 50*x*y + 100*y*y;
- %\end{verbatim}
- %The object {\tt z} holds a 2-dimensional array, thus {\tt derivative(...)} is a 2-parameter method:
- %
- %\[
- %{\tt z.derivative(i,j)} = \frac{\partial^{i+j}f}{\partial x^i\partial y^j}(13,14)
- % \qquad \text{for}\; (i,j)\in\{0,1,2,3\}\times\{0,1,2,3,4\}.
- %\]
- %A few values of the result can be confirmed through inspection:
- %
- %\begin{verbatim}
- % z.derivative(2,0) == 20
- % z.derivative(1,1) == 50
- % z.derivative(0,2) == 200
- %\end{verbatim}
- %Note how the position of the parameters in {\tt derivative(...)} match how {\tt x} and {\tt y} were declared.
- %This will be clarified next.
- %
- %\subsubsection{Two Rules of Variable Initialization}
- %
- %In general, there are two rules to keep in mind when dealing with multiple variables:
- %
- %\begin{enumerate}
- %\item Independent variables correspond to parameter position, in both the initialization {\tt make\_fvar<T,...>}
- % and calls to {\tt derivative(...)}.
- %\item The last template position in {\tt make\_fvar<T,...>} determines which variable a derivative will be
- % taken with respect to.
- %\end{enumerate}
- %Both rules are illustrated with an example in which there are 3 independent variables $x,y,z$ and 1 dependent
- %variable $w=f(x,y,z)$, though the following code readily generalizes to any number of independent variables, limited
- %only by the C++ compiler/memory/platform. The maximum derivative order of each variable is {\tt Nx}, {\tt Ny}, and
- %{\tt Nz}, respectively. Then the type for {\tt w} is {\tt boost::math::differentiation::autodiff\_fvar<T,Nx,Ny,Nz>}
- %and all possible mixed partial derivatives are available via
- %
- %\[
- %{\tt w.derivative(nx,ny,nz)} =
- % \frac{\partial^{n_x+n_y+n_z}f}{\partial x^{n_x}\partial y^{n_y}\partial z^{n_z} }(x_0,y_0,z_0)
- %\]
- %for $(n_x,n_y,n_z)\in\{0,1,2,...,N_x\}\times\{0,1,2,...,N_y\}\times\{0,1,2,...,N_z\}$ where $x_0, y_0, z_0$ are
- %the numerical values at which the function $f$ and its derivatives are evaluated.
- %
- %In code:
- %\begin{verbatim}
- % using namespace boost::math::differentiation;
- %
- % using var = autodiff_fvar<double,Nx,Ny,Nz>; // Nx, Ny, Nz are constexpr size_t.
- %
- % var x = make_fvar<double,Nx>(x0); // x0 is of type double
- % var y = make_fvar<double,Nx,Ny>(y0); // y0 is of type double
- % var z = make_fvar<double,Nx,Ny,Nz>(z0); // z0 is of type double
- %
- % var w = f(x,y,z);
- %
- % for (size_t nx=0 ; nx<=Nx ; ++nx)
- % for (size_t ny=0 ; ny<=Ny ; ++ny)
- % for (size_t nz=0 ; nz<=Nz ; ++nz)
- % std::cout << "w.derivative("<<nx<<','<<ny<<','<<nz<<") == "
- % << w.derivative(nx,ny,nz) << std::endl;
- %\end{verbatim}
- %Note how {\tt x}, {\tt y}, and {\tt z} are initialized: the last template parameter determines which variable
- %$x, y,$ or $z$ a derivative is taken with respect to. In terms of the $\varepsilon$-polynomials
- %above, this determines whether to add $\varepsilon_x, \varepsilon_y,$ or $\varepsilon_z$ to
- %$x_0, y_0,$ or $z_0$, respectively.
- %
- %In contrast, the following initialization of {\tt x} would be INCORRECT:
- %
- %\begin{verbatim}
- % var x = make_fvar<T,Nx,0>(x0); // WRONG
- %\end{verbatim}
- %Mathematically, this represents $x_0+\varepsilon_y$, since the last template parameter corresponds to the
- %$y$ variable, and thus the resulting value will be invalid.
- %
- %\subsubsection{Type Promotion}
- %
- %The previous example can be optimized to save some unnecessary computation, by declaring smaller arrays,
- %and relying on autodiff's automatic type-promotion:
- %
- %\begin{verbatim}
- % using namespace boost::math::differentiation;
- %
- % autodiff_fvar<double,Nx> x = make_fvar<double,Nx>(x0);
- % autodiff_fvar<double,0,Ny> y = make_fvar<double,0,Ny>(y0);
- % autodiff_fvar<double,0,0,Nz> z = make_fvar<double,0,0,Nz>(z0);
- %
- % autodiff_fvar<double,Nx,Ny,Nz> w = f(x,y,z);
- %
- % for (size_t nx=0 ; nx<=Nx ; ++nx)
- % for (size_t ny=0 ; ny<=Ny ; ++ny)
- % for (size_t nz=0 ; nz<=Nz ; ++nz)
- % std::cout << "w.derivative("<<nx<<','<<ny<<','<<nz<<") == "
- % << w.derivative(nx,ny,nz) << std::endl;
- %\end{verbatim}
- %For example, if one of the first steps in the computation of $f$ was {\tt z*z}, then a significantly less number of
- %multiplications and additions may occur if {\tt z} is declared as {\tt autodiff\_fvar<double,0,0,Nz>} as opposed to \\
- %{\tt autodiff\_fvar<double,Nx,Ny,Nz>}. There is no loss of precision with the former, since the extra dimensions
- %represent 0 values. Once {\tt z} is combined with {\tt x} and {\tt y} during the computation, the types will be
- %promoted as necessary. This is the recommended way to initialize variables in autodiff.
- \section{Writing Functions for Autodiff Compatibility}\label{compatibility}
- In this section, a general procedure is given for writing new, and transforming existing, C++ mathematical
- functions for compatibility with autodiff.
- There are 3 categories of functions that require different strategies:
- \begin{enumerate}
- \item Piecewise-rational functions. These are simply piecewise quotients of polynomials. All that is needed is to
- turn the function parameters and return value into generic (template) types. This will then allow the function
- to accept and return autodiff's {\tt fvar} types, thereby using autodiff's overloaded arithmetic operators
- which calculate the derivatives automatically.
- \item Functions that call existing autodiff functions. This is the same as the previous, but may also include
- calls to functions that are in the autodiff library. Examples: {\tt exp()}, {\tt log()}, {\tt tgamma()}, etc.
- \item New functions for which the derivatives can be calculated. This is the most general technique, as it
- allows for the development of a function which do not fall into the previous two categories.
- \end{enumerate}
- Functions written in any of these ways may then be added to the autodiff library.
- \subsection{Piecewise-Rational Functions}
- \[
- f(x) = \frac{1}{1+x^2}
- \]
- By simply writing this as a template function, autodiff can calculate derivatives for it:
- \begin{Verbatim}[xleftmargin=2em]
- #include <boost/math/differentiation/autodiff.hpp>
- #include <iostream>
- template <typename T>
- T rational(T const& x) {
- return 1 / (1 + x * x);
- }
- int main() {
- using namespace boost::math::differentiation;
- auto const x = make_fvar<double, 10>(0);
- auto const y = rational(x);
- std::cout << std::setprecision(std::numeric_limits<double>::digits10)
- << "y.derivative(10) = " << y.derivative(10) << std::endl;
- return 0;
- }
- /*
- Output:
- y.derivative(10) = -3628800
- */
- \end{Verbatim}
- As simple as $f(x)$ may seem, the derivatives can get increasingly complex as derivatives are taken.
- For example, the \nth{10} derivative has the form
- \[
- f^{(10)}(x) = -3628800\frac{1 - 55x^2 + 330x^4 - 462x^6 + 165x^8 - 11x^{10}}{(1 + x^2)^{11}}.
- \]
- Derivatives of $f(x)$ are useful, and in fact used, in calculating higher order derivatives for $\arctan(x)$
- for instance, since
- \[
- \arctan^{(n)}(x) = \left(\frac{d}{dx}\right)^{n-1} \frac{1}{1+x^2}\qquad\text{for}\quad 1\le n.
- \]
- \subsection{Functions That Call Existing Autodiff Functions}
- Many of the standard library math function are overloaded in autodiff. It is recommended to use
- \href{https://en.cppreference.com/w/cpp/language/adl}{argument-dependent lookup} (ADL) in order for functions to
- be written in a way that is general enough to accommodate standard types ({\tt double}) as well as autodiff types
- ({\tt fvar}).
- \\
- Example:
- \begin{Verbatim}[xleftmargin=2em]
- #include <boost/math/constants/constants.hpp>
- #include <cmath>
- using namespace boost::math::constants;
- // Standard normal cumulative distribution function
- template <typename T>
- T Phi(T const& x)
- {
- return 0.5 * std::erfc(-one_div_root_two<T>() * x);
- }
- \end{Verbatim}
- Though {\tt Phi(x)} is general enough to handle the various fundamental floating point types, this will
- not work if {\tt x} is an autodiff {\tt fvar} variable, since {\tt std::erfc} does not include a specialization
- for {\tt fvar}. The recommended solution is to remove the namespace prefix {\tt std::} from {\tt erfc}:
- \begin{Verbatim}[xleftmargin=2em]
- #include <boost/math/constants/constants.hpp>
- #include <boost/math/differentiation/autodiff.hpp>
- #include <cmath>
- using namespace boost::math::constants;
- // Standard normal cumulative distribution function
- template <typename T>
- T Phi(T const& x)
- {
- using std::erfc;
- return 0.5 * erfc(-one_div_root_two<T>() * x);
- }
- \end{Verbatim}
- In this form, when {\tt x} is of type {\tt fvar}, the C++ compiler will search for and find a function {\tt erfc}
- within the same namespace as {\tt fvar}, which is in the autodiff library, via ADL. Because of the using-declaration,
- it will also call {\tt std::erfc} when {\tt x} is a fundamental type such as {\tt double}.
- \subsection{New Functions For Which The Derivatives Can Be Calculated}\label{new_functions}
- Mathematical functions which do not fall into the previous two categories can be constructed using autodiff helper
- functions. This requires a separate function for calculating the derivatives. In case you are asking yourself what
- good is an autodiff library if one needs to supply the derivatives, the answer is that the new function will fit
- in with the rest of the autodiff library, thereby allowing for the creation of additional functions via all of
- the arithmetic operators, plus function composition, which was not readily available without the library.
- The example given here is for {\tt cos}:
- \begin{Verbatim}[xleftmargin=2em]
- template <typename RealType, size_t Order>
- fvar<RealType, Order> cos(fvar<RealType, Order> const& cr) {
- using std::cos;
- using std::sin;
- using root_type = typename fvar<RealType, Order>::root_type;
- constexpr size_t order = fvar<RealType, Order>::order_sum;
- root_type const d0 = cos(static_cast<root_type>(cr));
- if constexpr (order == 0)
- return fvar<RealType, Order>(d0);
- else {
- root_type const d1 = -sin(static_cast<root_type>(cr));
- root_type const derivatives[4]{d0, d1, -d0, -d1};
- return cr.apply_derivatives(order,
- [&derivatives](size_t i) { return derivatives[i & 3]; });
- }
- }
- \end{Verbatim}
- This uses the helper function {\tt fvar::apply\_derivatives} which takes two parameters:
- \begin{enumerate}
- \item The highest order derivative to be calculated.
- \item A function that maps derivative order to derivative value.
- \end{enumerate}
- The highest order derivative necessary to be calculated is generally equal to {\tt fvar::order\_sum}. In the case
- of {\tt sin} and {\tt cos}, the derivatives are cyclical with period 4. Thus it is sufficient to store only these
- 4 values into an array, and take the derivative order modulo 4 as the index into this array.
- A second helper function, not shown here, is {\tt apply\_coefficients}. This is used the same as
- {\tt apply\_derivatives} except that the supplied function calculates coefficients instead of derivatives.
- The relationship between a coefficient $C_n$ and derivative $D_n$ for derivative order $n$ is
- \[
- C_n = \frac{D_n}{n!}.
- \]
- Internally, {\tt fvar} holds coefficients rather than derivatives, so in case the coefficient values are more readily
- available than the derivatives, it can save some unnecessary computation to use {\tt apply\_coefficients}.
- See the definition of {\tt atan} for an example.
- Both of these helper functions use Horner's method when calculating the resulting polynomial {\tt fvar}. This works
- well when the derivatives are finite, but in cases where derivatives are infinite, this can quickly result in NaN
- values as the computation progresses. In these cases, one can call non-Horner versions of both function which
- better ``isolate'' infinite values so that they are less likely to evolve into NaN values.
- The four helper functions available for constructing new autodiff functions from known coefficients/derivatives are:
- \begin{enumerate}
- \item {\tt fvar::apply\_coefficients}
- \item {\tt fvar::apply\_coefficients\_nonhorner}
- \item {\tt fvar::apply\_derivatives}
- \item {\tt fvar::apply\_derivatives\_nonhorner}
- \end{enumerate}
- \section{Function Writing Guidelines}
- At a high level there is one fairly simple principle, loosely and intuitively speaking, to writing functions for
- which autodiff can effectively calculate derivatives: \\
- {\bf Autodiff Function Principle (AFP)}
- \begin{displayquote}
- A function whose branches in logic correspond to piecewise analytic calculations over non-singleton intervals,
- with smooth transitions between the intervals, and is free of indeterminate forms in the calculated value and
- higher order derivatives, will work fine with autodiff.
- \end{displayquote}
- Stating this with greater mathematical rigor can be done. However what seems to be more practical, in this
- case, is to give examples and categories of examples of what works, what doesn't, and how to remedy some of the
- common problems that may be encountered. That is the approach taken here.
- \subsection{Example 1: $f(x)=\max(0,x)$}
- One potential implementation of $f(x)=\max(0,x)$ is:
- \begin{verbatim}
- template<typename T>
- T f(const T& x)
- {
- return 0 < x ? x : 0;
- }
- \end{verbatim}
- Though this is consistent with Section~\ref{compatibility}, there are two problems with it:
- \begin{enumerate}
- \item {\tt f(nan) = 0}. This problem is independent of autodiff, but is worth addressing anyway. If there is
- an indeterminate form that arises within a calculation and is input into $f$, then it gets ``covered up'' by
- this implementation leading to an unknowingly incorrect result. Better for functions in general to propagate
- NaN values, so that the user knows something went wrong and doesn't rely on an incorrect result, and likewise
- the developer can track down where the NaN originated from and remedy it.
- \item $f'(0) = 0$ when autodiff is applied. This is because {\tt f} returns 0 as a constant when {\tt x==0}, wiping
- out any of the derivatives (or sensitivities) that {\tt x} was holding as an autodiff variable. Instead, let us
- apply the AFP and identify the two intervals over which $f$ is defined: $(-\infty,0]\cup(0,\infty)$.
- Though the function itself is not analytic at $x=0$, we can attempt somewhat to smooth out this transition
- point by averaging the calculation of $f(x)$ at $x=0$ from each interval. If $x<0$ then the result is simply
- 0, and if $0<x$ then the result is $x$. The average is $\frac{1}{2}(0 + x)$ which will allow autodiff to
- calculate $f'(0)=\frac{1}{2}$. This is a more reasonable answer.
- \end{enumerate}
- A better implementation that resolves both issues is:
- \begin{verbatim}
- template<typename T>
- T f(const T& x)
- {
- if (x < 0)
- return 0;
- else if (x == 0)
- return 0.5*x;
- else
- return x;
- }
- \end{verbatim}
- \subsection{Example 2: $f(x)=\sinc(x)$}
- The definition of $\sinc:\mathbb{R}\rightarrow\mathbb{R}$ is
- \[
- \sinc(x) = \begin{cases}
- 1 &\text{if}\; x = 0 \\
- \frac{\sin(x)}{x} &\text{otherwise.}\end{cases}
- \]
- A potential implementation is:
- \begin{verbatim}
- template<typename T>
- T sinc(const T& x)
- {
- using std::sin;
- return x == 0 ? 1 : sin(x) / x;
- }
- \end{verbatim}
- Though this is again consistent with Section~\ref{compatibility}, and returns correct non-derivative values,
- it returns a constant when {\tt x==0} thereby losing all derivative information contained in {\tt x} and
- contributions from $\sinc$. For example, $\sinc''(0)=-\frac{1}{3}$, however {\tt y.derivative(2) == 0} when
- {\tt y = sinc(make\_fvar<double,2>(0))} using the above incorrect implementation. Applying the AFP, the intervals
- upon which separate branches of logic are applied are $(-\infty,0)\cup[0,0]\cup(0,\infty)$. The violation occurs
- due to the singleton interval $[0,0]$, even though a constant function of 1 is technically analytic. The remedy
- is to define a custom $\sinc$ overload and add it to the autodiff library. This has been done. Mathematically, it
- is well-defined and free of indeterminate forms, as is the \nth{3} expression in the equalities
- \[
- \frac{1}{x}\sin(x) = \frac{1}{x}\sum_{n=0}^\infty\frac{(-1)^n}{(2n+1)!}x^{2n+1}
- = \sum_{n=0}^\infty\frac{(-1)^n}{(2n+1)!}x^{2n}.
- \]
- The autodiff library contains helper functions to help write function overloads when the derivatives of a function
- are known. This is an advanced feature and documentation for this may be added at a later time.
- For now, it is worth understanding the ways in which indeterminate forms can occur within a mathematical calculation,
- and avoid them when possible by rewriting the function. Table~\ref{3nans} compares 3 types of indeterminate
- forms. Assume the product {\tt a*b} is a positive finite value.
- \begin{table}[h]
- \centering\begin{tabular}{m{7em}||c|c|c}
- & $\displaystyle f(x)=\left(\frac{a}{x}\right)\times(bx^2)$
- & $\displaystyle g(x)=\left(\frac{a}{x}\right)\times(bx)$
- & $\displaystyle h(x)=\left(\frac{a}{x^2}\right)\times(bx)$ \\[0.618em]
- \hline\hline
- Mathematical\newline Limit
- & $\displaystyle\lim_{x\rightarrow0}f(x) = 0$
- & $\displaystyle\lim_{x\rightarrow0}g(x) = ab$
- & $\displaystyle\lim_{x\rightarrow0}h(x) = \infty$ \\
- \hline
- Floating Point\newline Arithmetic
- & {\tt f(0) = inf*0 = nan} & {\tt g(0) = inf*0 = nan} & {\tt h(0) = inf*0 = nan}
- \end{tabular}
- \caption{Automatic differentiation does not compute limits.
- Indeterminate forms must be simplified manually. (These cases are not meant to be exhaustive.)}\label{3nans}
- \end{table}
- Indeterminate forms result in NaN values within a calculation. Mathematically, if they occur at locally isolated
- points, then we generally prefer the mathematical limit as the result, even if it is infinite. As demonstrated in
- Table~\ref{3nans}, depending upon the nature of the indeterminate form, the mathematical limit can be 0 (no matter
- the values of $a$ or $b$), or $ab$, or $\infty$, but these 3 cases cannot be distinguished by the floating point
- result of nan. Floating point arithmetic does not perform limits (directly), and neither does the autodiff library.
- Thus it is up to the diligence of the developer to keep a watchful eye over where indeterminate forms can arise.
- \subsection{Example 3: $f(x)=\sqrt x$ and $f'(0)=\infty$}
- When working with functions that have infinite higher order derivatives, this can very quickly result in nans in
- higher order derivatives as the computation progresses, as {\tt inf-inf}, {\tt inf/inf}, and {\tt 0*inf} result
- in {\tt nan}. See Table~\ref{sqrtnan} for an example.
- \begin{table}[h]
- \centering\begin{tabular}{c||c|c|c|c}
- $f(x)$ & $f(0)$ & $f'(0)$ & $f''(0)$ & $f'''(0)$ \\
- \hline\hline
- {\tt sqrt(x)} & {\tt 0} & {\tt inf} & {\tt -inf} & {\tt inf} \\
- \hline
- {\tt sqr(sqrt(x)+1)} & {\tt 1} & {\tt inf} & {\tt nan} & {\tt nan} \\
- \hline
- {\tt x+2*sqrt(x)+1} & {\tt 1} & {\tt inf} & {\tt -inf}& {\tt inf}
- \end{tabular}
- \caption{Indeterminate forms in higher order derivatives. {\tt sqr(x) == x*x}.}\label{sqrtnan}
- \end{table}
- Calling the autodiff-overloaded implementation of $f(x)=\sqrt x$ at the value {\tt x==0} results in the
- \nth{1} row (after the header row) of Table~\ref{sqrtnan}, as is mathematically correct. The \nth{2} row shows
- $f(x)=(\sqrt{x}+1)^2$ resulting in {\tt nan} values for $f''(0)$ and all higher order derivatives. This is due to
- the internal arithmetic in which {\tt inf} is added to {\tt -inf} during the squaring, resulting in a {\tt nan}
- value for $f''(0)$ and all higher orders. This is typical of {\tt inf} values in autodiff. Where they show up,
- they are correct, however they can quickly introduce {\tt nan} values into the computation upon the addition of
- oppositely signed {\tt inf} values, division by {\tt inf}, or multiplication by {\tt 0}. It is worth noting that
- the infection of {\tt nan} only spreads upward in the order of derivatives, since lower orders do not depend upon
- higher orders (which is also why dropping higher order terms in an autodiff computation does not result in any
- loss of precision for lower order terms.)
- The resolution in this case is to manually perform the squaring in the computation, replacing the \nth{2} row
- with the \nth{3}: $f(x)=x+2\sqrt{x}+1$. Though mathematically equivalent, it allows autodiff to avoid {\tt nan}
- values since $\sqrt x$ is more ``isolated'' in the computation. That is, the {\tt inf} values that unavoidably
- show up in the derivatives of {\tt sqrt(x)} for {\tt x==0} do not have the chance to interact with other {\tt inf}
- values as with the squaring.
- \subsection{Summary}
- The AFP gives a high-level unified guiding principle for writing C++ template functions that autodiff can
- effectively evaluate derivatives for.
- Examples have been given to illustrate some common items to avoid doing:
- \begin{enumerate}
- \item It is not enough for functions to be piecewise continuous. On boundary points between intervals, consider
- returning the average expression of both intervals, rather than just one of them. Example: $\max(0,x)$ at $x=0$.
- In cases where the limits from both sides must match, and they do not, then {\tt nan} may be a more appropriate
- value depending on the application.
- \item Avoid returning individual constant values (e.g. $\sinc(0)=1$.) Values must be computed uniformly along
- with other values in its local interval. If that is not possible, then the function must be overloaded to
- compute the derivatives precisely using the helper functions from Section~\ref{new_functions}.
- \item Avoid intermediate indeterminate values in both the value ($\sinc(x)$ at $x=0$) and derivatives
- ($(\sqrt{x}+1)^2$ at $x=0$). Try to isolate expressions that may contain infinite values/derivatives so
- that they do not introduce NaN values into the computation.
- \end{enumerate}
- \section{Acknowledgments}
- \begin{itemize}
- \item Kedar Bhat --- C++11 compatibility, Boost Special Functions compatibility testing, codecov integration,
- and feedback.
- \item Nick Thompson --- Initial feedback and help with Boost integration.
- \item John Maddock --- Initial feedback and help with Boost integration.
- \end{itemize}
- \begin{thebibliography}{1}
- \bibitem{ad} \url{https://en.wikipedia.org/wiki/Automatic\_differentiation}
- \bibitem{ed} Andreas Griewank, Andrea Walther. \textit{Evaluating Derivatives}. SIAM, 2nd ed. 2008.
- \end{thebibliography}
- \end{document}
|