AutoGrad.jl is an automatic differentiation package for Julia. It is a Julia port of the popular Python autograd package. It can differentiate regular Julia code that includes loops, conditionals, helper functions, closures etc. by keeping track of the primitive operations and using this execution trace to compute gradients. It uses reverse mode differentiation (a.k.a. backpropagation) so it can efficiently handle functions with array inputs and scalar outputs. It can compute gradients of gradients to handle higher order derivatives. Please see the comments in core.jl for a description of how the code works in detail.Installation
You can install AutoGrad in Julia using:
In order to use it in your code start with:
Here is a linear regression example simplified from housing.jl:
using AutoGrad function loss(w) global xtrn,ytrn ypred = w*xtrn .+ w sum(abs2(ypred - ytrn)) / size(ypred,2) end function train(w; lr=.1, epochs=20) gradfun = grad(loss) for epoch=1:epochs g = gradfun(w) for i in 1:length(w) w[i] -= lr * g[i] end end return w end
loss function takes parameters as input and returns the loss to
be minimized. The parameter
w for this example is a pair:
a weight matrix, and
w is a bias vector. The training data
xtrn,ytrn are in global variables.
ypred is the predicted output,
and the last line computes the quadratic loss. The
loss function is
implemented in regular Julia.
train function takes initial parameters and returns optimized
grad is the only AutoGrad function used: it creates a
gradfun that takes the same arguments as
returns the gradient instead. The returned gradient will have the
same type and shape as the input argument. The
for loop implements
gradient descent, where we calculate the gradient and subtract a
scaled version of it from the weights.
AutoGrad can only handle a function if the primitives it uses have
known gradients. You can add your own primitives with gradients as
described in detail in
or using the
@zerograd macros in
Here is an example:
@primitive hypot(x1::Number,x2::Number)::y (dy->dy*x1/y) (dy->dy*x2/y)
@primitive macro marks the
hypot(::Number,::Number) method as
a new primitive and the next two expressions define gradient functions
wrt the first and second argument. The gradient expressions can refer
to the parameters and the return variable (indicated after the final
::) of the method declaration.
Note that Julia supports multiple-dispatch, i.e. a function may have
multiple methods each supporting different argument types. For
hypot(x1::Array,x2::Array) is another hypot method. In
AutoGrad.jl each method can independently be defined as a primitive
and can have its own specific gradient.
core.jl implements the main functionality and acts as the main documentation source. util.jl has some support functions to define and test new primitives. interfaces.jl sets up support for common data structures including Arrays, Tuples, and Dictionaries. The numerical gradients are defined in files such as base/math.jl, special/trig.jl that mirror the organization under julia/base.Current status and future work
The gradient coverage is spotty, I am still adding more gradients to cover the Julia base. Next steps are to make models faster by providing support for GPU operations and overwriting functions (to avoid memory allocation). I should also find out about the efficiency of closures and untyped functions in Julia which are used extensively in the code.Acknowledgments and references
AutoGrad.jl was written by Deniz Yuret. Large parts of the code are directly ported from the Python autograd package. I'd like to thank autograd author Dougal Maclaurin for his support. See (Baydin et al. 2015) for a general review of automatic differentiation, autograd tutorial for some Python examples, and Dougal's PhD thesis for design principles. JuliaDiff has alternative differentiation tools for Julia. I would like to thank my students Ozan Arkan Can and Emre Yolcu for helpful contributions.