Installation
You can install AutoGrad in Julia using:
julia> Pkg.add("AutoGrad")
In order to use it in your code start with:
using AutoGrad
Example
Here is a linear regression example simplified from
housing.jl:
using AutoGrad
function loss(w)
global xtrn,ytrn
ypred = w[1]*xtrn .+ w[2]
sum(abs2(ypred - ytrn)) / size(ypred,2)
end
function train(w; lr=.1, epochs=20)
gradfun = grad(loss)
for epoch=1:epochs
g = gradfun(w)
for i in 1:length(w)
w[i] -= lr * g[i]
end
end
return w
end
The loss
function takes parameters as input and returns the loss to
be minimized. The parameter w
for this example is a pair: w[1]
is
a weight matrix, and w[2]
is a bias vector. The training data
xtrn,ytrn
are in global variables. ypred
is the predicted output,
and the last line computes the quadratic loss. The loss
function is
implemented in regular Julia.
The train
function takes initial parameters and returns optimized
parameters. grad
is the only AutoGrad function used: it creates a
function gradfun
that takes the same arguments as loss
, but
returns the gradient instead. The returned gradient will have the
same type and shape as the input argument. The for
loop implements
gradient descent, where we calculate the gradient and subtract a
scaled version of it from the weights.
See the examples
directory
for more examples, and the extensively documented
core.jl
for details.
Extending AutoGrad
AutoGrad can only handle a function if the primitives it uses have
known gradients. You can add your own primitives with gradients as
described in detail in
core.jl
or using the @primitive
and @zerograd
macros in
util.jl
Here is an example:
@primitive hypot(x1::Number,x2::Number)::y (dy->dy*x1/y) (dy->dy*x2/y)
The @primitive
macro marks the hypot(::Number,::Number)
method as
a new primitive and the next two expressions define gradient functions
wrt the first and second argument. The gradient expressions can refer
to the parameters and the return variable (indicated after the final
::
) of the method declaration.
Note that Julia supports multiple-dispatch, i.e. a function may have
multiple methods each supporting different argument types. For
example hypot(x1::Array,x2::Array)
is another hypot method. In
AutoGrad.jl each method can independently be defined as a primitive
and can have its own specific gradient.
Code structure
core.jl
implements the main functionality and acts as the main documentation
source.
util.jl
has some support functions to define and test new primitives.
interfaces.jl
sets up support for common data structures including Arrays, Tuples,
and Dictionaries. The numerical gradients are defined in files such
as base/math.jl, special/trig.jl that mirror the organization
under julia/base.
Current status and future work
The gradient coverage is spotty, I am still adding more gradients to
cover the Julia base. Next steps are to make models faster by
providing support for GPU operations and overwriting functions (to
avoid memory allocation). I should also find out about the efficiency
of closures and untyped functions in Julia which are used extensively
in the code.
Acknowledgments and references
AutoGrad.jl was written by Deniz
Yuret. Large parts of the code are
directly ported from the Python
autograd package. I'd like to
thank autograd author Dougal Maclaurin for his support. See (Baydin
et al. 2015) for a general review
of automatic differentiation, autograd
tutorial
for some Python examples, and Dougal's PhD thesis for design
principles. JuliaDiff has alternative
differentiation tools for Julia. I would like to thank my students
Ozan Arkan Can and Emre Yolcu for helpful contributions.
Also see: A presentation, A demo.