October 29, 2011

Gamma distribution

The Gamma distribution is often used as a prior for positive random variables just like the Gaussian distribution for real valued random variables.  The purpose of this post is to build some intuition about how the two parameters, the shape parameter "a" and the scale parameter "b", effect the behavior of a Gamma random variable.  In particular we will show that for vague Gamma parameters (a<<1) the distribution almost acts like an upper bound on the random variable.

Here is the Gamma PDF:

$f(x) = \frac{1}{\Gamma(a) b} (\frac{x}{b})^{a-1} e^{-x/b} \;\; x\geq 0; a , b>0$

The mean is ab and the variance is ab².  When a=1 it is equivalent to the exponential distribution.  In fact when a is an integer, it is equivalent to the sum of (a) independent exponentially distributed random variables each of which has a mean of (b).  It is shaped like the exponential distribution with a spike at 0 for a<1, but has a mode at (a-1)b for a>1 (see the Wikipedia article).

MacKay suggests representing the positive real variable x in terms of its logarithm z=ln x (ITILA, pp. 314).  This will give us a better idea about the order of magnitude of typical x in terms of a and b.  The distribution in terms of z is:

$f(z) = \frac{1}{\Gamma(a)} (\frac{x}{b})^a e^{-x/b}  \;\; z \in \Re; x=e^z; a, b>0$

We can get an idea about the shape of f(z) by looking at its first two derivatives with respect to z:

$f'(z) = f(z) (a-\frac{x}{b})$
$f''(z) = f(z) (a^2 - (2a+1)\frac{x}{b} + (\frac{x}{b})^2)$


The graph above shows f(z) and its two derivatives for a=1/10 and b=10. The first derivative tells us that f(z) has a single mode at x=ab. Note that x=ab is the mean of f(x) but only the mode (not the mean) of f(z). The curve raises slowly on the left of the mode and falls sharply on the right. The second derivative has two roots that give us the values with the minimum and the maximum slope:

$x = ab + \frac{b}{2} \pm \frac{b}{2} \sqrt{1+4a}$.

Now we are going to look at the limit where a<<1, typically used as a vague prior. The height of the mode at x=ab is aae-a/Γ(a). Γ(a) is well approximated by 1/a for small a, aa and e-a both go to 1, so f(z) ≈ a at the mode.

Next, let's look at the right side (x>ab) where f(z) seems to fall sharply.  According to the roots of the second derivative given above, the minimum slope occurs at around x=b (if we ignore the terms with a<<1).  The value of f(z) when x=b is 1/(e Γ(a)).  Γ(a) is well approximated by 1/a for small a, so this value is approximately a/e.  The slope at x=b is approximately -a/e and if we fit a line at that point the line would cross 0 at x=eb. Thus for small a, the probability can be considered negligible for x>eb.

Next, let's look at the left side (x < ab) where f(z) appears more flat.  The maximum slope occurs around x=a²b (if we approximate √ 1+4a with 1+2a-2a²).  The slope at x=a²b is approximately a² which gives a flat shape for x<ab when a<<1.

In summary, when used with a<<1, f(z) rises slowly for x<ab (with approximate slope a²) and falls sharply for x>ab (with approximate slope -a/e).  You are unlikely to see x values larger than eb from such a distribution, but you may see values much smaller than the mean ab.  Thus a vague Gamma prior is practically putting an upper bound on your positive value.  The figure below shows how the f(z) distribution starts looking like a step function as the shape parameter approaches 0 (b=1/a and the peak heights have been matched for comparison).


I should also note that in the limit where a→0 and ab=1, we get an improper prior where f(z) becomes flat and the Gamma distribution becomes indifferent to the order of magnitude of the random variable. However it flattens a lot faster on the left than on the right.

Full post...