April 09, 2012

Probabilistic Programming

The probabilistic programming language Church brings together two of my favorite subjects: Scheme and Probability. I highly recommend this tutorial to graduate students interested in machine learning and statistical inference. The tutorial explains probabilistic inference through programming starting from simple generative models with biased coins and dice leading up to hierarchical, non-parametric, recursive and nested models. Even at the undergraduate level, I have long thought probability and statistics should be taught in an integrated manner instead of their current almost independent treatment. One roadblock is that even the simplest statistical inference (e.g. three tosses of a coin with an unknown (uniformly distributed) weight results in H, H, T; what is the fourth toss?) requires some calculus at the undergraduate level. Using a programming language like Church may allow an instructor to introduce basic concepts without students getting confused about the details of integration.
Full post...

April 01, 2012

The wonderful xargs command

I finally found a way I like to run a whole bunch of commands N at a time on an N core machine (well maybe use N-1 to be polite):

1. Say you have a command rprun.pl that takes 4 arguments that you want to run with 1000 different argument combinations.

2. You write a script rprun-args.pl that generates all combinations you need.  Say its output looks like:

10      185364  25      0.166
12      92682   25      0.166
18      65536   32      0.166
12      65536   25      0.7071
14      16384   25      0.166
...

3. Now you can use xargs to run these 24 at a time as follows:

rprun-args.pl | xargs -n4 -P24 rprun.pl > rprun.out

-n4 is to feed the arguments 4 at a time.  So a typical command line will look like:

rprun.pl 14 16384 25 0.166

-P24 tells xargs to run through the list 24 at a time.  If you run ps you will see 24 copies of rprun running together.  As soon as the number drops to 23 another child is spawned.

Note that the command above combines the outputs of all runs (in the order they finish) in the same file, so make sure rprun.pl prints out its arguments as well as its result on its output.

Full post...