## October 21, 2005

### A different representation for integers

[Math_] The fundamental theorem of arithmetic states that every positive integer has a unique factorization into primes. Instead of the usual decimal notation where each position represents a multiple of powers of ten, we can represent each integer with a "primal" notation where each position is the power of a prime number. Here is what I mean:

2: (1)
3: (1 0)
4: (2)
5: (1 0 0)
6: (1 1)
7: (1 0 0 0)
8: (3)
9: (2 0)
10: (1 0 1)
11: (1 0 0 0 0)
12: (1 2)

The right-most position is power of two's, next to it is power of three's and so on. The intuition one gets from such a representation is remarkable. Let's prove a simple bound related to the prime number theorem (i.e. how many primes are there less than a given number?)

Let p(n) be the number of primes less than n where 2^k <= n < 2^(k+1). Now consider all the numbers smaller than n. Their primal representations are all distinct, can have at most p(n) elements, and none of the elements can be greater than k. This gives us the inequality:

k^p > n
p lg(k) > lg(n)
p > lg(n) / lg(lg(n))

Not a very tight bound, but could be used as an alternative to Euclid's proof of the infinity of the number of primes :)

One thing about our primal representation is bothering me. Namely the elements themselves are integers, but we are denoting them with regular decimal digits instead of the primal representation! The primal representation for 1 should be (). Zero is more of a problem. Let's keep it as 0 for now and see what we get:

1: ()
2: (())
3: (() 0)
4: ((()))
5: (() 0 0)
6: (() ())
7: (() 0 0 0)
8: ((() 0))
9: ((()) 0)
10: (() 0 ())
11: (() 0 0 0 0)
12: (() (()))

Much better. This representation gives us a one-to-one mapping between all integers and a certain subset of binary trees or s-expressions for those of you familiar with lisp jargon. Unfortunately not all possible binary trees or s-expressions with 0 are meaningful (for example (0)). There must be an even more elegant mapping, but I couldn't think of one so far!

Full post...

### Big Bang

[Books_] Simon Singh's latest (and he claims last) book on how the Big Bang theory was painstakingly developed in about a half century. It is well written and I would put it in the must read list for people trying to understand or explain what science is and what seperates it from non-science (along with "Night comes to the Cretaceous", my other favorite on this subject). I didn't enjoy Big Bang as much as I did Simon Singh's previous two books because I knew most of the stuff about the Big Bang history. (His previous books - "Fermat's last theorem", and the "Codebook" are among my all time favorites.)

I think these two stories, (big bang, dinosaur killing comet) and the founding of geology - discovery of deep time would make a great trio for a lecture. On this last subject, there seems to be several books telling the story of several people:

- the map that changed the world (william smith 1769-1839)
- the man who found time (james hutton 1726-1797)
- the seashell on the mountaintop (nicolaus steno 1638-1686)

### SVD, SVM, EM

[Math_ Learning_] The field of machine learning is full of acronyms, and nobody tells you what each method is good for! There are common issues in each machine learning problem: lack of sufficient data, presence of irrelevant and/or redundant attributes. It would be nice to have a review that looked at each ML approach with these in mind.

Here is some observations:

SVD is great because it deals with the sparse data problem in very high dimensional spaces (e.g. histograms). The SVD mapping of samples to a smaller dimensional space makes noisy data points (e.g. histograms based on few data points) align themselves with less noisy ones and allows hidden similarities to surface.

SVM (or other generalized discriminant methods) is great because when you work with discriminants, irrelevant dimensions hurt you less. A nearest neighbor approach can certainly be misled by adding a number of random dimensions that make similar points look far away. In a discriminant approach, if the data is separable in a lower dimensional space, they will certainly remain separable when you add extra irrelevant dimensions.

EM is great because it has the ability to uncover underlying structure. We are not simply talking about classifying points into positive and negative examples. We are talking about taking surface observations and asking what sort of unobserved mechanism gave rise to them. I think this is very little appreciated, as the "hidden" stuff people look for consist of something very simple (like HMM states). But I showed that the "hidden" stuff can be something as complex as a parse tree!

Full post...

## October 18, 2005

### Method of utilizing implicit references to answer a query

Deniz Yuret. United States Patent 6,957,213

Abstract: A method of utilizing implicit references to answer a query includes receiving segments of text, wherein individual segments have elements. Implicit references are inferred between elements of the segments. The inferring operation includes identifying implicit references to antecedent elements. A query is received. In response to the query, one or more segments are identified as relevant to the query based at least in part on the implicit references.