## October 21, 2005

### A different representation for integers

[Math_] The fundamental theorem of arithmetic states that every positive integer has a unique factorization into primes. Instead of the usual decimal notation where each position represents a multiple of powers of ten, we can represent each integer with a "primal" notation where each position is the power of a prime number. Here is what I mean:

2: (1)
3: (1 0)
4: (2)
5: (1 0 0)
6: (1 1)
7: (1 0 0 0)
8: (3)
9: (2 0)
10: (1 0 1)
11: (1 0 0 0 0)
12: (1 2)

The right-most position is power of two's, next to it is power of three's and so on. The intuition one gets from such a representation is remarkable. Let's prove a simple bound related to the prime number theorem (i.e. how many primes are there less than a given number?)

Let p(n) be the number of primes less than n where 2^k <= n < 2^(k+1). Now consider all the numbers smaller than n. Their primal representations are all distinct, can have at most p(n) elements, and none of the elements can be greater than k. This gives us the inequality:

k^p > n
p lg(k) > lg(n)
p > lg(n) / lg(lg(n))

Not a very tight bound, but could be used as an alternative to Euclid's proof of the infinity of the number of primes :)

One thing about our primal representation is bothering me. Namely the elements themselves are integers, but we are denoting them with regular decimal digits instead of the primal representation! The primal representation for 1 should be (). Zero is more of a problem. Let's keep it as 0 for now and see what we get:

1: ()
2: (())
3: (() 0)
4: ((()))
5: (() 0 0)
6: (() ())
7: (() 0 0 0)
8: ((() 0))
9: ((()) 0)
10: (() 0 ())
11: (() 0 0 0 0)
12: (() (()))

Much better. This representation gives us a one-to-one mapping between all integers and a certain subset of binary trees or s-expressions for those of you familiar with lisp jargon. Unfortunately not all possible binary trees or s-expressions with 0 are meaningful (for example (0)). There must be an even more elegant mapping, but I couldn't think of one so far!

Full post...

### Big Bang

[Books_] Simon Singh's latest (and he claims last) book on how the Big Bang theory was painstakingly developed in about a half century. It is well written and I would put it in the must read list for people trying to understand or explain what science is and what seperates it from non-science (along with "Night comes to the Cretaceous", my other favorite on this subject). I didn't enjoy Big Bang as much as I did Simon Singh's previous two books because I knew most of the stuff about the Big Bang history. (His previous books - "Fermat's last theorem", and the "Codebook" are among my all time favorites.)

I think these two stories, (big bang, dinosaur killing comet) and the founding of geology - discovery of deep time would make a great trio for a lecture. On this last subject, there seems to be several books telling the story of several people:

- the map that changed the world (william smith 1769-1839)
- the man who found time (james hutton 1726-1797)
- the seashell on the mountaintop (nicolaus steno 1638-1686)

### SVD, SVM, EM

[Math_ Learning_] The field of machine learning is full of acronyms, and nobody tells you what each method is good for! There are common issues in each machine learning problem: lack of sufficient data, presence of irrelevant and/or redundant attributes. It would be nice to have a review that looked at each ML approach with these in mind.

Here is some observations:

SVD is great because it deals with the sparse data problem in very high dimensional spaces (e.g. histograms). The SVD mapping of samples to a smaller dimensional space makes noisy data points (e.g. histograms based on few data points) align themselves with less noisy ones and allows hidden similarities to surface.

SVM (or other generalized discriminant methods) is great because when you work with discriminants, irrelevant dimensions hurt you less. A nearest neighbor approach can certainly be misled by adding a number of random dimensions that make similar points look far away. In a discriminant approach, if the data is separable in a lower dimensional space, they will certainly remain separable when you add extra irrelevant dimensions.

EM is great because it has the ability to uncover underlying structure. We are not simply talking about classifying points into positive and negative examples. We are talking about taking surface observations and asking what sort of unobserved mechanism gave rise to them. I think this is very little appreciated, as the "hidden" stuff people look for consist of something very simple (like HMM states). But I showed that the "hidden" stuff can be something as complex as a parse tree!

Full post...

## October 18, 2005

### Method of utilizing implicit references to answer a query

Deniz Yuret. United States Patent 6,957,213

Abstract: A method of utilizing implicit references to answer a query includes receiving segments of text, wherein individual segments have elements. Implicit references are inferred between elements of the segments. The inferring operation includes identifying implicit references to antecedent elements. A query is received. In response to the query, one or more segments are identified as relevant to the query based at least in part on the implicit references.

## September 21, 2005

### Volkan Kurt, M.S. 2005

Last position: IT director, Markafoni, Istanbul. (twitter).
M.S. Thesis: Protein Structure Prediction Using Decision Lists. Koç University Department of Computational Sciences and Engineering, September 2005. (Download PDF).

Abstract:
Proteins are building blocks of life. Structure of these building
blocks plays a vital role in their function, and consequently in the
function of living organisms. Although, increasingly effective
methods are developed to determine protein structure, it is still
easier to determine amino acid sequence of a protein than its folded
structure and the gap between number of known structures and known
sequences is increasing in an accelerating manner. Structure
prediction algorithms may help closing this gap.

In this study, we have investigated various aspects of structure
prediction (both secondary and tertiary structure). We have
developed an algorithm (Greedy Decision List learner, or GDL) that
learns a list of pattern based rules for protein structure
prediction. The resulting rule lists are short, human readable and
open to interpretation. The performance of our method in secondary
structure predictions is verified using seven-fold cross validation
on a non-redundant database of 513 protein chains (CB513). The
overall three-state accuracy in secondary structure predictions is
62.5% for single sequence prediction and 69.2% using multiple
sequence alignment. We used GDL to predict tertiary structure of a
protein based on its backbone dihedral angles phi and psi. The
effect of angle representation granularity to the performance of
tertiary structure predictions has been investigated.

Existing structure prediction approaches build increasingly
sophisticated models emphasizing accuracy at the cost of
interpretability. We believe that the simplicity of the GDL models
provides scientific insight into the relationship between local
sequence and structure in proteins.

## September 11, 2005

### Başak Mutlum, M.S. 2005

Current position: Principal Program Manager, Microsoft, Chicago.
M.S. Thesis: Word Sense Disambiguation Based on Sense Similarity and Syntactic Context. Koç University Department of Computer Engineering, September 2005. (Download PDF).

Abstract:
Word Sense Disambiguation (WSD) is the task of determining the
meaning of an ambiguous word within a given context. It is an open
problem that has to be solved effectively in order to meet the needs
of other natural language processing tasks. Supervised and
unsupervised algorithms have been tried throughout the WSD research
history. Up to now, supervised systems achieved the best
accuracies. However, these systems with the first sense heuristic
have come to a natural limit. In order to make improvement in WSD,
benefits of unsupervised systems should be examined.

In this thesis, an unsupervised algorithm based on sense similarity
and syntactic context is presented. The algorithm relies on the
intuition that two different words are likely to have similar
meanings if they occur in similar local contexts. With the help of a
principle-based broad coverage parser, a 100-million-word training
corpus is parsed and local context features are extracted based on
some rules. Similarity values between the ambiguous word and the
words that occurred in a similar local context as the ambiguous word
are evaluated. Based on a similarity maximization algorithm,
polysemous words are disambiguated. The performance of the algorithm
is tested on SENSEVAL-2 and SENSEVAL-3 English all-words task data
and an accuracy of 59% is obtained.

## August 26, 2005

### Singular Value Decomposition Notes

This post is dedicated to figuring out why SVD works when it does. The image below is a Java applet that demonstrates the use of SVD for image compression. Click on the image and use the arrow keys on your keyboard to change the rank of the pixel matrix. Up/down keys change the rank by a factor of two, left/right keys increase and decrease the rank by one. The original image is 256x256. It is still recognizable when we reduce it to 16x16. The comments below are notes I took while I was trying to understand SVD.
Full post...

## August 19, 2005

### Semantic Relations

[Language_] --DRAFT-- Peter Turney's work on analogy of word pairs

## June 26, 2005

[Math_] Links to websites with interesting math problems.

www.cut-the-knot.org
Interactive math miscellany and puzzles. Definitely the best.

rec.puzzles
The rec.puzzles newsgroup has an archive and an FAQ page
which contain a wealth of information about many interesting problems.

sci.math
This is the FAQ page for the sci.math newsgroup which
is another place you can search for answers.

Shack's math problems
This is a similar page that contains
interesting math problems.

IMO
This is the
yearly competition for high school students.

Chance
This is a course developed at Dartmouth aiming to make students more informed, critical readers of current news stories that use probability and statistics.

Geopuzzles
This is a database of geometry problems.

Heard on the Street
This is a collection of questions asked in Wall Street job interviews.

Full post...

### Unsolved Problems

These are the 24 problems from the wonderful book "Old and new unsolved problems in plane geometry and number theory" by Victor Klee and Stan Wagon, MAA, 1991. Some problems are too cryptic to understand without the book. I'll type more explanations some day...

#### Plane Geometry

• Is each reflecting polygonal region illuminable?
(Imagine a room in a 2-D universe. If the walls form a polygonal shape
and they reflect light, can you always illuminate the whole room with
a single light source?)
• A point P inside a closed convex curve C is called equichordal if every chord of C drawn through point P has the same length. Is there a curve with two equichordal points?
(Marek Rychlik kindly informed me of his solution to this problem.  His homepage has references for the published work.)
• When congruent disks are pushed closer together, can the area of their union increase?
• If a convex body C contains a translate of each plane set of unit diameter, how small can C's area be?
• What is the smallest number f(n) such that any set of f(n) points (non-collinear) always contains the vertices of some convex n-gon?
• If n points are not collinear, must one of them lie on at least n/3 connecting lines?
• Is there a polygon that tiles the plane but cannot do so periodically?
• What is the minimum number of colors for painting the plane so that no two points at unit distance receive the same color?
• Can a circle be decomposed into finitely many sets that can be rearranged to form a square?
• Does the plane contain a dense rational set?
• Does every simple closed curve in the plane contain all four vertices of some square?
• Does each nonseparating plane continuum have the fixed-point property?

#### Number Theory

• Do there exist positive integers x, y, and z and an integer n>=3 such that x^n + y^n = z^n?
(Too late for this one :^) Wiles got it in 1993.  Check out http://www.best.com/~cgd/home/flt/flt01.htm for the proof.
• Does there exist a box with integer sides such that the three face diagonals and the main diagonal all have integer lengths?
• Does the greedy algorithm always succeed in expressing a fraction with odd denominator as a sum of unit fractions with odd denominator?
• Is there an odd perfect number? Are there infinitely many even perfect numbers?
(A perfect number is one whose factors add up to itself, e.g. 6, 28)
• Do the nontrivial zeros of the Riemann zeta function all have real part 1/2?
• Is there a polynomial-time algorithm for obtaining a number's prime factorization?
• Is every positive integer eventually taken to the value 1 by the 3n+1 function?
(3n+1 function is defined as f(n) = n/2 (if n even); 3n+1 (if n odd))

• Is there an algorithm to decide whether a polynomial with integer coefficients has a rational root?

#### Interesting Real Numbers

• Are the digits in the decimal expansion of pi devoid of any pattern?
• Are pi and e algebraically independent? Is their ratio rational?
• If an irrational number is real-time computable, is it necessarily transcendental? Is sqrt(2) real-time computable?
(A number is real-time computable if the amount of time spent computing its digits is linear in the number of digits.)
• Is 1 + 1/2^5 + 1/3^5 + ... irrational?

### Mycoplasma genitalium

[Biology_] I was searching the net for any microarray data for mycoplasma genitalium. I found anything but... Here is a list of some useful links.

Cell Cycle Regulated Yeast Genes Spellman's yeast microarray paper and data.

SMD : Microarray Resources : Databases

EBI Databases - ArrayExpress Home Ebi microarray data. Contains some yeast.

ExpressDB Database of E.coli and yeast microarray data.

microarray data List of microarray data sources.

Expasy 191 out of 486 swiss-prot matches to M.G. are "hypothetical" proteins, i.e. we don't have a clue what they are doing...

PDB Query Result Only three M.G. proteins are found in PDB.

Swiss-Prot taxonomic query: Mycoplasma genitalium List of all genes in M.G. with descriptions. Only can get the first 100 for some reason...

the Gene Ontology website: Tries to organize gene functions.

2can Bioinformatics Educational Resource: Home Page great site that includes bioinformatics tutorials and a section on protein function.

Proteomics - Transcription Factors: some info on protein function

Saccharomyces cerevisiae

Genome Browser Need to figure out how to use this.

Genome Sizes:
E. coli 4,639,221 4,377 4,290 of these genes encode proteins; the rest RNAs
Mycoplasma genitalium 580,073 483 three of the smallest true organisms
Ureaplasma urealyticum 751,719 652
Mycoplasma pneumoniae 816,394 680
Saccharomyces cerevisiae 12,495,682 5,770 Budding yeast. A eukaryote.

Mycoplasma genitalium Background

Entrez PubMed: " The minimal gene complement of Mycoplasma genitalium.

Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM, et al.

Institute for Genomic Research, Rockville, MD 20850, USA.

The complete nucleotide sequence (580,070 base pairs) of the Mycoplasma genitalium genome, the smallest known genome of any free-living organism, has been determined by whole-genome random sequencing and assembly. A total of only 470 predicted coding regions were identified that include genes required for DNA replication, transcription and translation, DNA repair, cellular transport, and energy metabolism. Comparison of this genome to that of Haemophilus influenzae suggests that differences in genome content are reflected as profound differences in physiology and metabolic capacity between these two organisms.

PMID: 7569993 [PubMed - indexed for MEDLINE]

"
Mycoplasma genitalium G-37 Information TIGR did the sequencing of MG, this is its homepage.

Bacteria
The DNA sequences of the complete genomes of seven mycoplasmas have been determined, including

* Mycoplasma genitalium has 580,073 base pairs of DNA encoding 517 genes (480 for proteins; the rest for RNAs).
* Mycoplasma urealyticum has 751,719 base pairs of DNA encoding 651 genes (613 for proteins; 39 for RNAs).
* Mycoplasma pneumoniae has 816,394 base pairs of DNA encoding 679 genes.

How many genes does it take to make an organism?
The scientists at The Institute for Genomic Research (TIGR) who determined the Mycoplasma genitalium sequence followed this work by systematically destroying its genes (by mutating them with insertions) to see which ones are essential to life and which are dispensable. Of the 480 protein-encoding genes, they conclude that only 265–350 of them are essential to life.

Genome Sizes Here is a page that has the sizes of the smallest genomes.

NCBI HomePage The site that I found the complete genome of mycoplasma genitalium.

Full post...

## June 16, 2005

### Analysis cheat sheet

[Math_] Notes on analysis... --DRAFT--

• Function spaces
• Metric spaces
• Convergence
• Measure
• Lebesgue measure
• Lebesgue integral
• Riemann integral
• Lebesgue-Stieltjes integral

• Dense subsets
• Separable spaces
• Complete metric spaces
• Compact metric spaces
• Compactness and continuity
• Linear spaces
• Linear functionals
• Norms and semi-norms of linear spaces
• Convergence revisited
• Euclidean spaces
• Orthogonality and bases
• Hilbert spaces
• Delta functions
• Fourier transform
• Functional derivatives
• Expectations
• Law of large numbers

Full post...

### The Way of Analysis

[Books_ Math_] I have been looking for a self-study book for Analysis for a while. All the books I have looked at more or less cover the same content. They all define "compact sets" for example. However I hadn't found one that gives any motivation for why such a weird thing should be defined. Mathematicians typically do not dream up unintuitive definitions just to annoy future students. There is a problem and a historical context that makes the definition necessary. I am sure these things are intuitively obvious to the authors but for those of us not lucky enough to have had a math undergrad degree, the lack of necessary background information in textbooks is a problem. I like "The Way of Analysis" because it is the only book I have seen that tries to motivate all its definitions. Best to illustrate with some examples:

On "Open Sets and Closed Sets" (pp. 86):

"It may seem a trifling matter whether or not the endpoints are included in the interval, but it makes a significant difference for certain questions. In the open interval, every point is surrounded by a sea of other points. This is the qualitative feature we will want when we define the notion of open sets; it is certainly not true of the endpoints of the closed interval. On the other hand, an open interval (a, b) seems to be "missing" its endpoints. Although they are not points in the interval, they can be approached arbitrarily closely from within the interval. It is as if they had been unfairly omitted. The closed interval has all the points it should from this point of view, and this is the closed aspect that we will generalize when we define a closed set."

On "Compact Sets" (pp. 99)

"Infinite sets are more difficult to deal with than finite sets because of the large number of points they contain. Nevertheless, there is a class of infinite sets, called compact sets, that behave in certain limited ways very much like finite sets... In what way can an infinite set behave like a finite set? Consider the infinite pigeon-hole principle: if an infinite number of letters arrive addressed to a finite number of people, then at least one person must receive an infinite number of letters. In more conventional terms, if x1, x2, ... is an infinite sequence of real numbers, and each xj belongs to a finite set A then at least one element of A must be equal to xj for an infinite number of j's. (my note: the sequence contains only a finite number of distinct numbers, so at least one of these numbers must repeat infinite times). Now, if A were an infinite set, this statement is obviously false. However, we could hope for a slightly weaker conclusion: that A contains a limit-point of the sequence. (my note: for any infinite sequence of real numbers x1, x2, ... made up of elements from A, at least one element of A must be arbitrarily close to an infinite number of xj's) ... Let us take this property as the definition of compactness."

Compare this with the usual definition given for compactness: "A topological space T is said to be compact if every open cover of T has a finite subcover." (from Kolmogorov's Introductory Real Analysis).

Note: In Prime Obsession John Derbyshire defines Analysis simply as "the study of limits". I am ashamed to admit that this was news to me after winning a bronze medal at IMO and spending 12 years at MIT. Thus Prime Obsession is also highly recommended if you need some light reading which won't strain your brain and give you some insight into how mathematicians think.

## June 03, 2005

### Amino-acid classification terminology

[Biology_] Meanings of some biochemical terms used to classify amino acids.

Aliphatic - carbon only chain
ala, val, ile, leu, pro?, gly?
Aromatic - carbon ring
phe, tyr, trp
Hydroxyl - carbon chain ends with OH
ser, thr, tyr?
Amide - carbon chain ends with NH2
asn, gln
Basic - carbon chain ends with NHx+
lys, arg, his?
Acidic - carbon chain ends with O-
asp, glu
Sulphur - carbon chain ends with S
cys, met

### Bijective Representations of Trees Using Cons Pairs

[Hacking_ Math_] --DRAFT-- Different ways of representing trees with list structure: We need to distinguish two things: the branching factor of the tree (binary vs variable), and whether internal nodes have any content. We are interested in one-to-one and onto (bijective) representations, any arbitrary list structure should correspond to a unique tree and vice versa.

1. Binary tree, no internal content: trivial representation with the cons pairs. A tree is a pair of two elements. An element is either an atom or another tree.
2. Variable branch, no internal content: simple representation given in the 6.001 tree lecture. A tree is a list of children. A list is a null terminated chain of cons pairs. A child is either an atom or another tree.
3. Binary tree, internal content: typical C representation would use a struct of three elements: left, right, and content. One possibility with cons pairs is to define a node as two pairs: (cons content (cons left right)). (is this bijective?)
4. Variable branch, internal content: SGF defines a format based on DFS walking (pre-order?) of the tree. Here is an EBNF definition:

GameTree = "(" Sequence { GameTree } ")"
Sequence = Node { Node }
Node = ";" { Property }

{...} represents 0 or more repetition. So a tree is a list of nodes followed by a list of trees. Each node is an atom.

A 1-branch tree gives simply a linked list. In terms of cons pairs: there are two types of pairs: node pairs have atom cars and tree pairs have pair cars. The car of a node pair is its content, the cdr may point to any type of pair or nil. The car of a tree pair is a node pair and its cdr is a tree pair or nil. There are never two open parens in a row so this is obviously not bijective.

## May 20, 2005

### Mathematical Go

[Math_ Books_ Games_] --DRAFT-- Berlekamp and Wolfe's Mathematical Go is a book on the wonderful subject of the relation between Combinatorial Game Theory (CGT) and end games in Go. I am a computer scientist interested in game playing programs and it seems the ideas of CGT may help conquer Go, one of the last frontiers of computer game playing (in most other popular games, computers have approached or passed the world championship level). I have tried to read and understand the book a couple of times, each time getting more frustrated. This may have three possible reasons: (i) I am dumb, (ii) the concepts are just too complicated, or (iii) there is something wrong with the exposition.

Now, it is a fact that more people in the world know about Go than
CGT. So it would make sense to use the language of Go to explain
difficult concepts in CGT. But the authors of course (being
mathematicians) have chosen to use the arcane language of CGT to
explain its relation to Go. It is no surprise that mathematicians
prefer a cryptic but exact language to an intuitive but fuzzy one. Yet
I have a suspicion that there could be a better way to explain these
ideas, and this is my attempt. The target audience is people who know
a little bit about Go and a little bit about game search algorithms.

Motivation

A Go game usually revolves around several semi-independent regions on
the 19x19 board. This is unlike chess, where a move in one corner of
the board usually effects the whole game. Thus typical chess programs
perform a (optimized) search on a tree of all possible legal
moves. Such a tree contains around b^n moves to analyze if (i) we want
to look ahead n moves and (ii) the number of legal moves at each turn
(the branching factor) is around b. The optimizations bring the actual
analyzed number to around b^(n/2). For chess, the branching factor is
around 10-20, and 12-15 move look-aheads are routinely performed by
modern programs.

Go starts with a 360+ branching factor. Thus searching a single tree
of all legal moves is out of the question. But note what happens if we
can successfully split the game into k independent regions. Each
region will have b/k legal moves on average. An optimized n move
look-ahead search will need to analyze (b/k)^(n/2) moves for each
region. If we could combine the results of these searches and compute
the best move in constant time using CGT, we will only need to search
k * (b/k)^(n/2) which is much smaller than b^(n/2). For example if b =
100 and we want a 10 move look ahead, the full search will cost us 10
billion moves (about three hours at a million moves per second). If we
could split the board into 5 equally sized independent games, search
each one 10 moves ahead and combine the results, the search would cost
16 million moves (16 seconds at a million moves per second).

Of course it is rarely possible to isolate one part of a Go board from
the rest. The examples in Mathematical Go exclusively deal with
completely isolated regions surrounded by unconditionally live
groups. Such a region has no effect on the rest of the board except
for the timing of the moves. However with further progress, it may be
possible to apply the theory to weakly interacting regions. Spatially
dividing the game seems to me the only option if search is going to
have any chance.

An Empty Corridor

So let's try to understand what CGT is trying to tell us. Here is an
isolated region (assuming the surrounding white and black stones are
alive).

$$Empty corridor$$ O X X X X$$O a b c X$$ O X X X X

The search tree is simple. If it is white's turn, he'll move into (a)
and black will defend at (b) to get 1 point of territory, if it is
black's turn, he'll move into (a) and get 2 points of territory. If
this was the only active region left on the board this is all the
analysis we would need.

However we should consider potential moves in other parts of the
board. If black has a better prospect elsewhere, he may choose not to
defend his corridor, in effect "passing" in this region of the
board. The Go term for such "playing elsewhere" is tenuki. When would
black do such a thing? If it is black's turn we said he can grab 2
points. If he plays tenuki, and white attacks at (a) black can take at
most 1. If he plays tenuki twice, white may not give him any points in
the corridor. So every tenuki costs black one point up to two
points. Black will choose to play elsewhere if such a move is worth
more than a point. He may be indifferent if such a move is worth one
point.

To decide whether to spend our next move in this region rather than
some other part of the board, does the location or the exact shape of
the corridor matter? Obviously not. Does the size of the corridor
matter? We will see that sometimes it does. What is the exact
information we need to make the correct decision? All we need is the
analysis from the last paragraph, i.e. black can play tenuki twice at
a regional cost of one point each. I'd like to write this out as
[1,1,0]. CGT represents it using the number 1+1/4.

The first big idea of the CGT is that we can represent these isolated
regions with such small bits of information and mostly forget about
the details. The second big idea is to use these small bits of
information to calculate (i) which region contains the best move, and
(ii) who is winning by how much assuming ideal play. To illustrate how
this calculation works, we need an example with more than one region.

Lots of empty corridors

Let's assume that our endgame consists of n copies of the above
corridor. If it is white's turn, he can attack the first corridor,
black can defend and get one point, white can attack the second
corridor, etc. and black will have n points at the end.

$$w Bad black strategy$$ O X X X X$$O 1 2 . X$$ O X X X X$$O 1 2 . X$$ O X X X X

But this is not the best possible strategy. Black may choose not to
defend where white attacks and instead may block another corridor to
get 2 points. In fact he can get an extra point using this strategy
if there are four corridors:

$$w Better black strategy$$ O X X X X _ _ O X X X X$$O 1 5 . X _ _ O 1 3 . X$$ O X X X X _ _ O X X X X$$O 2 . . X _ _ O 2 . . X$$ O X X X X _ _ O X X X X$$O 3 6 . X _ _ O 4 . . X$$ O X X X X _ _ O X X X X$$O 4 . . X _ _ O 5 6 . X$$ O X X X X _ _ O X X X X

Verifying the following is left as an exercise for the reader:

* The ideal strategy for both black and white is to play at the edge
of the longest remaining corridor.
* If white attacks first, playing the ideal strategy, black can get
one point for each corridor plus an extra point for every fourth
corridor.
* If black defends first, he gets two points for the first corridor
and refer to the previous rule for the other corridors.

Now the CGT value of 1+1/4 for this position may make a little more
sense. If you add up the values of all the corridors you get a
fractional number of points. The player on move will be able to round
the fraction in his favor.

Draft...

Here is a slightly different example:

This time if it is black's turn, he can get 4 points (one for the prisoner). If he plays elsewhere, and white attacks at "a", he can get 3 points. If he tenukis twice and white keeps attacking, he can still get 2 points. But if he tenukis a third time, white will save his stone and black will get zero points. So the tenuki costs for black this time are [1,1,2]. CGT represents this situation with an infinitesimal (something that is greater in absolute value than zero yet smaller than any positive real number) called "double up star" (written "^^*").

A third example to introduce another member of the CGT menagerie:

In the top corridor black will get 4, 3, 0 points respectively if he tenukis 0, 1, and 2 times. In my notation this is [1,3]. CGT calls it TINY-1. The bottom corridor gives black 5, 4, 0 points to black if he tenukis 0, 1, and 2 times. The costs of the tenukis are [1,4]. According to CGT, this is TINY-2. Tinies are things that are smaller than infinitesimals yet still larger than zero. Opposite of a tiny is called a miny.

So the first big idea of the CGT is that we can represent these isolated regions with small amounts of information and mostly forget about the details. The second big idea is to use these small bits of information to calculate (i) who is winning by how much, and (ii) in which region is the best move.

$$X O O O O O$$ X a b c d O$$X O O O O O$$ X e f O . . X O O O . .

2+1/8, 1/2
wa,be - 3
we,ba,wb - 3
ba,we,bb,wc - 2
be,wa - 3
if g: be,wf,ba,wb - 3 (2+1/8,1+1/2)
ba,wb,be,wf - 3

## May 17, 2005

### 5000 B.C. and Other Philosophical Fantasies

[Books_] This little book by Smullyan is a treat for anybody interested in philosophy. He calls it "philosophical fiction", little stories and dialogues that I think illustrate deep ideas in a more colorful way than dry philosophical ramblings. "Mind's I" by Hofstadter and Dennett is the best collection of phi-fi stories I have read and includes three pieces by Smullyan. "Godel, Escher, Bach" should of course be on top of everybody's reading list. I consider Kurzweil's dialogue in "The age of intelligent machines" one of the prime examples. Maybe there are more recent examples. "Sophie's World" could be considered one, but it is too directly about philosophy rather than its ideas. Some movies like "Matrix" could be in the list, but most are too confused and contain too many inconsistencies. (Except "Dark City" which I consider to be the best in this genre.) Paradoxes are ok, but inconsistencies are not in good taste for philosophy. Nevertheless "philosophical fiction" movies have always been my favorites.

More about Smullyan: I declared him my spiritual leader after reading his "Tao is Silent". (He doesn't know yet.) In fact I was first introduced to Smullyan from his two retrograde chess puzzle books (which are required reading if you like puzzles). At around the same time I also read "Mind's I" but probably did not realize it was the same guy who wrote some of my favorite pieces. I then ran into some of his logical puzzle books. After reading Tao is Silent, I finally started putting the pieces together and realized it was all one and the same person. He has one of these crisp, high-precision minds that I so admire in people like Bertrand Russell and Marvin Minsky, but unlike them, he is genuinely interested in mystical stuff! It turns out he is also a professional magician, piano player, logic professor, family relation to Marvin Minsky, doctoral student of Alonzo Church (are all my favorite people related?), and he loves dogs and builds telescopes. I think that qualifies him as a spiritual leader for anybody, including himself.

Recently I decided to collect all his books, most of which are shamefully out of print. (abebooks.com is a great site for finding out of print books.) I haven't read them all but here is a list:

Spritual Books:
- The Tao is Silent
- This Book Needs No Title (especially the last chapter "Planet without laughter")
- 5000 B.C. and Other Philosophical Fantasies (does philosophy count as spiritual?)
- Who Knows? - A Study of Religious Consciousness (this is my least favorite of the three, probably because I don't find the idea of Hell very interesting.)

- The Chess Mysteries of the Arabian Knights
- The Chess Mysteries of Sherlock Holmes

Logic Puzzles:
- What is the Name of this Book?
- The Lady or the Tiger?
- Alice in Puzzleland
- To Mock a Mockingbird
- Forever Undecided
- Satan, Cantor and Infinity
- The Riddle of Scheherazade and Other Amazing Puzzles

Autobiography:
- Some Interesting Memories

## April 26, 2005

### Are seven primitive lisp operators all necessary?

[Hacking_ Math_] --DRAFT-- Paul Graham talks about the seven primitives of LISP.

## April 25, 2005

### On Lisp

[Hacking_ Books_] Last weekend, I had a chance to browse Paul Graham's website, read his wonderful book "On Lisp", and learn about his new language idea "Arc". This is more of my ramblings inspired by his work rather than commentary on the book.

Lately, most of my programming projects seem to go through
the same cycle. I first write a program in Perl. Eventually I
run into some sort of memory related performance bottleneck. (My
programs typically work with large datasets and/or build large
statistical models.) Then I rewrite the whole program painfully
in C. All the while, I actually want to be using Lisp. What I
need is a Lisp variant that offers the advantages of Perl and C.

Why do I start with Perl? In one word: "conciseness". It takes
me five minutes to perform some task in Perl that would take me an
hour in C. Perl programs are concise. The operator names are
concise. There are concise ways to express things you most need:
hashes, extensible arrays, regular expressions, file I/O. I like
\$a{color} better than (gethash 'color a).

Why do I end up going to C? In one word: "overhead".
Implementors of high level programming languages like Java and
Perl do not worry about memory as much as they worry about speed.
Soon you find out that a short string actually costs 100 extra
bytes and since nobody expected you to create 100,000 hash tables
they crammed all sort of extra information into the data structure
that they could have kept outside. Any speed gain a smart
compiler can give you is worthless once your disk starts
thrashing. C gives you the control to spend as many bytes as you
want on whatever data structure you need.

Why do I want to use Lisp? If you are asking this you should read SICP and On Lisp or take a look at Paul Graham's articles on the subject.

So all I need is a "concise" Lisp with "no overhead"!