(c) Deniz Yuret 2007
glookup reads ngram patterns (possibly containing wildcards) from stdin, finds their counts in one pass from google ngram data, and prints the results.
The input should have a single pattern on each line consisting of space separated tokens with '_' representing the wildcard token that matches any word. The output will have up to three counts (tab separated) next to the pattern:
n0: the total count of the ngrams matching a given pattern.
n1: the number of distinct ngrams matching a given pattern. This is only output for patterns with wildcards.
n2: the number of distinct words that appear as the last word in a pattern that ends with a wildcard and has more than one wildcard. This is needed for Kneser-Ney smoothing.
Please see the README file and the user manual for more information.