PARSERS
MINIPAR is a broad-coverage parser for the English
language. It represents its grammar as a network of nodes and links,
where the nodes represent grammatical categories and the links represent
types of dependency relationships.
The Collins-CFG parser
provides an effective means of dealing with the sparse data problems
inherent in the use of lexicalized context-free grammars. The parse tree
is split up into a series of lexicalized CFG rules, which are then in
turn split up into a sequence of decisions which build up each rule as a
combination of a pair of lexicalized non-terminals.
An extensible, parallel parsing engine that
accommodates many different types of generative, statistical parsing
models (including an emulation of
Mike Collins’ parsing model) and can
easily be extended to new domains and new languages.
POS TAGGERS
Brill's part-of-speech tagger implements a simple
rule based tagger using transform-based learning.
MXPOST is a JAVA (JDK 1.1) implementation of the
part-of-speech tagger described in:
Adwait Ratnaparkhi. A Maximum Entropy
Part-Of-Speech Tagger. In Proceedings of the Empirical Methods in
Natural Language Processing Conference, May 17-18, 1996. University of
Pennsylvania
THESAURUS
This is automatically constructed thesaurus by Dekang
Lin. For each word, the thesaurus lists up to 200 most similar words and
their similarities. The similar words are clustered (also
automatically).
Similar to the above thesaurus. But similarity is
computed based on the linear proximity relationship between words only,
whereas the above thesaurus used dependency relationships extracted from
a parsed corpus.
COMPUTATIONAL LEXICON
WordNet® is an online lexical reference system whose
design is inspired by current psycholinguistic theories of human lexical
memory. English nouns, verbs, adjectives and adverbs are organized into
synonym sets, each representing one underlying lexical concept.
Different relations link the synonym sets.
SIMILARITY PACKAGE
This is a CPAN module that implements a variety of
semantic similarity measures that can be used in conjunction with
WordNet. In particular, it supports the measures of Resnik, Lin,
Jiang-Conrath, Leacock-Chodorow, Hirst-St.Onge, Wu-Palmer,
Banerjee-Pedersen, and Patwardhan-Pedersen.