2.1.2 Knowledge Based Approaches
All of the methods
described up to now require considerable amount of work to create a
classifier for each entry in the lexicon. Because of this reason, they were
able to report results on very few lexical items (in general between 4 to 12
items). With the availability of large-scale lexical resources, such as
dictionaries, thesauri and corpora, work on WSD focused on large-scale
disambiguation. In the following subsections WSD based on machine readable
dictionaries, thesauri and computational lexicons will be presented.
2.1.2.1 Machine Readable Dictionaries
Machine readable
dictionaries (MRD) provide a ready-made information source of word senses.
The first attempt to use MRD’s came from Lesk (1986)[38]. He starts from the
simple idea that a word’s dictionary definitions are likely to be good
indicators of the senses they define. By using Oxford Advanced Learner’s
Dictionary (OALD), he counts overlapping content words in the sense
definitions of the ambiguous word and in the definitions of context words
occurring nearby and selects the sense that achieves the maximum number of
overlaps. The accuracy of the method is reported to be 50-70% which is a
good result regarding the fact that a fine set of sense distinctions have
been used such as OALD.
Cowie et al. (1992)[39]
tried to improve Lesk’s approach by optimizing the overlap of all words in a
single sentence simultaneously. However, it was found computationally very
expensive. Therefore, Cowie et al. (1992)[39] used simulated annealing
(Sampson 1986)[40] for the first time in natural language processing. They
evaluated this approach using a corpus consisting of 50 sentences extracted
from Longman Dictionary of Contemporary English (LDOCE). 47% of the words
were reported to be correctly disambiguated to sense level and 72% to more
rough grained senses.
Stevenson and Wilks
(2001)[41] computed the overlap by normalizing the contribution of a word to
the overlap count. Pedersen and Banerjee (2002)[42] described a different
version of the Lesk’s algorithm by employing glosses contained in WordNet
(Fellbaum, 1998; Miller et al., 1990)[19;43], a lexical database for English
which will be described later in detail.
Because of the fact that
dictionaries are created for human use, not for computers, there are some
inconsistencies (Veronis and Ide 1991; Ide and Veronis 1993a,
1993b)[44;45;46]. Although they provide detailed information at the lexical
level, they lack pragmatic information used for sense determination. For
instance, the relation between ash and tobacco, cigarette
or tray is very indirect in a dictionary whereas the word ash
co-occurs very frequently with these words in a corpus (Ide and Veronis,
1998)[47].
2.1.2.2 Thesauri
Thesauri provide
information about relationships among words. Thesaurus based
disambiguation makes use of the semantic categorization provided by a
thesaurus or a dictionary with subject categories. The most frequently
used thesaurus in WSD is Roget’s International Thesaurus (Roget, 1946)
which was put into machine-tractable form in 1950’s. The basic inference
in thesaurus-based disambiguation is that semantic categories of the
words in a context determine the semantic category of that context as a
whole. And this category is then determines the correct senses that are
used.
Walker (1987:254) proposed an algorithm as follows: each word
is assigned to one or more subject categories in the thesaurus. If the
word is assigned to several subjects, then it is assumed that they
correspond to different senses of the word. Black (1988: 187) applied
this approach to five different words and achieved accuracies around
50%.
Yarowsky (1992) proposed
an approach for marking words with their categories from Roget’s
thesaurus. The important aspect of the approach was that he used a
context of 50 words either side so that 100 words were considered in the
training examples for each ambiguous word. This method was tested on 12
ambiguous words and reported to achieve 92% accuracy. Yarowsky notes
that this method is best for extracting topical information, most
successful for nouns.
Similar to machine
readable dictionaries, a thesaurus is a resource for humans, so there is
not enough information about word relations. Therefore, Roget’s or any
other thesauri have not been used so extensively.
2.1.2.3 Computational Lexicons
The usefulness of lexical
relations in linguistic, psycholinguistic and computational research has
led to a number of efforts to create large electronic databases of such
relations. Beginning from the mid-1980’s, construction of semantic
lexicons by hand has emerged. Some examples of these lexicons are
WordNet (Fellbaum 1998; Miller et al. 1990), CyC (Lenat and Guha 1990),
ACQUILEX (Briscoe 1991), and COMPLEX (Grishman, Macleod, and Meyers
1994).
Since WordNet is the most
popular lexicon among the above and since it is used in the work done in
this thesis both for sense evaluation and for similarity measure, this
section presents detailed information about it.
WordNet
WordNet is an online
lexical reference system which was developed at
Princeton University under the direction of
Professor George A. Miller. It combines many features used for WSD
in one system. It includes definitions of word senses as in a
dictionary; it defines “synsets” of synonymous words representing a
single lexical concept like a thesaurus; and it includes word-to-word
relations.
WordNet consists of three
databases: noun database, verb database and one database for adjectives
and adverbs. Each database consists of lexical entries corresponding to
unique orthographic forms. Each form is associated with a set of senses.
|
Overview of noun car
The noun car has 5 senses (first 3 from tagged texts)
1. (598) car, auto, automobile, machine, motorcar -- (4-wheeled
motor vehicle; usually propelled
by an internal combustion
engine; "he needs a car to get to work")
2. (24) car, railcar, railway car, railroad car -- (a wheeled
vehicle adapted to the rails of railroad;
"three cars had
jumped the rails")
3. (1) cable car, car -- (a conveyance for passengers or freight
on a cable railway; "they took a
cable car to
the top of the mountain")
4. car, gondola -- (car suspended from an airship and carrying
personnel and cargo and power
plant)
5. car, elevator car -- (where passengers ride up and down; "the
car was on the top floor") |
Figure-1:
WordNet 2.0 entry for the noun “car”
WordNet’s sense entries
consist of a set of synonyms, a dictionary-style definition, or gloss, and
some example uses as shown in Figure-1. The numbers in front of some senses
are the frequency values obtained from SemCor corpus. This simple listing of
lexical entries looks like an ordinary dictionary. However, WordNet contains
a set of domain-independent lexical relations which hold among WordNet
entries, senses or sets of synonyms. Lexical relations are only available
within the same database. Figure-2, 3 and 4 show a subset of the relations
associated with each of the three databases.
|
Relation |
Definition |
Example |
|
Hypernym
Hyponym
Has-member
Member-of
Has-part
Part-of
Antonym |
From concepts to superordinates
From concepts to subtypes
From groups to their members
From members to their groups
From wholes to parts
From parts to wholes
Opposites |
breakfast → meal
meal → lunch
faculty → professor
pilot → crew
table → leg
course → meal
leader → follower |
Figure-2:
Noun relations in WordNet
|
Relation |
Definition |
Example |
|
Hypernym
Troponym
Entails
Antonym |
From events to superordinate events
From events to their subtypes
From events to the events they entail
Opposites |
fly → travel
walk → stroll
snore → sleep
increase → decrease |
Figure-3:
Verb relations in WordNet
|
Relation |
Definition |
Example |
|
Antonym (adjective)
Antonym (adverb) |
Opposite
Opposite |
heavy → light
quickly → slowly |
Figure-4:
Adjective and adverb
relations in WordNet
Synonym and hyponym
relations are the most useful relations presented in WordNet. Synonym means
different lexemes with the same meaning. Two WordNet entries are considered
to be synonyms if they can be successfully substituted in some context. A
synset is a set of synonyms. Consider the following example of a synset:
{ car, auto, automobile, machine, motorcar }
The gloss of this synset
describes it as “4-wheeled motor vehicle; usually propelled by an internal
combustion engine; "he needs a car to get to work" as shown in Figure-1 for
the first sense of noun car. Instead of representing concepts using
logical terms, WordNet represents them as synsets.
Each synset is related to
its immediately more general and more specific synsets via direct hypernym
and hyponym relations. To find chains of more general or more specific
synsets, one can follow a transitive chain of hypernym and hyponym
relations. As an example, the hypernym chain for the noun valley is
shown in Figure-5. In this figure, successively more general synsets are
shown on successive indented lines. The chain starts from valley and
ends in entity.
|
valley, vale
=> natural depression, depression
=> geological formation, formation
=> natural object
=> object, physical object
=>entity |
Figure-5:
Hyponymy chain for the noun
“valley”
Since WordNet provides the
broadest set of lexical information in one resource, it is used
widespreadly. Another reason for its wide-spread use is, it is the first
broad-coverage lexical resource that is freely available.
The earliest attempts to use
WordNet in WSD were in information retrieval field. Voorhees (1993) and
Richardson and Smeaton (1994) created knowledge bases using WordNet’s
hierarchy. Li et al. (1995) proposed a WordNet-based algorithm for WSD.
Disambiguation was done by semantic similarity between words and heuristic
rules. Heuristic rules were based on the semantic similarity and the WordNet
hierarchy. Leacock et al. (1998) used WordNet to counter data sparseness
problem. Hawkins (1999) built up a WSD system that works with frequency and
contextual information that is based on WordNet. Fellbaum et al. (2001)
proposed a system that made use of syntactic clustering and semantic
distinctions extracted from WordNet.
As seen, there has been
various work regarding WordNet. But WordNet is mostly used to determine
semantic similarity between senses. Resnik (1995a) computed information
content of words which is a measure of the specificity of the concept that
subsumes the words in the WordNet hypernym hierarchy. Agirre and Rigau
(1996) employ WordNet to determine the conceptual distance among concepts
whereas Mihalcea and Moldovan (1999) exploit semantic density and WordNet
glosses in an all words design. Lin (1997; 1998; 2000) described a semantic
similarity measure where similarity between two objects is defined to be the
amount of information contained in the commonality between the objects
divided by the amount of information in the description of the objects. We
used the same measure for the work conducted in this thesis. Detailed
information is given in Section 3.2.3.
Other approaches using
WordNet are Jiang (1997), Agirre and Martinez (2000), Agirre et al. (2001),
Hayres (2001) and Banerjee et al. (2003). A combination of MRDs and WordNet
has also been tried with success (Litkowski, 2000, 2001; Mihalcea and
Moldovan, 1998).
2.2 Information Sources for WSD
There are various
information sources or feature types used in WSD regardless of the type of
the approach. To disambiguate a word, a diversity of information, including
syntactic tags, word frequencies, collocations, semantic context,
role-related expectations, and syntactic restrictions can be considered.
In Agirre and Martinez
(2001), a comparison of WSD systems has been made based on the information
source they used. Some of these sources are as follows:
Frequency of senses:
Frequency
information is used to measure the likelihood of each possible sense
appearing in the text. Therefore this information is generally used in
statistical approaches and it is generally learned from hand-tagged data
such as SemCor corpus. WordNet senses are ordered according to the
frequencies of the senses in the SemCor corpus.
Part of speech (POS):
Part of speech tagging is regarded as the first step of the disambiguation
process if the tokens have the same orthographic forms but different
syntactic classes. It is necessary since it reduces the number of possible
senses a word can belong to. An orthographic form may even be unambiguous in
one syntactic class whereas it has more than one sense in another. For
instance, in WordNet 2.0 handle has 5 senses as a verb, but only one
sense as a noun. The impact of knowledge resources on WSD in examined in
Gaustad (2004). The results show that accurate POS information is beneficial
for WSD and that including the POS of the ambiguous word itself as well as
POS of the context increases the disambiguation accuracy.
Morphology:
It is defined specially as the relation between derived words and their
roots. For instance, the noun agreement has 6 senses, its verbal root
agree 7. A stemmer tries to reduce various forms of a word to a
single stem. Since English is a language with little inflectional
morphology, it is not certain that using morphology will lead to significant
improvements in WSD. With other languages, such as German or Italian,
morphology is of greater influence. Derivational morphology would be useful
only for disambiguating roots.
Collocations:
Collocation is the relationship among any group of words that tend to
co-occur in a predictable configuration. Disambiguation relies heavily on
collocational information. For example, the noun match has 9 senses.
However, it has only one possible sense in “football match”. It is
observed that collocations are strong indicators if they are learned from
hand-tagged corpora. Although they are strong, they should be used with
other sources. They should not be treated as rules for sense-filtering alone
(as in Hirst 1987 or Dahlgren, McDowell, and Stabler 1989).
Semantic word
associations:
These can be classified as follows:
i.
Taxonomial
organization:
This refers to the classification of words in a hierarchy and the
lexical-semantic relationships holding between words such as a dog is
a kind of animal. This kind of information can be extracted from
ontologies like WordNet.
ii.
Situation
and Topic:
Information about the situation or topic enables a WSD system to see the
ambiguous word in a broader context. For example, if the word mouse
is used in an office situation and the topic is computer use, the most
probable sense of the word mouse will be “computer tool”, not
“animal”. Semantic word associations around topic and situation are powerful
when learned from hand-tagged corpora. Associations learned from MRDs can
also be useful.
iii.
Argument-head relations:
These relations provide important clues for disambiguation such as the
relationship between dog and bite in the sentence “the dog
bite the man.” Yarowsky (1993) showed that collocation information can be
captured using argument-head relations. Argument-head relations work as a
strong indicator if they are given as a sense-to-word relation.
Syntactic structure
(subcategorization information):
Subcategorization and dependency relations refer to certain
kinds of relations between words and phrases. For example the verb want
can be followed by an infinitive, as in “I want to fly to Istanbul”,
or a noun phrase, as in “I want a flight to Istanbul”. But the verb
find cannot be followed by an infinitive. Verbs have several possible
patterns of arguments. A particular set of arguments that a verb can appear
with is referred to a subcategorization frame. Subcategorization frames
capture syntactic regularities about complements.
Gaustad (2004) used
dependency relations in his WSD system for Dutch and showed that syntactic
information provides very useful disambiguation clues and leads to a
significant increase in performance. For the work presented in this thesis,
we also used syntactic information and dependency relations, so detailed
information about this kind of knowledge source will be given in the
following sections.
Agirre and Martinez (2001)
made a comparison between the contributions of the above resources to WSD.
According to their observations, if learned from hand-tagged corpora,
collocations and semantic word associations are the most important knowledge
types for WSD, but they also mentioned that syntactic cues are equally
reliable. On the other hand, taxonomical information was found to be very
weak.
2.3 General Approach and Related Work
As can be seen from the
previous sections, most of the study in WSD is concentrated so far on
supervised learning since these algorithms usually achieve the best
performance. This fact is shown by the results of SENSEVAL-2 (Cotton et al.,
1998). However these systems have the drawback that they are only applicable
to the words for which sense tagged data is available, and they are strongly
dependent to the labeled training data available. In addition, almost all of
these systems make use of the picking the first sense heuristic.
Supervised systems and first
sense heuristic has come to a natural bound of approximately 70% precision
(Mihalcea, 2002). In order to improve this value, benefits of
knowledge-based or unsupervised alternatives should be examined.
In this thesis, we present a
disambiguation algorithm that relies on the information gained from
syntactic context and sense similarity. The WSD approach used in the system
cannot be specified as unsupervised since it uses a lexicon, namely WordNet,
for sense disambiguation and for semantic similarity metric; however, it may
also be regarded as unsupervised in the sense that no sense-tagged data is
needed for training. All details about the system will be given in
Chapter-3. In this section, a number of WSD systems that have employed
syntactic context in the past are introduced.
The algorithm used in this
thesis is very similar to the algorithm presented in Lin (1997). Lin
mentions the same problems that supervised algorithms face with: need for a
large sense-tagged training set and data sparseness. In order not to deal
with these problems, Lin proposes an algorithm adopting the intuition that
“two different words are likely to have similar meanings if they occur in
identical local contexts”. This means that the same knowledge sources are
used for all words and that instead of building separate classifiers for
each word, past usages of other words are used to disambiguate the current
word.
Using a broad-coverage
parser, a 25-million-word Wall Street Journal corpus was parsed and
dependency relationships were extracted. Dependency relationships were
stored as dependency triplets which were composed of the type of the
dependency relation, the word related to the ambiguous word via the
dependency relation and information on whether the word is the head or the
modifier in the relationship. A local context database was constructed for
each of these triplets (local contexts). An entry in the database consisted
of twenty words found from the training corpus that were most likely to
occur in that particular local context.
In the disambiguation
process, input is parsed and local contexts are extracted. Local context
database is searched to find the words that occurred in the same local
context as the ambiguous word. Based on a similarity maximization algorithm
between the words, the polysemous word is disambiguated. The algorithm used
in the similarity maximization part of both Lin’s work and our work is very
similar to the algorithm presented in (Resnik, 1995a).
Since Wall Street Journal is
mostly business news, Lin only used the “press reportage” of SemCor corpus
for testing. An accuracy of 56.1% is reported for nouns.
Li, Szpakowicz, and Matwin
(1995), presents an algorithm for automatic word sense disambiguation, based
on lexical knowledge contained in WordNet and on the results of
surface-syntactic analysis. The algorithm was designed to support text
analysis with minimal precoded knowledge. They proposed a set of heuristic
rules that are based on the idea that objects of the same or similar verbs
are similar. Disambiguation is done by computing the semantic similarity
between words and applying the heuristic rules based on this similarity.
Stetina et al. (1998)
achieve good results with syntactic relations as features compared to the
frequency baseline. They use a measure of semantic distance based on WordNet
to find similar features. The features are extracted using a statistical
parser (Collins, 1996), and consist of the head and modifiers of each
phrase.
Martinez et al. (2002)
discuss the contribution of various syntactic features to WSD: grammatical
relations coded as the presence of adjuncts/arguments in isolation or as
subcategorization frames, and instantiated grammatical relations between
words.
The set of syntactic
features was extracted using Minipar (Lin, 1993). Two types of syntactic
relations were taken into account: instantiated grammatical relations (IGR)
and grammatical relations (GR). IGR are triples of [wordsense relation
value] where the value is either the word form or the lemma. For GR, bigrams
[wordsense relation] and n-grams [wordsense relation1 relation2...] are
stored. Three types of n-grams are defined: all subcategorization
information including part of speech given by Minipar, subcategorization
information occurring in the sentence, and all dependencies in the parse
tree. Their results on the Senseval-2 lexical sample data for English
indicate that both types of syntactic relations, IGR and GR, together
provide the best F-score (IGR getting better precision than coverage and GR
lower precision, but higher coverage). They also came to the conclusion that
syntactic features effectively improve WSD precision.
Gaustad (2004) also
investigated whether structural syntactic information is helpful for lexical
semantic disambiguation or not. He used the Alpino dependency parser for
Dutch in order to extract the dependency relations. He reported that using
dependency relations in conjunction with the lemma of the ambiguous word and
its part of speech as features performs better than the model including only
immediately left and right context.
Chapter 3
Word Sense
Disambiguation Based on Sense
Similarity
and Syntactic Context
3.1 Introduction
Most corpus-based WSD
algorithms make use of the previous usages of a polysemous word in order to
disambiguate that word. These algorithms basically rely on the intuition
that “two occurrences of the same word have identical meanings
if they have similar local contexts”. However, this intuition can be
misleading. In order for this method to have the desired consequences; a
polysemous word must be observed several times before making a good
classifier. Knowing that there are nearly 15000 polysemous nouns in WordNet
2.0, the corpus must contain millions of words. If the corpus fails to
contain a polysemous word or all the senses of a word, the algorithm can
lead to false results.
In this thesis, we make
use of a WSD algorithm that relies on the intuition that “two different
words are likely to have similar meanings if they occur in
identical local contexts”. By this approach, a generalization can be
made since it allows using the knowledge learned for one word on other
words; instead of using a separate classifier for each polysemous word.
Hence, the size of the training corpus can be much smaller. Moreover, this
method is able to disambiguate infrequent or unseen words in the training
corpus.
The basic steps of the
algorithm are as follows:
i.
Parse the
training corpus and build up a local context database by extracting local
contexts according to some rules.
ii.
Similarly,
parse the input text and extract local contexts of each ambiguous word.
iii.
For each
ambiguous word w, search the local context database and find words
that are in an identical local context as w. These words are the
similar words set S of w which will be used in sense
selection.
iv.
Select a sense
of w that maximizes the similarity between w and its similar
words set S that was found in the previous step.
Consider the sentence
below:
Jane applied for a job.
The noun “job” has
15 possible meanings in WordNet 2.0. To disambiguate the noun “job”,
we consider the other words that appeared in an identical context as
“job”. According to our system, the local context of “job” is “N:prep-for:apply:head:V”,
that is there is a relation between the verb “apply” and the noun “job”
via the preposition “for”. In Table-1, the words that occurred in the
local context “N:prep-for:apply:head:V” are presented. The numbers in
table indicate the likelihood ratios of the words in that local context. The
meaning of “job” in the example can be determined by choosing one of its 15
senses that is most similar to the meanings of the words in Table-1.
|
N:prep-for:apply:head:V |
|
license 419.46
membership 324.86
job 299.67
approval 241.81
patent 233.65
permission 207.59
amnesty 198.54
permit 166.21
listing 141.75
loan 141.00 |
citizenship 100.85
asylum 92.37
visa 84.21
status 72.32
grant 69.68
exemption 59.04
legalization 58.55
noun.person 54.34
refund 52.63
residence 50.20 |
Table-1:
Words that have a relation with “apply” via “for”
This chapter is structured
as follows: Firstly we present the resources needed for the algorithm. In
section 3.3 and section 3.4, the concepts of local context and local context
database are explained respectively. Later, the disambiguation algorithm is
presented in section 3.5. Finally, we conclude this chapter with the results
obtained from the experiments.
3.2 Resources of the
Algorithm
In order to implement
this algorithm the following resources are needed:
i.
an untagged training corpus,
ii.
a concept hierarchy,
iii.
a similarity measure,
iv. a
broad-coverage parser.
In this section, all of
the above resources are presented one by one.
3.2.1 Training Corpus
For the training corpus,
we used a 100-million-word Wall Street Journal corpus of the periods
presented below:
Source Period
Original Publication(s)
---------------------------------------------------------------
WSJ 1987-9
ACL/DCI and CSR WSJ0
WSJ 1990-2
TIPSTER Vol. 2, version 2
WSJ 1992-4
CSR LM-1 corpus
The WSJ 1992 collection on
TIPSTER covers only the first three months of the year. Data from the
remaining months of 1992 were taken from the LDC, along with all of 1993 and
the first three months of 1994.
3.2.2 Concept Hierarchy (WORDNET 2.0)
As the concept
hierarchy WordNet 2.0 is used. Information about the general WordNet
structure is given in section 2.1.2.3.
WordNet 2.0 is available for Unix and Windows platforms. WordNet 2.0
includes more than 42,000 new links between nouns and verbs that are
morphologically related, a topical organization for many areas that
classifies synsets by category, region, or usage, gloss and synset fixes,
and new terminology.
3.2.2 Concept Hierarchy (WORDNET 2.0)
As the concept
hierarchy WordNet 2.0 is used. Information about the general WordNet
structure is given in section 2.1.2.3.
WordNet 2.0 is available for Unix and Windows platforms. WordNet 2.0
includes more than 42,000 new links between nouns and verbs that are
morphologically related, a topical organization for many areas that
classifies synsets by category, region, or usage, gloss and synset fixes,
and new terminology.
3.2.3 Similarity Measure
In order to find the
similarity between the senses of a polysemous word and another word we need
a similarity measure. There is a strong relationship between a word’s
semantics (or meaning) and its syntactic behavior. As discussed in Weeds
(2002), if two words mean similar things then they will occur in documents
in similar contexts. Accordingly, we can attempt to calculate the similarity
between all senses of two words by finding all of the contexts that each
appears in. If a description of a word is made up of the contexts it appears
in and their associated frequencies then we can mathematically measure the
similarity between these descriptions. For example, the word dog may
occur as the subject of the word eat x number of times - this will
increase the similarity between dog and other words which also occur
as the subject of the word eat e.g. cat. Many measures have
been proposed to calculate the similarity between the descriptions of words;
we used WordNet Similarity Package v.0.15 which implements the semantic
relatedness measures described by Leacock & Chodorow (1998), Jiang & Conrath
(1997), Resnik (1995), Lin (1998), Hirst & St-Onge (1998), Wu & Palmer
(1994), the adapted gloss overlap measure by Banerjee and Pedersen (2002),
and two measures based on context vectors by Patwardhan (2003). This package
is able to calculate the similarity between either two words or two senses
(or synsets). We use it to find the similarity between two senses.
We used Lin’s measure
(1998) in the experiments conducted for this thesis.
Lin’s Measure
This measure is derived
from the following assumptions (Lin, 1998):
i.
The commonality between the words A and B is measured
by I(common(A,B)) where common(A,B) is a proposition that
states the commonalities between A and B and I(s) is
the amount of information contained in the proposition s.
ii.
The differences between the words A and B is measured
by
I(describe(A,B)) – I(common(A,B))
where
describe(A,B) is a proposition that describes what A and B
are.
iii.
The similarity between A and B, sim(A,B), is a
function of their commonality and differences. That is,
sim(A,B)
= f ( I(common(A,B)) , I(describe(A,B)))
where
the domain of function f(x,y) is { (x,y) | x ≥ 0 , y > 0 , y ≥ x }
.
iv.
Similarity is independent of the unit used in the information
measure.
According to the Information Theory (Cover and Thomas, 1991), I(s) = -logbP(s),
where P(s) is the probability of s, and b is
the unit. When b=2, I(s) is the number of bits
needed to encode s. Since logbx =
,
assumption (iv) means that the function f must
satisfy the following condition:

v.
Similarity is additive with respect to commonality.
If
common(A,B) consists of two independent parts, then the sim(A,B)
is the sum of the similarities computed when each part of the
commonality is considered. In other words: f(x1 +
x2, y) = f(x1,y) + f(x2,y). Therefore,
,
which means that when there is no commonality
between A and B, their similarity is 0, no matter
how different they are.
vi.
The similarity between a pair of identical objects is 1.
When A and B
are identical, knowing their commonalities means knowing what they are,
i.e., I(common(A,B)) = I(describe(A,B)). Therefore, the function f
must have the following property:

vii.
The function f(x,y) is continuous.
Similarity Theorem:
The similarity between A and B is measured by the ratio
between the amount of information needed to state the commonality of A
and B and the information needed to fully describe what A and
B are.
sim(A,B)

Proof:
To prove the theorem, we need to show f(x,y) =
.
Since f(x,y) = f(
,y)
(due to assumption (iv)), we only need to show that when
is
a rational number, f(x,y) =
.The
result can be generalized to all real numbers because f is continuous
and for any real number, there are rational numbers that are infinitely
close to it.
Suppose m and
n are positive integers.
f(nx,y) = f((n-1)x,y) + f(x,y) = n
f(x,y) (due to assumption (v))
Thus,

Substituting
for
x in this equation:

Since
is
rational, there exist m and n such that
=
.
Therefore,


For example, Figure-6
is a fragment of the WordNet. The nodes are synsets. The links represent
hypernym relationships. For every synset C, logP(C) can be
defined as the probability that a randomly selected noun refers to an
instance of C. The probabilities are estimated by the frequency of
synsets in SemCor.
If x is a
hill and y is a coast, the commonality between x
and y is that “x is a geological-formation and y is a
geological-formation”. The information contained in this statement is
-2xlogP(geological-formation). The similarity between the synsets
hill and coast is:

Generally speaking,
sim(A,B) =


where
is
the probability that an object belongs to all the maximally specific super
classes (Ci s) of both A and B.


Figure-6: A
fragment of WordNet
3.2.4 Broad-coverage Parser (MINIPAR)
Parsing means taking an input and producing some sort of structure for it. A
parser for a natural language such as English, Turkish, etc. is
a program that constructs derivations or phrase structure trees for
the sentences of that language. It supplies a correct grammatical
analysis for a given sentence by separating it into its constituents
and labeling each of them. It may also offer additional information,
such as the semantic class (e.g., Person, Physical Object) of each
word and the functional class (e.g., Subject, Direct Object) of each
constituent of the sentence.
We chose to use MINIPAR
(Lin, 2001) in order to extract the local contexts from the text
corpus. MINIPAR is a principle-based, broad-coverage parser for
English (Lin, 2001). It represents its grammar as a network of nodes
and links, where the nodes represent grammatical categories and the
links represent types of dependency relationships. In the Table-2
and Table-3 the lists of grammatical categories and dependency
relationships are presented respectively.
GRAMMATICAL CATEGORIES


Det
Determiners
PreDet
Pre-determiners
PostDet
Post-determiners
NUM numbers
C
Clauses
I
Inflectional Phrases
V Verb
and Verb Phrases
N Noun
and Noun Phrases
NN
noun-noun modifiers
P
Preposition and Preposition Phrases
PpSpec Specifiers
of Preposition Phrases
A
Adjective/Adverbs
Have
have
Aux
Auxiliary verbs
Be
Different forms of “be”
COMP
Complementizer
VBE be used
as a linking verb
V_N verbs
with one argument , i.e., intransitive verbs
V_N_N verbs with
two arguments, i.e., transitive verbs
V_N_I verbs
taking small clause as complement
SentAdjunct Sentence
adjunct
U
undefined
Table-2 :
Grammatical Categories of MINIPAR
GRAMMATICAL RELATIONSHIPS


appo
"ACME president,
--appo → P.W. Buckman"
aux
"should ←
aux-- resign"
be
"is ← be--
sleeping"
c
"that ←
c-- John loves Mary"
comp1
first complement
det
"the ← det
-- hat"
gen
"Jane's ←
gen -- uncle"
have
"have ←
have-- disappeared"
i
relationship btw. a C clause and its I clause
inv-aux
inverted
auxiliary: "Will ← inv-aux-- you stop it?
inv-be
inverted be: "Is
← inv-be-- she sleeping"
inv-have
inverted have: "Have
← inv-have-- you slept"
mod
relationship btw.
a word and its adjunct modifier
pnmod
post nominal
modifier
p-spec
specifier of
prepositional phrases
pcomp-c
clausal complement
of prepositions
pcomp-n
nominal complement of
prepositions
post
post determiner
pre
pre determiner
pred
predicate of a
clause
rel
relative
clause
vrel
passive verb
modifier of nouns
wha, whn,
whp wh-elements at C-spec
positions
obj
object of
verbs
obj2
second object of
ditransitive verbs
subj
subject of
verbs
s
surface
subject
Table-3 :
Grammatical Relationships of MINIPAR
MINIPAR achieves about 88%
precision and 80% recall with respect to dependency relationships (Lin,
1998a), evaluated on the SUSANNE corpus (Sampson, 1995), a subset of the
Brown Corpus of American English. In (Alam, 2002), they also performed a
test on 584 sentences. Instead of checking the parse trees, they checked
the frames corresponding to the sentences. They counted all the errors
caused by MINIPAR and its accuracy was found 77.6% in terms of correctly
parsed sentences.
As reported in (Alam,
2002), the majority of MINIPAR errors fall in the following categories:
1. Tagging errors: some
nouns are mistagged as verbs.
2. Attachment errors: some
prepositional phrases (PP) that should be attached to their immediate
preceding nouns are attached to the verbs.
3. Missing lexical
entries: some domain specific words such as download and their
usages are not in the MINIPAR lexicon. This introduces parsing errors
because such words are tagged as nouns by default.
4. Inability to handle
ungrammatical sentences: in a real world application, it is unrealistic
to expect the user to enter only grammatical sentences. Although MINIPAR
still produces a syntactic tree for an ungrammatical sentence, the tree
is ill formed and cannot be used to extract the semantic information
being expressed.
3.3 Local Context
Context is the only
source of information we use to identify the meaning of a polysemous
word. A great deal of work done in WSD uses local context as the primary
information source. However, every system has its own definition of a
local context. It is generally considered as a small window of words
surrounding a target word in a text. In these cases, the local context
is generally regarded as all words or characters falling within some
window without considering any features or any kinds of relations. Below
are some features that are taken into consideration in local context
selection:
Distance:
Examining a context
of a few words around the target word to disambiguate is the mostly used
method in WSD. In this method, there is always a question concerning the
optimal window size N.
Yarowsky (1993, 1994a
and b) examines different windows of local context, including
1-contexts, k-contexts, and words pairs at offsets -1 and -2, -1
and + 1, and +1 and +2, and sorts them using a log-likelihood ratio to
find the most reliable evidence for disambiguation. He makes the
observation that the optimal value of k varies with the kind of
ambiguity: he suggests that local ambiguities need only a window of k
= 3 or 4, while semantic or topic-based ambiguities require a larger
window of 20-50 words. No single best measure is reported, suggesting
that for different ambiguous words, different distance relations are
more efficient.
Collocation:
The term
“collocation” has been used in various senses in WSD work. Yarowsky
(1993) adapts the definition of collocation as “the co-occurrence of two
words in some defined relation.” He determined that in cases of binary
ambiguity, there exists “one sense per collocation,” that is, in a given
collocation a word is used with only one sense with 90-99% probability.
Syntactic relations:
In most
WSD work to date, syntactic information is used. Selectional
restrictions rely on parsing, frames, semantic networks, the application
of selectional preferences, etc.
In this thesis, the
local context of a word is defined in terms of the syntactic
dependencies between the word and the other words in the same sentence.
Dependency grammars generate parses where words in a sentence are
related directly to the word which is its syntactic head. As mentioned
before, we used MINIPAR in order to generate a parse including
dependency relationships between words for a sentence. Some examples of
dependencies generated by MINIPAR can be found in Figure-7.


Figure-7
In Figure-7, the labels
on the arrows indicate the grammatical relationships generated by
MINIPAR which are shown in detail in Table-2. The local context of a
word W1 is defined as a tuple as follows:
POS1 : relation : W2 : position : POS2
where POS1
is the part of speech of word W1; relation
is the grammatical relationship between W1
and W2; W2 is the word related to
W1 via the grammatical relationship; position is
the indicator of W2 whether it is the syntactic
head or the syntactic modifier(mod) in the grammatical
relation and POS2 is the part of speech of word W2.
Part of speech tagging is also done by MINIPAR and the possible part of
speech tags are presented in Table-1.
The following local
contexts are generated for the first sentence (dependency diagram) in
Figure-1:
word local contexts
buy ( V : aux : will : mod : Aux ) (V : subj : I : mod : N)
(V : obj : car : mod : N )
will ( Aux : aux : buy : head : V )
I ( N : subj : buy : head : V )
car ( N : obj : buy : head : V ) ( N : mod : red : mod : A
) ( N : det : a : mod : Det)
red ( A : mod : car : head : N )
a ( Det : det : car : head : N )
3.4 Local Context
Database
In order to construct a
local context database, initially, we have parsed the 100-million-word
training corpus with MINIPAR and extracted local contexts. Later, we
have introduced two types of modifications: function words are removed
and they are bridged with semantic relations. Below, the general rules
for excluding function words from the database are presented:
1.
If one of the words in the dependency relationship is an auxiliary verb,
that is if POS1 or POS2 is equal to
“Aux” and also the relationship is “aux”, then we delete
that dependency relationship from the database. Auxiliary verbs (will,
shall, should, can, etc.) do not provide us any information about the
meaning of a word.
2.
If one of the words in the dependency relationship is a sentence
adjunct, that is if POS1 or POS2 is
equal to “SentAdjunct”, then we delete that dependency
relationship from the database. Sentence adjuncts are always optional
because they do not depend on the verb, and they may occur initially or
finally. For example, “When you receive that card, you should
know that you are not being invited to join the party.”
3.
A complementizer is a conjunction which marks a complement clause. If
one of the words in the dependency relationship is a complementizer,
that is if POS1 or POS2 is equal to
“COMP”, then we delete that dependency relationship from the
database. An example for a complementizer is: “I know
that he is here.”
4.
If the dependency relation is “punc” or if POS1
or POS2 is equal to “U” or “have”, then
these dependency relationships are deleted from the database.
5.
Contractions (`m, `s, `re, etc.) are deleted from the database.
6.
Some words such as “one” are used very frequently in
English. Therefore, since the information gained from these kinds of
words does not contribute to the disambiguation process, dependency
relationships including these words are deleted from the database.
7.
Determiners are
deleted from the database, i.e. a, an, the.
In addition to the
dependency relationships produced by the parser, we created new
dependency relationships by bridging function words. The rules about how
to create these will be presented on example sentences. Consider the
five sentences and related parser outputs below:
(1) I went to the cinema.
go
V:mod:to:mod:Prep
to
Prep:mod:go:head:V
to
Prep:pcomp-n:cinema:mod:N
cinema
N:pcomp-n:to:head:Prep
(2) I am sitting at a table.
sit
V:mod:at:mod:Prep
at
Prep:mod:sit:head:V
at
Prep:pcomp-n:table:mod:N
table
N:pcomp-n:at:head:Prep
(3) The gardener cut the grass.
cut
V:subj:gardener:mod:N
gardener
N:subj:cut:head:V
cut
V:obj:grass:mod:N
grass
N:obj:cut:head:V
(4) The thief who stole her bag is caught.
thief
N:rel:fin:mod:C
fin
C:rel:thief:head:N
fin
C:whn:who:mod:N
who
N:whn:fin:head:C
fin C:i:steal:mod:V
steal V:i:fin:head:C
steal
V:subj:who:mod:N
who
N:subj:steal:head:V
(5) His car is very fast.
be
VBE:pred:fast:mod:A
fast
A:pred:be:head:VBE
fast
A:subj:car:mod:N
car
N:subj:fast:head:A
Rule-1:
In the sentences (1) and
(2), consider the relations between the words go-to-cinema and
sit-at-table. The parser produces the relationship between a word
and its adjunct modifier for word pairs go-to and sit-at
where the prepositions “to” and “at” are the adjunct modifiers for the
verbs “go” and “sit”; and a nominal complement of preposition relations
for to-cinema and at-table word pairs where “cinema” and
“table” are nominal complements for prepositions “to” and “at”. But it
does not produce any relations for go-cinema and sit-table
word pairs since there is no syntactic dependency in these pairs.
However there is a semantic dependency in each of these pairs. So, as a
rule, we introduce new relations for every nominal complement of
prepositions, and the relation is different for each preposition such as
prep-to, prep-at, prep-with, etc. In this case
below tuples are produced:
go V:prep-to:cinema:modifier:N
sit V:prep-at:table:modifier:N
cinema N:prep-to:go:head:V
table N:prep-at:sit:head:V
Rule-2:
For the sentence (3), the
parser produces a subject relation for gardener-cut pair where
“gardener” is the subject and an object relation for cut-grass
pair where “grass” is the object. There is no syntactic relation in the
gardener-grass pair. For these kind of subject and object pairs,
as a rule, we introduce the new “s-ob” relation resulting in tuples:
gardener
N:s-ob:grass:modifier:N
grass
N:s-ob:gardener:head:N
Rule-3:
In the sentence (4), there
exists a relative pronoun “who”, which is the syntactic subject of the
verb “steal”. But, this relative pronoun does not give us any semantic
information. So, as a rule, for the subordinate clauses constructed from
relative pronouns (who, which, that), we assign the main clause’s
subject to be the subject of the subordinate clause’s verb, like in the
example pair thief-steal:
steal V:subj:thief:modifier:N
thief
N:subj:steal:head:V
Rule-4:
From the sentence (5), we
get a subject relation for the pair car-fast where “car” is the
subject. This is syntactically correct but it would be better to give a
different name to the subject relation that is formed by the verb “be”.
Therefore, a new relation is introduced with the name “subj-be” for such
cases:
fast
A:subj-be:car:modifier:N
car
N:subj-be:fast:head:A
Rules that are explained
above are applied to the output of the parser to obtain the local
context tuples for the local context database. An entry in the local
context database is composed of one local context and a set of 20 words
which have top-20 likelihood ratios with that local context. In
Figure-8, different database entries are presented for different local
contexts.
|
N:subj:employ:head:V |
|
plant 947.30
company 380.15
unit 148.10
facility 129.88
operation 126.82
factory 125.74
division 121.38
firm 81.04
business 73.01
mill 70.60 |
noun.person 51.79
venture 41.58
industry 41.37
kitchen 40.80
noun.possession 27.45
enterprise 26.42
manufacturer 26.20
he 23.67
arrowhead 19.91
digital 19.65 |
Figure-8
For the local context
“N:subj:employ:head:V”, the subjects of employ that have the highest
20 likelihood ratios are presented. So, “plant” is the most
probable noun that is supposed to occur as the subject of the verb
“employ” according to our local context database. The most probable
verb to occur in a subject relationship with the noun “student”
is “learn”. We can infer from the database that the thing that is
most applied for is “license” and the noun “girl” is
mostly modified by a quantifier.
There are some expressions
as “noun.quantity” or “noun.person” in the local context
database. These expressions are obtained by exposing the output of the
parser to a named entity recognition process. “noun.person”
includes all proper names recognized as a person, “noun.group”
includes all proper names recognized as organization or corporation
names, “noun.location” are the proper names used for addresses
and geological forms, “noun.time” is the proper name for dates,
“noun.possession” is the proper name regarding money and
“noun.quantity” is used for numbers.
Likelihood Ratio
The likelihood ratio of a
local context and a word is obtained by treating the word and the local
context in a tuple as a bigram and applying the log-likelihood formula
presented in Dunning 1993. The log-likelihood is formulated as follows:

where

also
,
,
and
.
3.5 Disambiguation Algorithm
After explaining the
concept of local context and local context database, the steps of the
algorithm is presented in this section in detail.
Step-1:
Input text is parsed and local contexts of each word are extracted. Let
LCw denote the set of local contexts of word w.
(There can be more than one local context for a word.)
For each word
ambiguous word w in the input text:
Step-2:
Search the local context
database for every local context lc in LCw in
order to find the words that appeared in an identical local
context as w. Let these words be denoted as Selectorsw
:
Selectorsw
= 
where lc
is a local context and C(lc) is a set of
(word
likelihood) pairs
as shown in Figure-8.
Step-3:
Select a sense s of w that maximizes the similarity
between w and Selectorsw.
For the similarity maximization process, suppose that w = W0
,
selectors of W0 are Selectorsw = {W1,W2,…..Wk},
and
the word Wi has ni senses which are
denoted as {
…..,
}.
Step-3.1:
Construct a similarity matrix which is divided into (k+1)
(k+1)
blocks as shown in Figure-9. (One block for each word {W0
, W1 ,W2,…..Wk} .)
This matrix is used to
keep the similarity values between every sense of each word. For
example, suppose that the word W2 has 4 senses and the
word W7 has 8 senses. Then the block B27
will be a 4
8
matrix consisting of the similarity values between every sense
combination of W2 and W7.
The blocks on the
diagonals will be all consisting of 0s. Similarity values lower than 0.2
are considered as noise and are ignored.


where sim(sil
, sjm ) is the similarity value found by the WordNet
Similarity Package using Lin’s measure.
Step-3.2: Let A
be the set of polysemous words in {W0 , W1 ,W2,…..Wk}
:
A = { Wi | ni > 1 , 0 < i ≤ k}
Step-3.3: Find a sense of words in A that gets the highest total support from other
words. That is, find a sense which is most similar to every word. Call
this sense
:
0<i≤k
where sil
is one of the senses of word Wi ( l th
sense of Wi where l
[1,ni
] ) that is in the set A (Wi
A)
and support(sil,Wj) is defined as the
support sil gets from Wj:

Step-3.4: The
word
is
disambiguated and its sense is selected as
.
is
removed from the set of ambiguous words set A.

Step-3.5: Since
is
disambiguated, the similarity values between other senses of
and
senses of other words are removed.
For all l, j, m,
such that l
[1,
]
and
and
and
[1,nj]:

Step-3.6: If the disambiguated word is W0 , that is imax
= 0 , stop. Otherwise repeat from Step3.3.
|
|
senses of W0
.
. .  |
W1
, …, Wk-1
. . . . |
senses of Wk
.
. .  |
|
 |
0 |
|
B0k |
|
 |
B10 |
|
B1k |
|
.
.
. |
|
|
|
|

.
.
.
 |
Bk0
|
|
0 |
Figure-9:
Similarity matrix
Step-4:
The sense found for the word w in Step-3 is assigned to the word.
3.6 Evaluation and
Results
In order to see how well
our system performs, we used the SENSEVAL-2 (Palmer et al., 2001) and
SENSEVAL-3 English all-words task data. Both of these data sets consist
of three texts from the Penn Treebank corpus.
We only worked with
nouns. Because of the similarity measure, it is impossible to do
disambiguation on adverbs and adjectives. Hypernym relationship is not
defined for adjectives and adverbs in WordNet. For the verbs, since they
have more senses in WordNet and the difference between the senses is
very subtle, the results were not promising.
There are 1136 annotated
nouns in SENSEVAL-2 and 951 annotated nouns in SENSEVAL-3. Table-1 shows
precision and recall figures for the results that are obtained by our
system.
|
Data |
Precision |
Recall |
|
SENSEVAL-2 |
59.11 % |
40.80 % |
|
SENSEVAL-3 |
56.80 % |
40.00 % |
Table-4:
Results of Senseval data
In order to make a
comparison, the top-10 performing systems in SENSEVAL-2 English all
words task are presented in Table-5. As seen, most of the systems use
sense-tagged data for training. When the systems that have the same
precision and recall values are observed, it is found that they make use
of the first sense heuristic for the cases their systems are unable to
produce an answer.
There is a variety of
reasons for the recall of our system to be low. The main reason is that
we have eliminated the cases for which the parser produced wrong part of
speech tagging. In order to find tagging errors, parser’s tagging was
compared with the POS tags provided with SENSEVAL data. Secondly, local
contexts of the nouns may sometimes not exist in the local context
database. Thirdly, during the similarity maximization algorithm, there
may be some time that all of the support values become zero, so no sense
could be selected and the algorithm stopped.
System
Name Precision
Recall Supervised
100 SMUaw 0.698
0.698 Yes
100 CNTS-Antwerp
0.645 0.645
Yes
100 Sinequa-LIA – HMM
0.626 0.626
Yes
98.908 UNED - AW-U2
0.583 0.577 No
98.908UNED -
AW-U 0.565 0.559
No
95.552 UCLA -
gchao2 0.485 0.463
Yes
95.552
UCLA - gchao3 0.482
0.461 Yes
108.5 CL
Research – DIMAP 0.423
0.46 No
100 CL Research -
DIMAP (R) 0.46 0.46 No
89.729 UCLA –
gchao 0.508 0.456
Yes
Table-5:
Top-10 performing systems in SENSEVAL-2
In order to make a
comparison with Lin’s measure, we have also used the adapted
lesk measure (Banerjee and Pedersen, 2002). We chose the
lesk measure since it is reported to perform best among the other
measures in WordNet Similarity Package in (McCarthy et al., 2004;
Patwardhan et al., 2003). Lesk measure works by finding overlaps in the
glosses of the two synsets. The similarity value is the sum of the
squares of the overlap lengths. A single word overlap results in a score
of 1 whereas a two-consecutive-words overlap results in a score of
4.
For comparing the
similarity measures lin and lesk, we conducted an
experiment on a subset of SENSEVAL-2 English all-words data consisting
of only polysemous nouns and obtained the results presented on
Table-6. We didn’t use this similarity measure in all our experiments
since it is very inefficient in the sense that it requires a lot of
computational time.
|
Measure |
Precision |
|
Lin |
43.9 % |
|
Lesk |
51.6 % |
Table-6:
Comparison of measures
3.6.1 Weak Contexts of the Algorithm
Even our system obtained
up-to-date results compared to unsupervised systems, there are some weak
contexts. Firstly, in the disambiguation process, all the local contexts
of an ambiguous word are treated equally. However, some local contexts
provide a lot of information while some others provide less or don’t
provide anything. We eliminated some of these local contexts and
introduced new ones but a statistical model can be implemented on the
disambiguation algorithm.
Another weakness is the
idiomatic usages of words. The intuition that similar words occur in
identical contexts does not always hold. For example, consider the
following sentence:
Money talks.
Suppose we want to
disambiguate “money” which has 3 senses. The most frequent
subjects of “talk” according to our local context database are,
noun.person, people, peace, man, leader, executive, noun.group,
president., etc. None of these words is similar to the “wealth”
meaning of “money”.
Chapter 4
CONCLUSION
In the first part of this
thesis an overview of WSD is presented regarding different approaches as
well as different knowledge sources. Most of the work done in the past
25 years is tried to be summarized. In spite of all those work done on
WSD, relative to other fields, little progress seems to have been made.
WSD is still an open problem waiting to be solved. Starting from 1980’s
with the availability of more resources and statistical methods like
Naïve Bayes, there has been an improvement on the earlier results;
however, it appears that the limit of these supervised methods has been
reached currently. The problems of scarce sense-tagged data and data
sparseness still remain to be a barrier in front of WSD.
Another problem of WSD is
the difficulty of determining or even defining word senses. Lately,
WordNet has become a de facto standard for lexical resources however it
also have lots of limitations. Its sense distinctions are very
fine-grained and there are gaps in the WordNet hierarchy regarding the
lexical relations. Actually it is not a hierarchy because of these gaps.
Considering these, in the
second part of the thesis, we presented an algorithm which did not need
sense-tagged corpus. Instead of statistical methods, we tried to obtain
information from syntactic local contexts and sense similarity. With the
help of a principle-based broad coverage parser, we extracted the local
contexts. By removing function words, we built semantic relations by
bridging the words connected by function words. We computed the
similarity values between the words that occurred in the same local
contexts using a similarity metric proposed by Lin (1997). We used
WordNet in order to evaluate our system however because of the
fine-grained senses and some weaknesses in the algorithm, we were able
to achieve a precision about 59%. Lin (1997) also presented a similar
system and evaluated his system with relaxed correctness criterion
instead of evaluating his system according to WordNet senses. He
reported nearly 70% accuracy with the relaxed evaluation criteria which
is successful for an unsupervised system.
Therefore, according to
our observations, the future direction in WSD research should focus on
completely unsupervised methods (without using any ontology such as
WordNet). Going on the other way, trying the methods that have been
examined in the history of WSD, would cause a cycle in the WSD research
and improvement.