August 31, 2016

Onur Kuru, M.S. 2016

Current position: Data Scientist at Searchmetrics, Berlin. (Linkedin)
M.S. Thesis: Character-level Tagging. Koç University, Department of Computer Engineering. August, 2016. (PDF, Presentation, Code)

Abstract:

I describe and evaluate a language-independent character-level tagger for sequence labeling problems: Named Entity Recognition (NER), Part-of-Speech (POS) tagging and Chunking. Instead of words, a sentence is represented as a sequence of characters. The model consists of stacked bidirectional LSTMs which input characters and output tag probabilities for each character. These probabilities are then converted to consistent word level phrase tags using a Viterbi decoder. The model uses only labeled data and does not rely on hand-engineered features or other external resources like syntactic taggers or Gazetteers. The model is able to achieve close to state-of-the-art NER performance in seven languages, performs as well as or better than previous work in four languages for POS tagging and yields competitive results for English Chunking dataset.


No comments: