April 29, 2009

Natural Language Processing summer course at Sabanci University

This summer Kemal Oflazer, Dilek Hakkani-Tur ve Gokhan Tur are offering a Statistical Natural Language Processing course at Sabanci University. A draft syllabus is included below.

STATISTICAL NLP CLASS:


Kemal Oflazer:
  • Overview of NLP (2 hours)
    • NLP Applications
    • Processing pipeline: Basic steps and how they feed into each other and how they are used by applications
  • Morphological Analysis (could be skipped or shortened) (2 hours)
  • Introduction to Statistical Models, n-gram language modeling, (2hours)
    • Applications to simple sequence problems (tagging English and/or deascifier)
  • Morphological Disambiguation (applications to Turkish)
  • HMMs (formal treatment (backward-forward + viterbi) + applications to tagging) (2-3 hours)
  • CFGs and Probabilistic CFGs (3-4 hours)
    • Inside-outside algorithm for training PCFGs
    • Parsing with PCFGs
  • Machine Translation (MT) (3-4 Hours)
    • Brief overview Classical Symbolic MT
    • Statistical Machine Translation
      • Word-based Models
      • Phrase-based Models
      • Syntax-based models
    • Dealing with Morphology in SMT


Dilek Hakkani-Tur:
  • Elements of Information Theory / Advanced Language Modeling and Applications
    • Entropy/Perplexity/Mutual Information
    • Noisy Channel Model
      • Sequence classification / HMM
      • Sample classification / Naive Bayes
    • Smoothing
    • Adaptation
  • Named Entity Extraction (NE)
    • Using HMM for NE
    • Using CRF for NE
    • Using Boosting/MaxEnt/SVM for NE
  • Spoken Language Understanding (SLU) as Template Filling
    • HMM approaches (AT&T vs BBN)
    • Hidden Vector State Models
    • Latent Semantic Analysis
    • Sample-classification based (Boosting/MaxEnt/Decision Trees)
  • Summarization
    • Greedy Algorithms, MMR
    • TextRank/LexRank
    • Classification based extractive summarization
    • Global Models for Summarization: Linear Programming approaches
  • Question Answering
  • Spoken Dialog Systems and Dialog Management (DM)
    • Dialog Systems
    • DM
      • Finite State Models
      • Agent Models
      • Reinforcement Learning


Gokhan Tur
  • Topic Classification
    • Discriminative classification: SVM/Boosting
    • Generative classification: language model, document similarity, vector-space-model
    • Feature selection/transformation (LDA)
    • Latent semantic indexing
  • SLU as Intent Determination
    • Semantic Role Labeling
    • Robustness to ASR
  • Topic Clustering
    • K-Means
    • Top/Down vs. Bottom/Up
  • Topic Segmentation
    • HMM
    • TextTiling
    • Markov Chains
  • Sentence Segmentation
    • HMM
    • CRF
    • Hybrid
  • Active Learning/Semi-Supervised Learning/Unsupervised Learning/Model Adaptation/Robustness


No comments: