CS 295: Statistical NLP Winter 2017

Schedule

January 10

Introduction to Statistical NLP: What is NLP?; Why is it important?; Challenges for NLP; Learning goals and topics; Course logistics and information; [ Slides ]
Homework 1 available

January 12

Text Classification: What is Classification?; Evaluation Metrics: Accuracy and F1; Statistical Signficance; Naive Bayes: Model, Estimation, and Problems; Course Project details; [ Slides ]
I swapped Micro- and Macro-averaging in class by mistake, the slides are fixed (and correct).

January 17

Classification Contd. and (Some) Document Representation: Logistic Regression: Model, Estimation, Extensions; Introduction to Neural Networks; Document Vectors: Term-Doc Matrix, Vector Models, Cosine Distance; [ Slides ]
Homework 2 available.

January 19

Vector Space Models: Latent Semantic Analysis; Intro to Vector Space Models; Hierarchical Brown Clustering; Skip-Gram Model (word2vec): Model, Estimation, Applications; [ Slides ]
Project Pitch is due January 23rd

January 24

N-Gram Language Models: Word Embeddings: Negative Sampling, Neural View; Introduction to Language Modeling: Task and Evaluation; Generative Models: Unigram, Bigram, Trigram, and Smoothing; [ Slides ]

January 26

Discriminative Language Models: Discriminative Models: Featurized Language Models; Introduction to Recurrent Neural Networks; Variations: Stacking and Bi-directionality; Language Modeling using NNs; [ Slides ]
Readings:
Homework 1 is due tonight.

January 31

Sequence Labeling: Introduce to Tagging; Part of Speech Tagging; NB Classification for Sequences; Hidden Markov Models; Viterbi Decoding; EM Algorithm; [ Slides ]

February 2

Sequence Labeling Contd: Forward Backward Algorithm (HMMs); Maximum Entropy Markov Models; Greedy and Beam Search; Conditional Random Fields; Forward Backward and Viterbi for CRFs; Neural Sequence Tagging; [ Slides ]

February 7

Syntactic Parsing: Constituents; Syntactic Parse Trees; Context Free Grammars; Chomsky Normal Form; CKY Algorithm; Evaluation; [ Slides ]
Other links: Collins' Notes on PCFGs;
Project proposal due tonight.

February 9

Syntactic Parsing Contd; Dependency Parsing: Probabilistic Context Free Grammars; Lexical PCFGs; Dependency Grammar; Evaluating Dependency Trees; Transition-based Inference; Graph-based Inference; Eisner Algorithm; [ Slides ]
Homework 2 is due on Monday.

February 14

Semantics: Roles and Relations: Log-linear models; Likelihood Training; Structured Perceptron; Word Senses and Disambiguation; Roles: Thematic and Semantic; Semantic Role Labeling; [ Slides ]
Other links: WordNet; VerbNet; PropBank; FrameNet;

February 16

Logical Forms: Need for logical forms; Mapping language to Logic; Syntax vs Semantics; Lambda-Calculus; Limitations of Lambda-Calculus; Combinatory Categorical Grammars; CCG Types and Combinators; [ Slides ]
First paper summary due Feb 17.

February 21

CCGs Contd, Information Extraction: CCGs and Lambda Calculus; CCG Modeling; Learning CCGs; [ Slides ]; What is Information Extraction; Applications of Information Extraction; Role of NLP in IE; Named Entity Recognition; Features for NER; [ Slides ]

February 23

Relation Extraction: Relation Extraction and Applications; Rule-Based Relation Extraction; Supervised Models of Relation Extraction; Distantly Supervised Relation Extraction; Unsupervised Relation Extraction; [ Slides ]
Other links:
Homework 3 due on Monday, Feb 27.

February 28

Machine Translation: Intro to Machine Translation; Challenges, and Rule-Based; Statistical MT; Parallel Corpora; Components of an MT System; MT Evaluation; Word Alignment Models; [ Slides ]
Readings: JM Chapter 21.2-6;
Second paper summary due tonight.

March 2

Machine Translation Contd: EM Training for Word Alignment; Intro to Phrase-based MT; Learning Phrase Lexicons; Monotonic Word Alignment; Stack Decoding; Monotonic Phrase Decoding; [ Slides ]

March 7

Syntax MT; Neural MT: Non-Monotonic Phrase Decoding; Hypothesis Recombination; Multi-Stack Decoding; Overview of Syntax-Based MT; Neural MT: Seq2Seq; RNNs and extensions; GRUs and LSTMs; Google's Neural MT Model; [ Slides ]
Readings: Seq2Seq Learning;
Project Status due tonight.

March 9

Coreference, Entity Linking, and QA: Intro to Coref Resolution, Applications; Winograd Schema; Intro to Pragmatics; Types of References: Names, Pronouns, Nominals; Machine Learning for Coref Resolution; Evaluation of Coref Resolution; Entity Resolution and Linking; Applications of Linking; Evaluating Entity Linking; Intro to QA; Applications of QA; [ Slides ]
Readings:
Other links:
Homework 4 due on Monday, March 13.

March 14

Question Answering and Entailment: Factoid Question Answering; Overview of the Watson project; IR-Based Factoid QA: Question Processing, Passage Retrieval, Answer Processing; Answer Type Prediction; Other Extensions: AskMSR, FALCON, Allen AI Challenge; Introduction to Textual Entailment; Applications of Entailment; [ Slides ]
Readings: JM Chapter 28;
Other links:
Third paper summary due tonight.

March 16

Discourse and Summarization; Wrapup: Introduction to Discourse; Coherence vs Semantics; Coherence Indicators: Connectors, Lexical Chains, Relations; Applications of Coherence; Intro to Summarization; Types: Single vs Multiple Docs, Query-specific vs Generic, Extractive vs Abstractive; Summarization Pipeline; ROUGE Evaluation; Course Wrapup; [ Slides ]
Readings:
Other links: