CS 295: Statistical NLP Winter 2017

Paper Summaries

Everyone has to submit three paper summaries from the list of papers. Instructions for writing the summary are here (PDF/HTML), and the paper assignments and due dates are available on Canvas.

Programming Homeworks

HW 1: Semi-supervised Text Classification

Many real-world applications contain a small number of labeled instances but a large number of unlabeled instances. Machine learning algorithms that are able to utilize the information from unlabeled instances are known as semi-supervised approaches. The first programming assignment will require you to implement such an algorithm that benefits from large amounts of unlabeled text.

Due date: January 26, 2017
Description: PDF/HTML
Data: Kaggle (signup link on Canvas)
Source code: Github

HW 2: Language Modeling

One of the fundamental tasks for natural language processing is probabilistic modeling of language, i.e. how can we differentiate between a random sequence of words, and something we might consider an english sentence. Such language models are used in many applications, such as handwriting recognition, speech recognition, machine translation, and text generation. In this second programming assignment, you will perform language modeling of different kinds of text.

Due date: February 13, 2017
Description: PDF/HTML
Data: Canvas (9MB)
Source code: Github

HW 3: Sequence Tagging on Twitter

A number of tasks in natural language processing can be framed as sequence tagging, i.e. predicting a sequence of labels, one for each token in the sentence. Such tasks include more finer grained tasks such as tokenization and chunking, but also coarse-level part of speech tagging and named entity recognition. In this homework, you will be looking the latter two for a corpus of tweets, and investigating two challenges in sequence modeling: inference and feature engineering.

Due date: February 27, 2017
Description: PDF/HTML
Data: Canvas (16 MB)
Source code: Github

HW 4: Phrase-Based Translation

One of the most widespread and public-facing applications of natural language processing is machine translation. It has gained a lot of attention in recent years, both infamously for its lack of ability to understand the nuance in human communications, and for near human-level performance achieved using neural models. In this homework, we will be looking at phrase-based translation from French-English, and implementing stack-based decoders of various complexity to achieve this.

Due date: March 13, 2017
Description: PDF/HTML
Data: Canvas (140 MB)
Source code: Github