CS 295: Statistical NLP Winter 2018

Paper Summaries

Everyone has to submit three paper summaries from the list of papers. Instructions for writing the summary are here, and the paper assignments and due dates are available on Canvas.

Programming Homeworks

HW 1: Semi-supervised Text Classification

Many real-world applications contain a small number of labeled instances but a large number of unlabeled instances. Machine learning algorithms that are able to utilize the information from unlabeled instances are known as semi-supervised approaches. The first programming assignment will require you to implement such an algorithm that benefits from large amounts of unlabeled text.

Due date
January 23, 2018
Description
PDF
Data
Kaggle (signup link on Canvas)
Source code
Github


HW 2: Language Modeling

One of the fundamental tasks for natural language processing is probabilistic modeling of language, i.e. how can we differentiate between a random sequence of words, and something we might consider an English sentence. Such language models are used in many applications, such as handwriting recognition, speech recognition, machine translation, and text generation. In this second programming assignment, you will perform language modeling of different kinds of text.

Due date
February 8, 2018
Description
PDF
Data
Canvas (9MB)
Source code
Github


HW 3: Sequence Tagging on Twitter

A number of tasks in natural language processing can be framed as sequence tagging, i.e. predicting a sequence of labels, one for each token in the sentence. Such tasks include more finer grained tasks such as tokenization and chunking, but also coarse-level part of speech tagging and named entity recognition. In this homework, you will be looking the latter two for a corpus of tweets, and investigating two challenges in sequence modeling: inference and feature engineering.

Due date
February 27, 2018
Description
PDF
Data
Canvas (16 MB)
Source code
Github


HW 4: Neural Translation

One of the most widespread and public-facing applications of natural language processing is machine translation. It has gained a lot of attention in recent years, both infamously for its lack of ability to understand the nuance in human communications, and for near human-level performance achieved using neural models. In this homework, we will be looking at a translation system for Shakespearean English and Modern English, and implementing neural sequence to sequence model to achieve this.

Due date
March 18, 2018
Description
PDF
Data
Canvas
Source code
Github