CS 295: Statistical NLP Winter 2017
A computer’s ability to read, learn, and understand language is becoming of utmost importance with access to enormous amounts of digitized text (that we can’t possibly read), with personal communication increasingly becoming digital (that we can’t possibly remember), and with autonomous agents becoming bigger parts of our everyday lives (with whom we need to talk to). This course will introduce the historical and recent approaches to natural language processing, in particular focusing on the computational tasks and the machine learning techniques involved in NLP that have achieved incredible successes.
Tentatively, the course will cover the following topics:
- Introduction: what is NLP? Applications and challenges, review of probability and statistics
- Word and Bag of Words Representations: vector space models, word representations, word embeddings, text classification, naive bayes, discriminative classifiers, logistic regression, feed-forward neural networks, convolutional neural networks
- N-grams and Sequence Modeling: language models, featurized language models, neural language models, sequence modeling, part of speech tagging, named entity recognition, hidden markov models, conditional random fields, recurrent neural networks
- Sentence Structure Modeling: context-free grammars, probabilistic CFGs, PCFG parsing, constituency parsing, dependency parsing, semantic role labeling, recursive neural networks, neural parsing, sequence to sequence mapping with LSTMs
- Information Extraction: sentence-level relation extraction, corpus-level relation extraction, within-doc coreference, cross-doc coreference, entity-linking, question answering
- Text Generation, and other topics: machine translation, text summarization, textual entailment, reading comprehension
- Prerequisites
- At minimum:
- An undergraduate machine learning course (CS 178 or equivalent), although a graduate course such as CS 273 or 274 is a plus.
- An undergraduate artificial intelligence course (CS 171 or equivalent).
- Programming assignments will require a working familiarity with Python, along with familiarity with data structures and algorithms.
Contact me if you are concerned about your background for the course.
- Grading Policy
-
- 4 programming assignments: 40%
- 3 paper summaries: 15%
- Final project: 30%
- Class and online participation: 15%
- Piazza
- We will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates and myself. Rather than emailing questions to me, I encourage you to post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com.
Find our class page at: https://piazza.com/uci/winter2017/cs295/home
- Academic Honesty
- Academic honesty is a requirement for passing this class. Any student who compromises the academic integrity of this course is subject to a failing grade. The work you submit must be your own. Academic dishonesty includes, but is not limited to copying answers from another student, allowing another student to copy your answers, communicating exam answers to other students during an exam, attempting to use notes or other aids during an exam, or tampering with an exam after it has been corrected and then returning it for more credit. If you do so, you will be in violation of the UCI Policies on Academic Honesty (see link). It is your responsibility to read and understand these policies. Note that any instance of academic dishonesty will be reported to the Academic Integrity Administrative Office for disciplinary action and may be cause for a failing grade in the course.