Monday 4-7:50PM via Zoom.
The goal of this course is to teach students fundamental and cutting-edge concepts in Natural Language Processing (NLP), and provide hands-on experience developing NLP applications in the form of programming assignments in Python. Students are expected to have programming experience and either be familar with Python or able to quickly learn it during the first assignment.
As we'll explore in the course, natural language is often ambiguous, and machine learning is crucial to making decisions under uncertainty. Many other tools in basic artificial intelligence (e.g., planning, knowledge representation and reasoning) also play a role in understanding and responding to natural language. However, this class is aimed at students with a general background in computer science (i.e., you don't need to take a machine learning or AI course as a prerequisite). We will cover the necessary machine learning and basic AI material in this course.
The topics tentatively planned for this semester are listed below and include speech processing (language modeling, speech recognition, speech synthesis), linguistic foundations (parts of speech, syntax, speech disfluencies, semantics, dialogue, discourse), machine learning, and applications (information retrieval, information extraction, machine translation, natural language generation, dialogue systems, automated grading). There is no required text book for the course; we will use lectures to cover the material. References for optional reading materials will be provided for each lecture but there will be no required readings. One source of material for the course is Jurafsky and Martin's book "Speech and Language Processing." The second edition was published in 2009. While the third edition is being written, the draft chapters are available free and along with their class slides. Notable chapters include:
There will be a variety of resources used in this course to faciliate online discussion, distributing grades, and submitting coursework. We anticipate using the following:
Each assignment will have a specific rubric, but generally, the grades will depend on the performance of a system on unseen test data and a short technical report describing experiments on data provided (seen data). Each assignment will also have its own late policy, but generally there will be a penalty for each day late rapidly increasing to a zero grade on the assignment.
The first exam is in-class and closed: no books, no notes, no calculators, no electronic devices, etc. All arithmetic is simple and straightforward. You should bring a pen or dark pencil: exams will be scanned for grading, so the writing needs to be dark enough to show up on the scan.
The final exam will be open-note, open-book and taken online via Blackboard. It is strictly individual, and no collaboration is allowed. Although the final exam will not directly include problems of a type covered by the first exam, there may be some overlap if similar methods are used in the second half of the course.
The online final exam is due at the end of the final exam period (May 11: 6:30pm Pacific Time). It will be released 48 hours beforehand; students will have the flexibility to start and finish when they like as long as they submit before the deadline.
In the lecture slides for the course, we will work through questions and problems similar to those on the exams. The best way to study for exams is to work through the questions and problems in the lecture materials.
NOTE: when all the grades are released, students will be emailed with instructions and the deadline for requesting equal weight to be assigned to the four graded items (i.e., two exams, two assignments).
This schedule and list of topics are TENTATIVE. In particular, lectures marked TBA may need to be moved to accommodate guest speakers.
Date | Speaker | Topic |
Jan. 13 (lecture 1) | Core | Introduction and basic concepts |
Jan. 13 (lecture 2) | Core | Word-level processing |
Jan. 20 | Martin Luther King's Birthday (no class) | |
Jan. 27 (lecture 1) | Core | Text classification (Naive Bayes) |
Jan. 27 (lecture 2) | Core | Syntax (Part 1) |
Feb. 3 (lecture) | Core | Syntax (Part 2) |
Feb. 10 (lecture 1) | Core | Semantic Role Labeling |
Feb. 10 (lecture 2) | Core | Dialogue |
Feb. 17 | President's Day (no class) | |
Feb. 24 (lecture 1) | Kallirroi Georgila (guest lecture) | Statistical dialogue management (Part 1) |
Feb. 24 (lecture 2) | Kallirroi Georgila (guest lecture) | Statistical dialogue management (Part 2) |
Mar. 2 (lecture 1) | Core | Review / Semantics |
Mar. 2 (lecture 2) | Core | Discourse (Part 1) |
Mar. 9 (lecture 1) | exam #1 | |
Mar. 16 | Spring Break (no class) | |
Mar. 23 (lecture) | Core | Hidden Markov Models and discussion of assignment 2 |
Mar. 30 (lecture) | Kallirroi Georgila (guest lecture) | Speech recognition |
Apr. 6 (lecture 1) | Core | Information Retrieval |
Apr. 6 (lecture 2) | Core | Educational applications of NLP |
Apr. 13 (lecture) | Core | Discourse (Part 2) |
Apr. 20 (lecture) | Kallirroi Georgila (guest lecture) | Deep learning |
Apr. 27 | Core | Review and discussion of assignments |