CSCI 544 Applied Natural Language Processing, Spring 2020

Section 30027D

Instructor

Mark Core: core [AT] ict.usc.edu
Office hours: by appointment, or on lecture days (i.e., not holidays), Monday 2:30pm-3:30pm, Zoom

Teaching Assistants

Ming-Chang Chiu, mingchac [AT] usc.edu (office hours: Tuesday 9-11am, Zoom)
Alexander Spangher, spangher [AT] usc.edu (office hours: Wednesday 3-5pm, Zoom)
Karan Singla, singlak [AT] usc.edu (office hours: Thursday 11-12, Zoom)

Lectures

Monday 4-7:50PM via Zoom.

Syllabus

Course summary

The goal of this course is to teach students fundamental and cutting-edge concepts in Natural Language Processing (NLP), and provide hands-on experience developing NLP applications in the form of programming assignments in Python. Students are expected to have programming experience and either be familar with Python or able to quickly learn it during the first assignment.

As we'll explore in the course, natural language is often ambiguous, and machine learning is crucial to making decisions under uncertainty. Many other tools in basic artificial intelligence (e.g., planning, knowledge representation and reasoning) also play a role in understanding and responding to natural language. However, this class is aimed at students with a general background in computer science (i.e., you don't need to take a machine learning or AI course as a prerequisite). We will cover the necessary machine learning and basic AI material in this course.

The topics tentatively planned for this semester are listed below and include speech processing (language modeling, speech recognition, speech synthesis), linguistic foundations (parts of speech, syntax, speech disfluencies, semantics, dialogue, discourse), machine learning, and applications (information retrieval, information extraction, machine translation, natural language generation, dialogue systems, automated grading). There is no required text book for the course; we will use lectures to cover the material. References for optional reading materials will be provided for each lecture but there will be no required readings. One source of material for the course is Jurafsky and Martin's book "Speech and Language Processing." The second edition was published in 2009. While the third edition is being written, the draft chapters are available free and along with their class slides. Notable chapters include:

Chapter 4: Naives Bayes Classification
Chapter 19: Word Senses and WordNet
Chapter 20: Semantic Role Labeling and Argument Structure
Appendix A: Hidden Markov Models

Interested students can continue their study with other courses in USC's computational linguistics curriculum.

Resources

There will be a variety of resources used in this course to faciliate online discussion, distributing grades, and submitting coursework. We anticipate using the following:

Blackboard: lecture slides, grades
Piazza: discussion board (first place to go to ask questions) and announcements
Vocareum: used for your programming assignments
Zoom: lectures, office hours

Programming assignments

Each assignment will have a specific rubric, but generally, the grades will depend on the performance of a system on unseen test data and a short technical report describing experiments on data provided (seen data). Each assignment will also have its own late policy, but generally there will be a penalty for each day late rapidly increasing to a zero grade on the assignment.

Assignment 1: Due February 23rd, 4pm.
- Description (PDF)
- Report template
Assignment 2: Due April 13th, before midnight.
- Description (PDF)
- Report template
Assignment 3: Ungraded.
- Description (PDF).

Exams

The first exam is in-class and closed: no books, no notes, no calculators, no electronic devices, etc. All arithmetic is simple and straightforward. You should bring a pen or dark pencil: exams will be scanned for grading, so the writing needs to be dark enough to show up on the scan.

The final exam will be open-note, open-book and taken online via Blackboard. It is strictly individual, and no collaboration is allowed. Although the final exam will not directly include problems of a type covered by the first exam, there may be some overlap if similar methods are used in the second half of the course.

The online final exam is due at the end of the final exam period (May 11: 6:30pm Pacific Time). It will be released 48 hours beforehand; students will have the flexibility to start and finish when they like as long as they submit before the deadline.

In the lecture slides for the course, we will work through questions and problems similar to those on the exams. The best way to study for exams is to work through the questions and problems in the lecture materials.

Grading Scheme

20% in-class exam #1
20% Assignment 1
20% Assignment 2
40% on-line, take-home, final exam taken during the final exam period

NOTE: when all the grades are released, students will be emailed with instructions and the deadline for requesting equal weight to be assigned to the four graded items (i.e., two exams, two assignments).

Statement on Academic Conduct and Support Systems

Viterbi academic integrity information and graduate student policies

Topics and schedule

This schedule and list of topics are TENTATIVE. In particular, lectures marked TBA may need to be moved to accommodate guest speakers.

Date	Speaker	Topic
Jan. 13 (lecture 1)	Core	Introduction and basic concepts
Jan. 13 (lecture 2)	Core	Word-level processing
Jan. 20		Martin Luther King's Birthday (no class)
Jan. 27 (lecture 1)	Core	Text classification (Naive Bayes)
Jan. 27 (lecture 2)	Core	Syntax (Part 1)
Feb. 3 (lecture)	Core	Syntax (Part 2)
Feb. 10 (lecture 1)	Core	Semantic Role Labeling
Feb. 10 (lecture 2)	Core	Dialogue
Feb. 17		President's Day (no class)
Feb. 24 (lecture 1)	Kallirroi Georgila (guest lecture)	Statistical dialogue management (Part 1)
Feb. 24 (lecture 2)	Kallirroi Georgila (guest lecture)	Statistical dialogue management (Part 2)
Mar. 2 (lecture 1)	Core	Review / Semantics
Mar. 2 (lecture 2)	Core	Discourse (Part 1)
Mar. 9 (lecture 1)		exam #1
Mar. 16		Spring Break (no class)
Mar. 23 (lecture)	Core	Hidden Markov Models and discussion of assignment 2
Mar. 30 (lecture)	Kallirroi Georgila (guest lecture)	Speech recognition
Apr. 6 (lecture 1)	Core	Information Retrieval
Apr. 6 (lecture 2)	Core	Educational applications of NLP
Apr. 13 (lecture)	Core	Discourse (Part 2)
Apr. 20 (lecture)	Kallirroi Georgila (guest lecture)	Deep learning
Apr. 27	Core	Review and discussion of assignments