CSCI 544 Applied NLP: Proposing a Group Project
Due Date: October 14 by 11:59pm, early submissions welcomed!
For your group project, you will need to form a group of three to four
students, and pick an NLP-related task for your project. Your group needs to start by submitting a proposal by filling out the form here (only ONCE per group):
Online proposal submission.
The form asks you to describe:
- the proposed NLP task
- the data you will use
- any annotation that is part of the project
- how you plan to tackle the problem
- how you plan to evaluate your final product
You will need to list each group member and what tasks they will
perform. Each project member will write an individual report on the
project, and receive an individual grade. In addition, the group is
required to give an in-class presentation. Currently, we have
allocated the last three lectures for the presentations, but may need
to include additional lectures depending on the number of groups. After
you submit your project proposal, your group will receive an email
from either Mark or Kallirroi. The email will either approve the
proposal, or give comments and ask for revisions.
Question: Can I form a group with students from the other
section? Answer: Yes, assuming the entire group can attend
either the 4pm or 6pm lecture during the day their group is
presenting.
Help in picking a topic
Domain adaptation
Domain adaptation is a topic that can be explored in a variety of NLP
tasks. For many of these tasks, there are large corpora and software
tools trained on these corpora. However, performance of the tools may
drop when used in domains different from the training corpus. The idea
of domain adaptation is to annotate a small amount of training data in
the new domain and create a model based on both the large
out-of-domain corpus and the small in-domain corpus. This can be
applied to NLP tasks such as part of speech tagging, named entity
recognition and parsing.
Speech recognition / speech synthesis topics
- Build a limited-domain speech recognizer. You may use out-of-the-box acoustic models, or adapt out-of-the-box acoustic models to a particular speaker. In the latter case you will have a speaker-dependent speech recognizer. You can build domain-specific language models using one of the language modeling toolkits mentioned below.
- Build a grapheme to phoneme converter. The input will be a word and the output a sequence of phonemes. You may train and test your model using a pronunciation dictionary, e.g., the CMU pronouncing dictionary.
- Speech recognition error simulation.
NLU topics
- Recognition of multiword expressions such as compound nouns, proper names, and idioms.
- Word sense disambiguation.
- Unsupervised learning of lexical semantics.
- Learn a probabilistic context-free grammar from a corpus.
- Build a limited-domain NLU system using a parser.
- Semantic role labeling.
- Automatically grade student answers.
- Detection of grammatical errors, e.g., wrong prepositions.
Discourse topics
- Coreference resolution.
- Information extraction.
- Discourse segmentation.
- Discourse parsing.
- Build a simple natural language generation system.
- Build a simple summarization system.
Dialogue topics
- Build a limited-domain goal-oriented dialogue system.
- Build a simple chat-oriented dialogue system.
- Develop dialogue management strategies using reinforcement learning.
Help in finding data
Useful tools and resources for speech recognition
Useful tools and resources for language modeling
Useful tools and resources for speech synthesis
Useful tools for building dialogue systems