CSCI 544 Applied NLP: Proposing a Group Project

Due Date: October 14 by 11:59pm, early submissions welcomed!

For your group project, you will need to form a group of three to four students, and pick an NLP-related task for your project. Your group needs to start by submitting a proposal by filling out the form here (only ONCE per group):

Online proposal submission.

The form asks you to describe:

the proposed NLP task
the data you will use
any annotation that is part of the project
how you plan to tackle the problem
how you plan to evaluate your final product

You will need to list each group member and what tasks they will perform. Each project member will write an individual report on the project, and receive an individual grade. In addition, the group is required to give an in-class presentation. Currently, we have allocated the last three lectures for the presentations, but may need to include additional lectures depending on the number of groups. After you submit your project proposal, your group will receive an email from either Mark or Kallirroi. The email will either approve the proposal, or give comments and ask for revisions.

Question: Can I form a group with students from the other section?
Answer: Yes, assuming the entire group can attend either the 4pm or 6pm lecture during the day their group is presenting.

Help in picking a topic

Domain adaptation

Domain adaptation is a topic that can be explored in a variety of NLP tasks. For many of these tasks, there are large corpora and software tools trained on these corpora. However, performance of the tools may drop when used in domains different from the training corpus. The idea of domain adaptation is to annotate a small amount of training data in the new domain and create a model based on both the large out-of-domain corpus and the small in-domain corpus. This can be applied to NLP tasks such as part of speech tagging, named entity recognition and parsing.

D. McClosky, E. Charniak and M. Johnson. (2010). Automatic domain adaptation for parsing. NAACL.
H. Daume III. (2007). Frustratingly easy domain adaptation. ACL.

Speech recognition / speech synthesis topics

Build a limited-domain speech recognizer. You may use out-of-the-box acoustic models, or adapt out-of-the-box acoustic models to a particular speaker. In the latter case you will have a speaker-dependent speech recognizer. You can build domain-specific language models using one of the language modeling toolkits mentioned below.
Build a grapheme to phoneme converter. The input will be a word and the output a sequence of phonemes. You may train and test your model using a pronunciation dictionary, e.g., the CMU pronouncing dictionary.
Speech recognition error simulation.

NLU topics

Recognition of multiword expressions such as compound nouns, proper names, and idioms.
Word sense disambiguation.
Unsupervised learning of lexical semantics.
Learn a probabilistic context-free grammar from a corpus.
Build a limited-domain NLU system using a parser.
Semantic role labeling.
Automatically grade student answers.
Detection of grammatical errors, e.g., wrong prepositions.