CSCI 544 Applied NLP: Proposing a Group Project
Due Date: November 19, 2015, (10:59 AM PST)
We will try to comment on proposals submitted before November 6
by the following week. Submitting later may delay approval.
Failure to submit the proposal by the due date (November 19, 10:59 AM PST)
will result in zero credit for both the project and assignment 4 (because
assignment 4 is closely related to the project).
Online proposal submission.
Help in picking a topic
Domain adaptation
Domain adaptation is a topic that can be explored in a variety of NLP
tasks. For many of these tasks, there are large corpora and software
tools trained on these corpora. However, performance of the tools may
drop when used in domains different from the training corpus. The idea
of domain adaptation is to annotate a small amount of training data in
the new domain and create a model based on both the large
out-of-domain corpus and the small in-domain corpus. This can be
applied to NLP tasks such as part of speech tagging, named entity
recognition and parsing.
Speech recognition / speech synthesis topics
- Build a limited-domain speech recognizer. You may use out-of-the-box acoustic models, or adapt out-of-the-box acoustic models to a particular speaker. In the latter case you will have a speaker-dependent speech recognizer. You can build domain-specific language models using one of the language modeling toolkits mentioned below.
- Build a grapheme to phoneme converter. The input will be a word and the output a sequence of phonemes. You may train and test your model using a pronunciation dictionary, e.g., the CMU pronouncing dictionary.
- Speech recognition error simulation.
NLU topics
- Recognition of multiword expressions such as compound nouns, proper names, and idioms.
- Word sense disambiguation.
- Unsupervised learning of lexical semantics.
- Learn a probabilistic context-free grammar from a corpus.
- Build a limited-domain NLU system using a parser.
- Semantic role labeling.
- Automatically grade student answers.
- Detection of grammatical errors, e.g., wrong prepositions.
Discourse topics
- Coreference resolution.
- Information extraction.
- Discourse segmentation.
- Discourse parsing.
- Build a simple natural language generation system.
- Build a simple summarization system.
Dialogue topics
- Build a limited-domain goal-oriented dialogue system.
- Build a simple chat-oriented dialogue system.
- Develop dialogue management strategies using reinforcement learning.
Help in finding data
Useful tools and resources for speech recognition
Useful tools and resources for language modeling
Useful tools and resources for speech synthesis
Useful tools for building dialogue systems