Skip to content

Candidate topics

Samual Krish Ravichandran edited this page Mar 21, 2016 · 16 revisions
  • Lyric translation from english to other language with meaningful sentences

  • Document classification/recommendation methods that gives more control to its user.

    • For instance, we can split a single twitter timeline into multiple stream grouped by topic
    • Or we can filter out uninteresting tweets from certain timeline
    • Other features like account, retweet/share, like, conversation context could be used
      • Which means we have an opportunity to build a model upon this functionalities
    • We need to crawl data from twitter/facebook/...
  • SIGMORPHON 2016 Shared Task: Morphological Reinflection

    • Well-defined problem
    • Dataset and a baseline implementation are given
  • Transcription system between languages

  • Language Identification

    • Identify which language is used for a given document
    • Well-defined problem with easy-to-find dataset (Wikipedia)
    • Still has a room to improve
  • Predict closed questions on Stackoverflow



  • Is is possible to apply NLP techniques to a programming language implementation?

    • Some topics could be benefited from NLP techniques
      • Parsing incomplete/incorrect code
      • Use natural language information (comment, function/class name ...) for static analysis
    • Though not sure about whether a PL can be considered as "a language other than English"
  • Automatic synthesizing of a constructed language

    • Consider that given time frame is short, full construction seems to be infeasible
    • Need to focus on a specific structure like phonetic/morphological ...
    • Can be used for testing how a particular NLP technique works well in unknown language
  • Identifying the geographic of a person based on his sentence formation.

  • Augmenting procedural content generation using NLP techniques, somewhat related to narrative generation

    • Automatic generation of a natural language dialog with user-given constraints
    • ex. Rule-based quest line generation is not that hard, but it usually lacks of plausible dialogs
  • Develop a scoring algorithm for student-written short-answer responses.

    • The problem is that we don't have data set other than English
Clone this wiki locally