Skip to content

Task Feature Roadmap

rubenwolff edited this page Apr 11, 2013 · 3 revisions
  • Implement class which can take a sentence and shorten it. One use will be for taking the most important sentence in the article and shortening it to the max summary length.

  • Implement an interface which abstracts the requirements for different summarizing methods so that we can easily create a mixture of different methods. Or use different methods for different types of articles

  • Download news article corpus (AP or otherwise) and perform some meaningful data mining task on it. Potential tasks:

    • tf-idf weighting
    • Topic classification for LSI approach
  • Use generative approach to create multiple summaries for each article (optional)

  • Addition to Summary method 2 (freqNE method): - Find the stanford nlp api class which does anaphora resolution so that we can get a better count of the most mentioned NE - Add a new statistic to feature extraction class which find what verbs were used between any given 2 NE. - Add new statistics about frequency that two NE occur together in a sentence. Also include frequencies of a NE and a non NE noun occur together.

  • Summary creation method 3:

    • Pre: Find larg news article corpus
    • hash all the articles, group them into categories using LSI | NMF | LDA
    • find the n-gram frequencies for all the articles in each category and aggregate those counts.
    • Read the test article and find its category
    • Find the 3 most frequent NE in the test article.
    • Now create the most probabile sentence from this category of articles using 2-gram method from NLP class but constrainthat solution space by the condition that the solution mut contain the top 3 NE from the test article itself
Clone this wiki locally