-
Notifications
You must be signed in to change notification settings - Fork 0
Task Feature Roadmap
-
Implement class which can take a sentence and shorten it. One use will be for taking the most important sentence in the article and shortening it to the max summary length.
-
Implement an interface which abstracts the requirements for different summarizing methods so that we can easily create a mixture of different methods. Or use different methods for different types of articles
-
Download news article corpus (AP or otherwise) and perform some meaningful data mining task on it. Potential tasks:
- tf-idf weighting
- Topic classification for LSI approach
-
Use generative approach to create multiple summaries for each article (optional)
-
Addition to Summary method 2 (freqNE method): - Find the stanford nlp api class which does anaphora resolution so that we can get a better count of the most mentioned NE - Add a new statistic to feature extraction class which find what verbs were used between any given 2 NE. - Add new statistics about frequency that two NE occur together in a sentence. Also include frequencies of a NE and a non NE noun occur together.
-
Summary creation method 3:
- Pre: Find larg news article corpus
- hash all the articles, group them into categories using LSI | NMF | LDA
- find the n-gram frequencies for all the articles in each category and aggregate those counts.
- Read the test article and find its category
- Find the 3 most frequent NE in the test article.
- Now create the most probabile sentence from this category of articles using 2-gram method from NLP class but constrainthat solution space by the condition that the solution mut contain the top 3 NE from the test article itself