Compute TFIDF iteratively #24

minottic · 2022-07-08T15:30:48Z

I think that at the cost of some storage space, the TFIDF score can be computed iteratively, without having to run the whole computation from the beginning when a new document is added.

The idea is:

for each (item, term) you store its TF at time T and TFIDF at time T
you store Tc at time T
for each term you store T(t) at time T
now let's say that at time T+1 you add one new item
you increment Tc by 1 and store it
for each term you increment T(t) by 1 if the new document contains t and store it
with 6 you compute TF and TFIDF for the new document at time T+1 and store
with 6 you update the TF and TFID of the old documents at T+1

All this only makes sense if I understood correctly how TFIDF works :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute TFIDF iteratively #24

Compute TFIDF iteratively #24

minottic commented Jul 8, 2022

Compute TFIDF iteratively #24

Compute TFIDF iteratively #24

Comments

minottic commented Jul 8, 2022