Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide API to allow to avoid fulltext index #34

Open
kelson42 opened this issue Mar 31, 2020 · 3 comments
Open

Provide API to allow to avoid fulltext index #34

kelson42 opened this issue Mar 31, 2020 · 3 comments
Assignees
Milestone

Comments

@kelson42
Copy link
Contributor

Currently this seems to somehow be implicitly done based on the fact of a language is given or not for the indexing. This sounds wrong because you need the language for the stemming of the Xapian Title index.

@kelvinhammond
Copy link
Collaborator

@kelson42 @mgautierfr Will setting shouldIndex = false avoid the full text index or will this break other things too? See Line 112 and Line 132.

I'm not sure what Line 112 data->nbIndexArticles++; does / is used for.

It also seems like setting withIndex to false would handle this and is probably the proper way to do this.

There is a setIndexing. Currently this is set to false if fullTextLanguage.empty(), we could also expose the setIndexing function on the ZimCreatorWrapper in javascript and typescript thus allowing a script to customize indexing and / or disable it. However based on this Line 85 a user may need to do this before they start writing because it appears the creator will have been created by then and indexing started. But now that I look again it probably would skip part of the indexing step since withIndex would be false later so this may work. Please confirm. I can write the code for this if this is the solution.

@kelson42
Copy link
Contributor Author

kelson42 commented May 5, 2020

Currently this is set to false if fullTextLanguage.empty()

Yes, this is the part which is wrong. Probably should be fixed within #36.

@mgautierfr
Copy link

Line 112 is just about logging information.

shouldIndex methods is about "Should we index the article to allow the user to search for it ?". The answer is based on the "type" of the article, not on the fact that you want fulltext index or not. Probably that you want to index html article and not image/css/js ones. But it is not always the case. Some html articles may not have to be indexed for some reason. Or you may want to allow user to search for images (but it is not technically possible for now).

If you want to deactive fulltext index, you must use Creator::setIndexing(bool indexing, std::string language).
The function name is not best here. If indexing is false, the fulltext indexing is deactivated (what ever is Article::shouldIndex).
Title indexing is always activated (if Article::shouldIndex return true. And if libzim is compiled with xapian).

@kelson42 kelson42 pinned this issue Jul 6, 2020
@kelson42 kelson42 added this to the 2.5.0 milestone Oct 28, 2021
@kelson42 kelson42 modified the milestones: 3.0.0, 3.1.0 May 4, 2023
@kelson42 kelson42 modified the milestones: 3.1.0, 3.2.0, 3.3.0 Dec 3, 2023
@kelson42 kelson42 modified the milestones: 3.3.0, 3.4.0 Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants