-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to set path to custom tika config file #1367
Conversation
Add option to set path to custom tika config file
@iadcode thanks a lot! it looks very good. do you think you could add:
? thanks again! |
Will do, thank you! |
Add Tika Config Path Tests and Documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic!
Thanks a lot for your PR. I did a first review and left some comments.
Please let me know if it's unclear.
...o/elasticsearch/crawler/fs/test/integration/elasticsearch/FsCrawlerTestTikaConfigPathIT.java
Outdated
Show resolved
Hide resolved
...o/elasticsearch/crawler/fs/test/integration/elasticsearch/FsCrawlerTestTikaConfigPathIT.java
Outdated
Show resolved
Hide resolved
...o/elasticsearch/crawler/fs/test/integration/elasticsearch/FsCrawlerTestTikaConfigPathIT.java
Outdated
Show resolved
Hide resolved
tika/src/test/java/fr/pilato/elasticsearch/crawler/fs/tika/TikaDocParserTest.java
Outdated
Show resolved
Hide resolved
tika/src/test/java/fr/pilato/elasticsearch/crawler/fs/tika/TikaDocParserTest.java
Outdated
Show resolved
Hide resolved
tika/src/test/java/fr/pilato/elasticsearch/crawler/fs/tika/TikaDocParserTest.java
Outdated
Show resolved
Hide resolved
...o/elasticsearch/crawler/fs/test/integration/elasticsearch/FsCrawlerTestTikaConfigPathIT.java
Outdated
Show resolved
Hide resolved
...o/elasticsearch/crawler/fs/test/integration/elasticsearch/FsCrawlerTestTikaConfigPathIT.java
Outdated
Show resolved
Hide resolved
- Changed link to dynamic for Tika Configuration apache documentation - Moved tika configuration file(s) - Added early fail for tika config file not found - Updated exception handling
Review Changes
Thank you David, those notes were all clear and very helpful. I'm sure I've missed some things though so please let me know if anything else needs an update! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. Could you solve the remaining conflict with the master branch?
Thanks a lot @iadcode ! |
Path to xml tika config file can be added to _settings.yaml. If path is included and file exists, custom tika parser is used.
I looked at #498, but it's for a previous version of fscrawler and I didn't fully understand the design. I hope this is ok as a new pull request.
Thanks!