Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to set path to custom tika config file #1367

Merged
merged 9 commits into from
Mar 17, 2022
Merged

Add option to set path to custom tika config file #1367

merged 9 commits into from
Mar 17, 2022

Conversation

iadcode
Copy link
Contributor

@iadcode iadcode commented Jan 27, 2022

Path to xml tika config file can be added to _settings.yaml. If path is included and file exists, custom tika parser is used.
I looked at #498, but it's for a previous version of fscrawler and I didn't fully understand the design. I hope this is ok as a new pull request.
Thanks!

@dadoonet
Copy link
Owner

@iadcode thanks a lot!

it looks very good.

do you think you could add:

  • tests
  • Integration tests
  • Documentation

?

thanks again!

@dadoonet dadoonet added the new For new features or options label Jan 27, 2022
@dadoonet dadoonet added this to the 2.10 milestone Jan 27, 2022
@iadcode
Copy link
Contributor Author

iadcode commented Jan 27, 2022

Will do, thank you!

@dadoonet dadoonet self-assigned this Feb 1, 2022
@dadoonet dadoonet self-requested a review February 1, 2022 22:07
@iadcode
Copy link
Contributor Author

iadcode commented Feb 14, 2022

Hi @dadoonet,
I believe LGTM JavaScript analysis was already failing (it is marked as failed for #1376, and I do not believe my changes would have impacted any JS code). Do you need any changes from me before this can be reviewed?
Thanks!

Copy link
Owner

@dadoonet dadoonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!

Thanks a lot for your PR. I did a first review and left some comments.
Please let me know if it's unclear.

iadcode and others added 3 commits February 25, 2022 15:45
 - Changed link to dynamic for Tika Configuration apache documentation
 - Moved tika configuration file(s)
 - Added early fail for tika config file not found
 - Updated exception handling
@iadcode
Copy link
Contributor Author

iadcode commented Feb 25, 2022

Thank you David, those notes were all clear and very helpful. I'm sure I've missed some things though so please let me know if anything else needs an update!

@iadcode iadcode requested a review from dadoonet March 15, 2022 23:52
Copy link
Owner

@dadoonet dadoonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. Could you solve the remaining conflict with the master branch?

@mergify mergify bot merged commit c16b4c1 into dadoonet:master Mar 17, 2022
@dadoonet
Copy link
Owner

Thanks a lot @iadcode !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new For new features or options
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants