Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Extracting #1

Open
maoo opened this issue Oct 2, 2014 · 7 comments
Open

Text Extracting #1

maoo opened this issue Oct 2, 2014 · 7 comments

Comments

@maoo
Copy link
Contributor

maoo commented Oct 2, 2014

Hi,

ManifoldCF use extract update handler to handle binary content. Binary content is sent to solr, and tikka try to extract text content and some metadata (mime type).

For alfresco connector, Alfresco should be used to convert binary to text as official solr do (by calling NodeContentGet). Because alfresco already know how to convert document to text.

But NodeContentGet webscript is protected by Certificat, you have to clone this webscript.

(original issue - philipmeadows/alfresco-webscript-manifold-connector#21 by @alexist )

@maoo
Copy link
Contributor Author

maoo commented Oct 2, 2014

The Manifold Alfresco connector could invoke NodeContentGet (with http or https, both are available) during the manifold processDocument; this would imply:

  • Adding the right logic into alfresco-indexer-client
  • Invoke alfresco-indexer-client from Alfresco Manifold Connector

@alexist
Copy link

alexist commented Oct 2, 2014

But NodeContentGet is protected by solr-specific authentication mechanism (certificat). Is there another way to call this webscript in HTTP / without certificat ?

@maoo
Copy link
Contributor Author

maoo commented Oct 2, 2014

@alexist
Copy link

alexist commented Oct 2, 2014

When SSL is disabled, Solr webscript are accessible without any authentication. Not sure it's good idea, and you need to protect another way these webscripts.
Futhermore, you have to patch web.xml in order to disable SSL, also not a good idea.

I think exposing this webscript with the standard authentication mechanism can solve theses problem.

@maoo
Copy link
Contributor Author

maoo commented Oct 2, 2014

The all-in-one archetype is configured to use http (nossl) for Alfresco-Solr comms (in both directions)

https://artifacts.alfresco.com/nexus/content/repositories/alfresco-docs/alfresco-lifecycle-aggregator/latest/archetypes/alfresco-allinone-archetype/usage.html

@alexist
Copy link

alexist commented Oct 2, 2014

the maven SDK disable SSL during development phase, not in production environment ...

@maoo
Copy link
Contributor Author

maoo commented Oct 2, 2014

True, but it shows how you need to patch the Alfresco web.xml in order to disable SSL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants