Text Extracting #1

maoo · 2014-10-02T09:26:25Z

Hi,

ManifoldCF use extract update handler to handle binary content. Binary content is sent to solr, and tikka try to extract text content and some metadata (mime type).

For alfresco connector, Alfresco should be used to convert binary to text as official solr do (by calling NodeContentGet). Because alfresco already know how to convert document to text.

But NodeContentGet webscript is protected by Certificat, you have to clone this webscript.

(original issue - philipmeadows/alfresco-webscript-manifold-connector#21 by @alexist )

maoo · 2014-10-02T09:26:32Z

The Manifold Alfresco connector could invoke NodeContentGet (with http or https, both are available) during the manifold processDocument; this would imply:

Adding the right logic into alfresco-indexer-client
Invoke alfresco-indexer-client from Alfresco Manifold Connector

alexist · 2014-10-02T09:46:20Z

But NodeContentGet is protected by solr-specific authentication mechanism (certificat). Is there another way to call this webscript in HTTP / without certificat ?

maoo · 2014-10-02T09:48:39Z

You can run without SSL - https://wiki.alfresco.com/wiki/Alfresco_And_SOLR#Running_Without_SSL

alexist · 2014-10-02T10:10:26Z

When SSL is disabled, Solr webscript are accessible without any authentication. Not sure it's good idea, and you need to protect another way these webscripts.
Futhermore, you have to patch web.xml in order to disable SSL, also not a good idea.

I think exposing this webscript with the standard authentication mechanism can solve theses problem.

maoo · 2014-10-02T10:12:16Z

The all-in-one archetype is configured to use http (nossl) for Alfresco-Solr comms (in both directions)

https://artifacts.alfresco.com/nexus/content/repositories/alfresco-docs/alfresco-lifecycle-aggregator/latest/archetypes/alfresco-allinone-archetype/usage.html

alexist · 2014-10-02T11:56:00Z

the maven SDK disable SSL during development phase, not in production environment ...

maoo · 2014-10-02T12:52:38Z

True, but it shows how you need to patch the Alfresco web.xml in order to disable SSL

maoo mentioned this issue Oct 2, 2014

Text extracting philipmeadows/alfresco-webscript-manifold-connector#21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text Extracting #1

Text Extracting #1

maoo commented Oct 2, 2014

maoo commented Oct 2, 2014

alexist commented Oct 2, 2014

maoo commented Oct 2, 2014

alexist commented Oct 2, 2014

maoo commented Oct 2, 2014

alexist commented Oct 2, 2014

maoo commented Oct 2, 2014

Text Extracting #1

Text Extracting #1

Comments

maoo commented Oct 2, 2014

maoo commented Oct 2, 2014

alexist commented Oct 2, 2014

maoo commented Oct 2, 2014

alexist commented Oct 2, 2014

maoo commented Oct 2, 2014

alexist commented Oct 2, 2014

maoo commented Oct 2, 2014