nifi-extracttext-processor Apache NiFi Custom Processor Extracting Text From Files with Apache Tika See my article and example here: https://community.hortonworks.com/articles/163776/parsing-any-document-with-apache-nifi-15-with-apac.html Try this setup https://community.hortonworks.com/storage/attachments/56409-tika.xml https://community.hortonworks.com/articles/81694/extracttext-nifi-custom-processor-powered-by-apach.html For the latest version see here: https://community.hortonworks.com/articles/177370/extracting-html-from-pdf-excel-and-word-documents.html