You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Apache Tika language guesser is unreliable for PDF, RTF and DOCX documents. Therefore, the LRS first converts these documents to plain text (also using Apache Tika) before the language guesser is started. However, the LRS matcher uses the original mime type of the document to identify applicable tools (and the original document is sent to the selected tool). As a consequence, fewer tools are found applicable. In the future, allow users to decide whether they want the tools to work on the original mime types, or on text/plain documents (hence increasing the set of applicable tools).
The text was updated successfully, but these errors were encountered:
The Apache Tika language guesser is unreliable for PDF, RTF and DOCX documents. Therefore, the LRS first converts these documents to plain text (also using Apache Tika) before the language guesser is started. However, the LRS matcher uses the original mime type of the document to identify applicable tools (and the original document is sent to the selected tool). As a consequence, fewer tools are found applicable. In the future, allow users to decide whether they want the tools to work on the original mime types, or on text/plain documents (hence increasing the set of applicable tools).
The text was updated successfully, but these errors were encountered: