The seven (7) “deadly” sins of text analytics

John Martin of “BeyondRecognition” posted a couple of interesting articles on LinkedIn concerning the use of Text Analytics or Text Mining to classify files and documents.

Of course his “catch” is that one needs visual recognition as well as text based pattern recognition; BeyondRecognition delivers visual recognition technology.
In nearly every article the “problem” of having “image-only” PDFs or TIFFs is mentioned; when there is no text, text mining will not work. We all know that it is very easy to OCR PDFs and TIFFs. One step further is image recognition within photo’s. Both technologies will give us text and metadata to associate with the files.

But still, the articles have some good point that have to be taken into account when using text based classification solutions:

Parts 5 through 7 are still to come…

 

This entry was written by Edwin Stauthamer , posted on zondag december 04 2016at 12:12 pm , filed under Technologie and tagged , . Bookmark the permalink . Post a comment below or leave a trackback: Trackback URL.

Geef een reactie

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>