Features¶
Orange Textable offers the following features:
Basic text analysis¶
Use regular expressions to segment letters, words, sentences, etc. or full-text query
Use regexes to extract annotations from many input formats
Import in-line XML markup (e.g. TEI)
Include/exclude segments based on user-defined lists (stoplists)
Filter segments based on frequency
Easily generate random text samples
Advanced text analysis¶
Concordances and collocations, also based on annotations
Segment distribution, document-term matrix, transition matrix, etc.
Co-occurrence tables, also between different types of segments
Lemmatization and POS-tagging via Treetagger
Robust linguistic complexity measures, incl. mean length of word, lexical diversity, etc.
Access many advanced data mining algorithms: clustering, classification, factor analyses, etc.
Text recoding¶
Unicode-aware preprocessing functions, e.g. remove accents from Ancient Greek text
Recode and restructure texts using regexes, e.g. rewrite CSV as XML
Extensibility¶
Handles hundreds of text files
Use Python script for custom text processing or to access external tools: NLTK, Pattern, GenSim, etc.
Interoperability¶
Import text from keyboard, files, or URLs
Process any kind of raw text format: TXT, HTML, XML, CSV, etc.
Supports many text encodings, incl. Unicode
Export results in text files or copy-paste
Easy interfacing with Orange’s Text Mining add-on
Ease of access¶
User-friendly visual interface
Ready-made recipes for a range of frequent use cases
Extensive documentation
Support and community forums