Features

Orange Textable offers the following features:

Basic text analysis

  • Use regular expressions to segment letters, words, sentences, etc. or full-text query

  • Use regexes to extract annotations from many input formats

  • Import in-line XML markup (e.g. TEI)

  • Include/exclude segments based on user-defined lists (stoplists)

  • Filter segments based on frequency

  • Easily generate random text samples

Advanced text analysis

  • Concordances and collocations, also based on annotations

  • Segment distribution, document-term matrix, transition matrix, etc.

  • Co-occurrence tables, also between different types of segments

  • Lemmatization and POS-tagging via Treetagger

  • Robust linguistic complexity measures, incl. mean length of word, lexical diversity, etc.

  • Access many advanced data mining algorithms: clustering, classification, factor analyses, etc.

Text recoding

  • Unicode-aware preprocessing functions, e.g. remove accents from Ancient Greek text

  • Recode and restructure texts using regexes, e.g. rewrite CSV as XML

Extensibility

  • Handles hundreds of text files

  • Use Python script for custom text processing or to access external tools: NLTK, Pattern, GenSim, etc.

Interoperability

  • Import text from keyboard, files, or URLs

  • Process any kind of raw text format: TXT, HTML, XML, CSV, etc.

  • Supports many text encodings, incl. Unicode

  • Export results in text files or copy-paste

  • Easy interfacing with Orange’s Text Mining add-on

Ease of access

  • User-friendly visual interface

  • Ready-made recipes for a range of frequent use cases

  • Extensive documentation

  • Support and community forums