Regular expression list (Segment widget)

The keys (and associated values) for the file lists are the following:

Key

Type

Default

Value

Remark

mode

string

“split” or “tokenize”

regex

string

regular expression

be careful to escape the backslash

ignore_case

Boolean

false

option -i

cf. Python doc (re.UNICODE)

multiline

Boolean

false

option -m

cf. Python doc (re.MULTILINE)

dot_all

Boolean

false

option -s

cf. Python doc (re.DOTALL)

unicode_dependent

Boolean

false

option -u

cf. Python doc (re.IGNORECASE)

annotation_key

string

annotation key

annotation_value

string

annotation value

Example:

[
    {
        "mode":              "Tokenize",
        "regex":             ".",
        "dot_all":           true,
        "annotation_key":    "type",
        "annotation_value":  "other"
    },
    {
        "mode":              "Tokenize",
        "regex":             "\\w",
        "ignore_case":       true,
        "unicode_dependent": true,
        "annotation_key":    "type",
        "annotation_value":  "consonant"
    },
    {
        "mode":              "Tokenize",
        "regex":             "[aeiouy]",
        "ignore_case":       true,
        "annotation_key":    "type",
        "annotation_value":  "vowel"
    },
    {
        "mode":              "Tokenize",
        "regex":             "[0-9]",
        "annotation_key":    "type",
        "annotation_value":  "digit"
    }
]