Regular expression list (Segment widget)¶
The keys (and associated values) for the file lists are the following:
Key | Type | Default | Value | Remark |
---|---|---|---|---|
mode | string | — | “split” or “tokenize” | — |
regex | string | — | regular expression | be careful to escape the backslash |
ignore_case | Boolean | false | option -i | cf. Python doc (re.UNICODE) |
multiline | Boolean | false | option -m | cf. Python doc (re.MULTILINE) |
dot_all | Boolean | false | option -s | cf. Python doc (re.DOTALL) |
unicode_dependent | Boolean | false | option -u | cf. Python doc (re.IGNORECASE) |
annotation_key | string | — | annotation key | — |
annotation_value | string | annotation value | — |
Example:
[
{
"mode": "Tokenize",
"regex": ".",
"dot_all": true,
"annotation_key": "type",
"annotation_value": "other"
},
{
"mode": "Tokenize",
"regex": "\\w",
"ignore_case": true,
"unicode_dependent": true,
"annotation_key": "type",
"annotation_value": "consonant"
},
{
"mode": "Tokenize",
"regex": "[aeiouy]",
"ignore_case": true,
"annotation_key": "type",
"annotation_value": "vowel"
},
{
"mode": "Tokenize",
"regex": "[0-9]",
"annotation_key": "type",
"annotation_value": "digit"
}
]