Regular expression list (Segment widget)¶
The keys (and associated values) for the file lists are the following:
Key |
Type |
Default |
Value |
Remark |
---|---|---|---|---|
mode |
string |
— |
“split” or “tokenize” |
— |
regex |
string |
— |
regular expression |
be careful to escape the backslash |
ignore_case |
Boolean |
false |
option -i |
|
multiline |
Boolean |
false |
option -m |
|
dot_all |
Boolean |
false |
option -s |
|
unicode_dependent |
Boolean |
false |
option -u |
|
annotation_key |
string |
— |
annotation key |
— |
annotation_value |
string |
annotation value |
— |
Example:
[
{
"mode": "Tokenize",
"regex": ".",
"dot_all": true,
"annotation_key": "type",
"annotation_value": "other"
},
{
"mode": "Tokenize",
"regex": "\\w",
"ignore_case": true,
"unicode_dependent": true,
"annotation_key": "type",
"annotation_value": "consonant"
},
{
"mode": "Tokenize",
"regex": "[aeiouy]",
"ignore_case": true,
"annotation_key": "type",
"annotation_value": "vowel"
},
{
"mode": "Tokenize",
"regex": "[0-9]",
"annotation_key": "type",
"annotation_value": "digit"
}
]