2.4. XML annotation-based selection using a regex

Another common way of exploiting annotations consists in using them to select the segments that will be in-/excluded by an instance of Select (see Filtering segmentations using regexes) or Intersect (see Using a segmentation to filter another). Thus, in the case of the XML data example introduced here (and further developed there), we might insert an instance of Select between those of Extract XML and Count (see figure 1 below) in order to include only “content words”.

Inserting an instance of Select to filter a segmentation

Figure 1: Inserting an instance of Select to filter a segmentation.

In this simplified example, the Select instance could thus be parameterized (as indicated on figure 2 below), so as to exclude (Mode: Exclude) those segments whose annotation value for key type (Annotation key: type) is DET or PREP (Regex: ^(DET|PREP)$).

Excluding segments based on annotation values with Select

Figure 2: Excluding segments based on annotation values with Select.

See also