6. XML Annotation-based selection using a regex

Another common way of exploiting annotations consists in using them to select the segments that will be in-/excluded by an instance of Select (see Partitioning segmentations) or Intersect (see Using a segmentation to filter another). Thus, in the case of the XML data example introduced here (and further developed there), we might insert an instance of Select between those of Extract XML and Count (see figure 1 below) in order to include only “content words”.

Inserting an instance of Select to filter a segmentation

Figure 1: Inserting an instance of Select to filter a segmentation.

In this simplified example, the Select instance could thus be parameterized as indicated on figure 2 below), so as to exclude (Mode: Exclude) those segments whose annotation value for key type (Annotation key: type) is DET or PREP (Regex: ^(DET|PREP)$).

Inserting an instance of Select to filter a segmentation

Figure 2: Excluding segments based on annotation values with Select.