2.2. Filtering segmentations using regexes¶
In section Using a segmentation to filter another, we have seen how to use the Intersect widget to exclude a specified list of words (so-called “stopwords”) from a segmentation. The Select widget is tailored for such tasks.
The widget’s interface (see figure 1 below) offers a choice between two modes: Include and Exclude. Depending on this parameter, incoming segments that satisfy a given condition will be either included in or excluded from the output segmentation. By default (i.e. when the Advanced settings box is unchecked), the condition is specified by means of a regex, which will be applied to each incoming segment successively. For now, the option Annotation key can be left to its default setting (none).

In the example of figure 1,
the widget is configured to exclude all incoming segments containing no
more than 3 letters. Note that without the beginning of segment and
end of segment anchors (^
and $
), all words containing at least a
sequence of 1 to 3 letters–i.e. all the words–would be excluded.
Note that Select automatically emits a second segmentation containing all the segments that have been discarded from the main output segmentation (in the case of figure 1 above, that would be all words less than 4 letters long). This feature is useful when both the selected and the discarded segments are to be further processed on distinct branches. By default, when Select is connected to another widget, the main segmentation is being emitted. In order to send the segmentation of discarded segments instead, right-click on the outgoing connection and select Reset Signals (see figure 2 below).

Figure 2: Right-clicking on a connection and requesting to Reset Signals.¶
This opens the dialog shown on figure 3 below, where the user can “drag-and-drop” from the gray box next to Discarded data up to the box next to Segmentation, thus replacing the existing connection. Clicking OK validates the modification and enables the discarded data to flow through the connection.

Figure 3: This dialog allows the user to select a non-default connection between two widgets.¶