1.5. Using a segmentation to filter another¶

There are many situations where we might want to select or exclude segments from a segmentation. A typical example is the removal of “stopwords” (i.e. determiners, pronouns, prepositions, etc.) from a text, in order to restrict analyses to content words (cf. an example of a list of stopwords).

The purpose of widget Intersect is precisely to solve that kind of problem. It takes two segmentations in input and lets the user include in or exclude from the first (source) segmentation those segments whose content is the same as that of a segment in the second (filter) segmentation. The widget’s basic interface is shown on figure 1 below.

Figure 1: Interface of widget Intersect configured for stopword removal.¶

The Mode option serves to indicate if the segments present in the filter segmentation should be removed from the source segmentation (Exclude) or, conversely, if they should be retained (Include); in the latter case, all the segments absent in the filter segmentation will be removed from the source segmentation. The next step is to specify which incoming segmentation plays the role of the Source segmentation and the Filter segmentation.

In order to try out the widget, set up a workflow similar to the one shown on figure 2 below. The first instance of Text Field (UDHR) contains the text to process (for instance the Universal Declaration of Human Rights), while Text Field (1), contains the list of English stopwords mentioned above. Both instances of Segment produce a word segmentation (Segment into words); the only difference in their configuration is the widget label, i.e. Words versus Stopwords. Finally, Intersect is configured as shown on figure 1 above.

Schema illustrating the use of the Intersect widget for stopword removal

Figure 2: Example workflow for removing stopwords using widget Intersect.

The content of the first segments of the resulting segmentation is:

PREAMBLE

Whereas

recognition

inherent

dignity

equal

inalienable

rights

members

human

family

foundation

freedom

justice

peace

world

...

1.5. Using a segmentation to filter another¶

See also¶

Table of Contents

Previous topic

Next topic

This Page