.. meta:: :description: Orange Textable documentation, using a segmentation to filter another :keywords: Orange, Textable, documentation, filter, stoplist, stopwords Using a segmentation to filter another ====================================== In some cases, the number of forms to be selectively included in or excluded from a segmentation is too large for using the :ref:`Select` widget. A typical example is the removal of "stopwords" from a text: in English for instance, although the list of such words is finite, it is too long to try to encode it by means of a regex (cf. `an example of such a list `_). The purpose of widget :ref:`Intersect` is precisely to solve that kind of problem. It takes two segmentations in input and lets the user include in or exclude from the first (*source*) segmentation those segments whose content is the same as that of a segment in the second (*filter*) segmentation. The widget's basic interface is shown on :ref:`figure 1 ` below). .. _using_segmentation_filter_another_fig1: .. figure:: figures/intersect_example.png :align: center :alt: Interface of widget Intersect configured for stopword removal Figure 1: Interface of widget :ref:`Intersect` configured for stopword removal. Similarly to widget :ref:`Select`, user must choose between modes **Include** and **Exclude**. The next step is to specify which incoming segmentation plays the role of the **Source** segmentation and the **Filter** segmentation. (Here again, we will ignore the **Annotation key** option for the time being.) In order to try out the widget, set up a schema similar to the one shown on :ref:`figure 2 ` below). The first instance of :ref:`Text Field` contains the text to process (for instance the `Universal Declaration of Human Rights `_) and is labelled as such, while the second instance, *Text Field (1)*, contains the list of English stopwords mentioned above. Both instances of :ref:`Segment` produce a word segmentation with regex ``\w+``; the only difference in their configuration is the Segment Widget label , i.e. *words* for the segmentation of the UDHR and *stopwords* for the segmentation of *Text Field (1)*. Finally, the instance of :ref:`Intersect` is configured as shown on :ref:`figure 1 ` above. .. _using_segmentation_filter_another_fig2: .. figure:: figures/intersect_example_schema.png :align: center :alt: Schema illustrating the use of the Intersect widget for stopword removal :scale: 80 % Figure 2: Example schema for removing stopword using widget :ref:`Intersect` . The content of the first segments of the resulting segmentation is:: PREAMBLE Whereas recognition inherent dignity equal inalienable rights members human family foundation freedom justice peace world ... .. _using_segmentation_filter_another_ex: **Exercise:** Based on an instance of :ref:`Text Field`, produce a segmentation containing all words less than 4 letters long that appear at the beginning of each line, excluding *I, you, he, she, we*. (:ref:`solution `) .. _solution_using_segmentation_filter_another_ex: **Solution:** :ref:`Figure 3 ` below shows a possible solution. The 4 instances in the lower part of the schema (*Text Field (1)*, *Segment (1)*, *Intersect*, and *Display*) are configured as in :ref:`figure 2 ` above--with *Text Field (1)* containing the list of pronouns to exclude. The difference lies in the addition of a :ref:`Segment` instance in the upper branch. In this branch, the first instance (*Segment*) produces a segmentation into lines with regex ``.+`` while *Segment (2)* extracts the first word of each line, provided it is shorter than 4 letters (regex ``^\w{1,3}\b``). *Intersect* eventually takes care of excluding the pronouns listed above. .. _using_segmentation_filter_another_fig3: .. figure:: figures/solution_exercise_intersect.png :align: center :alt: Solution to the exercise illustrating the Intersect widget :scale: 80 % Figure 3: A possible solution. (:ref:`back to the exercise `) See also -------- * :ref:`Reference: Select widget