Exclude segments based on a stoplist

Goal

Filter out segments based on a stoplist.

Prerequisites

Some text has been imported in Orange Textable (see Cookbook: Text input) and it has been segmented into words (see Cookbook: Segment text in smaller units).

Ingredients

Widget

Text Field

Segment

Intersect

Icon

textfield_icon

segment_icon

intersect_icon

Quantity

2

2

1

Procedure

Exclude segments based on a stoplist with instances of Text Field, Segment and Intersect

Figure 1: Exclude segments based on a stoplist with instances of Text Field, Segment and Intersect

  1. Create an instance of Text Field and paste into it the stoplist you want to use.

  2. Follow the indications given in Cookbook: Segment text in smaller units in order to segment the stoplist into words.

  3. Create an instance of Intersect.

  4. Drag and drop from the output (righthand side) of the widget that emits the segmentation to be filtered, here Segment (words), to the input of Intersect (lefthand side).

  5. Likewise, connect Segment (stopwords) to Intersect.

  6. Double-click on the icon of Intersect to open its interface.

  7. In the Intersect section, choose Mode: Exclude.

  8. In the Source segmentation field, choose the segmentation to be filtered (here: words); in the Filter segmentation field, choose the segmentation containing the stopwords (here: stopwords).

  9. Click the Send button or tick the Send automatically checkbox.

  10. A segmentation containing the filtered segmentation is then available at the output of Intersect; to display or export it, see Cookbook: Text output.

Comment

  • Stopword lists for various languages can be found here.

See also