Filter segments based on their frequency¶
Goal¶
Filter out the most rare and/or frequent segments of a segmentation.
Prerequisites¶
Some text has been imported in Orange Textable (see Cookbook: Text input) and in all likelihood it has been segmented in smaller units (see Cookbook: Segment text in smaller units).
Ingredients¶
Widget
Icon
Quantity
1
Procedure¶

Figure 1: Filtering out low-frequency segments with an instance of Select¶
Create an instance of Select.
Drag and drop from the output (righthand side) of the widget that emits the segmentation to be filtered, here Segment (letters), to the input of Select (lefthand side).
Double-click on the icon of Select to open its interface.
Tick the Advanced settings checkbox.
In the Select section, choose Threshold in the Method drop-down menu.
Under Threshold expressed as, choose whether you want to express frequency thresholds in terms of Count (i.e. number of tokens) or of Proportion (i.e. percentage of tokens).
If you want to set a minimum frequency threshold, tick the Min. count (respectively Min. proportion (%)) checkbox and indicate the minimum frequency that a segment type must have in order to be included in the output.
If you want to set a maximum frequency threshold, tick the Max. count (respectively Max. proportion (%)) checkbox and indicate the maximum frequency that a segment type can have in order to be included in the output.
Click the Send button or tick the Send automatically checkbox.
A segmentation containing the selected segments is then available at the output of Select; to display or export it, see Cookbook: Text output.
Comment¶
The Select widget emits on a second output connection (not selected by default) a segmentation containing the segments that were not selected (see Filtering segmentations using regexes for instructions on how to access this other output segmentation).