3. Merging and segmenting

Computerized text analysis often implies consolidating various text sources into a single corpus. In the framework of Orange Textable, this amounts to grouping segmentations together, and it is the purpose of the Merge widget.

To try out this widget, create on the canvas two instances of Text Field, an instance of Merge and an instance of Display (see figure 1 below). Type a different string in each Text Field instance (e.g. a simple example and another example) and assign it a distinct label (e.g. text_string and text_string2). Eventually, connect the instances as shown on figure 1.

Schema illustrating the usage of widget Merge

Figure 1: Grouping a simple example with another example using widget Merge.

The interface of widget Merge (see figure 2 below) features 4 options : 2 annotation keys; the possibility of copying segment inputs annotations if any and the option of fusing segments that have the same adress.

Interface of widget merge

Figure 2: Interface of widget Merge.

We will return later to the purpose of checkbox Import labels with key, as well as Auto-number with key. Leave them unchecked for now.

Displaying a merged segmentation

Figure 3: Merged segmentation.

Figure 3 above shows the resulting merged segmentation, as displayed by widget Display. As can be seen, Merge makes it easy to concatenate several strings into a single segmentation. If the incoming segmentations contained several segments, each of them would appear in the output segmentation, in the order they have been linked to the Merge widget.

Exercise: Can you add a new instance of Merge to the schema illustrated on figure 1 above and modify the connections (but not the configuration of existing widgets) so that the segmentation given in figure 4 below appears in the Display widget? (solution)

3 segments: "a simple example", "another example", "another example"

Figure 4: The segmentation requested in the exercise.

Solution: (back to the exercise)

New Merge widget takes input from old one and Text field, and sends output to Display

Figure 5: Solution to the exercise.