10. Tagging table rows with segments and labels

There are several situations in which annotations attached to a segment can be used in place of this segment’s content. A particularly common case consists in using annotations for tagging the rows of a table built with an instance of Count, Length, Variety, or Category.

Consider the example of the texts in English and French introduced here. Suppose that after having merged them into a single segmentation with an instance of Merge (Widget Merge label: Texts ; Import labels with key: language), we segment these three texts into letters with an instance of Segment (Regex \w, Widget Segment label: letters), as in the schema shown on figure 1 below; both segmentations (texts and letters) can then be sent to an instance of Count for building a table with the frequency of each letter in each text.

Counting letter frequency in three texts

Figure 1: Schema for counting letter frequency in three texts.

Let us suppose, first, that the instance of Count is configured as shown on figure 2 below, so that the definition of contexts–that is, rows of the frequency table–is based on the content of the three texts.

Counting letter frequency in three texts

Figure 2: Counting letter frequency in texts.

Here is the resulting table, disregarding possible variations in row and/or column order:

Table 1: Letter frequency in three texts.
  a t e x i n E g l s h o r u f ç
a text in English 1 2 1 1 2 2 1 1 1 1 1 0 0 0 0 0
another text in English 1 3 2 1 2 3 1 1 1 1 2 1 1 0 0 0
un texte en français 2 2 3 1 1 3 0 0 0 1 0 0 1 1 1 1

As can be seen, the default header of each row is the entire content of each text. While this may not be a problem in a pedagogic example like this one, it is easy to see why it would compromise the table’s readability in a real application, where texts often contain thousand or even millions of characters. To avoid that, it is useful to tag the table’s rows with annotation values attached to segments rather than with these segments’ content. To that effect, the desired annotation key must be selected in the Contexts section of widget Count’s interface.

Tagging contexts with annotation values

Figure 3: Tagging contexts with annotation values.

In the example of figure 3 above key language has been selected, so that the resulting frequency table looks like this:

Table 2: Letter frequency in two text types.
  a t e x i n E g l s h o r u f ç
en 2 5 3 2 4 5 2 2 2 2 3 1 1 0 0 0
fr 2 2 3 1 1 3 0 0 0 1 0 0 1 1 1 1

Besides the substitution of segment content by annotation values in row headers, this example demonstrates an important consequence of this manipulation: contexts associated with the same annotation value are, in effect, collapsed together so that they form a single row. If this behavior is not desired, it can be avoided by assigning distinct annotation values to the contexts that must be kept separated (e.g. en_1 and en_2).