Convert XML tags into Orange Textable annotations

Goal

Convert XML markup into Orange Textable data structures such as segments and their annotations.

Prerequisites

Some text containing XML markup has been imported in Orange Textable (see Cookbook: Text input) and possibly further processed (see Cookbook: Segmentation manipulation).

Ingredients

Widget

Extract XML

Icon

extract_xml_icon

Quantity

1

Procedure

Convert XML tags into Orange Textable annotations with an instance of Extract XML

Figure 1: Convert XML tags into Orange Textable annotations with an instance of Extract XML

  1. Create an instance of Extract XML.

  2. Drag and drop from the output (righthand side) of the widget that emits the data containing XML markup (e.g. Text Field) to the input of Extract XML (lefthand side).

  3. Double-click on the icon of Extract XML to open its interface.

  4. In the XML Extraction section, insert the desired XML element (here w).

  5. Click the Send button or tick the Send automatically checkbox.

  6. A segmentation containing a segment for each occurrence of the specified tag is then available at the output of Segment; to display or export it, see Cookbook: Text output.

Comment

  • The XML tags that have been retrieved are actually discarded from the resulting segmentation: only their content is included in the output.

  • The attributes of the XML tags are automatically converted to annotations associated with the created segments.

  • Note that it is only possible to extract instances of a single XML element type at a time (here w).

  • However, it is possible to chain several Extract XML instances in order to successively extract instances of different XML elements. For example, a first instance to extract div type elements, a second to extract w type elements, and so on. In this case, it is important to make sure that the Remove markup option is not selected.

See also