Convert XML tags into Orange Textable annotations¶
Goal¶
Convert XML markup into Orange Textable data structures such as segments and their annotations.
Prerequisites¶
Some text containing XML markup has been imported in Orange Textable (see Cookbook: Text input) and possibly further processed (see Cookbook: Segmentation manipulation).
Ingredients¶
Widget
Icon
Quantity
1
Procedure¶

Figure 1: Convert XML tags into Orange Textable annotations with an instance of Extract XML¶
Create an instance of Extract XML.
Drag and drop from the output (righthand side) of the widget that emits the data containing XML markup (e.g. Text Field) to the input of Extract XML (lefthand side).
Double-click on the icon of Extract XML to open its interface.
In the XML Extraction section, insert the desired XML element (here
w
).Click the Send button or tick the Send automatically checkbox.
A segmentation containing a segment for each occurrence of the specified tag is then available at the output of Segment; to display or export it, see Cookbook: Text output.
Comment¶
The XML tags that have been retrieved are actually discarded from the resulting segmentation: only their content is included in the output.
The attributes of the XML tags are automatically converted to annotations associated with the created segments.
Note that it is only possible to extract instances of a single XML element type at a time (here
w
).However, it is possible to chain several Extract XML instances in order to successively extract instances of different XML elements. For example, a first instance to extract
div
type elements, a second to extractw
type elements, and so on. In this case, it is important to make sure that the Remove markup option is not selected.