Skip to content

encoding.xml #

Description

xml is a module to parse XML documents into a tree structure. It also supports validation of XML documents against a DTD.

Note that this is not a streaming XML parser. It reads the entire document into memory and then parses it. This is not a problem for small documents, but it might be a problem for extremely large documents (several hundred megabytes or more).

The public function parse_single_node can be used to parse a single node from an implementation of io.Reader, which can help parse large XML documents on an element-by-element basis. Sample usage is provided in the parser_test.v file.

Usage

Parsing XML Files

There are three different ways to parse an XML Document:

  1. Pass the entire XML document as a string to XMLDocument.from_string.
  2. Specify a file path to XMLDocument.from_file.
  3. Use a source that implements io.Reader and pass it to XMLDocument.from_reader.
import encoding.xml

//...
doc := xml.XMLDocument.from_file('test/sample.xml')!

Validating XML Documents

Simply call validate on the parsed XML document.

Querying

Check the get_element... methods defined on the XMLDocument struct.

Escaping and Un-escaping XML Entities

When the validate method is called, the XML document is parsed and all text nodes are un-escaped. This means that the text nodes will contain the actual text and not the escaped version of the text.

When the XML document is serialized (using str or pretty_str), all text nodes are escaped.

The escaping and un-escaping can also be done manually using the escape_text and unescape_text methods.

Constants #

const default_entities = {
	'lt':   '<'
	'gt':   '>'
	'amp':  '&'
	'apos': '''
	'quot': '"'
}
const default_entities_reverse = {
	'<': 'lt'
	'>': 'gt'
	'&': 'amp'
	''': 'apos'
	'"': 'quot'
}

fn escape_text #

fn escape_text(content string, config EscapeConfig) string

escape_text replaces all entities in the given string with their respective XML entity strings. See default_entities, which can be overridden.

fn parse_single_node #

fn parse_single_node(first_char u8, mut reader io.Reader) !XMLNode

parse_single_node parses a single XML node from the reader. The first character of the tag is passed in as the first_char parameter. This function is meant to assist in parsing nested nodes one at a time. Using this function as opposed to the recommended static functions makes it easier to parse smaller nodes in extremely large XML documents without running out of memory.

fn unescape_text #

fn unescape_text(content string, config UnescapeConfig) !string

unescape_text replaces all entities in the given string with their respective original characters or strings. See default_entities_reverse, which can be overridden.

fn XMLDocument.from_file #

fn XMLDocument.from_file(path string) !XMLDocument

XMLDocument.from_file parses an XML document from a file. Note that the file is read in its entirety and then parsed. If the file is too large, try using the XMLDocument.from_reader function instead.

fn XMLDocument.from_reader #

fn XMLDocument.from_reader(mut reader io.Reader) !XMLDocument

XMLDocument.from_reader parses an XML document from a reader. This is the most generic way to parse an XML document from any arbitrary source that implements that io.Reader interface.

fn XMLDocument.from_string #

fn XMLDocument.from_string(raw_contents string) !XMLDocument

XMLDocument.from_string parses an XML document from a string.

type DTDListItem #

type DTDListItem = DTDElement | DTDEntity

type XMLNodeContents #

type XMLNodeContents = XMLCData | XMLComment | XMLNode | string

struct DTDElement #

struct DTDElement {
pub:
	name       string   @[required]
	definition []string @[required]
}

struct DTDEntity #

struct DTDEntity {
pub:
	name  string @[required]
	value string @[required]
}

struct DocumentType #

struct DocumentType {
pub:
	name string  @[required]
	dtd  DTDInfo
}

struct DocumentTypeDefinition #

struct DocumentTypeDefinition {
pub:
	name string
	list []DTDListItem
}

struct EscapeConfig #

@[params]
struct EscapeConfig {
pub:
	reverse_entities map[string]string = xml.default_entities_reverse
}

struct UnescapeConfig #

@[params]
struct UnescapeConfig {
pub:
	entities map[string]string = xml.default_entities
}

struct XMLCData #

struct XMLCData {
pub:
	text string @[required]
}

struct XMLComment #

struct XMLComment {
pub:
	text string @[required]
}

struct XMLDocument #

struct XMLDocument {
	Prolog
pub:
	root XMLNode @[required]
}

XMLDocument is the struct that represents a single XML document. It contains the prolog and the single root node. The prolog struct is embedded into the XMLDocument struct, so that the prolog fields are accessible directly from the this struct. Public prolog fields include version, enccoding, comments preceding the root node, and the document type definition.

fn (XMLDocument) get_element_by_id #

fn (doc XMLDocument) get_element_by_id(id string) ?XMLNode

get_element_by_id returns the first element with the given id, or none if no such element exists.

fn (XMLDocument) get_elements_by_attribute #

fn (doc XMLDocument) get_elements_by_attribute(attribute string, value string) []XMLNode

get_elements_by_attribute returns all elements with the given attribute-value pair. If there are no such elements, an empty array is returned.

fn (XMLDocument) get_elements_by_tag #

fn (doc XMLDocument) get_elements_by_tag(tag string) []XMLNode

get_elements_by_tag returns all elements with the given tag name. If there are no such elements, an empty array is returned.

fn (XMLDocument) pretty_str #

fn (doc XMLDocument) pretty_str(indent string) string

pretty_str returns a pretty-printed version of the XML document. It requires the string used to indent each level of the document.

fn (XMLDocument) str #

fn (doc XMLDocument) str() string

str returns a string representation of the XML document. It uses a 2-space indentation to pretty-print the document.

fn (XMLDocument) validate #

fn (doc XMLDocument) validate() !XMLDocument

validate checks the document is well-formed and valid. It returns a new document with the parsed entities expanded when validation is successful. Otherwise it returns an error.

struct XMLNode #

struct XMLNode {
pub:
	name       string            @[required]
	attributes map[string]string
	children   []XMLNodeContents
}

XMLNode represents a single XML node. It contains the node name, a map of attributes, and a list of children. The children can be other XML nodes, CDATA, plain text, or comments.

fn (XMLNode) get_element_by_id #

fn (node XMLNode) get_element_by_id(id string) ?XMLNode

get_element_by_id returns the first element with the given id, or none if no such element exists in the subtree rooted at this node.

fn (XMLNode) get_elements_by_attribute #

fn (node XMLNode) get_elements_by_attribute(attribute string, value string) []XMLNode

get_elements_by_attribute returns all elements with the given attribute-value pair in the subtree rooted at this node. If there are no such elements, an empty array is returned.

fn (XMLNode) get_elements_by_tag #

fn (node XMLNode) get_elements_by_tag(tag string) []XMLNode

get_elements_by_tag returns all elements with the given tag name in the subtree rooted at this node. If there are no such elements, an empty array is returned.

fn (XMLNode) pretty_str #

fn (node XMLNode) pretty_str(original_indent string, depth int, reverse_entities map[string]string) string

pretty_str returns a pretty-printed version of the XML node. It requires the current indentation the node is at, the depth of the node in the tree, and a map of reverse entities to use when escaping text.