encoding.xml #
Description
xml
is a module to parse XML documents into a tree structure. It also supports validation of XML documents against a DTD.
Note that this is not a streaming XML parser. It reads the entire document into memory and then parses it. This is not a problem for small documents, but it might be a problem for extremely large documents (several hundred megabytes or more).
The public function parse_single_node
can be used to parse a single node from an implementation of io.Reader
, which can help parse large XML documents on an element-by-element basis. Sample usage is provided in the parser_test.v
file.
Usage
Parsing XML Files
There are three different ways to parse an XML Document:
- Pass the entire XML document as a string to
XMLDocument.from_string
. - Specify a file path to
XMLDocument.from_file
. - Use a source that implements
io.Reader
and pass it toXMLDocument.from_reader
.
import encoding.xml
//...
doc := xml.XMLDocument.from_file('test/sample.xml')!
Validating XML Documents
Simply call validate
on the parsed XML document.
Querying
Check the get_element...
methods defined on the XMLDocument struct.
Escaping and Un-escaping XML Entities
When the validate
method is called, the XML document is parsed and all text nodes are un-escaped. This means that the text nodes will contain the actual text and not the escaped version of the text.
When the XML document is serialized (using str
or pretty_str
), all text nodes are escaped.
The escaping and un-escaping can also be done manually using the escape_text
and unescape_text
methods.
Constants #
const default_entities = {
'lt': '<'
'gt': '>'
'amp': '&'
'apos': '''
'quot': '"'
}
const default_entities_reverse = {
'<': 'lt'
'>': 'gt'
'&': 'amp'
''': 'apos'
'"': 'quot'
}
fn escape_text #
fn escape_text(content string, config EscapeConfig) string
escape_text replaces all entities in the given string with their respective XML entity strings. See default_entities, which can be overridden.
fn parse_single_node #
fn parse_single_node(first_char u8, mut reader io.Reader) !XMLNode
parse_single_node parses a single XML node from the reader. The first character of the tag is passed in as the first_char parameter. This function is meant to assist in parsing nested nodes one at a time. Using this function as opposed to the recommended static functions makes it easier to parse smaller nodes in extremely large XML documents without running out of memory.
fn unescape_text #
fn unescape_text(content string, config UnescapeConfig) !string
unescape_text replaces all entities in the given string with their respective original characters or strings. See default_entities_reverse, which can be overridden.
fn XMLDocument.from_file #
fn XMLDocument.from_file(path string) !XMLDocument
XMLDocument.from_file parses an XML document from a file. Note that the file is read in its entirety and then parsed. If the file is too large, try using the XMLDocument.from_reader function instead.
fn XMLDocument.from_reader #
fn XMLDocument.from_reader(mut reader io.Reader) !XMLDocument
XMLDocument.from_reader parses an XML document from a reader. This is the most generic way to parse an XML document from any arbitrary source that implements that io.Reader interface.
fn XMLDocument.from_string #
fn XMLDocument.from_string(raw_contents string) !XMLDocument
XMLDocument.from_string parses an XML document from a string.
type DTDListItem #
type DTDListItem = DTDElement | DTDEntity
type XMLNodeContents #
type XMLNodeContents = XMLCData | XMLComment | XMLNode | string
struct DTDElement #
struct DTDElement {
pub:
name string @[required]
definition []string @[required]
}
struct DTDEntity #
struct DTDEntity {
pub:
name string @[required]
value string @[required]
}
struct DocumentType #
struct DocumentType {
pub:
name string @[required]
dtd DTDInfo
}
struct DocumentTypeDefinition #
struct DocumentTypeDefinition {
pub:
name string
list []DTDListItem
}
struct EscapeConfig #
struct EscapeConfig {
pub:
reverse_entities map[string]string = default_entities_reverse
}
struct UnescapeConfig #
struct UnescapeConfig {
pub:
entities map[string]string = default_entities
}
struct XMLCData #
struct XMLCData {
pub:
text string @[required]
}
struct XMLComment #
struct XMLComment {
pub:
text string @[required]
}
struct XMLDocument #
struct XMLDocument {
Prolog
pub:
root XMLNode @[required]
}
XMLDocument is the struct that represents a single XML document. It contains the prolog and the single root node. The prolog struct is embedded into the XMLDocument struct, so that the prolog fields are accessible directly from the this struct. Public prolog fields include version, enccoding, comments preceding the root node, and the document type definition.
fn (XMLDocument) get_element_by_id #
fn (doc XMLDocument) get_element_by_id(id string) ?XMLNode
get_element_by_id returns the first element with the given id, or none if no such element exists.
fn (XMLDocument) get_elements_by_attribute #
fn (doc XMLDocument) get_elements_by_attribute(attribute string, value string) []XMLNode
get_elements_by_attribute returns all elements with the given attribute-value pair. If there are no such elements, an empty array is returned.
fn (XMLDocument) get_elements_by_tag #
fn (doc XMLDocument) get_elements_by_tag(tag string) []XMLNode
get_elements_by_tag returns all elements with the given tag name. If there are no such elements, an empty array is returned.
fn (XMLDocument) pretty_str #
fn (doc XMLDocument) pretty_str(indent string) string
pretty_str returns a pretty-printed version of the XML document. It requires the string used to indent each level of the document.
fn (XMLDocument) str #
fn (doc XMLDocument) str() string
str returns a string representation of the XML document. It uses a 2-space indentation to pretty-print the document.
fn (XMLDocument) validate #
fn (doc XMLDocument) validate() !XMLDocument
validate checks the document is well-formed and valid. It returns a new document with the parsed entities expanded when validation is successful. Otherwise it returns an error.
struct XMLNode #
struct XMLNode {
pub:
name string @[required]
attributes map[string]string
children []XMLNodeContents
}
XMLNode represents a single XML node. It contains the node name, a map of attributes, and a list of children. The children can be other XML nodes, CDATA, plain text, or comments.
fn (XMLNode) get_element_by_id #
fn (node XMLNode) get_element_by_id(id string) ?XMLNode
get_element_by_id returns the first element with the given id, or none if no such element exists in the subtree rooted at this node.
fn (XMLNode) get_elements_by_attribute #
fn (node XMLNode) get_elements_by_attribute(attribute string, value string) []XMLNode
get_elements_by_attribute returns all elements with the given attribute-value pair in the subtree rooted at this node. If there are no such elements, an empty array is returned.
fn (XMLNode) get_elements_by_tag #
fn (node XMLNode) get_elements_by_tag(tag string) []XMLNode
get_elements_by_tag returns all elements with the given tag name in the subtree rooted at this node. If there are no such elements, an empty array is returned.
fn (XMLNode) pretty_str #
fn (node XMLNode) pretty_str(original_indent string, depth int, reverse_entities map[string]string) string
pretty_str returns a pretty-printed version of the XML node. It requires the current indentation the node is at, the depth of the node in the tree, and a map of reverse entities to use when escaping text.
- README
- Constants
- fn escape_text
- fn parse_single_node
- fn unescape_text
- fn XMLDocument.from_file
- fn XMLDocument.from_reader
- fn XMLDocument.from_string
- type DTDListItem
- type XMLNodeContents
- struct DTDElement
- struct DTDEntity
- struct DocumentType
- struct DocumentTypeDefinition
- struct EscapeConfig
- struct UnescapeConfig
- struct XMLCData
- struct XMLComment
- struct XMLDocument
- struct XMLNode