Skip to content

v.token #

Description

v.token is a module providing the basic building blocks of the V syntax - the tokens, as well as utilities for working with them.

KeywordsMatcherTrie

KeywordsMatcherTrie provides a faster way of determining whether a given name is a reserved word (belongs to a given set of previously known words R). It works by exploiting the fact, that the set of reserved words is small, and the words short.

KeywordsMatcherTrie uses an ordered set of tries, one per each word length, that was added, so that rejecting that something is a reserved word, can be done in constant time for words smaller or larger in length than all the reserved ones.

After a word w, is confirmed by this initial check by length n, that it could belong to a trie Tn, responsible for all known reserved words of that length, then Tn is used to further verify or reject the word quickly. In order to do so, Tn prepares in advance an array of all possible continuations (letters), at each index of the words R, after any given prefix, belonging to R.

For example, if we have added the word asm to the trie T3, its tree (its nodes) may look like this (note that the 0 pointers in children, mean that there was no word in R, that had that corresponding letter at that specific index):

TrieNode 0:  a b c d e f g h i j k l m n o p q r s t u v w x y z ... |
| children:  1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... | children[`a`] = 1 -> TrieNode 1
|   prefix so far: ''    | value: 0                                  |
|
TrieNode 1:  a b c d e f g h i j k l m n o p q r s t u v w x y z ... |
| children:  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 ... | children[`s`] = 2 -> TrieNode 2
|   prefix so far: 'a'   | value: 0                                  |
|
TrieNode 2:  a b c d e f g h i j k l m n o p q r s t u v w x y z ... |
| children:  0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 ... | children[`m`] = 3 -> TrieNode 3
|   prefix so far: 'as'  | value: 0                                  | Note: `as` is a keyword with length 2,
|                                                                      but we are searching in T3 trie.
|
TrieNode 3:  a b c d e f g h i j k l m n o p q r s t u v w x y z ... |
| children:  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... | all of children are 0
|   prefix so far: 'asm' | value: int(token.Kind.asm)                |

Matching any given word in the trie, after you have prepared it, is then simple: just read each character of the word, and follow the corresponding pointer from the children array (indexed by character). When the pointer is nil, there was NO match, and the word is rejected, which happens very often, and early for most words that are not in the set of the previously added reserved words. One significant benefit compared to just comparing the checked word against a linear list of all known words, is that once you have found that a word is not a match at any given level/trie node, then you know that it is not a match to any of them.

Note: benchmarking shows that it is ~300% to 400% faster, compared to just usingtoken.keywords[name] on average, when there is a match, but it can be 17x faster in the case, where there is a length mismatch. After changes to KeywordsMatcherTrie, please do v -prod run vlib/v/tests/bench/bench_compare_tokens.v to verify, that there is no performance regression.

Constants #

const assign_tokens = [Kind.assign, .decl_assign, .plus_assign, .minus_assign, .mult_assign,
	.div_assign, .xor_assign, .mod_assign, .or_assign, .and_assign, .right_shift_assign,
	.left_shift_assign, .unsigned_right_shift_assign, .boolean_and_assign, .boolean_or_assign]
const valid_at_tokens = ['@VROOT', '@VMODROOT', '@VEXEROOT', '@FN', '@METHOD', '@MOD', '@STRUCT',
	'@VEXE', '@FILE', '@LINE', '@COLUMN', '@VHASH', '@VCURRENTHASH', '@VMOD_FILE', '@VMODHASH',
	'@FILE_LINE', '@LOCATION', '@BUILD_DATE', '@BUILD_TIME', '@BUILD_TIMESTAMP']
const token_str = build_token_str()
const keywords = build_keys()
const scanner_matcher = new_keywords_matcher_trie[Kind](keywords)

fn assign_op_to_infix_op #

fn assign_op_to_infix_op(op Kind) Kind

fn build_precedences #

fn build_precedences() []Precedence

fn is_decl #

fn is_decl(t Kind) bool

fn is_key #

fn is_key(key string) bool

fn kind_from_string #

fn kind_from_string(s string) !Kind

fn kind_to_string #

fn kind_to_string(k Kind) string

fn new_keywords_matcher_from_array_trie #

fn new_keywords_matcher_from_array_trie(names []string) KeywordsMatcherTrie

new_keywords_matcher_from_array_trie creates a new KeywordsMatcherTrie instance from a given array of strings. The values for the strings, that find will return, will be the indexes in that array.

fn new_keywords_matcher_trie #

fn new_keywords_matcher_trie[T](kw_map map[string]T) KeywordsMatcherTrie

new_keywords_matcher_trie creates a new KeywordsMatcherTrie instance from a given map with string keys, and integer or enum values.

fn new_trie_node #

fn new_trie_node() &TrieNode

new_trie_node creates a new TrieNode instance

fn KeywordsMatcherTrie.new #

fn KeywordsMatcherTrie.new(cap int) KeywordsMatcherTrie

enum AtKind #

enum AtKind {
	unknown
	fn_name
	method_name
	mod_name
	struct_name
	vexe_path
	file_path
	line_nr
	column_nr
	vhash
	v_current_hash
	vmod_file
	vmodroot_path
	vmod_hash
	vroot_path // obsolete
	vexeroot_path
	file_path_line_nr
	location
	build_date
	build_time
	build_timestamp
}

@FN => will be substituted with the name of the current V function @METHOD => will be substituted with ReceiverType.MethodName @MOD => will be substituted with the name of the current V module @STRUCT => will be substituted with the name of the current V struct @FILE => will be substituted with the path of the V source file @LINE => will be substituted with the V line number where it appears (as a string). @COLUMN => will be substituted with the column where it appears (as a string). @VHASH => will be substituted with the shortened commit hash of the V compiler (as a string). @VMOD_FILE => will be substituted with the contents of the nearest v.mod file (as a string). @VMODROOT => will be substituted with the folder where the nearest v.mod file is (as a string). @VEXE => will be substituted with the path to the V compiler @VEXEROOT => will be substituted with the folder where the V executable is (as a string). @VROOT => the old name for @VMODROOT; sometimes it was used as @VEXEROOT;

Note: @VROOT is now deprecated, use either @VMODROOT or @VEXEROOT instead.

Note: @VEXEROOT & @VMODROOT are used for compilation options like this: #include "@VMODROOT/include/abc.h" #flag -L@VEXEROOT/thirdparty/libgc

The @XYZ tokens allow for code like this: println( 'file: ' + @FILE + ' | line: ' + @LINE + ' | fn: ' + @MOD + '.' + @FN) ... which is useful while debugging/tracing.

@ is allowed for keyword variable names. E.g. 'type'

enum Kind #

enum Kind {
	unknown
	eof
	name       // user
	number     // 123
	string     // 'foo'
	str_inter  // 'name=$user.name'
	chartoken  // `A` - rune
	plus       // +
	minus      // -
	mul        // *
	div        // /
	mod        // %
	xor        // ^
	pipe       // |
	inc        // ++
	dec        // --
	and        // &&
	logical_or // ||
	not        // !
	bit_not    // ~
	question   // ?
	comma      // ,
	semicolon  // ;
	colon      // :
	arrow      // <-
	amp        // &
	hash       // #
	dollar     // $
	at         // @
	str_dollar
	left_shift                  // <<
	right_shift                 // >>
	unsigned_right_shift        // >>>
	not_in                      // !in
	not_is                      // !is
	assign                      // =
	decl_assign                 // :=
	plus_assign                 // +=
	minus_assign                // -=
	div_assign                  // /=
	mult_assign                 // *=
	xor_assign                  // ^=
	mod_assign                  // %=
	or_assign                   // |=
	and_assign                  // &=
	right_shift_assign          // <<=
	left_shift_assign           // >>=
	unsigned_right_shift_assign // >>>=
	boolean_and_assign          // &&=
	boolean_or_assign           // ||=
	lcbr                        // {
	rcbr                        // }
	lpar                        // (
	rpar                        // )
	lsbr                        // [
	nilsbr                      // #[
	rsbr                        // ]
	eq // ==
	ne // !=
	gt // >
	lt // <
	ge // >=
	le // <=
	comment
	nl
	dot      // .
	dotdot   // ..
	ellipsis // ...
	keyword_beg
	key_as
	key_asm
	key_assert
	key_atomic
	key_break
	key_const
	key_continue
	key_defer
	key_else
	key_enum
	key_false
	key_for
	key_fn
	key_global
	key_go
	key_goto
	key_if
	key_import
	key_in
	key_interface
	key_is
	key_match
	key_module
	key_mut
	key_nil
	key_shared
	key_lock
	key_rlock
	key_none
	key_return
	key_select
	key_like
	key_ilike
	key_sizeof
	key_isreftype
	key_likely
	key_unlikely
	key_offsetof
	key_struct
	key_true
	key_type
	key_typeof
	key_dump
	key_orelse
	key_union
	key_pub
	key_static
	key_volatile
	key_unsafe
	key_spawn
	key_implements
	keyword_end
	_end_
}

fn (Kind) is_assign #

fn (t Kind) is_assign() bool

fn (Kind) str #

fn (t Kind) str() string

Note: used for some code generation, so no quoting

fn (Kind) precedence #

fn (kind Kind) precedence() int

precedence returns the precedence of the given token kind if defined, otherwise 0

fn (Kind) is_relational #

fn (tok Kind) is_relational() bool

fn (Kind) is_start_of_type #

fn (k Kind) is_start_of_type() bool

fn (Kind) is_prefix #

fn (kind Kind) is_prefix() bool

fn (Kind) is_infix #

fn (kind Kind) is_infix() bool

fn (Kind) is_postfix #

fn (kind Kind) is_postfix() bool

enum Precedence #

enum Precedence {
	lowest
	cond // OR or AND
	in_as
	assign // =
	eq     // == or !=
	// less_greater // > or <
	sum     // + - | ^
	product // * / << >> >>> &
	// mod // %
	prefix  // -X or !X; TODO: seems unused
	postfix // ++ or --
	call    // func(X) or foo.method(X)
	index   // array[index], map[key]
	highest
}

Representation of highest and lowest precedence

struct KeywordsMatcherTrie #

@[heap]
struct KeywordsMatcherTrie {
pub mut:
	nodes   []&TrieNode
	min_len int = 999999
	max_len int
}

KeywordsMatcherTrie provides a faster way of determining whether a given name is a reserved word (belongs to a given set of previously known words R). See the module description for more details.

fn (KeywordsMatcherTrie) find #

fn (km &KeywordsMatcherTrie) find(word string) int

find tries to find the given word in the set of all previously added words to the KeywordsMatcherTrie instance. It returns -1 if the word was NOT found there at all. If the word was found, find will return the value (value => 0), associated with the word, when it was added.

fn (KeywordsMatcherTrie) matches #

fn (km &KeywordsMatcherTrie) matches(word string) bool

matches returns true when the word was already added, i.e. when it was found.

fn (KeywordsMatcherTrie) add_word #

fn (mut km KeywordsMatcherTrie) add_word(word string, value int)

add_word adds the given word to the KeywordsMatcherTrie instance. It associates a non negative integer value to it, so later find could return the value, when it succeeds.

struct Pos #

struct Pos {
pub:
	len     int // length of the literal in the source
	line_nr int // the line number in the source where the token occurred
	pos     int // the position of the token in scanner text
	col     int // the column in the source where the token occurred
pub mut:
	last_line int // the line number where the ast object ends (used by vfmt)
}

fn (Pos) free #

unsafe
fn (mut p Pos) free()

fn (Pos) line_str #

fn (p Pos) line_str() string

fn (Pos) extend #

fn (pos Pos) extend(end Pos) Pos

fn (Pos) extend_with_last_line #

fn (pos Pos) extend_with_last_line(end Pos, last_line int) Pos

fn (Pos) update_last_line #

fn (mut pos Pos) update_last_line(last_line int)

struct Token #

@[minify]
struct Token {
pub:
	kind    Kind   // the token number/enum; for quick comparisons
	lit     string // literal representation of the token
	line_nr int    // the line number in the source where the token occurred
	col     int    // the column in the source where the token occurred
	// name_idx int // name table index for O(1) lookup
	pos  int // the position of the token in scanner text
	len  int // length of the literal
	tidx int // the index of the token
}

fn (Token) debug #

fn (t Token) debug() string

fn (Token) is_next_to #

fn (t Token) is_next_to(pre_token Token) bool

fn (Token) is_scalar #

fn (tok Token) is_scalar() bool

is_scalar returns true if the token is a scalar

fn (Token) is_unary #

fn (tok Token) is_unary() bool

is_unary returns true if the token can be in a unary expression

fn (Token) pos #

fn (tok &Token) pos() Pos

fn (Token) precedence #

fn (tok Token) precedence() int

precedence returns a tokens precedence if defined, otherwise 0

fn (Token) str #

fn (t Token) str() string

struct TrieNode #

struct TrieNode {
pub mut:
	children [123]&TrieNode
	value    int = -1 // when != -1, it is a leaf node representing a match
}

TrieNode is a single node from a trie, used by KeywordsMatcherTrie

fn (TrieNode) show #

fn (node &TrieNode) show(level int)

show displays the information in node, in a more compact/readable format (recursively)

fn (TrieNode) add_word #

fn (mut node TrieNode) add_word(word string, value int, word_idx int)

add_word adds another word and value pair into the trie, starting from node (recursively). word_idx is just used as an accumulator, and starts from 0 at the root of the tree.

fn (TrieNode) find #

fn (root &TrieNode) find(word string) int

find tries to find a match for word to the trie (the set of all previously added words). It returns -1 if there is no match, or the value associated with the previously added matching word by add_word.