v.token #
Description
v.token
is a module providing the basic building blocks of the V syntax - the tokens, as well as utilities for working with them.
KeywordsMatcherTrie
KeywordsMatcherTrie provides a faster way of determining whether a given name is a reserved word (belongs to a given set of previously known words R
). It works by exploiting the fact, that the set of reserved words is small, and the words short.
KeywordsMatcherTrie uses an ordered set of tries, one per each word length, that was added, so that rejecting that something is a reserved word, can be done in constant time for words smaller or larger in length than all the reserved ones.
After a word w
, is confirmed by this initial check by length n
, that it could belong to a trie Tn
, responsible for all known reserved words of that length, then Tn
is used to further verify or reject the word quickly. In order to do so, Tn
prepares in advance an array of all possible continuations (letters), at each index of the words R
, after any given prefix, belonging to R
.
For example, if we have added the word asm
to the trie T3, its tree (its nodes) may look like this (note that the 0 pointers in children, mean that there was no word in R
, that had that corresponding letter at that specific index):
TrieNode 0: a b c d e f g h i j k l m n o p q r s t u v w x y z ... |
| children: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... | children[`a`] = 1 -> TrieNode 1
| prefix so far: '' | value: 0 |
|
TrieNode 1: a b c d e f g h i j k l m n o p q r s t u v w x y z ... |
| children: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 ... | children[`s`] = 2 -> TrieNode 2
| prefix so far: 'a' | value: 0 |
|
TrieNode 2: a b c d e f g h i j k l m n o p q r s t u v w x y z ... |
| children: 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 ... | children[`m`] = 3 -> TrieNode 3
| prefix so far: 'as' | value: 0 | Note: `as` is a keyword with length 2,
| but we are searching in T3 trie.
|
TrieNode 3: a b c d e f g h i j k l m n o p q r s t u v w x y z ... |
| children: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... | all of children are 0
| prefix so far: 'asm' | value: int(token.Kind.asm) |
Matching any given word
in the trie, after you have prepared it, is then simple: just read each character of the word
, and follow the corresponding pointer from the children
array (indexed by character). When the pointer is nil, there was NO match, and the word is rejected, which happens very often, and early for most words that are not in the set of the previously added reserved words. One significant benefit compared to just comparing the checked word
against a linear list of all known words, is that once you have found that a word is not a match at any given level/trie node, then you know that it is not a match to any of them.
Note: benchmarking shows that it is ~300% to 400% faster, compared to just usingtoken.keywords[name]
on average, when there is a match, but it can be 17x faster in the case, where there is a length mismatch. After changes to KeywordsMatcherTrie, please do v -prod run vlib/v/tests/bench/bench_compare_tokens.v
to verify, that there is no performance regression.
Constants #
const assign_tokens = [Kind.assign, .decl_assign, .plus_assign, .minus_assign, .mult_assign,
.div_assign, .xor_assign, .mod_assign, .or_assign, .and_assign, .right_shift_assign,
.left_shift_assign, .unsigned_right_shift_assign, .boolean_and_assign, .boolean_or_assign]
const valid_at_tokens = ['@VROOT', '@VMODROOT', '@VEXEROOT', '@FN', '@METHOD', '@MOD', '@STRUCT',
'@VEXE', '@FILE', '@LINE', '@COLUMN', '@VHASH', '@VCURRENTHASH', '@VMOD_FILE', '@VMODHASH',
'@FILE_LINE', '@LOCATION', '@BUILD_DATE', '@BUILD_TIME', '@BUILD_TIMESTAMP']
const token_str = build_token_str()
const keywords = build_keys()
const scanner_matcher = new_keywords_matcher_trie[Kind](keywords)
fn assign_op_to_infix_op #
fn assign_op_to_infix_op(op Kind) Kind
fn build_precedences #
fn build_precedences() []Precedence
fn is_decl #
fn is_decl(t Kind) bool
fn is_key #
fn is_key(key string) bool
fn kind_to_string #
fn kind_to_string(k Kind) string
fn new_keywords_matcher_from_array_trie #
fn new_keywords_matcher_from_array_trie(names []string) KeywordsMatcherTrie
new_keywords_matcher_from_array_trie creates a new KeywordsMatcherTrie instance from a given array of strings. The values for the strings, that find
will return, will be the indexes in that array.
fn new_keywords_matcher_trie #
fn new_keywords_matcher_trie[T](kw_map map[string]T) KeywordsMatcherTrie
new_keywords_matcher_trie creates a new KeywordsMatcherTrie instance from a given map with string keys, and integer or enum values.
fn new_trie_node #
fn new_trie_node() &TrieNode
new_trie_node creates a new TrieNode instance
fn KeywordsMatcherTrie.new #
fn KeywordsMatcherTrie.new(cap int) KeywordsMatcherTrie
enum AtKind #
enum AtKind {
unknown
fn_name
method_name
mod_name
struct_name
vexe_path
file_path
line_nr
column_nr
vhash
v_current_hash
vmod_file
vmodroot_path
vmod_hash
vroot_path // obsolete
vexeroot_path
file_path_line_nr
location
build_date
build_time
build_timestamp
}
@FN => will be substituted with the name of the current V function @METHOD => will be substituted with ReceiverType.MethodName @MOD => will be substituted with the name of the current V module @STRUCT => will be substituted with the name of the current V struct @FILE => will be substituted with the path of the V source file @LINE => will be substituted with the V line number where it appears (as a string). @COLUMN => will be substituted with the column where it appears (as a string). @VHASH => will be substituted with the shortened commit hash of the V compiler (as a string). @VMOD_FILE => will be substituted with the contents of the nearest v.mod file (as a string). @VMODROOT => will be substituted with the folder where the nearest v.mod file is (as a string). @VEXE => will be substituted with the path to the V compiler @VEXEROOT => will be substituted with the folder where the V executable is (as a string). @VROOT => the old name for @VMODROOT; sometimes it was used as @VEXEROOT;
Note: @VROOT is now deprecated, use either @VMODROOT or @VEXEROOT instead.
Note: @VEXEROOT & @VMODROOT are used for compilation options like this: #include "@VMODROOT/include/abc.h" #flag -L@VEXEROOT/thirdparty/libgc
The @XYZ tokens allow for code like this: println( 'file: ' + @FILE + ' | line: ' + @LINE + ' | fn: ' + @MOD + '.' + @FN) ... which is useful while debugging/tracing.
@
enum Kind #
enum Kind {
unknown
eof
name // user
number // 123
string // 'foo'
str_inter // 'name=$user.name'
chartoken // `A` - rune
plus // +
minus // -
mul // *
div // /
mod // %
xor // ^
pipe // |
inc // ++
dec // --
and // &&
logical_or // ||
not // !
bit_not // ~
question // ?
comma // ,
semicolon // ;
colon // :
arrow // <-
amp // &
hash // #
dollar // $
at // @
str_dollar
left_shift // <<
right_shift // >>
unsigned_right_shift // >>>
not_in // !in
not_is // !is
assign // =
decl_assign // :=
plus_assign // +=
minus_assign // -=
div_assign // /=
mult_assign // *=
xor_assign // ^=
mod_assign // %=
or_assign // |=
and_assign // &=
right_shift_assign // <<=
left_shift_assign // >>=
unsigned_right_shift_assign // >>>=
boolean_and_assign // &&=
boolean_or_assign // ||=
lcbr // {
rcbr // }
lpar // (
rpar // )
lsbr // [
nilsbr // #[
rsbr // ]
eq // ==
ne // !=
gt // >
lt // <
ge // >=
le // <=
comment
nl
dot // .
dotdot // ..
ellipsis // ...
keyword_beg
key_as
key_asm
key_assert
key_atomic
key_break
key_const
key_continue
key_defer
key_else
key_enum
key_false
key_for
key_fn
key_global
key_go
key_goto
key_if
key_import
key_in
key_interface
key_is
key_match
key_module
key_mut
key_nil
key_shared
key_lock
key_rlock
key_none
key_return
key_select
key_like
key_ilike
key_sizeof
key_isreftype
key_likely
key_unlikely
key_offsetof
key_struct
key_true
key_type
key_typeof
key_dump
key_orelse
key_union
key_pub
key_static
key_volatile
key_unsafe
key_spawn
key_implements
keyword_end
_end_
}
fn (Kind) is_assign #
fn (t Kind) is_assign() bool
fn (Kind) str #
fn (t Kind) str() string
Note: used for some code generation, so no quoting
fn (Kind) precedence #
fn (kind Kind) precedence() int
precedence returns the precedence of the given token kind
if defined, otherwise 0
fn (Kind) is_relational #
fn (tok Kind) is_relational() bool
fn (Kind) is_start_of_type #
fn (k Kind) is_start_of_type() bool
fn (Kind) is_prefix #
fn (kind Kind) is_prefix() bool
fn (Kind) is_infix #
fn (kind Kind) is_infix() bool
fn (Kind) is_postfix #
fn (kind Kind) is_postfix() bool
enum Precedence #
enum Precedence {
lowest
cond // OR or AND
in_as
assign // =
eq // == or !=
// less_greater // > or <
sum // + - | ^
product // * / << >> >>> &
// mod // %
prefix // -X or !X; TODO: seems unused
postfix // ++ or --
call // func(X) or foo.method(X)
index // array[index], map[key]
highest
}
Representation of highest and lowest precedence
struct KeywordsMatcherTrie #
struct KeywordsMatcherTrie {
pub mut:
nodes []&TrieNode
min_len int = 999999
max_len int
}
KeywordsMatcherTrie provides a faster way of determining whether a given name is a reserved word (belongs to a given set of previously known words R
). See the module description for more details.
fn (KeywordsMatcherTrie) find #
fn (km &KeywordsMatcherTrie) find(word string) int
find tries to find the given word
in the set of all previously added words to the KeywordsMatcherTrie instance. It returns -1 if the word was NOT found there at all. If the word was found, find will return the value
(value => 0), associated with the word, when it was added.
fn (KeywordsMatcherTrie) matches #
fn (km &KeywordsMatcherTrie) matches(word string) bool
matches returns true when the word was already added, i.e. when it was found.
fn (KeywordsMatcherTrie) add_word #
fn (mut km KeywordsMatcherTrie) add_word(word string, value int)
add_word adds the given word to the KeywordsMatcherTrie instance. It associates a non negative integer value to it, so later find
could return the value, when it succeeds.
struct Pos #
struct Pos {
pub:
len int // length of the literal in the source
line_nr int // the line number in the source where the token occurred
pos int // the position of the token in scanner text
col int // the column in the source where the token occurred
pub mut:
last_line int // the line number where the ast object ends (used by vfmt)
}
fn (Pos) free #
fn (mut p Pos) free()
fn (Pos) line_str #
fn (p Pos) line_str() string
fn (Pos) extend #
fn (pos Pos) extend(end Pos) Pos
fn (Pos) extend_with_last_line #
fn (pos Pos) extend_with_last_line(end Pos, last_line int) Pos
fn (Pos) update_last_line #
fn (mut pos Pos) update_last_line(last_line int)
struct Token #
struct Token {
pub:
kind Kind // the token number/enum; for quick comparisons
lit string // literal representation of the token
line_nr int // the line number in the source where the token occurred
col int // the column in the source where the token occurred
// name_idx int // name table index for O(1) lookup
pos int // the position of the token in scanner text
len int // length of the literal
tidx int // the index of the token
}
fn (Token) debug #
fn (t Token) debug() string
fn (Token) is_next_to #
fn (t Token) is_next_to(pre_token Token) bool
fn (Token) is_scalar #
fn (tok Token) is_scalar() bool
is_scalar returns true if the token is a scalar
fn (Token) is_unary #
fn (tok Token) is_unary() bool
is_unary returns true if the token can be in a unary expression
fn (Token) pos #
fn (tok &Token) pos() Pos
fn (Token) precedence #
fn (tok Token) precedence() int
precedence returns a tokens precedence if defined, otherwise 0
fn (Token) str #
fn (t Token) str() string
struct TrieNode #
struct TrieNode {
pub mut:
children [123]&TrieNode
value int = -1 // when != -1, it is a leaf node representing a match
}
TrieNode is a single node from a trie, used by KeywordsMatcherTrie
fn (TrieNode) show #
fn (node &TrieNode) show(level int)
show displays the information in node
, in a more compact/readable format (recursively)
fn (TrieNode) add_word #
fn (mut node TrieNode) add_word(word string, value int, word_idx int)
add_word adds another word
and value
pair into the trie, starting from node
(recursively). word_idx
is just used as an accumulator, and starts from 0 at the root of the tree.
fn (TrieNode) find #
fn (root &TrieNode) find(word string) int
find tries to find a match for word
to the trie (the set of all previously added words). It returns -1 if there is no match, or the value associated with the previously added matching word by add_word
.
- README
- Constants
- fn assign_op_to_infix_op
- fn build_precedences
- fn is_decl
- fn is_key
- fn kind_to_string
- fn new_keywords_matcher_from_array_trie
- fn new_keywords_matcher_trie
- fn new_trie_node
- fn KeywordsMatcherTrie.new
- enum AtKind
- enum Kind
- enum Precedence
- struct KeywordsMatcherTrie
- struct Pos
- struct Token
- struct TrieNode