strings #
Description
strings
provides utilities for efficiently processing large strings.
If you got here looking for methods available on the string
struct, those methods are found in the builtin
module.
fn dice_coefficient #
fn dice_coefficient(s1 string, s2 string) f32
dice_coefficient implements the Sørensen–Dice coefficient. It finds the similarity between two strings, and returns a coefficient between 0.0 (not similar) and 1.0 (exact match).
fn find_between_pair_rune #
fn find_between_pair_rune(input string, start rune, end rune) string
find_between_pair_rune returns the string found between the pair of marks defined by start
and end
. As opposed to the find_between
, all_after*
, all_before*
methods defined on the string
type, this function can extract content between nested marks in input
. If start
and end
marks are nested in input
, the characters between the outermost mark pair is returned. It is expected that start
and end
marks are balanced, meaning that the amount of start
marks equal the amount of end
marks in the input
. An empty string is returned otherwise. Using two identical marks as start
and end
results in undefined output behavior. find_between_pair_rune is inbetween the fastest and slowest in the find_between_pair_* family of functions.
Examples
assert strings.find_between_pair_rune('(V) (NOT V)',`(`,`)`) == 'V'
assert strings.find_between_pair_rune('s {X{Y}} s',`{`,`}`) == 'X{Y}'
fn find_between_pair_string #
fn find_between_pair_string(input string, start string, end string) string
find_between_pair_string returns the string found between the pair of marks defined by start
and end
. As opposed to the find_between
, all_after*
, all_before*
methods defined on the string
type, this function can extract content between nested marks in input
. If start
and end
marks are nested in input
, the characters between the outermost mark pair is returned. It is expected that start
and end
marks are balanced, meaning that the amount of start
marks equal the amount of end
marks in the input
. An empty string is returned otherwise. Using two identical marks as start
and end
results in undefined output behavior. find_between_pair_string is the slowest in the find_between_pair_* function family.
Examples
assert strings.find_between_pair_string('/*V*/ /*NOT V*/','/*','*/') == 'V'
assert strings.find_between_pair_string('s {{X{{Y}}}} s','{{','}}') == 'X{{Y}}'
fn find_between_pair_u8 #
fn find_between_pair_u8(input string, start u8, end u8) string
find_between_pair_byte returns the string found between the pair of marks defined by start
and end
. As opposed to the find_between
, all_after*
, all_before*
methods defined on the string
type, this function can extract content between nested marks in input
. If start
and end
marks are nested in input
, the characters between the outermost mark pair is returned. It is expected that start
and end
marks are balanced, meaning that the amount of start
marks equal the amount of end
marks in the input
. An empty string is returned otherwise. Using two identical marks as start
and end
results in undefined output behavior. find_between_pair_byte is the fastest in the find_between_pair_* family of functions.
Examples
assert strings.find_between_pair_u8('(V) (NOT V)',`(`,`)`) == 'V'
assert strings.find_between_pair_u8('s {X{Y}} s',`{`,`}`) == 'X{Y}'
fn hamming_distance #
fn hamming_distance(a string, b string) int
hamming_distance uses the Hamming Distance algorithm to calculate the distance between two strings a
and b
(lower is closer).
fn hamming_similarity #
fn hamming_similarity(a string, b string) f32
hamming_similarity uses the Hamming Distance algorithm to calculate the distance between two strings a
and b
. It returns a coefficient between 0.0 (not similar) and 1.0 (exact match).
fn jaro_similarity #
fn jaro_similarity(a string, b string) f64
jaro_similarity uses the Jaro Distance algorithm to calculate the distance between two strings a
and b
. It returns a coefficient between 0.0 (not similar) and 1.0 (exact match).
fn jaro_winkler_similarity #
fn jaro_winkler_similarity(a string, b string) f64
jaro_winkler_similarity uses the Jaro Winkler Distance algorithm to calculate the distance between two strings a
and b
. It returns a coefficient between 0.0 (not similar) and 1.0 (exact match). The scaling factor(p=0.1
) in Jaro-Winkler gives higher weight to prefix similarities, making it especially effective for cases where slight misspellings or prefixes are common.
fn levenshtein_distance #
fn levenshtein_distance(a string, b string) int
levenshtein_distance uses the Levenshtein Distance algorithm to calculate the distance between between two strings a
and b
(lower is closer).
fn levenshtein_distance_percentage #
fn levenshtein_distance_percentage(a string, b string) f32
levenshtein_distance_percentage uses the Levenshtein Distance algorithm to calculate how similar two strings are as a percentage (higher is closer).
fn new_builder #
fn new_builder(initial_size int) Builder
new_builder returns a new string builder, with an initial capacity of initial_size
fn repeat #
fn repeat(c u8, n int) string
strings.repeat - fill a string with n
repetitions of the character c
fn repeat_string #
fn repeat_string(s string, n int) string
strings.repeat_string - gives you n
repetitions of the substring s
Note: strings.repeat, that repeats a single byte, is between 2x and 24x faster than strings.repeat_string called for a 1 char string.
fn split_capital #
fn split_capital(s string) []string
split_capital returns an array containing the contents of s
split by capital letters.
Examples
assert strings.split_capital('XYZ') == ['X', 'Y', 'Z']
assert strings.split_capital('XYStar') == ['X', 'Y', 'Star']
type Builder #
type Builder = []u8
strings.Builder is used to efficiently append many strings to a large dynamically growing buffer, then use the resulting large string. Using a string builder is much better for performance/memory usage than doing constantly string concatenation.
fn (Builder) reuse_as_plain_u8_array #
fn (mut b Builder) reuse_as_plain_u8_array() []u8
reuse_as_plain_u8_array allows using the Builder instance as a plain []u8 return value. It is useful, when you have accumulated data in the builder, that you want to pass/access as []u8 later, without copying or freeing the buffer. NB: you should NOT use the string builder instance after calling this method. Use only the return value after calling this method.
fn (Builder) write_ptr #
fn (mut b Builder) write_ptr(ptr &u8, len int)
write_ptr writes len
bytes provided byteptr to the accumulated buffer
fn (Builder) write_rune #
fn (mut b Builder) write_rune(r rune)
write_rune appends a single rune to the accumulated buffer
fn (Builder) write_runes #
fn (mut b Builder) write_runes(runes []rune)
write_runes appends all the given runes to the accumulated buffer
fn (Builder) clear #
fn (mut b Builder) clear()
clear clears the buffer contents
fn (Builder) write_u8 #
fn (mut b Builder) write_u8(data u8)
write_u8 appends a single data
byte to the accumulated buffer
fn (Builder) write_byte #
fn (mut b Builder) write_byte(data u8)
write_byte appends a single data
byte to the accumulated buffer
fn (Builder) write_decimal #
fn (mut b Builder) write_decimal(n i64)
write_decimal appends a decimal representation of the number n
into the builder b
, without dynamic allocation. The higher order digits come first, i.e. 6123 will be written with the digit 6
first, then 1
, then 2
and 3
last.
fn (Builder) write #
fn (mut b Builder) write(data []u8) !int
write implements the io.Writer interface, that is why it it returns how many bytes were written to the string builder.
fn (Builder) drain_builder #
fn (mut b Builder) drain_builder(mut other Builder, other_new_cap int)
drain_builder writes all of the other
builder content, then re-initialises other
, so that the other
strings builder is ready to receive new content.
fn (Builder) byte_at #
fn (b &Builder) byte_at(n int) u8
byte_at returns a byte, located at a given index i
.
Note: it can panic, if there are not enough bytes in the strings builder yet.
fn (Builder) write_string #
fn (mut b Builder) write_string(s string)
write appends the string s
to the buffer
fn (Builder) write_string2 #
fn (mut b Builder) write_string2(s1 string, s2 string)
write_string2 appends the strings s1
and s2
to the buffer
fn (Builder) go_back #
fn (mut b Builder) go_back(n int)
go_back discards the last n
bytes from the buffer
fn (Builder) spart #
fn (b &Builder) spart(start_pos int, n int) string
spart returns a part of the buffer as a string
fn (Builder) cut_last #
fn (mut b Builder) cut_last(n int) string
cut_last cuts the last n
bytes from the buffer and returns them
fn (Builder) cut_to #
fn (mut b Builder) cut_to(pos int) string
cut_to cuts the string after pos
and returns it. if pos
is superior to builder length, returns an empty string and cancel further operations
fn (Builder) go_back_to #
fn (mut b Builder) go_back_to(pos int)
go_back_to resets the buffer to the given position pos
Note: pos should be < than the existing buffer length.
fn (Builder) writeln #
fn (mut b Builder) writeln(s string)
writeln appends the string s
, and then a newline character.
fn (Builder) writeln2 #
fn (mut b Builder) writeln2(s1 string, s2 string)
writeln2 appends two strings: s1
+ \n
, and s2
+ \n
, to the buffer.
fn (Builder) last_n #
fn (b &Builder) last_n(n int) string
last_n(5) returns 'world' buf == 'hello world'
fn (Builder) after #
fn (b &Builder) after(n int) string
after(6) returns 'world' buf == 'hello world'
fn (Builder) str #
fn (mut b Builder) str() string
str returns a copy of all of the accumulated buffer content.
Note: after a call to b.str(), the builder b will be empty, and could be used again. The returned string owns its own separate copy of the accumulated data that was in the string builder, before the .str() call.
fn (Builder) ensure_cap #
fn (mut b Builder) ensure_cap(n int)
ensure_cap ensures that the buffer has enough space for at least n
bytes by growing the buffer if necessary
fn (Builder) grow_len #
fn (mut b Builder) grow_len(n int)
grow_len grows the length of the buffer by n
bytes if necessary
fn (Builder) free #
fn (mut b Builder) free()
free frees the memory block, used for the buffer.
Note: do not use the builder, after a call to free().
- README
- fn dice_coefficient
- fn find_between_pair_rune
- fn find_between_pair_string
- fn find_between_pair_u8
- fn hamming_distance
- fn hamming_similarity
- fn jaro_similarity
- fn jaro_winkler_similarity
- fn levenshtein_distance
- fn levenshtein_distance_percentage
- fn new_builder
- fn repeat
- fn repeat_string
- fn split_capital
- type Builder
- fn reuse_as_plain_u8_array
- fn write_ptr
- fn write_rune
- fn write_runes
- fn clear
- fn write_u8
- fn write_byte
- fn write_decimal
- fn write
- fn drain_builder
- fn byte_at
- fn write_string
- fn write_string2
- fn go_back
- fn spart
- fn cut_last
- fn cut_to
- fn go_back_to
- fn writeln
- fn writeln2
- fn last_n
- fn after
- fn str
- fn ensure_cap
- fn grow_len
- fn free