strings #

Description

strings provides utilities for efficiently processing large strings.

If you got here looking for methods available on the string struct, those methods are found in the builtin module.

fn dice_coefficient #

fn dice_coefficient(s1 string, s2 string) f32

dice_coefficient implements the Sørensen–Dice coefficient. It finds the similarity between two strings, and returns a coefficient between 0.0 (not similar) and 1.0 (exact match).

fn find_between_pair_rune #

fn find_between_pair_rune(input string, start rune, end rune) string

find_between_pair_rune returns the string found between the pair of marks defined by start and end. As opposed to the find_between, all_after*, all_before* methods defined on the string type, this function can extract content between nested marks in input. If start and end marks are nested in input, the characters between the outermost mark pair is returned. It is expected that start and end marks are balanced, meaning that the amount of start marks equal the amount of end marks in the input. An empty string is returned otherwise. Using two identical marks as start and end results in undefined output behavior. find_between_pair_rune is inbetween the fastest and slowest in the find_between_pair_* family of functions.

Examples

assert strings.find_between_pair_rune('(V) (NOT V)',`(`,`)`) == 'V'

assert strings.find_between_pair_rune('s {X{Y}} s',`{`,`}`) == 'X{Y}'

fn find_between_pair_string #

fn find_between_pair_string(input string, start string, end string) string

find_between_pair_string returns the string found between the pair of marks defined by start and end. As opposed to the find_between, all_after*, all_before* methods defined on the string type, this function can extract content between nested marks in input. If start and end marks are nested in input, the characters between the outermost mark pair is returned. It is expected that start and end marks are balanced, meaning that the amount of start marks equal the amount of end marks in the input. An empty string is returned otherwise. Using two identical marks as start and end results in undefined output behavior. find_between_pair_string is the slowest in the find_between_pair_* function family.

Examples

assert strings.find_between_pair_string('/*V*/ /*NOT V*/','/*','*/') == 'V'

assert strings.find_between_pair_string('s {{X{{Y}}}} s','{{','}}') == 'X{{Y}}'

fn find_between_pair_u8 #

fn find_between_pair_u8(input string, start u8, end u8) string

find_between_pair_byte returns the string found between the pair of marks defined by start and end. As opposed to the find_between, all_after*, all_before* methods defined on the string type, this function can extract content between nested marks in input. If start and end marks are nested in input, the characters between the outermost mark pair is returned. It is expected that start and end marks are balanced, meaning that the amount of start marks equal the amount of end marks in the input. An empty string is returned otherwise. Using two identical marks as start and end results in undefined output behavior. find_between_pair_byte is the fastest in the find_between_pair_* family of functions.

Examples

assert strings.find_between_pair_u8('(V) (NOT V)',`(`,`)`) == 'V'

assert strings.find_between_pair_u8('s {X{Y}} s',`{`,`}`) == 'X{Y}'

fn hamming_distance #

fn hamming_distance(a string, b string) int

hamming_distance uses the Hamming Distance algorithm to calculate the distance between two strings a and b (lower is closer).

fn hamming_similarity #

fn hamming_similarity(a string, b string) f32

hamming_similarity uses the Hamming Distance algorithm to calculate the distance between two strings a and b. It returns a coefficient between 0.0 (not similar) and 1.0 (exact match).

fn jaro_similarity #

fn jaro_similarity(a string, b string) f64

jaro_similarity uses the Jaro Distance algorithm to calculate the distance between two strings a and b. It returns a coefficient between 0.0 (not similar) and 1.0 (exact match).

fn jaro_winkler_similarity #

fn jaro_winkler_similarity(a string, b string) f64

jaro_winkler_similarity uses the Jaro Winkler Distance algorithm to calculate the distance between two strings a and b. It returns a coefficient between 0.0 (not similar) and 1.0 (exact match). The scaling factor(p=0.1) in Jaro-Winkler gives higher weight to prefix similarities, making it especially effective for cases where slight misspellings or prefixes are common.

fn levenshtein_distance #

fn levenshtein_distance(a string, b string) int

levenshtein_distance uses the Levenshtein Distance algorithm to calculate the distance between between two strings a and b (lower is closer).

fn levenshtein_distance_percentage #

fn levenshtein_distance_percentage(a string, b string) f32

levenshtein_distance_percentage uses the Levenshtein Distance algorithm to calculate how similar two strings are as a percentage (higher is closer).

fn new_builder #

fn new_builder(initial_size int) Builder

new_builder returns a new string builder, with an initial capacity of initial_size.

fn repeat #

fn repeat(c u8, n int) string

strings.repeat - fill a string with n repetitions of the character c

fn repeat_string #

fn repeat_string(s string, n int) string

strings.repeat_string - gives you n repetitions of the substring s

Note: strings.repeat, that repeats a single byte, is between 2x and 24x faster than strings.repeat_string called for a 1 char string.

fn split_capital #

fn split_capital(s string) []string

split_capital returns an array containing the contents of s split by capital letters.

Examples

assert strings.split_capital('XYZ') == ['X', 'Y', 'Z']

assert strings.split_capital('XYStar') == ['X', 'Y', 'Star']

type Builder #

type Builder = []u8

strings.Builder is used to efficiently append many strings to a large dynamically growing buffer, then use the resulting large string. Using a string builder is much better for performance/memory usage than doing constantly string concatenation.

fn (Builder) reuse_as_plain_u8_array #

unsafe

fn (mut b Builder) reuse_as_plain_u8_array() []u8

reuse_as_plain_u8_array allows using the Builder instance as a plain []u8 return value. It is useful, when you have accumulated data in the builder, that you want to pass/access as []u8 later, without copying or freeing the buffer. NB: you should NOT use the string builder instance after calling this method. Use only the return value after calling this method.

fn (Builder) write_ptr #

unsafe

fn (mut b Builder) write_ptr(ptr &u8, len int)

write_ptr writes len bytes provided byteptr to the accumulated buffer

fn (Builder) write_rune #

fn (mut b Builder) write_rune(r rune)

write_rune appends a single rune to the accumulated buffer

fn (Builder) write_runes #

fn (mut b Builder) write_runes(runes []rune)

write_runes appends all the given runes to the accumulated buffer.

fn (Builder) write_u8 #

fn (mut b Builder) write_u8(data u8)

write_u8 appends a single data byte to the accumulated buffer

fn (Builder) write_byte #

fn (mut b Builder) write_byte(data u8)

write_byte appends a single data byte to the accumulated buffer

fn (Builder) write_decimal #

fn (mut b Builder) write_decimal(n i64)

write_decimal appends a decimal representation of the number n into the builder b, without dynamic allocation. The higher order digits come first, i.e. 6123 will be written with the digit 6 first, then 1, then 2 and 3 last.

fn (Builder) write #

fn (mut b Builder) write(data []u8) !int

write implements the io.Writer interface, that is why it returns how many bytes were written to the string builder.

fn (Builder) drain_builder #

fn (mut b Builder) drain_builder(mut other Builder, other_new_cap int)

drain_builder writes all of the other builder content, then re-initialises other, so that the other strings builder is ready to receive new content.

fn (Builder) byte_at #

fn (b &Builder) byte_at(n int) u8

byte_at returns a byte, located at a given index i.

Note: it can panic, if there are not enough bytes in the strings builder yet.

fn (Builder) write_string #

fn (mut b Builder) write_string(s string)

write appends the string s to the buffer

fn (Builder) write_string2 #

fn (mut b Builder) write_string2(s1 string, s2 string)

write_string2 appends the strings s1 and s2 to the buffer.

fn (Builder) go_back #

fn (mut b Builder) go_back(n int)

go_back discards the last n bytes from the buffer.

fn (Builder) spart #

fn (b &Builder) spart(start_pos int, n int) string

spart returns a part of the buffer as a string

fn (Builder) cut_last #

fn (mut b Builder) cut_last(n int) string

cut_last cuts the last n bytes from the buffer and returns them.

fn (Builder) cut_to #

fn (mut b Builder) cut_to(pos int) string

cut_to cuts the string after pos and returns it. if pos is superior to builder length, returns an empty string and cancel further operations

fn (Builder) go_back_to #

fn (mut b Builder) go_back_to(pos int)

go_back_to resets the buffer to the given position pos.

Note: pos should be < than the existing buffer length.

fn (Builder) writeln #

fn (mut b Builder) writeln(s string)

writeln appends the string s, and then a newline character.

fn (Builder) writeln2 #

fn (mut b Builder) writeln2(s1 string, s2 string)

writeln2 appends two strings: s1 + \n, and s2 + \n, to the buffer.

fn (Builder) last_n #

fn (b &Builder) last_n(n int) string

last_n(5) returns 'world' buf == 'hello world'

fn (Builder) after #

fn (b &Builder) after(n int) string

after(6) returns 'world' buf == 'hello world'

fn (Builder) str #

fn (mut b Builder) str() string

str returns a copy of all of the accumulated buffer content.

Note: after a call to b.str(), the builder b will be empty, and could be used again. The returned string owns its own separate copy of the accumulated data that was in the string builder, before the .str() call.

fn (Builder) ensure_cap #

fn (mut b Builder) ensure_cap(n int)

ensure_cap ensures that the buffer has enough space for at least n bytes by growing the buffer if necessary.

fn (Builder) grow_len #

unsafe

fn (mut b Builder) grow_len(n int)

grow_len grows the length of the buffer by n bytes if necessary

fn (Builder) free #

unsafe

fn (mut b Builder) free()

free frees the memory block, used for the buffer.

Note: do not use the builder, after a call to free().