How to make this function to work with utf8?
function string:split(sep)
local sep, fields = sep or ":", {}
local pattern = string.format("([^%s]+)", sep)
self:gsub(pattern, function(c) fields[#fields + 1] = c end)
return fields
end
I think the line:
self:gsub(pattern, function(c) fields[#fields + 1] = c end) should be replaced with:
self:utf8.gsub(pattern, function(c) fields[#fields + 1] = c end)
but that give me error : "function arguments expected near '.' "?
Comments
Also `:` in `self:gsub(pattern, function(c) fields[#fields + 1] = c end)` is syntax sugar for `string.gsub(self, pattern, function(c) fields[#fields + 1] = c end)`.
Try this one:
Example:
st = {}
a = "jezičnim|teškoću"
st = a:split("|")
print(st[1], st[2])
print(#st[1], #st[2])
After running the string is splitted but there are errors:
word: "jezičnim" have 8 letters but # command gives 9.
also word "teškoću" have 7 letters but # command gives also 9.
It seems that every letter č, š, ć adds empty space at end (in first case one, in second two empty spaces).
It's same with self:gsub(pattern, function(c) fields[#fields + 1] = c end) and
utf8.gsub(self, pattern, function(c) fields[#fields + 1] = c end), even I was thinking that utf8 characters are the problem.
Also how do you put code in "window" like in your post?
There are no errors, it's how utf8 strings work: each utf8 character is encoded with 1, 2, 3 or 4 bytes (depends on utf8 character itself).
And Lua strings are raw byte sequences where each byte is an unsigned number in 0..255 range. And '#' operator for a string only gives you it's size in bytes. And if you need to get length in characters you use `utf8.len` function instead of `#` operator.
Select your text and press "C" button (it's above the text you enter).