string.len with cyrillic in Gideros

unlying · March 2013

string.len("база") return 8 in gideros. And any other cyrillic combination will return doubled number. With latin works fine. I thought that may be it is ok in Lua, but in Quick works fine...
So it is gideros bug.

Unknown · March 2013

its not a bug. unless it is bug in standard lua interpreter (library liblua).
in short
> You might want to know how many Unicode characters are in a string. Depending on the encoding used, a single Unicode character may occupy up to four bytes.

for more information see http://lua-users.org/wiki/LuaUnicode

unlying · March 2013

May be it is really Lua bug. But as i said before Quick returns right numbers.

ar2rsawseen · March 2013

By default in Lua string.len in this case should return how many bytes string occupies, to get size for buffer, etc. And it's doing just that, so maybe it's a bug in Quick

I know I'm not being helpful, just the fact

http://stackoverflow.com/questions/10097941/print-number-of-characters-in-utf-8-string

So basically you'd need a specific library to deal with unicode strings, I just don't know any of them, maybe someone else will.

unlying · March 2013

I used this lib in my Rebus game. Looks like it fit to this problem too. If anybody will need it - file attached to message.
May be Quick has some sort of autodetect... curiously...

atilim · March 2013

Gideros always saves the files in UTF-8 format. For Quick, I think there are two possibilities:
1. You've used a encoding like http://en.wikipedia.org/wiki/ISO/IEC_8859-5 other than UTF-8.
2. Quick has modified string.len (and other string functions) to work with UTF-8 seamlessly.

atilim · March 2013

And I think the most convenient way to work with different encodings is to use http://www.gnu.org/software/libiconv/ library. And it seems there is a Lua bindings too: http://ittner.github.com/lua-iconv/

But it's a big library and its binaries takes about ~2MB.

MauMau · December 2013

I'm not sure if my issue fits into the same category - I found that string.upper and string.lower do not work as expected.

According to lua.org, the following should work:

string.upper("ação") -- returns: AÇÃO

With Gideros, however, it returns: AçãO

Is Gideros perhaps using an outdated library?

uzubari · November 2014

Hello
Here is a solution which works for me for getting correct unicode string length

-- Return the character count in a unicode string word
function wordLength( word )
local wordlength = 0
for c in string.gmatch(word, ".") do
	 if( string.byte(c)  < 128 or string.byte(c) > 191) then 
	 --print("c  "..string.byte(c))
		wordlength = wordlength + 1
	end
end
return wordlength
end

Howdy, Stranger!

Categories

In this Discussion

Top Posters

string.len with cyrillic in Gideros

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Top Posters

string.len with cyrillic in Gideros

Comments