Quick Links: Download Gideros Studio | Gideros Documentation | Gideros community chat | DONATE
string.len with cyrillic in Gideros — Gideros Forum

string.len with cyrillic in Gideros

unlyingunlying Guru
edited March 2013 in Bugs and issues
string.len("база") return 8 in gideros. And any other cyrillic combination will return doubled number. With latin works fine. I thought that may be it is ok in Lua, but in Quick works fine...
So it is gideros bug.
Tagged:

Comments

  • its not a bug. unless it is bug in standard lua interpreter (library liblua).
    in short
    > You might want to know how many Unicode characters are in a string. Depending on the encoding used, a single Unicode character may occupy up to four bytes.

    for more information see http://lua-users.org/wiki/LuaUnicode
  • unlyingunlying Guru
    edited March 2013
    May be it is really Lua bug. But as i said before Quick returns right numbers.
  • ar2rsawseenar2rsawseen Maintainer
    edited March 2013
    By default in Lua string.len in this case should return how many bytes string occupies, to get size for buffer, etc. And it's doing just that, so maybe it's a bug in Quick :D

    I know I'm not being helpful, just the fact ;)

    http://stackoverflow.com/questions/10097941/print-number-of-characters-in-utf-8-string

    So basically you'd need a specific library to deal with unicode strings, I just don't know any of them, maybe someone else will.
  • unlyingunlying Guru
    edited March 2013
    I used this lib in my Rebus game. Looks like it fit to this problem too. If anybody will need it - file attached to message.
    May be Quick has some sort of autodetect... curiously...

    Likes: bali001

    +1 -1 (+1 / -0 )Share on Facebook
  • atilimatilim Maintainer
    edited March 2013
    Gideros always saves the files in UTF-8 format. For Quick, I think there are two possibilities:
    1. You've used a encoding like http://en.wikipedia.org/wiki/ISO/IEC_8859-5 other than UTF-8.
    2. Quick has modified string.len (and other string functions) to work with UTF-8 seamlessly.
  • atilimatilim Maintainer
    And I think the most convenient way to work with different encodings is to use http://www.gnu.org/software/libiconv/ library. And it seems there is a Lua bindings too: http://ittner.github.com/lua-iconv/

    But it's a big library and its binaries takes about ~2MB.
  • I'm not sure if my issue fits into the same category - I found that string.upper and string.lower do not work as expected.

    According to lua.org, the following should work:
    string.upper("ação") -- returns: AÇÃO
    With Gideros, however, it returns: AçãO

    Is Gideros perhaps using an outdated library?
  • uzubariuzubari Member
    edited November 2014
    Hello
    Here is a solution which works for me for getting correct unicode string length
    -- Return the character count in a unicode string word
    function wordLength( word )
    local wordlength = 0
    for c in string.gmatch(word, ".") do
    	 if( string.byte(c)  < 128 or string.byte(c) > 191) then 
    	 --print("c  "..string.byte(c))
    		wordlength = wordlength + 1
    	end
    end
    return wordlength
    end
Sign In or Register to comment.