Quick Links: Download Gideros Studio | Gideros Documentation | Gideros Development Center | Gideros community chat | DONATE
Inconsistent crash on Windows - really stumped — Gideros Forum

Inconsistent crash on Windows - really stumped

I've been trying to solve a mysterious crash on a Windows game, and I'm running out of ideas...

An app I've had on the Microsoft Store for years suddenly stopped working for at least a couple of users, crashing after a few seconds showing the splash screen. The Microsoft Store developer dashboard has no details about the crash. One effected user has been a customer of my games for more than 10 years, and he's been working with me to try to resolve it.

Since the app hadn't changed, my first suspicion was that the installation of the app had been corrupted somehow. I published an update as a beta to the one customer, allowing him to update the app without losing his data (records of tens of thousands of fish caught over several years, placing him high on the global leader board.) That didn't help, so I suspected the app's saved data was corrupt, though in most cases the code is tolerant of corrupt data. I built him another beta that sent his catch history to a script that stored it in a database, and a one-off system to download and restore that data. That allowed him to uninstall the app complete and reinstall, knowing we could restore his progress (I plan to do something like this for all customers in the future.) But even after uninstalling and reinstalling, it still crashes on startup for him.

In the app I already had code in place to handle failed startups to help me with troubleshooting. At many points it appends an entry to a log file showing the last step completed. Some seconds after the app is up and running, it deletes that log. So on a startup, if the file exists, it shows that the previous run didn't start successfully, and adds that log to a database on my server, showing how far it got. I've used that to solve some other issues in the past. Whatever steps the code takes between the last successful log entry and the next one it should have taken next should include the code where the crash happened. Normally that's very helpful when I can't get a stack trace.

In the betas, I've also peppered the startup code with many more logged steps like that, to pinpoint exactly what the app is trying to do when it crashes, with each beta adding more to narrow it down until I'm logging steps for individual lines of code at the area where it's been crashing. That should pinpoint the exact point of the crash, but the crashes are not consistent from one run to the next. On one run it might crash seeding the random number generator. On another it might be when it tries to open a local file. In one case it failed after one log entry and another when no other code was attempted between them. I've built a version that sends a status update to a server at each logged step, rather than only when the previous run failed to start, and the data shows the same - failing very early in the startup, but not consistently at any particular step.

That suggests the problem is happening on some other thread than the Lua code, or triggered by some other software that's running. The user has checked for updated drivers, etc. I still can't reproduce the problem - both the general release and all the betas run fine on two Windows 10 computers. The user's computer's specs exceed that of the ones I'm using - ample RAM and drive space, etc. He runs many other games on it, and the only one that crashes is mine.

Unfortunately this is the short version of the story - I've been through many one-off builds trying to solve this over the past few weeks. I originally built with Visual Studio 2015, reinstalled that and the SDKs from scratch, originally built with WinRT export and and older Gideros, now updated to UWP with the latest Gideros and Visual Studio 2017, and for complicated reasons, switched to building on a different PC. Again, everything works as it should for me, logging all the startup steps of every run to my database, whether I'm running a clean install of the game or one I've run several times.

Is there any .ini file or registry entry left behind after an uninstall of a Windows Store application made from Gideros, something that could be corrupted and still around to cause trouble after uninstalling or reinstalling the app? I'm grasping at straws here. I know that in general, if a game used to work on Windows 10 and suddenly stops, it's a good bet some Windows update changed the environment and some hardware may not place nice until drivers get updated. But if his drivers are up to date, and other games and applications that use the graphics card, etc, work fine, and the application runs fine on other computers with fully updated Windows 10, where does one go from here?

I'd love to hear any and all ideas...

Thanks,

Paul


Likes: antix

+1 -1 (+1 / -0 )Share on Facebook

Comments

  • antixantix Member
    edited February 2020
    If the user knows the date the application stopped working they should check for windows updates around that time, especially device drivers.

    Maybe they could run system restore to around that date and test again.

    You could just publish a test case where you use an empty project and see if that does not work on his machine also.
  • Thanks for the ideas. I've considered both - asking him to do a system restore to back out of some updates, which might fix it for now, but if the issue is some incompatibility between some of the users hardware and the latest Windows update, the problem would likely return as soon as he updates again.

    I've also considered publishing a trivial Gideros app as a beta, without network access, plugins, or anything complicated, just to see if whatever the issue is applies to everything built this way. If it runs, I could try adding a module or plugin at a time and see if one particular one seems to be causing the crash. I don't know if or how that might lead to a fix, but it could conceivably shed more light on the cause.

    I might try either of those next. Still scratching my head over this...

    Likes: antix

    +1 -1 (+1 / -0 )Share on Facebook
  • MoKaLuxMoKaLux Member
    edited February 2020
    antix said:

    You could just publish a test case where you use an empty project and see if that does not work on his machine also.

    PaulH said:

    adding a module or plugin at a time and see if one particular one seems to be causing the crash. I don't know if or how that might lead to a fix, but it could conceivably shed more light on the cause.

    +1 here.

    Likes: antix

    my growING GIDEROS github repositories: https://github.com/mokalux?tab=repositories
    +1 -1 (+1 / -0 )Share on Facebook
  • Update: Through numerous beta builds, I've found the crashes happen iterating through the results of lfs.dir(). The crashes happen using either of the two methods to iterate through a list of files supported by the Lua File System:

    for file in lfs.dir(path) do
    (stuff)
    end

    iter, lfs_obj = lfs.dir()
    while iter do
    (stuff)
    iter = lfs_obj:next()
    end
    lfs_obj:close()

    The first result is ".", and retrieving the second result crashes for the effected user.

    When I built a stripped down app with most of the content removed, and two buttons that would each test one of those two methods, for him the first method doesn't crash but doesn't return any results at all. The second method works.

    Still trying to get this worked out...

    Paul


    Likes: MoKaLux, antix

    +1 -1 (+2 / -0 )Share on Facebook
  • So it's working now or not? I'm currently away from home but have some lfs code for iterating through a folder that works... unless something has changed since I last ran it :D
  • It still works fine on my own computers, but fails for at least a couple of my customers, for whom it used to work. For the customer who's helping me troubleshoot it, it works on his laptop but not on his desktop computer. So something changed in their environment that caused calling lfs.dir(), or at least iterating through the results, to crash.

    If I don't find any other solution, I may have to rework quite a bit of code to avoid ever needing to use lfs.dir() or anything equivalent. In one case my server can deliver a list of possible entries, names of folders where each of about 172 different content packages would be installed. Then I can have the app just check for the presence of each. In another case I'm getting a list of folders containing user created content (fishing flies), but I could maintain a list in a file, adding another entry each time the user creates a new item.

    Paul
  • antixantix Member
    edited February 2020
    Here is part of my hero manager class that goes through a folder and loads fie data... maybe it could be of some use.
    HEROFOLDER @ '|D|heroes/'
     
    HeroManager =Core.class()
     
    function HeroManager:init()
      HEROMANAGER = self
      HERO = nil
      self.hero = nil
     
      self:createHeroList()
    end
     
    -- create hero folder if it does not exist, otherwise create list of saved heroes
    function HeroManager:createHeroList()
      local heroList = {}
      if not self:folderExists(HEROFOLDER) then -- make hero folder if it does not exist
        lfs.mkdir(HEROFOLDER)
        print("HeroManager:init() created folder " .. HEROFOLDER)
      else
        for filename in lfs.dir(HEROFOLDER) do -- populate list of hero names
          if filename ~= "." and filename ~= ".." then
            local hero = self:loadHero(filename)
            local info = filename .. ", Level" .. hero.level .. " " .. hero.class -- info that will be shown in game when selecting a hero
            heroList[filename] = info
            print('loaded hero: ' .. filename .. ', level ' .. hero.level .. ' ' .. hero.class)
          end
        end
      end
      self.heroList = heroList
      HEROLIST = heroList
     
      if self:countHeroes() == 0 then
        print('Hero folder found but it contains no heroes')
      end
    end
     
    -- returns true if folder exists
    function HeroManager:folderExists(name)
      local cd = lfs.currentdir()
      local result = lfs.chdir(HEROFOLDER)
      if cd ~= nil then
        lfs.chdir(cd)
      end
      return result
    end
     
    -- returns number of elements in table
    function HeroManager:countHeroes()
      local heroList = self.heroList
      local count = 0
      for _ in pairs(heroList) do
        count += 1
      end
      return count
    end

    Likes: MoKaLux

    +1 -1 (+1 / -0 )Share on Facebook
  • i do something like this:
    local ok, err = pcall(lfs.dir, path)
    if not ok then return nil, err end
    local iter, dir = lfs.dir(path)
    perhaps doing it in pcall would help you too.

    Likes: MoKaLux

    +1 -1 (+1 / -0 )Share on Facebook
  • Thanks for the ideas. I really appreciate the input.

    At this point I can confirm that on the few Windows 10 computers (I know of) where the problem exists, the application crashes the second time lfs.dir() iterates, even if the return value of the first iteration works as it should, and all values look appropriate until the second iteration (attempting to access the second result.) So it seems that any approach that uses lfs.dir() will crash in these cases.

    I don't know if third party software, like specific anti-malware programs trying to protect files, or the presence of mounted network drives, or some other aspect of these particular systems might be involved.

    Moments ago I confirmed that if the app avoids calling lfs.dir(), everything else works just fine. My latest beta just maintains a list of files written to folders, and consults those lists rather than asking the system for a list of files present by calling lfs.dir().

    So that's my solution, at least for now. Lfs.dir() crashes on some small subset of Windows 10 computers, so I won't call it.

    For backward compatibility, my next general release will make one attempt to use lfs.dir() to build the lists of files already present. Thereafter it will just add to the list when it adds files to folders, so most users will never experience any impact. For those few computers that crash on lfs.dir(), it will crash on that first run when it makes that attempt, but on subsequent runs it will only use the stored file lists.

    Paul

    Likes: MoKaLux, keszegh, talis

    +1 -1 (+3 / -0 )Share on Facebook
  • talistalis Guru
    edited February 2020
    It seems like this is the same case:
    https://github.com/luapower/lfs/issues/1
    By the way when i searched google like "lfs.dir crash on windows" so many pages about this crash
    All crashes are lua based applications and in the second iteration.@PaulH

    Likes: MoKaLux

    +1 -1 (+1 / -0 )Share on Facebook
  • @talis, interesting find, perhaps that may help to find the cause.

    Likes: talis

    +1 -1 (+1 / -0 )Share on Facebook
  • hgy29hgy29 Maintainer
    @PaulH and @talis, thanks for your input. It seems to be a lfs issue but I had a quick look at their code (and our modified version), and I may have spotted the issue: they use '_findfirst' series of function under windows, but expect the result to be a 'long', while it should be a 'intptr_t'. Which means that it can cause crashes on win64 platforms, since a long is still 32bit, but an intptr_t is 64bit...
    While I am at it, I also spotted another possible issue: they use ascii versions of the API, not unicode (wide-char), which means that it will have trouble with non pure ascii file names.

    First fix is very easy: just change line 94 of /All Plugins/lfs/source/lfs.c from 'long hFile;' into 'intptr_t hFile;'

    Second fix requires a bit more work.
    +1 -1 (+4 / -0 )Share on Facebook
  • @hgy29 thanks for the solution. The hard-work done by @PaulH i just searched google and happy to help. o:)

    Likes: MoKaLux

    +1 -1 (+1 / -0 )Share on Facebook
  • Wow- Thanks, @talis and @hgy29! I'll make the change to lfs.c and give that a shot.

    I don't know if this is relevant, but at least a few times I've seen an iteration of lfs.dir() return a value of type "function". I didn't dig too far into that - just used code to only deal with returns of type "string", but that still puzzles me.



    Likes: talis

    +1 -1 (+1 / -0 )Share on Facebook
  • Interesting. I'm keen to find out if the fix @hgy29 has implemented will solve the issue for @paulh

    Likes: MoKaLux

    +1 -1 (+1 / -0 )Share on Facebook
Sign In or Register to comment.