How to debug replay determinism issues

Discussion in 'Battle Improvements' started by Neckbeard, Feb 27, 2015.

  1. Neckbeard

    Neckbeard Crew

    I don't mean to be insulting, but the replay divergences are pretty bad and don't seem to be getting better. So I thought it might help to give some advice on how to attack the problem. I'm a game developer and I've debugged replay determinism problems in multiple games.

    Here's how I do it:

    1. Add a system to log a massive amount of data about the game state, the positions, velocities, health, etc etc etc of everything in the game, to a file.
    2. Create such a log while playing the game, also recording all user input.
    3. Restart the battle, but this time you will read your massive log file instead of write to it, and play back the user input.
    4. If any of the variables you captured differ from what you recorded, assert.
    5. Hopefully where the assert hits will clue you into what caused the replay to diverge from the recording. If not then you need to add more logging and return to step 2.

    That's it! It sounds easy when written so succinctly, but it's a bear of a problem. Here's some more tips:

    • Staying deterministic is a never-ending battle. Every time the code changes, someone might have broken it. Depending on the size of your team, assigning a production assistant or setting up a server to keep testing it all the time is a good idea.
    • If the file is too large to fit on the device, transfer it over the network.
    • Record frame delta times (unless you have a fixed-frame engine) as part of user input. During playback you will use the recorded times and not newly measured time deltas. You might have to skip replay frames or interpolate between them if the CPU is running at a different speed than when the recording was made.
    • If you use random numbers, seed the random number generator with the same value when you begin a replay.
    • Multithreading can easily break determinism. You have to make sure the order that objects get processed in either doesn't matter, or is itself deterministic.
    • Another major headache is the OS. You can't predict when an interrupt will change how long it takes for an operation to complete. So all time measurements that feed into game logic need to use an in-game measure of time (accumulated frame time deltas) and not the "wall clock time".
    • It may not seem like it sometimes, but the hardware really is deterministic! It's easy when you're in the trenches, to start to think a divergence is due to cosmic rays hitting your CPU. So far I haven't ever been right about that.
    Fingers crossed this actually gets read by an engineer that can benefit from it!
     
  2. Kelani

    Kelani Commodore

    Nicely put @Neckbeard and I'm glad to see you posting again! :) I was wondering about your take on how the update addressed the poisonous troops upgrades you wrote about previously. Anyway, I agree with everything you said above.

    I've spent quite a bit of time looking at my replays, and while not really scientific, it seems that what breaks my replays the most is a mortar shot, WD position, or a combo of both. Here's an example I see all the time:

    Battle: A mortar hits my gunners, knocking them back. However, they remain within the WD's protection aura. This protects them from the remaining attacks, and I go on to win the battle.

    Replay: Because the gunners and WD take slightly different paths, the mortar shot knocks the gunners OUT of the aura, or kills the WD. Those gunners quickly die, and no longer contribute. The attack falls apart, and the replay veers completely off course.

    The reverse is also true. Sometimes troops which were killed in a battle survive the death shot in the replay, and you wind up with a replay that's better than your actual results.

    In short, replays are being broken by a butterfly effect.
     
  3. Neckbeard

    Neckbeard Crew

    Yep, the butterfly effect is a good way to put it. Even the smallest variation from the original play to the replay can cause major divergence down the road. That's why you have to record absolutely everything into the file I talked about above, and fire an assert at the slightest difference. By the time the divergence is actually visible to the player, the problematic event has long since passed.

    (I'll reply about the poisonous troop upgrades in that thread)
     
    Smelly_Vile and Kelani like this.
  4. Kakashi

    Kakashi Crew

    Totally agree with u @Neckbeard. But don't u think creating such a vast log file may hinder the playability on low end devices, and i think it will,also enable users to cheat like in samurai siege health hacks.?
     
  5. Neckbeard

    Neckbeard Crew

    Oh I meant, all of this logging should never make it into a production build. Indeed it probably will have to be be turned off for the the majority of the development team as well. It might make the game too slow to play normally. All of it should be on a #define or something so only the engineer working on the replay system will need to turn it on.
     
  6. Kakashi

    Kakashi Crew

    @Neckbeard hope the midoki guys apply ur idea it's great. U seem to know much about the programming r u a Dev?
     
  7. Becker Redbeard

    Becker Redbeard First Mate

    Wow. Seems simple ;)
    Good thoughts. To bring up a possibly related side issue, hopefully some are not suffering defeats due to wall clock vs. frame time issues. A few in our guild feel like wins are harder, troops weaker. If true could it be the update (software updates almost always using more computing resources) is taxing older Apple devices?
     
  8. Kakashi

    Kakashi Crew

    I think they are @Becker Redbeard , my old iPad 3 is rendered unusable for most of the games which use metal like pp and vain glory.
     
  9. Kelani

    Kelani Commodore

    This game is one heck of a battery hog, too. That usually indicates it's taxing the resources pretty nicely.
     
  10. Becker Redbeard

    Becker Redbeard First Mate

    This thread is above my pay grade, but...
    My understanding is Metal is the shader/ graphics engine.
    Would @Neckbeard or @Kakashi be able to elaborate whether this might have some impact on gameplay, success of attacks? I'd think the battle AI would be separate from graphics concerns (CPU vs GPU), but again way outta my league here.
    Should I close all background apps before I go raiding to free up resources? o_O
    Playing on iPad 3 btw.
     
  11. Kakashi

    Kakashi Crew

    Cpu may be handling the ai, but it is the gpu that processes those shades and triangles into the picture u see on ur iPad.so if the gpu isn't doing well there is no point in having a high end cup calculate the ai, u will still have shitty performance.i believe u should clear some resources first.
     
  12. Neckbeard

    Neckbeard Crew

    It is possible you could get different results from one device to another, depending on how the engine works. For example, one common difference between machines is frame rate. If you're not careful with your math, you can end up with different results on a slower machine. A famously hard thing to get consistent in variable frame rate games is jump height.

    It's equally possible though that a slower machine could bias the results in the user's favor, as the other way. And I'm sure the developers have tried to design things to keep there from being any noticeable difference. With the right algorithms (which they need to use anyway to get replay to work...) there should be no difference at all. I think it's more likely that people are deceiving themselves to make themselves feel better about their losses.
     
    Smelly_Vile, Kakashi and Kelani like this.

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice