Memory and File Management – Tales from the Trenches

I’m basically working full time on the game at the moment. I have another business that demands attention, but the momentum of the game keeps growing. There’s a tension between new features and stability that keeps me working at it. I’ll add a new feature or two, we test it and uncover some bugs, I fix enough of them to think the game is stable and add more features. In spite of all the debugging support I’ve added, some crashes have remained a complete mystery.

Recently my dev laptop started failing to reallocate memory. I was writing the entire game’s replay data into one buffer, and inevitably the amount of data outgrew the buffer; so it had to grow too. We are not talking about huge amounts of memory – 8 or 16MB allocations would fail. It was a little surprising give the 2 or so GB of virtual memory available.

There must be some fragmentation of the heap going on – quite likely since I dynamically allocate small amounts of memory for the very dynamic physics, projectile and effects system. My test machine with the same OS did not have the same trouble, but I figured I should bite the bullet and work out a better solution. After all, there’s no telling how long or complicated games will get when it’s out in the wild, and running on all sorts of low-end hardware.

So the new system accumulates replay data in a smallish buffer of 1MB, but a flush is triggered when there is more that 256KB written. The chunk is compressed and then appended to the physical file for the replay. If there’s a crash, desync, the player quits or goes into instant replay, whatever is in the buffer is also flushed. When a replay is viewed the entire list of chunks is read and decompressed into a single large buffer. This is needed because the user can seek to any point in the replay. It doesn’t seem to be a problem allocating big buffers once at the beginning – although perhaps I’ll have to work out a replay streaming system if it is for other people.

Even with this working, there was the possibility of the buffer running out because I had not yet serialised any projectiles. I put this off because I thought it was unnecessary work and didn’t seem to matter that much, but now I had a good reason to. If there was a continuous fire fight a flush could be put off for a long time. The replay key frames are important for desync recovery, so there must always be one in the current buffer. The other thing was, players can attempt to join any time, and they would have to wait until there were no projectiles to actually enter the game. If the host had paused while any projectiles were airborne the joining client would hang at the loading screen. Not good! So with a little extra work – perhaps a day – I managed to write and read the state of bullets, mortars and guided missiles to and from the key frames. It felt good to have that working, finally! Now the game can make perfectly regular key frames, the buffer use is much more predictable (although individual key frame sizes can vary), and players can truly join at any time.

I just discovered today that this new system of writing the replay to disk in chunks caused a problem with the way games are restarted. After the game configuration (map, users, AI configuration, etc.), the first thing the game records in the replay file is a snapshot (or key frame) of the game, which allows the source data to change without invalidating the replay. When the player restarts a game it would seek to the beginning of replay to reload that key frame and start again from there. But now with the buffer flushed that data wasn’t there, unless it was a very short game. So I had to read in the game configuration preamble and the first compressed chunk which contained that vital first key frame. Anything after that is discarded and the new game data is appended.

Another thing I did today was prevent the game rolling on indefinitely after a game had ended. This maybe not be a big issue for players, but I have been pitting two computer players against each other to test stability. Without a forced pause the replay could go on and on with nothing actually happening. Little things like this are tightening up the game, bit by bit. It feels like it’s coming together rapidly, but it’s actually taking a very long time.

Prefab engines such as Unity or Unreal help a lot of game developers reduce the total amount of work, but I really don’t think there is much in this game that is generic. The hardest problems I deal with feel very unique, such as synchronising target selection for missile launchers via snipers over any number of clients. Not being hamstrung by an existing framework may be an advantage is some cases.

Last night I had a bit of a crisis. I did a regular backup of the Subversion repository to DVD and Dropbox, but something happened to the original files during the process. The files looked sensible if you opened them, but Subversion was failing! Luckily the DVD backup was good and replacing the files with the backup sorted it out. Although there was no danger of losing work on the game, having a complete history of changes is extremely valuable.

It’s amazing how much work all this takes. I wonder how apparent that is to anyone playing the game.