AWS for Games Blog
Trouble-shooting success with 1047 Games
Stunning graphics, smooth gameplay and a massive fun factor have earned first-person-shooter game Splitgate rave reviews and an impressive 164 million downloads since its beta test just a few short months ago. Described as ‘Halo-meets-Portal’, the game sees its players run around in a sci-fi world and jump location through portals to outsmart opponents in this fast-paced FPS. Nevada-based indie studio 1047 Games, the creative force behind Splitgate, raised over $100m in September, valuing the company at $1.5bn.
In July Splitgate released an open beta across PlayStation, Xbox, and Steam, where it quickly became the top trending game. Splitgate gathered more than one million downloads in its first six days, and player numbers exploded from 400 to more than 175,000 within weeks.
It was an amazing triumph for 1047 and its small independent team, but behind the scenes cracks began to appear. The game’s supporting infrastructure was struggling to cope with the scale and speed of explosive growth and the team found itself battling multiple server issues.
Chief Technology Officer and Co-founder Nicholas Bagamian says, “we were ecstatic that the game was so well-received but at the same time, the servers were crashing every day for the first few weeks, so it was pretty stressful.”
With the whole system failing when player numbers went over 50,000, the team of only four engineers had to act quickly. One emergency measure included fashioning together a no-frills throttle system to temporarily restrict player entry. Chief Executive and Co-founder Ian Proulx explains, “it was literally this big red button on our internal dashboard. When you clicked it, that stopped anyone signing-in. When you clicked it again, it would start re-admitting people.”
“We didn’t have time to automate it, so I was on ‘button duty’. Every five minutes, the number would reach 50,000 and I’d press to stop anybody coming in. Then, as people logged off and numbers started to drop, I’d re-open the doors. I did that for 24 hours straight!”
Unfortunately, that quick-fix ran into problems when the volume of users crashed the internal dashboard. “The throttle button was tied to the dashboard, so every time the dash went down, I was frantically thinking: ‘Did it crash with the doors open, or closed?’ Proulx recalls. “It was so stressful, because every hour or so, the dashboard would crash and I’d be left holding my breath, knowing we were letting in people when we couldn’t afford to, and praying the system would restart quickly so I could close-off entry.”
Soon afterwards, the small, but nimble team automated the throttle system, optimized their self-built internal dashboard and implemented a ‘first in, first out’ queuing system using a combination of Amazon Elastic Container Service (ECS), and a Redis queue on Amazon Elasticache.
Another headache was caused by an open source bidirectional and event-based communication service, which was used for the connection between the game client and the backend, with messages passed through Redis. This proved unreliable and was placing stress on the Redis clusters, causing it to topple over after 50,000 users.1047 spent a week trying to come up with a solution, as Bagamian explains, “We tried everything, including scaling-up the Redis cluster, but in the end, we had to get rid of Redis.”
Removing Redis and replacing it with Amazon Elastic Compute Cloud (EC2) unlocked a new, higher capacity and helped stability. This was particularly important, because user data hadn’t been correctly passed to users, meaning some players logging in to the game couldn’t see their stats and thought they’d lost all the skins they’d paid for.
Another element causing crashes was the matchmaker that was built in-house by the 1047 team, according to Bagamian. “It was overwhelmed and would topple over and not be able to place games,” he says. “When the matchmakers crashed, the queue for matchmaking clogged up, got way too big and couldn’t recover. Players were left sitting in the main menu, searching endlessly and we’d end up having to take the servers offline.” The solution was to take out various pieces and optimize the queries so that it could be stabilized.
To bolster its infrastructure, 1047 worked with its supporting AWS team to enhance its existing solution, evaluating optimizations and increasing resources where needed. “After a week of trying to fight fires, we were reaching out to everybody to say: ‘What do we do? We’ve never handled this type of scale before. We need help!’” Bagamian recalls. “Our account team was very helpful. About a week into our massive blow up, we all hopped on a Discord chat. It’s been great to have someone there.”
Today, 1047 hosts all of its backend services in Amazon ECS, AWS Fargate, and Amazon Container Registry (ECR). Amazon CloudWatch is used to monitor health of these services, and Amazon Simple Storage Service (S3) stores dynamically updated backend configuration files. To host its dedicated game servers and scale capacity depending on player demand, 1047 leverages Amazon Elastic Compute Cloud (EC2), and Amazon S3 stores its packaged game, game server, and logs. Amazon Aurora is also used to store matchmaking tickets, as well as state and availability of game servers and sessions.
The events of the past few months all seem a long way from 2017, when Proulx and Bagamian were computing science students and friends at Stanford university. Proulx came up with the concept of an FPS game that included portals, and persuaded Nicholas to help him build the first version of Splitgate. They named their company after the dorm room where they first met—1047.
After launching a beta version of the game in 2019 on PC, they earned a small but dedicated player-base and spent the next two years working on the game, gaining a bigger fanbase as they went.
Fast-forward to the present day and thanks to its transparency and honesty with the community, including keeping fans continuously updated about issues, the 1047 team’s bond with its player base is stronger than ever.
Now, with servers stabilized, 1047 is looking to the future, hiring more engineers and collaborating with AWS Select Technology Partner AccelByte to help release a stream of new features and other developments, including new ways of interacting with players, which will be unveiling over the coming months.