AWS Open Source Blog
The Wheel
Keeping the AWS cloud operating efficiently worldwide is a big job. The team is growing rapidly, and it’s important that new leaders internalize the high bar we set for service operations, and that our senior leaders are able to periodically inspect what’s happening deep in the stack.
For over ten years, one of our mechanisms for doing so has been a two-hour weekly meeting, attended by senior leaders, all service general managers, and many engineers, to review service metrics, raise and resolve issues, and share best practices. With more than one hundred services now available from AWS, it has become a big meeting.
In the early days, we’d look at the metrics for every service team (such as Amazon S3 and Amazon EC2) every week. But, as we added more and more services, it became impossible to cover them all in a single meeting. A simple roster would have ensured that every team reported regularly, but we wanted every team to be prepared to report every week, and their leaders to know the details (whether or not they were called to present in the big forum). And thus The Wheel was born.
It started out as a handmade “wheel of fortune” – take a spin and take your chances, just like at a county fair.
Our weekly meeting is typically divided into fifteen-minute slots. Some slots are used for deep dives on particular events, but, for most of them, we spin the wheel. The team that’s chosen then walks through their operational dashboard, explaining their operational performance and answering questions posed by experienced operations leaders in the room.
The physical wheel only took us so far: it was difficult to keep updating it as new services were added, and at some point we simply couldn’t fit any more slots on it. We’re engineers: we solved this with technology.
In a re:Invent talk titled How AWS Runs Our Weekly Operations Meetings, David Lubell and Kevin Miller made the public debut of the software version of The Wheel, and announced that version 1.4 had just been open sourced. This open source release was brought to you by Amit, Dan, Dave, Jeff, Lukasz, Xiujin, and Matt, a few members of our team that builds automation to help AWS services operate well at scale.
The Wheel was implemented using AWS Lambda, Amazon API Gateway, Amazon Cognito, and AWS Cloud Formation with a JavaScript user interface. This new version includes some refinements over the early wheels, such as the concept of weighted randomization to reduce the probability of recent choices being re-selected.
You can also add humor to the selection process:
One improvement we haven’t had time to look into yet ourselves would be a customizable appearance – we’ve talked about a slot machine look, or a fortune-telling robot, or even a 3D version.
For our customers, The Wheel is a peek into how AWS operates our services at scale, and it demonstrates that solutions don’t always need to be complicated to be effective. But, even if you’re not managing a weekly operations meeting, you might find The Wheel useful yourself: perhaps to select a note taker for this week’s team meeting, or who gets to wash the dishes at home.
Ready to try it out and spin up your own version of The Wheel? Get it on GitHub. We’d love to see contributions from the community, and to hear about how you’re using it.
Thanks to Julian Wood for his blog post about Dave Lubell and Kevin Miller’s re:Invent talk.