AWS Open Source Blog
Hygieia and the Capital One DevOps Journey
by Tapabrata Pal
AWS customers have created many interesting and useful open source projects which should reach a wider audience. I’m very happy to be able to promote them on this blog. Here we feature a guest post by Tapabrata (Topo) Pal of Capital One. – Adrian
Capital One is a top digital bank with millions of accounts, but our DNA is different from many other financial institutions. As a Capital One Senior Engineering Fellow contributing to teams across multiple LOBs, I’ve seen how our commitment to building our own software, to the public cloud, to microservices, to open source, to DevOps and Continuous Delivery, has set us apart.
Since day one, we have been on a journey to transform and distinguish ourselves as a tech company, not just as a digital bank. An important component of this journey has been our approach to DevOps and Continuous Delivery. Three factors have helped us develop our DevOps processes:
- We started early. We started our journey about five years ago. At the time, we were one of the earliest big “non-unicorn” enterprises to adopt DevOps. This has put us ahead in the game, allowing us time to develop and manage our processes organically.
- Leadership pushed the process. Many enterprises struggle with DevOps because their leadership is not fully onboard. Our case has been a bit of the opposite. Our leadership bought into the whole paradigm early in the process, fully supporting our teams from Day 1.
- Our engineering talent. We have a lot of good engineering practices and software engineers in-house. Our tech teams are mostly insourced, which is a major advantage we have over more outsourced companies.
Continuous Delivery and No Fear Releases
Five years ago, when we started bringing in lots of DevOps practices, our goal was to not only push software, but to push quality software. We embrace Continuous Delivery as an important part of reaching that goal. A major factor that has paved our path to success is our approach to “no fear releases.” There can be a lot of fear around releasing code to production. The devs who wrote the code can feel fearful that if their code breaks, it is on them to fix it. Product managers can feel fearful about delivering code to production that overwhelms and breaks a production server. Senior leaders can feel fearful about disrupting certain mission-critical services during peak hours or days.
At Capital One, there are a lot of things we’ve done to get out of that fearful mindset. Many of these have to do with test automation, blue/green deployment, and other techniques to provide checks and balances in our delivery pipeline. We want delivery teams to deploy to production during the daytime without any fear. We want them to know that if some of these deployments fail, it is okay because we are using an inactive area of the production box and controlling the traffic. This way you can deploy to production whenever you want, and if something fails, that is okay because you can recover fast and without fear.
Hygieia and Continuous Delivery
To build a Continuous Delivery software delivery pipeline, you end up using a lot of tools which can make managing the pipeline difficult. “Where is my software in the pipeline?” “Did I build correctly?” “Did I test correctly?” “Which test passed, which test failed, what security scan was okay and which was not okay?” But as a dev, product manager, QA person, or even as a senior leader you need to know where the code is in the pipeline. To understand the health of the pipeline, you need to know what state it is in, what the problems are, and what the bottlenecks are. And while all these different pieces of information are available in great detail in each individual tool, there is no single viewpoint from which you can view them together.
That single viewpoint is what we tried to create with our open source DevOps dashboard Hygieia. Hygieia collects key metrics from all of your pipeline tools and presents the data via a single dashboard view, helping you determine how healthy – or unhealthy – your pipeline is.
Why Did We Build Hygieia?
Early in our DevOps journey, we built a strategy based on three big pillars:
- Automate everything in our pipeline that we could.
- Shift everything left, meaning take all the steps in the later part of our delivery lifecycle and bring them up front, addressing bottlenecks before we ran into them.
- Implement total transparency and a fast feedback loop to provide feedback to stakeholders at every stage of the pipeline as quickly as possible.
We looked at many tools – both commercial and open source – to help develop this capability, but could not find one with all the features we needed. We started building ourselves an internal DevOps dashboard tool, and decided that, in keeping with our open source first policy, we should open source it.
Hygieia is made for large enterprises with multiple teams creating or building a product using a lot of tools on a pipeline with a lot of possible bottlenecks. This is a problem typical to large enterprises, regardless of industry. Capital One was in the right place at the right time to build Hygieia for two reasons:
- We are the target audience. As a large enterprise employing DevOps, we feel and understand the problems in adopting and scaling DevOps. This gives us good insight into how to be successful in the journey.
- We got in the game early and learned things that other organizations are only learning now. We included some of these learnings in the Hygieia dashboard by displaying certain opinionated metrics based on our experiences.
We are seeing an increased adoption of Hygieia across a variety of industries. As more large enterprises join our user community, the demand increases for additional “enterprisey” features. Our next big move will be to work with our user community to create an Executive Dashboard so executives from big enterprises can more easily access their data and use it to push the DevOps practices of their engineering teams.
We remain fully committed to keeping Hygieia open source. Increasing collaboration, bringing more big enterprises in, and creating a governing body for Hygieia are all goals for 2018. We want people to contribute not only code, but ideas and solutions as well to our DevOps and Executive Dashboards.
2018 and Beyond
In 2018, there are still a lot of companies sitting on the sidelines watching and trying to understand if they should jump into the DevOps movement. Additionally, companies that have already been practicing DevOps for years are trying to figure out where they can improve their process and expand it into newer areas such as microservices and serverless.
DevOps is not just about tools and technologies or a one-size-fits-all process – it’s about fast feedback, transparency, and innovation. In DevOps, success means that developers can deploy code to production without fear. Because the process never stops and there is no final product to be measured, you could almost say there is no success criteria as such. DevOps and Continuous Delivery is a continuous improvement process; we cannot say, “Hey, we’re successful, so let’s go home and have a party!” This applies equally to our code, the tools we build to manage that code, and to the concept of DevOps itself.
Topo Pal
Senior Director & Senior Engineering Fellow, Capital One
Tapabrata Pal has more than 20 years of IT experience in various technology roles (developer, operations engineer, and architect) in the retail, healthcare, and finance industries.
Over the last six years, Tapabrata has evangelized and led the company’s DevOps initiatives. He is currently Senior Director and Senior Engineering Fellow focused on DevOps, and Continuous Delivery at large scale in a regulated environment.
Tapabrata is also the community manager and core contributor to the Hygieia open source project.
Previously, Tapabrata spent some time in academics doing doctoral and postdoctoral research in the field of solid state physics.
The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.