AWS Case Study: Euclid

About Euclid

Euclid, a fast-growing technology start-up, helps brick-and-mortar retailers optimize marketing, merchandising, and operations performance by measuring foot traffic, store visits, walk-by conversion, bounce rate, visit duration, and customer loyalty. The company has a network of traffic counting sensors in nearly 400 shopping centers, malls, and street locations around the United States. Euclid provides aggregated, anonymous data that measures more than 21 million shopping sessions a month across the United States in specialty retail, department stores, shopping malls, and big-box retail. Euclid is located in Palo Alto, CA.

The Challenge

Euclid analyzes customer movement data to correlate traffic with marketing campaigns and to help retailers optimize hours for peak traffic, among other activities. The company processes data at several levels of abstraction to identify and analyze key data characteristics. Euclid’s analysts run SQL queries, create new algorithms, and produce custom reports. This type of work requires a flexible computing environment that can adjust to spikes in utilization. Additionally, as Ken Leung, Co-Founder and CTO, explains, “Our analytics team constantly experiments with new heuristics. If something really works or if there’s a fix that needs to be put into historical data, we might have to re-compute up to 18 months of customer data. That requires a lot of computational power, which spikes traffic. We need resources that can scale up on demand and scale down when we don’t need it.”

Why Amazon Web Services

Euclid uses technology provided by Heroku—an Advanced Technology Partner in the Amazon Partner Network (APN)—for its web layer and has always run on Amazon Web Services (AWS). Euclid stores information on Amazon Simple Storage Service (Amazon S3), and processes data in parallel with Amazon Elastic MapReduce (Amazon EMR).

Initially, the company ran its data store on MySQL but moved to Amazon Redshift to improve performance for analytic workloads. “Using Amazon Redshift, our analysts can work with large data sets and run SQL-based queries to our stack quickly,” Dexin Wang, Director of Platform Engineering reports. “We were totally amazed at the speed—a simple count of rows that would take 5 1/2 hours using MySQL only took 30 seconds with Amazon Redshift.” Wang estimates that it only took a few days to port production data over to Amazon Redshift and start running analysis on it. “Amazon Redshift is very easy to scale with minimal management requirements,” he comments. “It’s also cost effective. We saw a 90 percent cost reduction moving from our previous database system to Amazon Redshift.”

Euclid’s engineers write and test code directly on Amazon EC2 instances. This allows them to work from any location, access unlimited bandwidth, and move data around easily and quickly at a speed of up to 25 MB per second. Euclid uses three Amazon EC2 instances to process data and AWS Elastic Beanstalk for load balancing and auto scaling. Euclid stores up to 30 GB of uncompressed data per day in Amazon S3.

The analytics team leverages Amazon EMR and Hadoop to aggregate and analyze data. “Amazon EMR does most of the heavy lifting,” says Leung. “I used Hadoop in my previous work and we had to spend time installing and managing the cluster. We don’t have to do that with AWS. We only use the service when we need it, which is a great cost savings.” Figure 1 below demonstrates Euclid’s environment on AWS.

The Benefits

“From our very early days, AWS allowed Euclid to be agile and move quickly,” says Leung. “When we began, we were two guys and unfunded. Even after getting funding, our resources were limited. Our total spend on AWS was less than $1,000. That allowed us to focus on developing a process, pivot to take advantage of business opportunities, and try different solutions for customers rapidly without committing resources.”

As the company continues to grow, it takes advantage of Amazon Redshift and Amazon EMR to run complex queries on large and growing data sets with improved performance. “We’ve collected 1 to 30 GB of data per day over the last three years,” notes Leung. “By running on AWS and taking advantage of Amazon Redshift, we can scale to provide the computational power to complete a task on our entire data set, tens of terabytes, in a couple of hours—a task that used to take two weeks. Overall, compared to what we would have to spend to build an infrastructure capable of meeting our peak compute load requirements, we’re saving 80 to 90 percent using AWS.”

Wang adds, “We didn’t want to worry about infrastructure or scaling. We just want to be able to ask questions and get answers. AWS helps us get answers quickly.”

Next Step

To learn more about how AWS can help your data workload needs, visit our Big Data on AWS details page: http://thinkwithwp.com/big-data/.