AWS Partner Network (APN) Blog
Revolutionizing the Aftermarket: Building Data Products and Unlocking Insights with AWS Analytics
By Kalyan Kumar Neelampudi, Partner Solutions Architect – AWS
By Ram G Suri, Lead Consultant – Thoughtworks
By Hariharan Anantharaman, Sr. Consultant – Thoughtworks
Thoughtworks |
In ecommerce, measuring key performance indicators (KPIs) and tracking objective key results (OKRs) is vital for evaluating and measuring success, as business stakeholders seek ways to enhance operations and boost sales.
Designing and building a data analytics service for an ecommerce system can be a complex and challenging task.
In this post, we’ll discuss how Thoughtworks has leveraged its Amazon Web Services (AWS) expertise to make the data accessible via data products, unlock the hidden insights, and create intelligent AWS dashboards for stakeholders.
Thoughtworks is an AWS Premier Tier Services Partner and AWS Marketplace Seller with AWS Competencies in Machine Learning, Migration, DevOps, and Data and Analytics Consulting, and more.
Challenges Building a Data Analytics Service
Data products are the smallest unit of the architecture that democratize secure access to datasets. They democratize access to ubiquitous data and are oriented around domains such as orders, payments and products.
Data products abstract the complexity of implementing data pipelines, and they standardize centralized access to a federated set of data products by creating a standard and repeatable way of securely accessing data, securely onboarding new data, discovering data, and governing data.
Building a data analytics service often poses challenges:
- Data integration and management: Ecommerce businesses generate vast amounts of data from various sources such as customer interactions, clickstream analytics, marketing campaigns, supply chain data, and sales transactions. The challenge arises in integrating and consolidating data from multiple sources which may have different data formats, structures, and APIs.
- Data quality and consistency: Ensuring data quality and consistency is crucial for reliable analytics. Data can be incomplete, inconsistent, and contain errors. Overcoming these challenges involves employing cleansing techniques and data governance processes to cleanse, harmonize, and transform the data into a standardized format.
- Scalability: Ecommerce services generate large volumes of data that grow exponentially over time. Building a data analytics service that can scale to handle this growth is essential, and involves designing a scalable architecture that can handle increasing data volumes, accommodate concurrent users, and support high-performance processing and storage capabilities.
- Real-time data processing: Ecommerce businesses often require real-time analytics to gain immediate insights into customer behavior, product performance, inventory levels, and more. Processing and analyzing real-time data in a timely manner can be challenging due to the high velocity and volume of data.
- Data security and privacy: Ecommerce services handle sensitive customer information, including personal details and financial data. Building a data analytics service that prioritizes data security and privacy is vital, and involves implementing robust security measures, encryption techniques, access controls, and complying with data protection regulations like GDPR or CCPA.
- Data visualization and reporting: The ultimate goal of a data analytics service is to provide actionable insights to stakeholders. Presenting data in a visually appealing and intuitive manner is crucial for effective decision-making. Designing interactive dashboards, reports, and visualizations that cater to different user roles and applying row-level security can be challenging.
- Data governance and compliance: Ecommerce businesses must adhere to data governance policies and follow industry regulations. Establishing data governance frameworks, defining data ownership, ensuring data lineage, and implementing data retention policies can be complex tasks.
- Continuous improvement: An effective data analytics service requires continuous improvement and adaptation to evolving business needs. It requires monitoring and evaluating the service’s performance, gathering user feedback, identifying areas of improvement, and implementing enhancements to provide better analytics capabilities.
How Thoughtworks Builds Data Products
AWS offers a powerful suite of services to handle data analytics, including Amazon Kinesis, Amazon EMR, AWS Glue, Amazon QuickSight, Amazon S3-based data lakes, and Amazon CloudWatch. These services help Thoughtworks transform and process data from multiple sources and create smart dashboards to visualize data effectively.
Thoughtworks’ approach to building a data analytics service is focused on delivering a comprehensive and scalable solution, following a systematic process involving these steps.
Figure 1 – Systematic approach to build data analytics.
- Identify: Understanding business use cases, thorough planning, and robust architecture design is essential. Thoughtworks conducted a detailed analysis of data sources and identified the datasets and their domains within their bounded context.
- Acquire: Once the data sources are identified, Thoughtworks builds a framework leveraging AWS Database Migration Service (AWS DMS) infrastructure as code (IaC) that connects to various domain data sources and collects operational and domain data from their source systems. For a comprehensive list of data sources supported by AWS DMS, visit the official documentation.
- Clean, curate, apply transformation, and store: Thoughtworks leveraged Apache Spark on Amazon EMR to read data, transform, harmonize, and store the final clean and processed data as a domain-oriented data product. Using the EMR-optimized Spark runtime creates faster performance and leads to cost savings. Throughout this process, Thoughtworks prioritizes data security and privacy, implementing robust measures such as encryption, access controls, and anonymization techniques to safeguard sensitive information and align with relevant regulations.
- Model: Thoughtworks has built data products tailored to specific consumer needs, such as clickstream analytics, sales analytics, marketing analytics, and financial analytics. Its data products focusing on a particular domain are known as “domain-driven” data products while data products tailored to a specific need are known as “fit for purpose” data products.
- Deploy: Thoughtworks integrates logging, monitoring, and alerting within its data products and deploys using CI/CD pipelines. The products conduct hypothesis testing and perform exploratory data analysis.
- Execute and visualize: Using Amazon QuickSight, Thoughtworks visualizes the insights crucial for data-driven decisions. QuickSight’s intuitive data visualization and intelligent reporting drive aftermarket KPIs and OKRs. These models also support predicting, forecasting, and pattern discovery analysis.
Figure 2 – Sample dashboard built using mock data to reflect best practices in data visualization.
Finally, Thoughtworks’ approach involves continuous testing, iteration, and improvement, all of which is managed through CI/CD pipelines. It uses AWS CodePipeline for streamlined and automated development workflows, and conducts rigorous testing to help verify the accuracy and reliability of the service. Thoughtworks actively gathers feedback from users and stakeholders to enhance product functionality and usability.
Solutioning with AWS Data Analytics Services
Overall architecture is modularized and has the following components:
- Ingest and load data: This component handles ingesting the data from multiple sources and makes it available for downstream consumers or data products. AWS DMS loads the data from multiple data sources into an S3 bucket, and applies the domain boundary while loading data in the output. The component then converts data into a common format (Parquet) and calls it as a raw data product. Once the data is loaded into raw, it applies the process of cleansing, anonymization, and harmonization and calls it as a native data product. AWS Deequ integration filters out the anomalous records in order to ensure the data quality at both the input as well as output ports.
- Transform, process, and store data as data product: This handles reading the data from multiple native data products and, depending on use case, generally applies the complex transformations and data modelling to achieve the final output, which is model layer, consumer-aligned data product, or fit-for-purpose data product. The data product output can be consumed by human user or system user depending upon required privileges. The access is controlled by proper access control lists (ACLs).
- Analyze and consume insights or consume data product output: This handles reading the data from multiple data product output ports and unlocks insights from the data product. The output of a data product can be used by any visualization tool; in this case Amazon QuickSight and the visualization dashboard is structured into three parts:
- Strategic-level KPIs: This is the most distilled measure of a metric’s impact on overall business performance.
- Current health status: This analyzes top-level health metrics over time, revealing performance, patterns, and providing a comprehensive overview of business health evolution.
- Controllers/manipulators: Different angles to analyze and adjust data story focus and context.
- Control unit: This component handles the cross-cutting concerns like logging, monitoring, and alerting of the data product. In case of any technical failures, consumers get notifications via Amazon CloudWatch alerts. In the case of ecommerce KPIs not meeting required thresholds or benchmarks, then consumers also get proactive alerts.
Figure 3 – Architecture leveraging data-as-a-product principle.
Key Learnings
- Thoughtworks learned not to overprovision replication instances for managed AWS DMS infrastructure and instead test and anticipate data volumes to determine the right sizing. Also, AWS recently launched AWS Database Migration Service Serverless that automatically provisions and scales capacity for migration and data replication.
- Develop disaster recovery and high availability strategies for data and service resilience.
- Time to market is key, and Thoughtworks’ framework expedited new data product onboarding.
- Include clear metric definitions in dashboards to avoid manual intervention in understanding KPIs.
- Conduct exploratory data analysis (EDA) and use statistical and graphical tools to uncover patterns and insights.
- Verify that dashboards effectively communicate data insights to both technical and business stakeholders.
- Shifting left in testing enhances data quality in data pipelines. Integrating AWS Deequ checks that issues are identified early and include robust data quality in the data product output. AWS Glue Data Quality can be used to replace AWS Deequ in this solution to help maintain overall data quality and prevent bad data in both data at rest in the catalog and data in transit within ETL jobs.
- Acquire knowledge of data security practices, including encryption, access control, and compliance with data protection regulations like GDPR.
- Be prepared to adapt the service to evolving business needs and data requirements.
Results, Outcomes, and Customer Success
- Organization-level alignment: Thoughtworks established organizational alignment by providing a consistent view of ecommerce metrics and KPIs. This standardized approach enables all stakeholders to access data through the data catalog using common business terminology.
- Identify new opportunities: The web analytics data product has helped businesses identify new growth opportunities, including new markets, channels, and products.
- Increased agility: Decision-making and time to market have significantly improved. Businesses can now make faster, informed decisions thanks to the interactive dashboards, thus eliminating the need to wait for weekly reports in Excel and presentations.
- Higher conversion rate: These business insights have empowered domain teams to take appropriate actions, decreasing the time to successful searches, increasing search conversion rates, reducing cart abandonment, and boosting overall revenue in ecommerce.
- Increased operational efficiency via data products: Integrating data from multiple sources has significantly reduced efforts in reconciliation and manual report building. Automated data pipelines have enhanced data accuracy, providing stakeholders with timely and accurate information.
- Reduced costs: Insights from the order analytics data product help businesses identify cost-saving opportunities, optimize supply chains, and enhance customer service, thereby improving fulfillment rates.
- Improved customer satisfaction: Insights from CSAT and the email marketing data product enable businesses to enhance customer satisfaction by offering better products and services.
Conclusion
Thoughtworks’ adoption of data products as the fundamental building blocks of its data architecture has yielded substantial benefits and transformative outcomes.
Amazon EMR has provided seamless integration, optimization, and customization, maximizing the benefits of data processing and transformations for superior performance and efficiency. AWS Glue is also a good fit, and this specific scenario demanded additional configurability across multiple segments. Apache Spark’s capabilities, combined with Amazon EMR’s flexibility, enabled Thoughtworks to effectively tackle these complexities.
Using Amazon QuickSight has empowered Thoughtworks to harness the full potential of data, enabling the team to derive valuable insights and drive informed decisions. They have seen remarkable improvements in decision-making, customer satisfaction, and revenue generation with quick access to actionable insights and a reduction in manual effort.
As Thoughtworks continues to evolve, the adoption of data products remains pivotal in its quest for data-driven excellence. This journey has already enhanced data capabilities and empowered the organization to navigate the dynamic ecommerce landscape with greater efficiency and effectiveness.
Thoughtworks – AWS Partner Spotlight
Thoughtworks is an AWS Premier Tier Services Partner with AWS Competencies in Machine Learning, Migration, DevOps, and Data and Analytics Consulting, and more.