AWS Machine Learning Blog

Category: Amazon SageMaker Data Wrangler

Prepare data from Databricks for machine learning using Amazon SageMaker Data Wrangler

Data science and data engineering teams spend a significant portion of their time in the data preparation phase of a machine learning (ML) lifecycle performing data selection, cleaning, and transformation steps. It’s a necessary and important step of any ML workflow in order to generate meaningful insights and predictions, because bad or low-quality data greatly […]

SageMaker Data Wrangler Risk Modeling

Build a mental health machine learning risk model using Amazon SageMaker Data Wrangler

This post is co-written by Shibangi Saha, Data Scientist, and Graciela Kravtzov, Co-Founder and CTO, of Equilibrium Point. Many individuals are experiencing new symptoms of mental illness, such as stress, anxiety, depression, substance use, and post-traumatic stress disorder (PTSD). According to Kaiser Family Foundation, about half of adults (47%) nationwide have reported negative mental health […]

Amazon SageMaker Autopilot now supports time series data

Amazon SageMaker Autopilot automatically builds, trains, and tunes the best machine learning (ML) models based on your data, while allowing you to maintain full control and visibility. We have recently announced support for time series data in Autopilot. You can use Autopilot to tackle regression and classification tasks on time series data, or sequence data […]

season-trend decomposition

Prepare time series data with Amazon SageMaker Data Wrangler

Time series data is widely present in our lives. Stock prices, house prices, weather information, and sales data captured over time are just a few examples. As businesses increasingly look for new ways to gain meaningful insights from time-series data, the ability to visualize data and apply desired transformations are fundamental steps. However, time-series data […]

Balance your data for machine learning with Amazon SageMaker Data Wrangler

August 2023: This post was reviewed for accuracy. Amazon SageMaker Data Wrangler is a new capability of Amazon SageMaker that makes it faster for data scientists and engineers to prepare data for machine learning (ML) applications by using a visual interface. It contains over 300 built-in data transformations so you can quickly normalize, transform, and […]

Launch processing jobs with a few clicks using Amazon SageMaker Data Wrangler

August 2023: This post was reviewed for accuracy. Amazon SageMaker Data Wrangler makes it faster for data scientists and engineers to prepare data for machine learning (ML) applications by using a visual interface. Previously, when you created a Data Wrangler data flow, you could choose different export options to easily integrate that data flow into […]

Prepare and analyze JSON and ORC data with Amazon SageMaker Data Wrangler

Amazon SageMaker Data Wrangler is a new capability of Amazon SageMaker that makes it faster for data scientists and engineers to prepare data for machine learning (ML) applications via a visual interface. Data preparation is a crucial step of the ML lifecycle, and Data Wrangler provides an end-to-end solution to import, prepare, transform, featurize, and […]

Plan the locations of green car charging stations with an Amazon SageMaker built-in algorithm

While the fuel economy of new gasoline or diesel-powered vehicles improves every year, green vehicles are considered even more environmentally friendly because they’re powered by alternative fuel or electricity. Hybrid electric vehicles (HEVs), battery only electric vehicles (BEVs), fuel cell electric vehicles (FCEVs), hydrogen cars, and solar cars are all considered types of green vehicles. […]

Accelerate data preparation using Amazon SageMaker Data Wrangler for diabetic patient readmission prediction

Patient readmission to hospital after prior visits for the same disease results in an additional burden on healthcare providers, the health system, and patients. Machine learning (ML) models, if built and trained properly, can help understand reasons for readmission, and predict readmission accurately. ML could allow providers to create better treatment plans and care, which […]

Schedule an Amazon SageMaker Data Wrangler flow to process new data periodically using AWS Lambda functions

Data scientists can spend up to 80% of their time preparing data for machine learning (ML) projects. This preparation process is largely undifferentiated and tedious work, and can involve multiple programming APIs and custom libraries. Announced at AWS re:Invent 2020, Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for […]