AWS Big Data Blog
Tag: Amazon EMR
Streaming Analytics with DataTorrent RTS and Amazon EMR
Nick Durkin is a Senior Solution Engineer for DataTorrent. DataTorrent is an AWS Technology Partner. In this blog post, we introduce fast big data and provide context about the DataTorrent RTS streaming analytics platform. In addition, we show you how to implement a real-time, streaming analytics application for capturing social media trends from Twitter using […]
Launching and Running an Amazon EMR Cluster inside a VPC
NOTE: This article contains information and instructions only pertinent to older EMR releases (emr-4.6.0 and earlier) and may no longer be applicable. For latest information please refer to the current user guide. Daniel Garrison is a Big Data Support Engineer for Amazon Web Services Introduction With Amazon EC2 now firmly in the VPC-by-default model, it’s […]
Using Amazon EMR and Hunk for Rapid Response Log Analysis and Review
Patrick Shumate is a Solutions Architect for AWS. Introduction It is fairly common to collect access and application logs but never interactively review them. Monitoring dashboards, coupled with well-instrumented applications, allow operators to manage day-to-day operations without ever digging into the flood of logs silently stored in Amazon S3. That works until the monitoring dashboard […]
Using IPython Notebook to Analyze Data with Amazon EMR
Manjeet Chayel is a Solutions Architect with AWS IPython Notebook is a web-based interactive environment that lets you combine code, code execution, mathematical functions, rich documentation, plots, and other elements into a single document. In the background, IPython Notebook stores this information as a JSON document. The main advantage of a notebook when compared to […]
Running Apache Accumulo on Amazon EMR
Manjeet Chayel is a Solutions Architect with Amazon Web Services This post was co-authored by Matt Yanchyshyn, a Principal Solutions Architect with Amazon Web Services Apache Accumulo is a sorted, distributed key-value store that is built on top of Apache Hadoop, Zookeeper, and Thrift. Accumulo was originally modeled after Google’s BigTable and can scale to […]
Getting Started with Elasticsearch and Kibana on Amazon EMR
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Hernan Vivani is a Big Data Support Engineer for Amazon Web Services This post shows you how to install Elasticsearch and Kibana on an Amazon EMR cluster and provides a few simple ways to confirm it is working. (Please also […]
Strategies for Reducing Your Amazon EMR Costs
UPDATE, MAY 2019: We have updated the Amazon EC2 Spot pricing model as of November, 2017. The new pricing model simplifies purchasing without bidding and with fewer interruptions. Learn more about the updated pricing model. —————————————————— This is a guest post by Prateek Gupta, a lead engineer at BloomReach BloomReach has built a personalized discovery […]
Node.js Streaming MapReduce with Amazon EMR
Ian Meyers is a Solutions Architecture Senior Manager with AWS Introduction Node.js is a JavaScript framework for running high performance server-side applications based upon non-blocking I/O and an asynchronous, event-driven processing model. When customers need to process large volumes of complex data, Node.js offers a runtime that natively supports the JSON data structure. Languages such […]
Building and Running a Recommendation Engine at Any Scale
This is a guest post by K Young, co-founder and CEO of Mortar Data. Mortar Data is an AWS advanced technology partner. UPDATE: MortarData has transitioned into Datadog and has wound down the public Mortar service. The tutorial below no longer works. To learn more about building a recommendation engine on AWS, see Building a […]
Getting HBase Running on Amazon EMR and Connecting it to Amazon Kinesis
Wangechi Doble is an AWS Solutions Architect Introduction Apache HBase is an open-source, column-oriented, distributed NoSQL database that runs on the Apache Hadoop framework. In the AWS Cloud, you can choose to deploy Apache HBase on Amazon Elastic Compute Cloud (Amazon EC2) and manage it yourself or leverage Apache HBase as a managed service on […]