AWS Messaging & Targeting Blog
Streaming Events from Amazon Pinpoint to Redshift
Note: This post was originally written by Ryan Idrigo-Lam, one of the founding members of the Amazon Pinpoint team.
You can use Amazon Pinpoint to segment, target, and engage with your customers directly from the console. The Pinpoint console also includes a variety of dashboards that you can use to keep track of how your customers use your applications, and measure how likely your customers are to engage with the messages you send them.
Some Pinpoint customers, however, have use cases that require a bit more than what these dashboards have to offer. For example, some customers want to join their Pinpoint data to external data sets, or to collect historical data beyond the six month window that Pinpoint retains. To help customers meet these needs, and many more, Amazon Pinpoint includes a feature called Event Streams.
This article provides information about using Event Streams to export your data from Amazon Pinpoint and into a high-performance Amazon Redshift database. Once your data is in Redshift, you can run queries against it, join it with other data sets, use it as a data source for analytics and data visualization tools, and much more.
Step 1: Create a Redshift Cluster
The first step in this process involves creating a new Redshift cluster to store your data. You can complete this step in a few clicks by using the Amazon Redshift console. For more information, see Managing Clusters Using the Console in the Amazon Redshift Cluster Management Guide.
When you create the new cluster, make a note of the values you specify for the Cluster Identifier, Database Name, Master User Name, and Master User Password. You’ll use all of these values when you set up Amazon Kinesis Firehose in the next section.
Step 2: Create a Firehose Delivery Stream with a Redshift Destination
After you create your Redshift cluster, you can create the Amazon Kinesis Data Firehose delivery stream that will deliver your Pinpoint data to the Redshift cluster.
To create the Kinesis Data Firehose delivery stream
- Open the Amazon Kinesis Data Firehose console at https://console.thinkwithwp.com/firehose/home.
- Choose Create delivery stream.
- For Delivery stream name, type a name.
- Under Choose source, for Source, choose Direct PUT or other sources. Choose Next.
- On the Process records page, do the following:
- Under Transform source records with AWS Lambda, choose Enabled if you want to use a Lambda function to transform the data before Firehose loads it into Redshift. Otherwise, choose Disabled.
- Under Convert record format, choose Disabled, and then choose Next.
- On the Choose destination page, do the following:
- For Destination, choose Amazon Redshift.
- Under Amazon Redshift destination, specify the Cluster name, User name, Password, and Database for the Redshift database you created earlier. Also specify a name for the Table.
- Under Intermediate S3 destination, choose an S3 bucket to store data in. Alternatively, choose Create new to create a new bucket. Choose Next.
- On the Configure settings page, do the following:
- Under IAM role, choose an IAM role that Firehose can use to access your S3 bucket and KMS key. Alternatively, you can have the Firehose console create a new role. Choose Next.
- On the Review page, confirm the settings you specified on the previous pages. If the settings are correct, choose Create delivery stream.
Step 3: Create a JSONPaths file
The next step in this process is to create a JSONPaths file and upload it to an Amazon S3 bucket. You use the JSONPaths file to tell Amazon Redshift how to interpret the unstructured JSON that Amazon Pinpoint provides.
To create a JSONPaths file and upload it to Amazon S3
- In a text editor, create a new file.
- Paste the following code into the text file:
- Modify the preceding code example to include the fields that you want to import into Redshift.
Note: You can specify custom attributes or metrics by replacingmy_custom_attribute
ormy_custom_metric
in the example above with your custom attributes or metrics, respectively. - When you finish modifying the code example, remove all whitespace, including spaces and line breaks, from the file. Save the file as
json-paths.json
. - Open the Amazon S3 console at https://s3.console.thinkwithwp.com/s3/home.
- Choose the S3 bucket you created when you set up the Firehose stream. Upload
json-paths.json
into the bucket.
Step 4: Configure the table in Redshift
At this point, it’s time to finish setting up your Redshift database. In this section, you’ll create a table in the Redshift cluster you created earlier. The columns in this table mirror the values you specified in the JSONPaths file in the previous section.
- Connect to your Redshift cluster by using a database tool such as SQL Workbench/J. For more information about connecting to a cluster, see Connect to the Cluster in the Amazon Redshift Getting Started Guide.
- Create a new table that contains a column for each field in the JSONPaths file you created in the preceding section. You can use the following example as a template.
Step 5: Configure the Firehose Stream
You’re getting close! At this point, you’re ready to point the Kinesis Data Firehose stream to your JSONPaths file so that Redshift parses the incoming data properly. You also need to list the columns of the table that your data will be copied into.
To configure the Firehose Stream
- Open the Amazon Kinesis Data Firehose console at https://console.thinkwithwp.com/firehose/home.
- In the list of delivery streams, choose the delivery stream you created earlier.
- On the Details tab, choose Edit.
- Under Amazon Redshift destination, for COPY options, paste the following:
- Replace
s3-bucket
in the preceding code example with the path to the S3 bucket that containsjson-paths.json
. - For Columns, list all of the columns that are present in the JSONPaths file you created earlier. Specify the column names in the same order as they’re listed in the
json-paths.json
file, using commas to separate the column names. When you finish, choose Save.
Step 6: Enable Event Streams in Amazon Pinpoint
The only thing left to do now is to tell Amazon Pinpoint to start sending data to Amazon Kinesis.
To enable Event Streaming in Amazon Pinpoint
- Open the Amazon Pinpoint console at https://console.thinkwithwp.com/pinpoint/home.
- Choose the application or project that you want to enable event streams for.
- In the navigation pane, choose Settings.
- On the Event stream tab, choose Enable streaming of events to Amazon Kinesis.
- Under Stream to Amazon Kinesis, select Send events to an Amazon Kinesis Firehose delivery stream.
- For Amazon Kinesis Firehose delivery stream, choose the stream you created earlier.
- For IAM role, choose an existing role that allows the
firehose:PutRecordBatch
action, or choose Automatically create a role to have Amazon Pinpoint create a role with the appropriate permissions. If you choose to have Amazon Pinpoint create a role for you, type a name for the role. Choose Save.
That’s it! Once you complete this final step, Amazon Pinpoint starts exporting the data you specified into your Redshift cluster.
I hope this walk through was helpful. If you have any questions, please let us know in the comments or in the Amazon Pinpoint forum.