AWS Big Data Blog

Data load made easy and secure in Amazon Redshift using Query Editor V2

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data efficiently and securely. Users such as data analysts, database developers, and data scientists use SQL to analyze their data in Amazon Redshift data warehouses. Amazon Redshift provides a web-based Query Editor V2 in addition to supporting connectivity via ODBC/JDBC or the Amazon Redshift Data API.

Amazon Redshift Query Editor V2 makes it easy to query your data using SQL and gain insights by visualizing your results using charts and graphs with a few clicks. With Query Editor V2, you can collaborate with team members by easily sharing saved queries, results, and analyses in a secure way.

Analysts performing ad hoc analyses in their workspace need to load sample data in Amazon Redshift by creating a table and load data from desktop. They want to join that data with the curated data in their data warehouse. Data engineers and data scientists have test data, and want to load data into Amazon Redshift for their machine learning (ML) or analytics use cases.

In this post, we walk through a new feature in Query Editor V2 to easily load data files either from your local desktop or Amazon Simple Storage Service (Amazon S3).

Prerequisites

Complete the following prerequisite steps:

    1. Create an Amazon Redshift provisioned cluster or Serverless endpoint.
    2. Provide access to Query Editor V2 for your end-users. To enable your users to access Query Editor V2 using IAM, as an administrator, you can attach one of the following AWS-managed policies to the AWS Identity and Access Management (IAM) user or role to grant permission:
      • AmazonRedshiftQueryEditorV2FullAccess – Grants full access to the Query Editor V2 operations and resources.
      • AmazonRedshiftQueryEditorV2NoSharing – Grants the ability to work with Query Editor V2 without sharing resources.
      • AmazonRedshiftQueryEditorV2ReadSharing – Grants the ability to work with Query Editor V2 with limited sharing of resources. The granted principal can read the resources shared with its team but can’t update them.
      • AmazonRedshiftQueryEditorV2ReadWriteSharing – Grants the ability to work with Query Editor V2 with sharing of resources. The granted principal can read and update the resources shared with its team.
    3. Provide access to the S3 bucket to load data from a local desktop file.
      • To enable your users to load data from a local desktop using Query Editor V2, as an administrator, you have to specify a common S3 bucket, and the user account must be configured with proper permissions. You can use the following IAM policy as an example to configure your IAM user or role:
        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Action": [
                        "s3:ListBucket",
                        "s3:GetBucketLocation"
                    ],
                    "Resource": [
                        "arn:aws:s3:::<staging-bucket-name>>"
                    ]
                },
                {
                    "Effect": "Allow",
                    "Action": [
                        "s3:PutObject",
                        "s3:GetObject",
                        "s3:DeleteObject"
                    ],
                    "Resource": [
                        "arn:aws:s3:::<staging-bucket-name>[/<optional-prefix>]/${aws:userid}/*"
                    ]
                }
            ]
        }
        
      • It’s also recommended to have proper separation of data access when loading data files from your local desktop. You can use the following S3 bucket policy as an example to separate data access between users of the staging bucket you configured:
        {
         "Version": "2012-10-17",
            "Statement": [
                {"Sid": "userIdPolicy",
                    "Effect": "Deny",
                    "Principal": "*",
                    "Action": ["s3:PutObject",
                               "s3:GetObject",
                               "s3:DeleteObject"],
                    "NotResource": [
                        "arn:aws:s3:::<staging-bucket-name>[/<optional-prefix>]/${aws:userid}/*"
                    ]
                 }
            ]
        }
        

Configure Query Editor V2 for your AWS account

As an admin, you must first configure Query Editor V2 before providing access to your end-users. On the Amazon Redshift console, choose Query editor v2 in the navigation pane.

If you’re accessing Query Editor v2 for the first time, you must configure your account by providing AWS Key Management Service (AWS KMS) encryption and, optionally, an S3 bucket.

By default, an AWS-owned key is used to encrypt resources. Optionally, you can create a symmetric customer managed key to encrypt Query Editor V2 resources such as saved queries and query results using the AWS KMS console or AWS KMS API operations.

The S3 bucket URI is required when loading data from your local desktop. You can provide the S3 URI of the same bucket that you configured earlier as a prerequisite.

Configure-QEv2

If you have previously configured Query Editor V2 with only AWS KMS encryption, you can choose Account Settings after launching the interface to update the S3 URI to support loading from your local desktop.

Configure-QEv2

Load data from your local desktop

Users such as data analysts, database developers, and data scientists can now load local files up to 5 MB in size into Amazon Redshift tables from Query Editor V2, without using the COPY command. The supported data formats are CSV, JSON, DELIMITER, FIXEDWIDTH, SHAPEFILE, AVRO, PARQUET, and ORC. Complete the following steps:

      1. On the Amazon Redshift console, navigate to Query Editor V2.
      2. Click on Load data.
        load data
      3. Choose Load from local file and Browse to choose a local file. You can download the student_info.csv file to use as an example.
      4. If your file has column headers as the first row, keep the default selection of Ignore header rows as 1 to ignore first row.
      5. If your file has date columns, choose Data conversion parameters.
        browse and format file
      6. Select Date format, set it to auto and choose Next.
        date format
      7. Choose Load new table to automatically infer the file schema.
      8. Specify the values for Cluster or workgroup, Database, Schema, and Table (for example, Student_info) to load data to.
      9. Choose Create table.
        create-table

A success message appears that the table was created. Now you can load data into the newly created table from a local file.

      1. Choose Load data.
        table created

A message appears that the data load was successful.

      1. Query the Student_info table to see the data.
        query data

Load data from Amazon S3

You can easily load data from Amazon S3 into an Amazon Redshift table using Query Editor V2. Complete the following steps:

      1. On the Amazon Redshift console, launch Query Editor V2 and connect to your cluster.
      2. Browse to the database name (for example, dev), the public schema, and expand Tables.
      3. You can automatically infer the schema of a S3 file similar to Load from local file option shown above however for this demo, we will also show you how to load data to an existing table. Run the following create table script to make a sample table (for this example, public.customer):
CREATE TABLE customer ( 
	c_custkey int8 NOT NULL , 
	c_name varchar(25) NOT NULL, 
	c_address varchar(40) NOT NULL, 
	c_nationkey int4 NOT NULL, 
	c_phone char(15) NOT NULL, 
	c_acctbal numeric(12,2) NOT NULL, 
	c_mktsegment char(10) NOT NULL, 
	c_comment varchar(117) NOT NULL, 
PRIMARY Key(C_CUSTKEY) 
) DISTKEY(c_custkey) sortkey(c_custkey);
      1. Choose Load data.
        Create-Table
      2. Choose Load from S3 bucket.
      3. For this post, we load data from the TPCH Sample data GitHub repo, so for the S3 URI, enter s3://redshift-downloads/TPC-H/2.18/10GB/customer.tbl.
      4. For S3 file location, choose us-east-1.
      5. For File format, choose Delimiter.
      6. For Delimiter character, enter |.
        Load from S3
      7. Choose Data conversion parameters, then select Time format and Date format as auto.
      8. Choose Back.

Refer to Data conversion parameters for more details.

Date Time Format

      1. Choose Load operations.
      2. Select Automatic update for compression encodings.
      3. Select Stop loading when maximum number of errors has been exceeded and specify a value (for example, 100).
      4. Select Statistics update and ON, then choose Next.

Refer to Data load operations for more details.

Load Operations

      1. Choose Load existing table.
      2. Specify the Cluster or workgroup, DatabaseSchema (for example, public) and Table name (for example, customer).
      3. For IAM role, choose a suitable IAM role.
      4. Choose Load data.
        S3 Load Data

Query Editor V2 generates the COPY command and runs it on the Amazon Redshift cluster. The results of the COPY command are displayed in the Result section upon completion.

S3 Load Copy

Conclusion

In this post, we showed how Amazon Redshift Query Editor V2 has simplified the process to load data into Amazon Redshift from Amazon S3 or your local desktop, thereby accelerating the data analysis. It’s an easy-to-use feature that your teams can start using to load and query datasets. If you have any questions or suggestions, please leave a comment.


About the Authors

Raks KhareRaks Khare is an Analytics Specialist Solutions Architect at AWS based out of Pennsylvania. He helps customers architect data analytics solutions at scale on the AWS platform.

Tahir Aziz is an Analytics Solution Architect at AWS. He has worked with building data warehouses and big data solutions for over 13 years. He loves to help customers design end-to-end analytics solutions on AWS. Outside of work, he enjoys traveling and cooking.

Erol MurtezaogluErol Murtezaoglu, a Technical Product Manager at AWS, is an inquisitive and enthusiastic thinker with a drive for self-improvement and learning. He has a strong and proven technical background in software development and architecture, balanced with a drive to deliver commercially successful products. Erol highly values the process of understanding customer needs and problems, in order to deliver solutions that exceed expectations.

Sapna Maheshwari is a Sr. Solutions Architect at Amazon Web Services. She has over 18 years of experience in data and analytics. She is passionate about telling stories with data and enjoys creating engaging visuals to unearth actionable insights.

Karthik Ramanathan is a Software Engineer with Amazon Redshift and is based in San Francisco. He brings close to two decades of development experience across the networking, data storage and IoT verticals. When not at work he is also a writer and loves to be in the water.

Albert Harkema is a Software Development Engineer at AWS. He is known for his curiosity and deep-seated desire to understand the inner workings of complex systems. His inquisitive nature drives him to develop software solutions that make life easier for others. Albert’s approach to problem-solving emphasizes efficiency, reliability, and long-term stability, ensuring that his work has a tangible impact. Through his professional experiences, he has discovered the potential of technology to improve everyday life.