How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 2

In this series, we talk about Swisscom’s journey of automating Amazon Redshift provisioning as part of the Swisscom One Data Platform (ODP) solution using the AWS Cloud Development Kit (AWS CDK), and we provide code snippets and the other useful references.

In Part 1, we did a deep dive on provisioning a secure and compliant Redshift cluster using the AWS CDK and the best practices of secret rotation. We also explained how Swisscom used AWS CDK custom resources to automate the creation of dynamic user groups that are relevant for the AWS Identity and Access Management (IAM) roles matching different job functions.

In this post, we explore using the AWS CDK and some of the key topics for self-service usage of the provisioned Redshift cluster by end-users as well as other managed services and applications. These topics include federation with the Swisscom identity provider (IdP), JDBC connections, detective controls using AWS Config rules and remediation actions, cost optimization using the Redshift scheduler, and audit logging.

Scheduled actions

To optimize cost-efficiency for provisioned Redshift cluster deployments, Swisscom implemented a scheduling mechanism. This functionality is driven by the user configuration of the cluster, as described in Part 1 of this series, wherein the user may enable dynamic pausing and resuming of clusters based on specified cron expressions:

redshift_options:
...
  use_scheduler: true                                         # Whether to use Redshift scheduler
  scheduler_pause_cron: "cron(00 18 ? * MON-FRI *)"           # Cron expression for scheduler pause
  scheduler_resume_cron: "cron(00 08 ? * MON-FRI *)"          # Cron expression for scheduler resume
...

This feature allows Swisscom to reduce operational costs by suspending cluster activity during off-peak hours. This leads to significant cost savings by pausing and resuming clusters at appropriate times. The scheduling is achieved using the AWS CloudFormation action CfnScheduledAction. The following code illustrates how Swisscom implemented this scheduling:

if config.use_scheduler:
    cfn_scheduled_action_pause = aws_redshift.CfnScheduledAction(
        scope, "schedule-pause-action",
        # ...
        schedule=config.scheduler_pause_cron,
        # ...
        target_action=aws_redshift.CfnScheduledAction.ScheduledActionTypeProperty(
                         pause_cluster=aws_redshift.CfnScheduledAction.ResumeClusterMessageProperty(
                            cluster_identifier='cluster-identifier'
                         )
                      )
    )

    cfn_scheduled_action_resume = aws_redshift.CfnScheduledAction(
        scope, "schedule-resume-action",
        # ...
        schedule=config.scheduler_resume_cron,
        # ...
        target_action=aws_redshift.CfnScheduledAction.ScheduledActionTypeProperty(
                         resume_cluster=aws_redshift.CfnScheduledAction.ResumeClusterMessageProperty(
                            cluster_identifier='cluster-identifier'
                         )
                      )
    )

JDBC connections

The JDBC connectivity for Amazon Redshift clusters was also very flexible, adapting to user-defined subnet types and security groups in the configuration:

redshift_options:
...
  subnet_type: "routable-private"         # 'routable-private' OR 'non-routable-private'
  security_group_id: "sg-test_redshift"   # Security Group ID for Amazon Redshift (referenced group must exists in Account)
...

As illustrated in the ODP architecture diagram in Part 1 of this series, a considerable part of extract, transform, and load (ETL) processes is anticipated to operate outside of Amazon Redshift, within the serverless AWS Glue environment. Given this, Swisscom needed a mechanism for AWS Glue to connect to Amazon Redshift. This connectivity to Redshift clusters is provided through JDBC by creating an AWS Glue connection within the AWS CDK code. This connection allows ETL processes to interact with the Redshift cluster by establishing a JDBC connection. The subnet and security group defined in the user configuration guide the creation of JDBC connectivity. If no security groups are defined in the configuration, a default one is created. The connection is configured with details of the data product from which the Redshift cluster is being provisioned, like ETL user and default database, along with network elements like cluster endpoint, security group, and subnet to use, providing secure and efficient data transfer. The following code snippet demonstrates how this was achieved:

jdbc_connection = glue.Connection(
    scope, "redshift-glue-connection",
    type=ConnectionType("JDBC"),
    connection_name="redshift-glue-connection",
    subnet=connection_subnet,
    security_groups=connection_security_groups,
    properties={
        "JDBC_CONNECTION_URL": f"jdbc:redshift://{cluster_endpoint}/{database_name}",
        "USERNAME": etl_user.username,
        "PASSWORD": etl_user.password.to_string(),
        "redshiftTmpDir": f"s3://{data_product_name}-redshift-work"
    }
)

By doing this, Swisscom made sure that serverless ETL workflows in AWS Glue can securely communicate with newly provisioned Redshift cluster running within a secured virtual private cloud (VPC).

Identity federation

Identity federation allows a centralized system (the IdP) to be used for authenticating users in order to access a service provider like Amazon Redshift. A more general overview of the topic can be found in Identity Federation in AWS.

Identity federation not only enhances security due to its centralized user lifecycle management and centralized authentication mechanism (for example, supporting multi-factor authentication), but also improves the user experience and reduces the overall complexity of identity and access management and thereby also its governance.

In Swisscom’s setup, Microsoft Active Directory Services are used for identity and access management. At the initial build stages of ODP, Amazon Redshift offered two different options for identity federation:

IAM-based SAML 2.0 IdP federation, as outlined in Federate Amazon Redshift access with Microsoft Azure AD single sign-on. See also Using IAM authentication to generate database user credentials.
Native IdP federation, as outlined in Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client. See also Native identity provider (IdP) federation for Amazon Redshift.

In Swisscom’s context, during the initial implementation, Swisscom opted for IAM-based SAML 2.0 IdP federation because this is a more general approach, which can also be used for other AWS services, such as Amazon QuickSight (see Setting up IdP federation using IAM and QuickSight).

At 2023 AWS re:Invent, AWS announced a new connection option to Amazon Redshift based on AWS IAM Identity Center. IAM Identity Center provides a single place for workforce identities in AWS, allowing the creation of users and groups directly within itself or by federation with standard IdPs like Okta, PingOne, Microsoft Entra ID (Azure AD), or any IdP that supports SAML 2.0 and SCIM. It also provides a single sign-on (SSO) experience for Redshift features and other analytics services such as Amazon Redshift Query Editor V2 (see Integrate Identity Provider (IdP) with Amazon Redshift Query Editor V2 using AWS IAM Identity Center for seamless Single Sign-On), QuickSight, and AWS Lake Formation. Moreover, a single IAM Identity Center instance can be shared with multiple Redshift clusters and workgroups with a simple auto-discovery and connect capability. It makes sure all Redshift clusters and workgroups have a consistent view of users, their attributes, and groups. This whole setup fits well with ODP’s vision of providing self-service analytics across the Swisscom workforce with necessary security controls in place. At the time of writing, Swisscom is actively working towards using IAM Identity Center as the standard federation solution for ODP. The following diagram illustrates the high-level architecture for the work in progress.

Audit logging

Amazon Redshift audit logging is useful for auditing for security purposes, monitoring, and troubleshooting. The logging provides information, such as the IP address of the user’s computer, the type of authentication used by the user, or the timestamp of the request. Amazon Redshift logs the SQL operations, including connection attempts, queries, and changes, and makes it straightforward to track the changes. These logs can be accessed through SQL queries against system tables, saved to a secure Amazon Simple Storage Service (Amazon S3) location, or exported to Amazon CloudWatch.

Amazon Redshift logs information in the following log files:

Connection log – Provides information to monitor users connecting to the database and related connection information like their IP address.
User log – Logs information about changes to database user definitions.
User activity log – Tracks information about the types of queries that both the users and the system perform in the database. It’s useful primarily for troubleshooting purposes.

With the ODP solution, Swisscom wanted to write all the Amazon Redshift logs to CloudWatch. This is currently not directly supported by the AWS CDK, so Swisscom implemented a workaround solution using the AWS CDK custom resources option, which invokes the SDK on the Redshift action enableLogging. See the following code:

    custom_resources.AwsCustomResource(self, f"{self.cluster_identifier}-custom-sdk-logging",
           on_update=custom_resources.AwsSdkCall(
               service="Redshift",
               action="enableLogging",
               parameters={
                   "ClusterIdentifier": self.cluster_identifier,
                   "LogDestinationType": "cloudwatch",
                   "LogExports": ["connectionlog","userlog","useractivitylog"],
               },
               physical_resource_id=custom_resources.PhysicalResourceId.of(
                   f"{self.account}-{self.region}-{self.cluster_identifier}-logging")
           ),
           policy=custom_resources.AwsCustomResourcePolicy.from_sdk_calls(
               resources=[f"arn:aws:redshift:{self.region}:{self.account}:cluster:{self.cluster_identifier}"]
           )
        )

AWS Config rules and remediation

After a Redshift cluster has been deployed, Swisscom needed to make sure that the cluster meets the governance rules defined in every point in time after creation. For that, Swisscom decided to use AWS Config.

AWS Config provides a detailed view of the configuration of AWS resources in your AWS account. This includes how the resources are related to one another and how they were configured in the past so you can see how the configurations and relationships change over time.

An AWS resource is an entity you can work with in AWS, such as an Amazon Elastic Compute Cloud (Amazon EC2) instance, Amazon Elastic Block Store (Amazon EBS) volume, security group, or Amazon VPC.

The following diagram illustrates the process Swisscom implemented.

If an AWS Config rule isn’t compliant, a remediation can be applied. Swisscom defined the pause cluster action as default in case of a non-compliant cluster (based on your requirements, other remediation actions are possible). This is covered using an AWS Systems Manager automation document (SSM document).

Automation, a capability of Systems Manager, simplifies common maintenance, deployment, and remediation tasks for AWS services like Amazon EC2, Amazon Relational Database Service (Amazon RDS), Amazon Redshift, Amazon S3, and many more.

The SSM document is based on the AWS document AWSConfigRemediation-DeleteRedshiftCluster. It looks like the following code:

description: | 
  ### Document name - PauseRedshiftCluster-WithCheck 

  ## What does this document do? 
  This document pauses the given Amazon Redshift cluster using the [PauseCluster](https://docs.thinkwithwp.com/redshift/latest/APIReference/API_PauseCluster.html) API. 

  ## Input Parameters 
  * AutomationAssumeRole: (Required) The ARN of the role that allows Automation to perform the actions on your behalf. 
  * ClusterIdentifier: (Required) The identifier of the Amazon Redshift Cluster. 

  ## Output Parameters 
  * PauseRedshiftClusterWithoutSnapShot.Response: The standard HTTP response from the PauseCluster API. 
  * PauseRedshiftClusterWithSnapShot.Response: The standard HTTP response from the PauseCluster API. 
schemaVersion: '0.3' 
assumeRole: '{{ AutomationAssumeRole }}' 
parameters: 
  AutomationAssumeRole: 
    type: String 
    description: (Required) The ARN of the role that allows Automation to perform the actions on your behalf. 
    allowedPattern: '^arn:aws[a-z0-9-]*:iam::\d{12}:role\/[\w-\/.@+=,]{1,1017}$' 
  ClusterIdentifier: 
    type: String 
    description: (Required) The identifier of the Amazon Redshift Cluster. 
    allowedPattern: '[a-z]{1}[a-z0-9_.-]{0,62}' 
mainSteps: 
  - name: GetRedshiftClusterStatus 
    action: 'aws:executeAwsApi' 
    inputs: 
      ClusterIdentifier: '{{ ClusterIdentifier }}' 
      Service: redshift 
      Api: DescribeClusters 
    description: |- 
      ## GetRedshiftClusterStatus 
      Gets the status for the given Amazon Redshift Cluster. 
    outputs: 
      - Name: ClusterStatus 
        Selector: '$.Clusters[0].ClusterStatus' 
        Type: String 
    timeoutSeconds: 600 
  - name: Condition 
    action: 'aws:branch' 
    inputs: 
      Choices: 
        - NextStep: PauseRedshiftCluster 
          Variable: '{{ GetRedshiftClusterStatus.ClusterStatus }}' 
          StringEquals: available 
      Default: Finish 
  - name: PauseRedshiftCluster 
    action: 'aws:executeAwsApi' 
    description: | 
      ## PauseRedshiftCluster 
      Makes PauseCluster API call using Amazon Redshift Cluster identifier and pauses the cluster without taking any final snapshot. 
      ## Outputs 
      * Response: The standard HTTP response from the PauseCluster API. 
    timeoutSeconds: 600 
    isEnd: false 
    nextStep: VerifyRedshiftClusterPause 
    inputs: 
      Service: redshift 
      Api: PauseCluster 
      ClusterIdentifier: '{{ ClusterIdentifier }}' 
    outputs: 
      - Name: Response 
        Selector: $ 
        Type: StringMap 
  - name: VerifyRedshiftClusterPause 
    action: 'aws:assertAwsResourceProperty' 
    timeoutSeconds: 600 
    isEnd: true 
    description: | 
      ## VerifyRedshiftClusterPause 
      Verifies the given Amazon Redshift Cluster is paused. 
    inputs: 
      Service: redshift 
      Api: DescribeClusters 
      ClusterIdentifier: '{{ ClusterIdentifier }}' 
      PropertySelector: '$.Clusters[0].ClusterStatus' 
      DesiredValues: 
        - pausing 
  - name: Finish 
    action: 'aws:sleep' 
    inputs: 
      Duration: PT1S 
    isEnd: true

The SSM automations document is deployed with the AWS CDK:

from aws_cdk import aws_ssm as ssm  

ssm_document_content = #read yaml document as dict  

document_id = 'automation_id'   
document_name = 'automation_name' 

document = ssm.CfnDocument(scope, id=document_id, content=ssm_document_content,  
                           document_format="YAML", document_type='Automation', name=document_name) 

To run the automation document, AWS Config needs the right permissions. You can create an IAM role for this purpose:

from aws_cdk import iam 

#Create role for the automation 
role_name = 'role-to-pause-redshift'
automation_role = iam.Role(scope, 'role-to-pause-redshift-cluster', 
                           assumed_by=iam.ServicePrincipal('ssm.amazonaws.com'), 
                           role_name=role_name) 

automation_policy = iam.Policy(scope, "policy-to-pause-cluster", 
                               policy_name='policy-to-pause-cluster', 
                               statements=[ 
                                   iam.PolicyStatement( 
                                       effect=iam.Effect.ALLOW, 
                                       actions=['redshift:PauseCluster', 
                                                'redshift:DescribeClusters'], 
                                       resources=['*'] 
                                   ) 
                               ]) 

automation_role.attach_inline_policy(automation_policy)

Swisscom defined the rules to be applied following AWS best practices (see Security Best Practices for Amazon Redshift). These are deployed as AWS Config conformance packs. A conformance pack is a collection of AWS Config rules and remediation actions that can be quickly deployed as a single entity in an AWS account and AWS Region or across an organization in AWS Organizations.

Conformance packs are created by authoring YAML templates that contain the list of AWS Config managed or custom rules and remediation actions. You can also use SSM documents to store your conformance pack templates on AWS and directly deploy conformance packs using SSM document names.

This AWS conformance pack can be deployed using the AWS CDK:

from aws_cdk import aws_config  
  
conformance_pack_template = # read yaml file as str 
conformance_pack_content = # substitute `role_arn_for_substitution` and `document_for_substitution` in conformance_pack_template

conformance_pack_id = 'conformance-pack-id' 
conformance_pack_name = 'conformance-pack-name' 


conformance_pack = aws_config.CfnConformancePack(scope, id=conformance_pack_id, 
                                                 conformance_pack_name=conformance_pack_name, 
                                                 template_body=conformance_pack_content)

Conclusion

Swisscom is building its next-generation data-as-a-service platform through a combination of automated provisioning processes, advanced security features, and user-configurable options to cater for diverse data handling and data products’ needs. The integration of the Amazon Redshift construct in the ODP framework is a significant stride in Swisscom’s journey towards a more connected and data-driven enterprise landscape.

In Part 1 of this series, we demonstrated how to provision a secure and compliant Redshift cluster using the AWS CDK as well as how to deal with the best practices of secret rotation. We also showed how to use AWS CDK custom resources in automating the creation of dynamic user groups that are relevant for the IAM roles matching different job functions.

In this post, we showed, through the usage of the AWS CDK, how to address key Redshift cluster usage topics such as federation with the Swisscom IdP, JDBC connections, detective controls using AWS Config rules and remediation actions, cost optimization using the Redshift scheduler, and audit logging.

The code snippets in this post are provided as is and will need to be adapted to your specific use cases. Before you get started, we highly recommend speaking to an Amazon Redshift specialist.

About the Authors

Asad bin Imtiaz is an Expert Data Engineer at Swisscom, with over 17 years of experience in architecting and implementing enterprise-level data solutions.

Jesús Montelongo Hernández is an Expert Cloud Data Engineer at Swisscom. He has over 20 years of experience in IT systems, data warehousing, and data engineering.

Samuel Bucheli is a Lead Cloud Architect at Zühlke Engineering AG. He has over 20 years of experience in software engineering, software architecture, and cloud architecture.

Srikanth Potu is a Senior Consultant in EMEA, part of the Professional Services organization at Amazon Web Services. He has over 25 years of experience in Enterprise data architecture, databases and data warehousing.

AWS Big Data Blog