AWS Database Blog

Mask PII data using AWS DMS and Amazon Macie during migration

When sensitive personally identifiable information (PII) is involved, large businesses frequently encounter difficulties when transferring production data to non-production or lateral environments for testing or other purposes. Most firms are required by compliance regulations to mask production PII data when it’s transferred for testing and development reasons.

Currently, data that has been copied or migrated to a lower or lateral environment needs to be manually masked before it can be released to end-users. This typically requires a lot of time, energy, and additional manual labor. Additionally, as it requires database subject matter experts to review the documentation to locate and processes to preserve PII data, this can be extremely difficult for firms if database subject matter experts or documentation are not currently available in the organization.

In this post, we present a solution to identify PII data using Amazon Macie, mask it using AWS Database Migration Service (AWS DMS), and migrate it from an Amazon Relational Database Service (Amazon RDS) for Oracle production source database to an RDS for Oracle development target database, before releasing the environment to users. This way, you can save time and make sure that sensitive data is protected.

Solution overview

This post shows how we can identify PII data within a source RDS for Oracle database using Macie managed and custom identifiers, and mask that PII data while it’s being migrated to the target database using AWS DMS.

The following diagram presents an overview of the solution architecture.

The process includes the following steps:

  1. Use AWS DMS to extract data from the source RDS for Oracle DB instance and save it in Parquet format in a folder in an Amazon Simple Storage Service (Amazon S3) bucket.
  2. Enable Macie to scan the Parquet files to discover PII data and store the findings in another folder in the S3 bucket.
  3. Use AWS Glue to create a Data Catalog by crawling through the JSONL file created in the previous step, which has metadata information of where PII is stored in the RDS for Oracle data copy.
  4. Use Amazon Athena to query the Data Catalog to extract the metadata information—the schema name, table name, and column name of where PII data is stored in the RDS for Oracle data copy. This information is needed to create transformation rules in the next step.
  5. Create transformation rules in AWS DMS for those tables and columns identified in the previous step.
  6. Use AWS DMS to migrate data to the target RDS for Oracle database with PII data masked as per the transformation rules.

Implementing the solution consists of the following steps:

  1. Populate the source RDS for Oracle database with example PII data.
  2. Configure AWS DMS and Amazon S3 components.
  3. Migrate the PII data from the source RDS for Oracle database to Amazon S3 using AWS DMS.
  4. Configure a Macie job and AWS Glue crawler.
  5. Configure Athena to extract PII metadata.
  6. Run an AWS DMS replication task with transformation rules to migrate the source data with PII data masked to the target RDS for Oracle database.

Prerequisites

Before you get started, you must have the following prerequisites:

Populate the source RDS for Oracle database with example PII data

Complete the following steps to populate the source database:

  1. Connect to the source RDS for Oracle database using your preferred SQL client.
  2. Create a database user named AWS_TEST_USER:
    CREATE USER AWS_TEST_USER identified by <password>;
  3. Grant necessary privileges to the user AWS_TEST_USER:
    GRANT CONNECT, RESOURCE TO AWS_TEST_USER;
  4. Create three tables and insert records:
    CREATE TABLE AWS_TEST_USER.CUSTOMER 
       (	CUSTID NUMBER(30,0), 
    	CUSTNAME CHAR(30), 
    	CR_CARD_EXPIRY DATE, 
    	HOME_ADD VARCHAR2(30), 
    	JOIN_DATE DATE, 
    	COMM_PREF VARCHAR2(20), 
    	SSN VARCHAR2(30)
       );
    
     CREATE TABLE AWS_TEST_USER.EMPLOYEE 
       (	EMP_ID NUMBER(30,0), 
    	EMP_NM CHAR(30), 
    	PASSPORT VARCHAR2(20), 
    	HOME_ADD VARCHAR2(30), 
    	JOIN_DATE DATE, 
    	JOB_CODE VARCHAR2(20), 
    	MANAGER VARCHAR2(30), 
    	BIRTHDAY DATE, 
    	SSN VARCHAR2(30)
    );
    
    CREATE TABLE AWS_TEST_USER.SUPPLIERS
       (	SUPPID NUMBER(30), 
    	SUPPNAME CHAR(30), 
            PRDCODE  CHAR(30),
    	COMM NUMBER(30)
       ) ;
    insert into aws_test_user.customer values
    (100,'Zee','28-OCT-22','Austin,Texas','20-JUL-22','email','690-22-5818');
    
    insert into aws_test_user.customer values
    (101,'ANC','28-OCT-22','Atlanta,Georgia','20-JUL-22','email','490-22-8513');
    
    insert into aws_test_user.customer values
    (102,'NBC','28-OCT-22','Denver,CO','20-JUL-22','email','690-22-9090');
    
    insert into aws_test_user.employee values
    (750,'Geet','C53HL5134','LosAngeles,CA','24-OCT-22','DBA','pk','16-JUL-22','789-62-8145');
    
    insert into aws_test_user.suppliers values
    (501,'Artel','P10',10);
    
    commit;

Configure AWS DMS and Amazon S3 components

In this section, we detail the steps to configure your AWS DMS and Amazon S3 components.

Create a replication instance

To create an AWS DMS replication instance, complete the following steps:

  1. On the AWS DMS console, choose Replication instances in the navigation pane.
  2. Choose Create replication instance.
  3. Follow the instructions in Creating a replication instance to create a new replication instance.
  4. Choose the same VPC, Region, and Availability Zone as the RDS for Oracle instances.

The source and destination database security groups should have ingress permissions opened for the AWS DMS replication instance for successful connection.

Create the S3 bucket with a customer managed key

In this step, you create your S3 bucket with a customer managed AWS Key Management Service (AWS KMS) key.

  1. On the Amazon S3 console, choose Buckets in the navigation pane.
  2. Choose Create bucket.
  3. Follow the instructions in Create your first S3 bucket to create a new bucket.
  4. In the section Default encryption, select Server-side encryption with AWS Key Management Service keys (SSE-KMS) and create a KMS key for your bucket.

Add an S3 bucket policy

Update the bucket policy to allow Macie to store its findings in the S3 bucket after scanning the RDS for Oracle data stored in Amazon S3.

Putobject is allowed to ensure Macie can access the S3 bucket to store the JSONL files with its findings.

GetBucketLocation is allowed for Macie to scan through the RDS for Oracle data in the S3 bucket to find PII data.

Furthermore, to prevent accessing the data using S3 select on parquet file, which has PII data, and to prevent the download action on those Parquet files, the Getobject privilege on Amazon S3 has to be denied with an exception for Macie to scan these files to find PII data.

Complete the following steps:

  1. Navigate to your bucket on the Amazon S3 console.
  2. Choose the Permissions tab and edit the bucket policy.
  3. Add the following to your bucket policy:
    { "Version": "2012-10-17",
           "Statement": [
            {       "Sid": "Allow Macie to upload objects to the bucket",
                    "Effect": "Allow",
                    "Principal": { "Service": "macie.amazonaws.com" },
                   "Action": "s3:PutObject",
                     "Resource": "arn:aws:s3:::<YOUR BUCKET NAME>/*",
                      "Condition": { "StringEquals": { "aws:SourceAccount": "<aws account #>" },
                      "ArnLike": { "aws:SourceArn": [
                            "arn:aws:macie2:<YOUR REGION NAME>:<aws account #>:export-configuration:*",
                            "arn:aws:macie2:<YOUR REGION NAME>:<aws account #>:classification-job/*"
                        ]}}},
       {
                "Sid": "Allow Macie to use the getBucketLocation operation",
                "Effect": "Allow",           
                "Principal": {    "Service": "macie.amazonaws.com" },
                "Action": "s3:GetBucketLocation",
                "Resource": "arn:aws:s3:::<YOUR BUCKET NAME>",
                "Condition": {"StringEquals": { "aws:SourceAccount": "<aws account #>"  },
                 "ArnLike": { "aws:SourceArn": [
                            "arn:aws:macie2: :<YOUR REGION NAME>:<aws account #>:export-configuration:*",
                            "arn:aws:macie2: :<YOUR REGION NAME>:<aws account #>:classification-job/*"
                    ]}}},
    {
                "Sid": "Deny S3 select and allow Macie to get Object",
                "Effect": "Deny",            
                "Principal": "*",            
                "Action": "s3:GetObject",            
                "Resource": "arn:aws:s3:::<YOUR BUCKET NAME>/orcldata/*",
                "Condition": 
                {"StringNotLike": {  
                 "aws:PrincipalArn": "arn:aws:iam::<aws account #>:role/aws-service-role/macie.amazonaws.com/AWSServiceRoleForAmazonMacie"
       }}}]}

Create IAM roles and policies for the S3 bucket

In this step, you create IAM roles and policies for the S3 bucket where AMS DMS will store the RDS for Oracle data. First, create the policy.

  1. On the IAM console, choose Policies in the navigation pane.
  2. Choose Create policy.
  3. On the JSON tab, choose Edit and enter the following policy:
    { 
          "Version": "2012-10-17",
            "Statement": [
            {
            "Effect": "Allow",
                "Action": [
                       "s3:PutObject",
                       "s3:DeleteObject",
                       "s3:PutObjectTagging" ],
               "Resource": ["arn:aws:s3:::<YOUR BUCKET NAME>/*"]
           },
          {
                "Effect": "Allow",
                "Action": ["s3:ListBucket"],
                "Resource": ["arn:aws:s3:::<YOUR BUCKET NAME>"]
       }]}
  4. Choose Next.
  5. Enter a name for your policy.
  6. Choose Create policy.
    Now you can create an IAM role with AWS DMS as a trust relationship to access the S3 bucket.
  7. On the IAM console, choose Roles in the navigation pane.
  8. Choose Create role.
  9. For Trusted entity type, choose AWS service.
  10. For Use case, choose DMS, then choose Next.
  11. For Add permission, choose the policy you created, then choose Next.
  12. Enter a name for your role, then choose Next.
    Because the S3 object is encrypted using a customer managed KMS key (SSE-KMS encryption), permissions on this key have be granted to Macie, which will let it decrypt the files with PII data. Therefore, we add a statement to the policy for the KMS key that allows the Macie service-linked role and Amazon S3 IAM role to use the key to encrypt and decrypt data.
  13. On the AWS KMS console, navigate to your key.
  14. Switch to Policy view.
  15. Edit the policy and add the following code:
    {
        "Version": "2012-10-17",
        "Id": "key-consolepolicy-3",
        "Statement": [
            {
                "Sid": "Enable IAM User Permissions",
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::<aws account#>:root"
                },
                "Action": "kms:*",
                "Resource": "*"
            },
            {
                "Sid": "Allow the Macie service-linked role to use the key",
                "Effect": "Allow",
                "Principal": {
               "AWS": "arn:aws:iam::<aws account#>:role/aws-service-role/macie.amazonaws.com/AWSServiceRoleForAmazonMacie"
                },
                "Action": "kms:Decrypt",
                "Resource": "*"
            },
            {
                "Sid": "Allow the s3-pii-masking-role  to use the key",
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::<aws account#>:role/<YOUR IAM ROLE NAME>”
                },
                "Action": [
                    "kms:GenerateDataKey",
                    "kms:Encrypt"
                ],
                "Resource": "*"
            },
            {
                "Sid": "Allow Macie to use the key",
                "Effect": "Allow",
                "Principal": {
                    "Service": "macie.amazonaws.com"
                },
                "Action": [
                    "kms:GenerateDataKey",
                    "kms:Encrypt"
                ],
                "Resource": "<ARN of <YOUR KMS Key NAME>>",
                "Condition": {
                    "StringEquals": {
                        "aws:SourceAccount": "<aws account#>"
                    },
                    "ArnLike": {
                        "aws:SourceArn": [
             "arn:aws:macie2:us-east-1:<aws account#>:export-configuration:*",
             "arn:aws:macie2:us-east-1:<aws account#>:classification-job/*"
                        ]
                    }
                }
            }
        ]
    }

Create AWS DMS endpoints

In this step, you create endpoints for the source and target databases. For more information, refer to Creating source and target endpoints.

First, you create an endpoint that connects to the source database from which PII data has to be identified.

  1. On the AWS DMS console, create an AWS DMS source endpoint with Oracle as the source engine.

Next, you create an endpoint for the target database where masked PII data will be migrated.

  1. Create a target endpoint for the target Oracle database with Oracle as the source engine.

Next, you create a second target endpoint for Amazon S3 as the target. This connects to the S3 bucket as an intermediate location where the files with PII data are placed for Macie to access it.

  1. Create a second target endpoint with the source engine as Amazon S3.
    1. For the service access role ARN, enter your IAM role ARN.
    2. For the bucket name, enter the bucket you created.
    3. For the bucket folder, enter the folder you created (for this post, we use orcldata).
    4. For the endpoint settings, select Use endpoint connection attributes.
    5. In the Extra connection attributes section, enter the following:
      compressionType=NONE;csvDelimiter=,;csvRowDelimiter=\n;DataFormat=parquet;datePartitionEnabled=false;EncryptionMode=SSE_KMS;ServerSideEncryptionKmsKeyId=<<ARN of <YOUR KMS KEY>>;

Migrate the PII data from the source RDS for Oracle database to Amazon S3 using AWS DMS

To create your database migration task, complete the following steps:

  1. On the AWS DMS console, choose Database migration tasks in the navigation pane.
  2. Choose Create task.
  3. For Task identifier, enter rds-to-s3.
  4. For Replication instance, enter your replication instance.
  5. For Source database endpoint, enter your source Oracle endpoint.
  6. For Target database endpoint, enter your target Amazon S3 endpoint.
  7. For Migration type, choose Migrate existing data.
  8. Select Turn on CloudWatch Logs if monitoring is needed.
    1. Set Target load and Target apply to Debug.

Other detailed debug and log options can be enabled; these are the basic recommended ones that can help troubleshoot most common issues.

  1. For Table mappings, add a new selection rule.
  2. For Schema, enter a schema.
  3. For Source name, enter the source schema in your environment (for this post, we use AWS_TEST_USER).
  4. For Source table name, enter %.
  5. For Action, choose Include.
  6. Choose Create task.

Configure a Macie job and AWS Glue crawler

In this section, we detail the steps to create and run a Macie job and AWS Glue crawler.

Enable Macie to analyze the Parquet files generated by AWS DMS

We enable Macie and set the S3 bucket details so that a jsonl.gz file is generated that has PII findings. We create a Data Catalog in AWS Glue from this file later.

  1. On the Macie console, choose Discovery results in the navigation pane.
  2. Choose your bucket (optionally, you can create a new bucket if you want to save the PII findings in another location).
  3. For Encryption settings, choose your KMS key.
  4. Choose Save.

KMS key selection is important to make sure we don’t encounter a putclassificationExportconfiguration error that would prevent creating the discovery results.

Configure a Macie job to read Parquet files from Amazon RDS for Oracle

In this step, PII identification is based on managed identifiers in Macie. If managed identifiers aren’t sufficient to match the PII value detection in your environment, you can create custom identifiers as per your needs to identify PII data. In this post, a Custom identifier is created for Passport.

Complete the following steps to create your Macie job:

  1. On the Macie console, choose Jobs in the navigation pane.
  2. Choose Create job.
  3. For Select S3 bucket, enter your bucket, then choose Next.
  4. For Refine the scope, choose One-time job, then choose Next.
  5. Select your data identifiers:
    1. Choose Managed and then Next to use managed identifiers (recommended).
    2. Choose Custom to open a new page to create a custom identifier.
  6. Choose Create.
  7. For Name, enter Passport.
  8. For Regular expression, enter [C]\d{2}[A-Z]{2}\d{4} so sample data like C53HL5134 will work.
  9. For Keywords, enter Passport, PASSPORT.
  10. Choose Submit.
  11. Return to the job configuration page and choose Passport, then choose Next.
  12. For Job name, enter a name (for this post, we use rds-pii-macie-findings).
  13. Choose Next.
  14. Review the details, then choose Submit.

Create an AWS Glue crawler

We create an AWS Glue crawler to create a Data Catalog for the JSONL file created by Macie.

  1. On the AWS Glue console, expand Data Catalog in the navigation pane and choose Crawlers.
  2. Choose Create crawler.
  3. For Crawler name, enter a name (for this post, we use dms-macie-pii).
  4. Choose Next.
  5. For Data sources and classifiers, choose Add a data source.
  6. For Data source, choose S3.
  7. For S3 path, enter s3://<YOUR BUCKET NAME>/AWSLogs.
  8. Choose Add S3 data source.
  9. In the Configure security settings IAM role section, choose Create new IAM role.
  10. Enter a name for the role (for this post, AWSGlueServiceRole-dms-macie-crawler) and choose Next.
  11. In the Output configuration, Target database section, choose Add database to open a new window.
  12. For Database name, enter the name of your database.
  13. Return to the crawler configuration page, refresh the database names, and choose your database.
  14. For Table name prefix, enter dms-mask-.
  15. Choose Next, then choose Create crawler.

Add an inline policy to the crawler IAM role

Now you add an inline policy to the IAM role you created (AWSGlueServiceRole-dms-macie-crawler). The policy allows the crawler to decrypt the S3 bucket, which is encrypted by SSE-KMS.

  1. On the IAM console, navigate to the AWSGlueServiceRole-dms-macie-crawler role.
  2. In the Permissions section, choose Add permissions.
  3. Choose Create inline policy.
  4. On the JSON tab, enter the following contents:
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": ["kms:Decrypt"],
    "Resource": ["<ARN of  <YOUR KEY NAME>>"]
    }]}
  5. Choose Review policy.
  6. For Policy name, enter a name (for this post, we use dms-glue-inline-policy).
  7. Choose Create policy.

Run the AWS Glue crawler

To run the crawler, complete the following steps:

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Select the crawler you created (dms-macie-pii).
  3. Choose Run crawler.

Configure Athena to extract PII metadata

Now you’re ready to use Athena to query the PII metadata.

  1. On the Athena console, launch the query editor.
  2. For Data source, choose AwsDataCatalog.
  3. For Database, choose your database.
  4. Before you run your first query, set up the query result location in Amazon S3 as s3://<YOUR BUCKET NAME>/orcldata.
  5. Select Encrypt query results.
  6. For Encryption type, choose SSE_KMS.
  7. Choose your KMS key.
  8. Save your settings.
  9. Create a view to get the subset of columns needed to identify PII:
    create view chk_rds_pii_md_vw as
    with dataset as
    (
    with dataset as
    (
    select
    title
    , severity.description severity
    , "split_part"(resourcesAffected.s3object.key, '/', 2) schema_nm
    , "split_part"(resourcesAffected.s3object.key, '/', 3) table_nm
    , array[classificationdetails.result.sensitiveData[1].detections] as detections from "dms-mask-awslogs”
    WHERE (type IS NOT NULL)
    )SELECT severity,title,schema_nm,table_nm,Column_nm FROM dataset
    cross join UNNEST(detections) AS t(Column_nm)
    where table_nm is not null)
    select schema_nm,table_nm,
    "split_part"(cols.occurrences.records[1].jsonpath, '.', 2) AS Column_name,
    severity,title from dataset cross join unnest(Column_nm) as t(cols)
  10. Query the view that you created from the query editor:
    SELECT * FROM chk_rds_pii_md_vw;

The following screenshot shows our output.

Athena view results

Run an AWS DMS replication task to migrate the source data with PII data masked to the target RDS for Oracle database

In this step, we create the final transformation rule in an AWS DMS replication task, which uses the found PII metadata information.

  1. On the AWS DMS console, create a new database migration task.
  2. For Task identifier, enter dms-pii-masked-migration.
  3. For Replication instance, enter your replication instance.
  4. Select your source and target Oracle endpoints.
  5. For Migration type, choose Migrate existing data.
  6. Under Task settings, leave the default values or modify them as needed.
  7. Under Table mappings, choose JSON editor and enter the following contents:
    {
        "rules": [
            {
                "rule-type": "selection",
                "rule-id": "1",
                "rule-name": "selected",
                "object-locator": {
                    "schema-name": "AWS_TEST_USER",
                    "table-name": "%"
                },
                "rule-action": "include",
                "filters": []
            },
            {
                "rule-type": "transformation",
                "rule-id": "2",
                "rule-name": "dmsssnadd",
                "rule-target": "column",
                "object-locator": {
                    "schema-name": "AWS_TEST_USER",
                    "table-name": "EMPLOYEE"
                },
                "rule-action": "add-column",
                "value": "DMS_SSN",
                "expression": "substr($SSN,9,2)||'6-'||substr($SSN,2,2)||'-'||substr($SSN,1,2)||substr($SSN,9,2)",
                "data-type": {
                    "type": "string",
                    "length": "100",
                    "scale": ""
                }
            },
            {
                "rule-type": "transformation",
                "rule-id": "3",
                "rule-name": "dmsssnadd",
                "rule-target": "column",
                "object-locator": {
                    "schema-name": "AWS_TEST_USER",
                    "table-name": "CUSTOMER"
                },
                "rule-action": "add-column",
                "value": "DMS_SSN",
                "expression": "substr($SSN,9,2)||'6-'||substr($SSN,2,2)||'-'||substr($SSN,1,2)||substr($SSN,9,2)",
                "data-type": {
                    "type": "string",
                    "length": "100",
                    "scale": ""
                }
            },
            {
                "rule-type": "transformation",
                "rule-id": "4",
                "rule-name": "adcolpass",
                "rule-target": "column",
                "object-locator": {
                    "schema-name": "AWS_TEST_USER",
                    "table-name": "EMPLOYEE"
                },
                "rule-action": "add-column",
                "value": "DMS_PASSPORT",
                "expression": "'C'||SUBSTR($PASSPORT,6,2)||SUBSTR($PASSPORT,2,6)",
                "data-type": {
                    "type": "string",
                    "length": 100
                },
                "old-value": null
            },
            {
                "rule-type": "transformation",
                "rule-id": "5",
                "rule-name": "rmcolssn",
                "rule-target": "column",
                "object-locator": {
                    "schema-name": "AWS_TEST_USER",
                    "table-name": "CUSTOMER",
                    "column-name": "SSN"
                },
                "rule-action": "remove-column",
                "value": null,
                "old-value": null
            },
            {
                "rule-type": "transformation",
                "rule-id": "6",
                "rule-name": "rmcolssn",
                "rule-target": "column",
                "object-locator": {
                    "schema-name": "AWS_TEST_USER",
                    "table-name": "EMPLOYEE",
                    "column-name": "SSN"
                },
                "rule-action": "remove-column",
                "value": null,
                "old-value": null
            },
            {
                "rule-type": "transformation",
                "rule-id": "7",
                "rule-name": "rmcolpass",
                "rule-target": "column",
                "object-locator": {
                    "schema-name": "AWS_TEST_USER",
                    "table-name": "EMPLOYEE",
                    "column-name": "PASSPORT"
                },
                "rule-action": "remove-column",
                "value": null,
                "old-value": null
            }
        ]
    }
  8. Create the task and run it.

In the preceding code, the values passed are derived based on the output of the Athena query we ran earlier:

  • schema-name – We use AWS-TEST-USER, which is derived from the value of the schema_nm column.
  • table-name – We use CUSTOMER and EMPLOYEE, which are values value of the table_nm column.
  • rule-action – We use remove-column for PASSPORT and SSN. In the target, we don’t include the original column that has PII data. For this post, we add a rule to remove the passport and SSN columns when migration is complete.
  • rule-action – We use add-column for DMS-PASSPORT and DMS-SSN to add the masked data columns.

The expression in the transformation rules is an example to show the logic of how PII data can be masked in the added column. You should use your own expression logic as needed.

Verify if the task is complete without any errors and verify the results set of the tables in the target database, which has PII data masked.

SQL Query Results

Post-migration steps

Now you can complete some post-migration steps.

Firstly, in the target RDS for Oracle database, new columns with the DMS_ prefix have masked PII data. However, column names should be renamed to remove this prefix. Therefore, you should rename those columns.

  1. In the SQL client, issue the following DDL commands:
    #employee table
    Alter table <target_schema>.employee rename column dms_passport to passport;
    Alter table <target_schema>.employee rename column dms_ssn to ssn;
    
    # customer table
    Alter table <target_schema>.customer rename column dms_ssn to ssn;

If column order matters for your application, use the invisible option to hide columns and add the columns you need, then revert the invisible columns to visible again, so the column order is preserved.

2. Reorder the columns with the following steps to preserve the order of the Passport column in the Employee table

select COLUMN_NAME,COLUMN_ID from all_tab_cols where table_name ='EMPLOYEE' and owner=’<source_schema>’ order by 2;

The Column_id for the Passport column, which should be 3 as per the source table, would have been moved to last in the target.

  1. To fix the order back to 3, run the following invisible command in the target from column 3 onwards:
    alter table <target_schema>.employee modify (Home_add invisible,join_date invisible,job_code invisible, manager invisible,birthday invisible,dms_passport invisible,dms_ssn invisible);
  2. Add the passport column to the employee table and verify the column order:
    #add the column
    alter  table <target_schema>.employee add passport varchar2(30);
    
    #verify the order
    Select COLUMN_NAME,COLUMN_ID from all_tab_cols where table_name ='EMPLOYEE' and owner=’<target_schema>' order by 2;

Now the Column_id for the Passport column is 3 when you query all_tab_cols.

  1. Once verified, run the following command to revert the invisible columns back to visible:
    alter table <target_schema>.employee modify (Home_add visible,join_date visible,job_code visible, manager visible,birthday visible, dms_passport visible,dms_ssn visible);
  2. Update the employee table to copy data from the dms_passport column to the newly created passport column, which is in the correct order:
    update <target_schema>.employee e set passport =
    (with s as (select dms_passport, emp_id from <target_schema>.employee) select s.dms_passport
    from s where s.emp_id=e.emp_id);
  3. Remove the dms_passport column after copying to the passport column:
    alter  table <target_schema>.employee drop column dms_passport;

Clean up

To clean up your resources, you should delete the database migration tasks to prevent reverse engineering of PII data detection:

  1. On the AWS DMS console, choose Database migration tasks in the navigation pane.
  2. Select the task which has the PII masking transformation rule dms-pii-masked-migration
  3. On the Action menu, choose Delete.
  4. Select the task rds-to-s3.
  5. On the Action menu, choose Delete.

You should also delete the associated resources you created during this post to prevent incurring any costs.

  1. Delete the AWS DMS replication instance.
  2. Delete the S3 bucket.
  3. Delete the Athena database.
  4. Delete the AWS Glue crawler.
  5. Delete the Macie job.
  6. Delete the associated IAM roles and policies.
  7. Delete the KMS key.

Conclusion

In this post, we showed how to set up Macie, an AWS Glue crawler, and Athena to discover PII data and how to setup AWS DMS transformation rules to mask PII data and migrate data from an RDS for Oracle database with PII data masked to the target RDS for Oracle database.

For more information on masking data in PostgreSQL, see Data masking using AWS DMS.

As always, AWS welcomes feedback. If you have any comments or questions on this post, please share them in the comments section.


About the Authors

Primary Author
Chithra Krishnamurthy is a Database Consultant, with the Professional Services team working at Amazon Web Services. She works with enterprise customers to help them achieve their business outcomes by providing technical guidance for database migrations to AWS, guidance on setting up bi-directional replication.

Co-Author

Mansi Suratwala is a Sr Database Consultant working at Amazon Web Services. She closely works with companies from different domains to provide scalable and secure database solutions in AWS cloud. She is passionate to collaborate with customers to achieve their cloud adoption goals.