亚马逊AWS官方博客

亚马逊云科技服务之安全巡检及优化

一. 背景介绍

基础设施保护是信息安全的基石,对企业而言至关重要。它的核心目的是防止企业遭受未经授权的访问、恶意攻击和漏洞利用等威胁。随着数字化转型的推进,企业越来越依赖云计算和网络基础设施,这也使得它们面临的安全风险显著增加。客户需要采取积极的措施来管理其云端配置。以下几点凸显了基础设施防护的重要性:

  • 防止数据泄露:基础设施保护能够防止敏感数据被未经授权的人员访问,从而保护企业的商业机密和客户隐私。
  • 防范漏洞利用:定期的安全扫描和补丁管理能够及时发现并修补系统和应用中的漏洞,防止黑客利用这些漏洞进行入侵。
  • 确保合规性:许多行业都有严格的数据保护和隐私法规(如 GDPR、HIPAA 等),基础设施保护措施能够帮助企业符合这些法律法规的要求,避免高额罚款和声誉损失。
  • 增强客户信任:良好的安全记录和强有力的基础设施保护措施能够增强客户对企业的信任,从而提升企业的市场竞争力和品牌形象。

伊克罗德为一家致力于赋能企业数智化转型的服务公司,为企业提供云端架构咨询、项目迁移、混合云环境托管、培训与多样化的上云解决方案。伊克罗德服务全球数千家企业,客户涵盖互联网、媒体、游戏、电商零售、制造、汽车、金融科技、社交应用等行业,是全球云服务顾问咨询产业中值得信赖的一站式上云解决⽅案提供者。

伊克罗德认识到基础设施防护和云端配置管理的安全在信息安全中的关键作用,并制定了全面的安全防护方案。通过数据分析和先进的安全技术,伊克罗德帮助客户提升云端安全性,减少业务受攻击的风险。自 2023 年以来,伊克罗德已经处理了多起客户反馈的安全事件,包括 AK/SK 泄露、S3 Bucket 安全问题以及网络安全组公开等问题。这些问题多源于运维人员对云端安全意识的不足。针对这些问题,伊克罗德售后团队制定了详细的安全检查和优化方案,确保客户的云端环境始终处于最佳保护状态。

二. 伊克罗德 AWS 巡检方案

伊克罗德作为亚马逊云科技的高级合作伙伴,推出的 AWS 服务巡检方案能及时检查客户在 AWS 上的资源状况及利用情况。方案通过 API 收集并分析资源利用率和账单数据,覆盖 AWS EBS、EC2、VPC、RDS、S3、EKS、ElastiCache、Redshift、ECS 等常用服务。

巡检内容

1. 全面的安全合规检查

  • AWS Security Top 10:我们的巡检方案依据 AWS Security Top 10,确保环境的安全性、合规性和性能。
  • 身份和访问管理:定期检查 IAM 用户和角色权限,确保最小权限原则的实施,减少潜在攻击面。
  • 日志记录和监控:审查 CloudTrail 配置,确保所有关键操作都有记录,并设置及时的告警和监控机制。
  • 数据保护:检查 EBS 数据存储的安全性,审查 S3 上的数据保护措施以及 KMS 的安全管理,保障数据的机密性和完整性。

2. 资源优化与成本控制

  • 识别未使用资源:通过定期检查未使用的资源(如闲置的 EBS 卷),帮助客户优化资源配置,降低不必要的成本。
  • 账单分析:利用 Cost Explorer 数据分析资源使用情况,识别费用异常的资源,并提出优化建议,避免资源浪费。

3. 性能监控与优化

  • CPU 使用率监控:持续监控实例的 CPU 使用率,帮助用户识别高负载实例,及时进行性能调优,保障业务的高效运行。
  • 定期巡检与报告:通过定期巡检和生成详细报告,让客户清楚了解其 AWS 环境的健康状况和潜在风险。

4. 定制化的可视化 Dashboard

  • 直观的数据可视化:通过自定义 Dashboard,客户可以直观地查看资源使用情况,界面美观、易于理解,并能清晰展示资源的变化趋势。
  • 灵活的分析工具:利用自定义代码和强大的图表工具,实现灵活的资源使用概览,帮助客户实时监控和优化其 AWS 资源。

权限配置

  1. 使用 Role:配置具有 ReadOnly 权限的 SwitchRole 角色;
  2. 使用 SSO:提供 sso_start_url 及 ReadOnly 的权限 SwitchRole 角色;

试用对象

所有 AWS Global/CN 客户

巡检处理流程

操作频率

  • 安全设定检查:每 24 小时至少一次
  • 费用异常:依据 AWS 原生费用异常检测告警
  • 闲置资源:每周至少 1 次

优点

  1. 根据 AWS Security Top 10,定时查看云端的安全配置情况
  2. 识别审查云端资源,减少可被攻击面
  3. 定期检查云端资源的使用情况,减少资源浪费

三. 伊克罗德 AWS 巡检方案示例

伊克罗德根据客户需求定制巡检服务,以下是服务处理流程:

示例 1:成本优化

通过 API 获取账单信息,筛选出本月费用超过上个月 10% 的部分,并分类显示。

代码示例如下:

select
  case
    when prev_month.service_name is null then 'arn:' || base_month.partition || ':::' || base_month.account_id || ':cost/' || base_month.service
    else 'arn:' || prev_month.partition || ':::' || prev_month.account_id || ':cost/' || prev_month.service
  end as resource,
  case
    when base_month.cost is null then 'skip'
    when prev_month.cost is null then 'ok'
    -- adjust this value to change threshold for the alarm
    when (prev_month.cost - base_month.cost) between (base_month.cost * 0.1)  and 50 then 'info'
    when (prev_month.cost - base_month.cost) > (base_month.cost * 0.1) and (prev_month.cost - base_month.cost) > $1 then 'alarm'
    else 'ok'
  end as status

显示结果(以下为 AWS CN Account 检测数据):

示例 2:巡检数据图形化产出

根据客户需求,定时导入数据到内部开发的图表工具中,生成可视化看板,帮助客户清晰查看资源状态。通过图表,客户可以直观分析 AWS 资源使用和费用情况,快速识别异常资源和潜在浪费,优化资源管理。

示例 3:伊克罗德可视化监测面板

伊克罗德基于客户 AWS 资源使用情况,提供自定义 Dashboard 解决方案。通过代码识别闲置资源(如未使用 EBS 卷、EIP、NAT 等)、分析实例 CPU 利用率等,帮助优化资源配置、降低成本、提升性能表现。通过监控实例的 CPU 使用率,帮助用户识别高负载实例,及时进行性能调优。使用图表进行数据可视化,界面美观且易于理解,能够清晰显示资源变化情况。

1. 面板样例

仪表板包含顶部的 AWS 基本资源概况、分析及性能与利用率三部分。内容如下:

  • 基本资源概况:EBS、EC2、unattached EBS、VPC、RDS、S3。
  • 分析视图:实例按状态、区域、账户分布。
  • 性能与利用率:过去 7 天中 CPU 使用率最高的前 10 个 EC2 及 RDS 实例。

2. 配置设定

以上示例为根据 AWS 基础服务资源实现的一个简易 Dashboard,其配置示例如下:

mod "local" {
  title = "Insight-mod-ECR"
}
dashboard "dashboard_total_ec2" {
  title = "ECR Dashboard"
  text {
    value = "ECR will use this dashboard to show you account resource usage and help you optimize your observation resource usage!"
  }
  container {
     title = "AWS Basic Resources Overview"
    # Analysis
    card {
      query = query.ebs_volume_count
      width = 2
    }
    card {
      sql = query.ec2_instance_count.sql
      width = 2
    }
    card {
      query = query.ebs_volume_unattached_count
      width = 2
    }
    card {
      query = query.vpc_count
      width = 2
    }
    card {
      query = query.rds_db_cluster_count
      width = 2
    }
    card {
      query = query.s3_bucket_count
      width = 2
    }
}
  container {
     title = "Analysis"
    chart {
      title = "Instances by State"
      query = query.ec2_instance_by_state
      type  = "donut"
      width = 4
    }
    chart {
      title = "Instances by Region"
      query = query.ec2_instance_by_region
      type  = "column"
      width = 4
    }
    chart {
      title = "Instances by Account"
      query = query.ec2_instance_by_account
      type  = "column"
      width = 4
    }

}
  container {
    title = "Performance & Utilization"
    chart {
      title = "Top 10 CPU - Last 7 days"
      query = query.ec2_top10_cpu_past_week
      type  = "line"
      width = 6
    }
    chart {
      title = "Top 10 CPU - Last 7 days"
      query = query.rds_db_instance_top10_cpu_past_week
      type  = "line"
      width = 6
    }
  }
}

#AWS Basic Resources Overview
query "ebs_volume_count" {
  sql = <<-EOQ
    select
      count(*) as "Volumes"
    from
      aws_ebs_volume;
  EOQ
}
query "ec2_instance_count" {
  sql = <<-EOQ
    select count(*) as "Instances" from aws_ec2_instance
  EOQ
}
query "ebs_volume_unattached_count" {
  sql = <<-EOQ
    select
      count(*) as value,
      'Vol Not In-Use' as label,
      case count(*) when 0 then 'ok' else 'alert' end as "type"
    from
      aws_ebs_volume
    where
      jsonb_array_length(attachments) = 0;
  EOQ
}
query "vpc_count" {
  sql = <<-EOQ
    select count(*) as "VPCs" from aws_vpc;
  EOQ
}
query "rds_db_cluster_count" {
  sql = <<-EOQ
    select count(*) as "DB Clusters" from aws_rds_db_cluster;
  EOQ
}
query "s3_bucket_count" {
  sql = <<-EOQ
    select count(*) as "Buckets" from aws_s3_bucket;
  EOQ
}
#Analysis
query "ec2_instance_by_region" {
  sql = <<-EOQ
    select
      region,
      count(i.*) as total
    from
      aws_ec2_instance as i
    group by
      region
  EOQ
}
query "ec2_instance_by_account" {
  sql = <<-EOQ
    select
      a.title as "Account",
      count(i.*) as "total"
    from
      aws_ec2_instance as i,
      aws_account as a
    where
      a.account_id = i.account_id
    group by
      a.title
    order by
      count(i.*) desc;
  EOQ
}
query "ec2_instance_by_state" {
  sql = <<-EOQ
    select
      instance_state,
      count(instance_state)
    from
      aws_ec2_instance
    group by
      instance_state
  EOQ
}

#Performance & Utilization
query "ec2_top10_cpu_past_week" {
  sql = <<-EOQ
    with top_n as (
    select
      instance_id,
      avg(average)
    from
      aws_ec2_instance_metric_cpu_utilization_daily
    where
      timestamp  >= CURRENT_DATE - INTERVAL '7 day'
    group by
      instance_id
    order by
      avg desc
    limit 10
  )
  select
      timestamp,
      instance_id,
      average
    from
      aws_ec2_instance_metric_cpu_utilization_hourly
    where
      timestamp  >= CURRENT_DATE - INTERVAL '7 day'
      and instance_id in (select instance_id from top_n)
    order by
      timestamp;
  EOQ
}
query "rds_db_instance_top10_cpu_past_week" {
  sql = <<-EOQ
    with top_n as (
      select
        db_instance_identifier,
        avg(average)
      from
        aws_rds_db_instance_metric_cpu_utilization_daily
      where
        timestamp  >= CURRENT_DATE - INTERVAL '7 day'
      group by
        db_instance_identifier
      order by
        avg desc
      limit 10
  )
  select
      timestamp,
      db_instance_identifier,
      average
    from
       aws_rds_db_instance_metric_cpu_utilization_hourly
    where
      timestamp  >= CURRENT_DATE - INTERVAL '7 day'
      and db_instance_identifier in (select db_instance_identifier from top_n)
    order by
      timestamp;
  EOQ
}

四. 巡检项目清单示例

检查项目(Security
1. Accurate account information
Ensure security contact information is registered R
2. Use multi-factor authentication (MFA)
IAM root user MFA should be enabled R
IAM users with console access should have MFA enabled R
IAM administrator users should have MFA enabled R
3. No hard-coding secrets
EC2 auto scaling group launch configurations user data should not have any sensitive data
CloudFormation stacks outputs should not have any secrets R
CodeBuild project plaintext environment variables should not contain sensitive AWS values R
EC2 instances user data should not have secrets R
ECS task definition containers should not have secrets passed as environment variables R
4. Limit security groups
EC2 instances should not be attached to ‘launch wizard’ security groups R
VPC default security group should not allow inbound and outbound traffic R
VPC Security groups should only allow unrestricted incoming traffic for authorized ports R
VPC security groups should restrict ingress from 0.0.0.0/0 or ::/0 to cassandra ports 7199 or 9160 or 8888 R
VPC security groups should restrict ingress from 0.0.0.0/0 or ::/0 to memcached port 11211 R
VPC security groups should restrict ingress from 0.0.0.0/0 or ::/0 to mongoDB ports 27017 and 27018 R
VPC security groups should restrict ingress from 0.0.0.0/0 or ::/0 to oracle ports 1521 or 2483 R
VPC security groups should restrict ingress Kafka port access from 0.0.0.0/0 R
VPC security groups should restrict ingress redis access from 0.0.0.0/0 R
VPC security groups should restrict ingress SSH access from 0.0.0.0/0 R
VPC security groups should restrict ingress TCP and UDP access from 0.0.0.0/0 R
Security groups should not allow unrestricted access to ports with high risk R
5. Intentional data policies
API Gateway REST API endpoint type should be configured to private R
Ensure the S3 bucket CloudTrail logs to is not publicly accessible R
EBS snapshots should not be publicly restorable R
EC2 AMIs should restrict public access R
EC2 instances should not have a public IP address R
ECR repositories should prohibit public access R
EFS file systems should restrict public access R
EKS clusters endpoint should restrict public access R
ELB load balancers should prohibit public access R
EMR public access should be blocked at account level R
KMS CMK policies should prohibit public access R
Lambda functions should restrict public access R
RDS DB instances should prohibit public access R
Redshift clusters should prohibit public access R
S3 bucket policy should prohibit public access R
AWS S3 permissions granted to other AWS accounts in bucket policies should be restricted R
S3 buckets should prohibit public read access R
S3 buckets should prohibit public write access R
S3 public access should be blocked at account level R
S3 public access should be blocked at account and bucket levels R
SNS topic policies should prohibit public access R
SQS queue policies should prohibit public access R
SSM documents should not be public R
6. Centralize CloudTrail logs
At least one multi-region AWS CloudTrail should be present in an account R
At least one trail should be enabled with security best practices R
At least one enabled trail should be present in a region R
7. Validate IAM roles
Ensure that IAM Access analyzer is enabled for all regions R
IAM Access analyzer should be enabled without findings R
IAM roles should not have read only access for external AWS accounts R
IAM roles that have not been used in 60 days should be removed R
IAM role trust policies should prohibit public access R
8. Rotate keys
Ensure there is only one active access key available for any single IAM user R
IAM user access keys should be rotated at least every 90 days R
检查项目(Cost)
EC2
Application load balancers having no targets attached should be deleted R
Gateway load balancers having no targets attached should be deleted R
EC2 instances should not use older generation t2, m3, and m4 instance types R
Network load balancers having no targets attached should be deleted R
EBS
 EBS volumes attached to stopped instances should be reviewed R
Are there any EBS volumes with low usage? R
Still using gp2 EBS volumes? Should use gp3 instead R
Are there any unattached EBS volumes? R
VPC
Unattached elastic IP addresses (EIPs) should be released R
Unused NAT gateways should be deleted R
RDS
Are there RDS instances using previous gen instance types? R
RDS DB instances with a low number of connections per day should be reviewed R
S3
Buckets should have lifecycle policies R

五. 巡检告警通知样例

检查名称 Ensure MFA is enabled for the root account
安全维度 身份和访问控制
检查编号 check113
资源类型 IAM
威胁描述 The root account is the most privileged user in an AWS account. MFA adds an extra layer of protection on top of a user name and password. With MFA enabled when a user signs in to an AWS website they will be prompted for their user name and password as well as for an authentication code from their AWS MFA device. When virtual MFA is used for root accounts it is recommended that the device used is NOT a personal device but rather a dedicated mobile device (tablet or phone) that is managed to be kept charged and secured independent of any individual personal devices. (“non-personal virtual MFA”) This lessens the risks of losing access to the MFA due to device loss / trade-in or if the individual owning the device is no longer employed at the company.
缓解措施 Using IAM console navigate to Dashboard and expand Activate MFA on your root account.
参考文档 https://docs.thinkwithwp.com/IAM/latest/UserGuide/id_root-user.html#id_root-user_manage_mfa
资源列表 us-east-1: MFA is not ENABLED for root account

六. 参考链接

https://thinkwithwp.com/cn/blogs/security/top-10-security-items-to-improve-in-your-aws-account

https://docs.thinkwithwp.com/zh_cn/IAM/latest/UserGuide/getting-started-roles.html

https://docs.thinkwithwp.com/zh_cn/singlesignon/latest/userguide/using-the-portal.html

如果您对此方案感兴趣,可以通过如下方式联系:

联系邮箱:tech-support@ecloudrover.com

本篇作者

Jessie Liu

刘盼盼,伊克罗德 MSP 售后工程师,负责公司内 PLES&MSP 客户的运营与支持维护,持有 SAP、DAS、ANS、MLS 等证书,善于分析客户的费用使用情况和账户安全并提供加固建议,致力于利用工具帮助亚马逊云客户提升效率并节约成本。

Harris Han

亚马逊云科技合作伙伴技术客户经理,负责合作伙伴的企业级客户架构和成本的优化、技术支持与服务等工作,同时致力于 AWS 在国内和全球的应用及企业级服务的推广,并在产品部署、网络安全,桌面云,服务器虚拟化,企业运维管理等领域拥有丰富的设计与实践经验。