AWS DevOps & Developer Productivity Blog
Tighten your package security with CodeArtifact Package Origin Control toolkit
Introduction
AWS CodeArtifact is a fully managed artifact repository service that makes it easy for organizations to securely store and share software packages used for application development. On Jul14 2022 we introduced a new feature called Package Origin Controls which allows customers to protect themselves against “dependency substitution“ or “dependency confusion” attacks.
This class of supply chain attacks can be carried out when an attacker with knowledge of an organization’s internally published package names (for example: Sample-Package=1.0.0
) is able to publish such name(s) in a public repository. Package managers contain dependency resolution logic that pulls the latest version of a package. The attacker abuses this logic by publishing a high version number of a package with the same name as the organization’s package (for example: Sample-Package=99.0.0
). The package manager then would resolve any requests for that package by pulling the attacker’s package version with malicious code instead of the internally published dependency.
In order for this type of attack to be successful, the organization must source their package versions from both internal and remote repositories at the same time. For example, your pip
installation could be configured with multiple package indexes, both internal and external; or, as a CodeArtifact user you may have both the repository containing your private packages as well as an external connection to PyPI in the upstream graph of your current repository. In either case, the package manager is able to obtain package versions from more than one source. This causes the package manager to resolve the higher version number from the remote repository, instead of the trusted internal version.
A few strategies can be used to mitigate this kind of mixing: a simple one is to instruct the package manager to only source from an internal repository. While effective, this is often not practical, because it either significantly degrades developer experience or requires a lot of effort in order to set up, maintain, and vet external dependencies. Another mitigation consists in using explicit version pinning, which is also effective, though it might re-introduce the dependency substitution risk upon dependency upgrade without manual vetting. Some package managers also support namespaces or other types of dependency scoping, which are also helpful in preventing this class of attacks, but when available may not always actionable for existing packages due to the large amount of work required to do the renaming.
CodeArtifact is adding another tool to strengthen your software supply chain by introducing per-package per-repository controls which allow you to more precisely configure and control how package versions are sourced. For each package in your repository, you are now able to decide whether to allow or block sourcing versions from both upstream sources and direct publishing. These flags enable you to prevent mixed versions scenarios for all the types of packages supported by CodeArtifact without the need for additional package manager configuration.
There are two common scenarios of particular note that are now addressed by CodeArtifact by default:
- If a brand-new package is first created by downloading (i.e., retaining) a package version from an upstream into a repository then the package has its publish flag set to
BLOCK
, and upstream flag set toALLOW
in the repository. This way, customers won’t be able to inadvertently publish new versions but will be able to continue getting new versions from upstreams. - If a brand-new package is first created by publishing a package version into a repository then the package has its upstream flag set to
BLOCK
, and publish flag set toALLOW
in the repository. With this, customers won’t be getting new versions from upstreams but will continue to be able to publish new versions of the package into the repository.
We began setting these default origin configurations for any packages created since around May 2022. However, we wanted to avoid breaking existing customer workflows as much as possible. We thus set the origin configuration for any packages older than that to continue allowing both acquiring new versions from upstreams, and publishing new versions. Note that this means that such packages aren’t currently protected from dependency confusion attacks.
Should you want to leverage this feature to tighten the security posture of your existing packages, we are releasing a toolkit to make it easier to bulk-set policy values in your repositories. This blog post describes how to use it.
Solution overview
The purpose of the Package Origin Control toolkit is to provide repository administrators with an easy way to set Origin Control policies in bulk on packages that have not received the default protection because they pre-date feature release. This can be achieved by blocking upstream versions for internal packages. In this blog post we will focus on this use-case, though the toolkit does support blocking publishing package versions to avoid a potentially vulnerable mixed state for external packages as well.
The toolkit is comprised of two scripts: a first one called generate_package_configurations.py
for creating a manifest file listing the packages in a domain alongside their proposed origin configuration to apply, and a second one named apply_package_configurations.py
, that reads the manifest file and applies the configuration within.
generate_package_configurations.py
can operate on a whole repository, or on a subset of packages (specified either via filters, or though a list) and supports two origin control resolution modes:
- A manual one where you supply the origin configuration you would like to set for all packages in scope. This is a good option if, for instance, you already maintain a list of internal packages, or if they are published in a consistent internal namespace which allows for them to be easily selected.
- An automated one, which tries to identify what packages should have their upstreams blocked by analyzing the upstream repository graph and external connections, looking for evidence that package versions are only available from the repository at hand- in which case it determines it can disable sourcing of upstream versions can be done without risk of breaking builds. This is a good option if you want a quick way to tighten your security posture without having to manually analyze your whole repository.
With the manifest created, apply_package_configurations.py
takes it as an input and effects the changes specified in it by calling the new PutPackageOriginConfiguration
API. Precisely because it is meant to set these values in bulk, this script supports backup and revert operations by default, as well as dry-run and step-by-step confirmation options. If you identify an issue after applying origin control changes, you will be able to safely revert to the original, working configuration before trying again.
In this blog post we will cover how to use these tools:
- To block package versions from upstream sources for all recommended packages in a repository
- To block upstreams for a list of packages you already have
- To revert to the original state in case of an incorrect configuration push
Prerequisites
The following prerequisites are required before you begin:
- Set up the Package Origin Control toolkit as described in the README on GitHub. You will need a working installation of Python 3.6 or later as well as the ability to install dependencies like the Python AWS SDK. The AWS CLI is not required.
- Write permissions on the CodeArtifact repository where you want to add package origin controls (see this link for additional info.)
Procedures
To block package versions from upstream sources for all recommended packages in a repository
Introduction
This procedure should be considered if you have a CodeArtifact repository with a variety of package formats and upstreams, and you want to have the toolkit automatically resolve what packages are safe to block upstreams for. It will block acquisition of new versions from upstreams only if two conditions are met:
- the target repository doesn’t have access to an external connection
- no versions of the package are available via any of its upstream repositories (either because the target repository itself doesn’t have any upstreams or because none of the upstreams have the package).
Therefore, we assume there isn’t an immediate External Connection attached to the target repository for the package format(s) you are trying to run this script against (because in that case the script would fall back on leaving things as-is for all packages).
Steps
- Make sure you have completed the required prerequisites described above
- Identify the target repository in your domain you want to automatically apply origin controls for, e.g.
myrepo
- Identify the query parameters that define the list of packages you want to target. The script supports the same filters as the
ListPackages
API.- To match all packages in a repo, run :
python generate_package_configurations.py --region us-west-2 --domain mydomain --repository myrepo
Please note that you always need to specify the AWS region and CodeArtifact domain alongside the repository. - To match only some packages in a repo, for example only Python packages whose name begins with “
internal_software_
”:python generate_package_configurations.py --region us-west-2 --domain mydomain --repository myrepo --format pypi --prefix internal_software_*
- To match all packages in a repo, run :
- If necessary, you can review the produced manifest file, which you will be able to find in the same folder under the name
origin-configuration_mydomain_myrepo.csv
(unless you have specified a different filename and path viathe --output-file
option) - If the manifest file looks correct, you can apply the changes by calling the second stage:
python apply_package_configurations.py --region us-west-2 --domain mydomain --repository myrepo --input origin-configuration_mydomain_myrepo.csv
To block package versions from upstream sources for all packages in a repository matching a list you maintain
Introduction
This procedure should be considered if you have a set of packages within your repository you know you want to apply origin control restrictions to. Rather than relying on a query, you can use such a list as an input to create a manifest.
Steps
Create a file containing a list of package names (and package names only). Multiple namespaces and formats are not supported and you will need to re-run this procedure for each. The expected file format is one package name per line. In this example we will want to block upstreams for three packages of the npm format (format is always mandatory when specifying a list of packages). As an example, a small input file is going to look something like this:
requests
numpy
django
(more information about this option can be found in the README)
Generate the manifest by supplying this file to the first stage, alongside the desired origin control configuration. In this case, we want to block upstreams for all packages in the supplied list (for more information about the origin control configuration string, consult the documentation): python generate_package_configurations.py --region us-west-2 --domain mydomain --repository myrepo --format npm --from-list inputfile.csv --set-restrictions publish=ALLOW,upstream=BLOCK
If necessary, you can review the produced manifest file, which you will be able to find in the same folder under the name origin-configuration_mydomain_myrepo.csv
(unless you have specified a different filename and path via the –output-file option)
Run the apply_package_configurations.py
script to update the package origin controls in your repository: python apply_package_configurations.py --region us-west-2 --domain mydomain --repository myrepo --input origin-configuration_mydomain_myrepo.csv
To revert to the original state in case of an incorrect configuration push
Introduction
Erroneously bulk-changing your origin configuration can lead to broken builds and confusing failure modes for developers. To mitigate this risk, the toolkit backs up the existing configuration before making any changes and lets you easily revert them if need be.
Steps
Identify the manifest containing the configuration you want to revert. The toolkit automatically creates a backup file t for every input manifest you provide. This is the file produced by the first stage, which by default takes a name like origin-configuration_[domain]_[repository], for example origin-configuration_mydomain_myrepo.csv
Run the second stage script in restore mode: python apply_package_configurations.py --region us-west-2 --domain mydomain --repository myrepo --input origin-configuration_mydomain_myrepo.csv --restore
Conclusion
In this blog post we have explained how to use the Origin Control toolkit to improve package security, focusing on restricting upstream package versions. We have demonstrated both an automated repository-wide application of the toolkit, which tries to minimize the amount of repository administrator work by applying a restriction heuristic, as well as a manual mode where a repository administrator can effect fine-grained origin control changes. Finally, we showed how these changes can be reverted using the built-in backup feature.
Author: