Command Injection / Remote Code Execution (RCE) via Insecure Deserialization in _load_ccd_pickle_cached() of chemical_components.py in AlphaFold 3 (v3.0.1) - (github.com/google-deepmind/alphafold3)
- Author: Joshua Provoste
- Website: https://joshuaprovoste.com
- Company: https://wl-inc.cl
AlphaFold 3 is an open-source development created by Google DeepMind, designed to accurately predict the three-dimensional structure of proteins and other biomolecular complexes. Its core is an AI model that infers interactions among proteins, DNA, RNA, and ligands—marking a technical leap in computational structural biology. While there is a complementary web service, AlphaFold Server, that simplifies use without local installation, the project’s real contribution is the release of its code and architecture, enabling the scientific community to study, validate, and adapt it.
This module encapsulates the PDB's "Chemical Components Dictionary" (CCD): it loads a pickle of constants, exposes it as an immutable, cached mapping (Ccd), allows injecting/merging a user CCD in mmCIF format, and provides a structured conversion (mmcif_to_info) to a dataclass (ComponentInfo) with useful fields like name, type, formula, molecular weight, and SMILES. In this way it standardizes access to chemical metadata without relying on repetitive I/O or risking accidental mutation.
It also offers query utilities: component_name_to_info caches lookups by residue name, and type_symbol resolves an atom's elemental symbol given a component and its atom_id, returning '?' when missing. In short, it serves as a constants layer plus helpers to convert and query CCD entries in pipelines that need coherent, performant chemical information.
That line reads all the bytes from the ccd.pickle file (opened in binary mode) and deserializes them with pickle.loads(...) to reconstruct in memory the CCD constants dictionary (a dict[str, Mapping[str, Sequence[str]]]) used by the module.
Since the loader is decorated with @functools.cache, that object is loaded only once per process and reused on subsequent accesses; functionally it’s equivalent to pickle.load(f), except here it uses the "loads" variant on the full buffer.
@functools.cache
def _load_ccd_pickle_cached(
path: os.PathLike[str],
) -> dict[str, Mapping[str, Sequence[str]]]:
"""Loads the CCD pickle file and caches it so that it is only loaded once."""
with open(path, 'rb') as f:
return pickle.loads(f.read())
The binary file ccd.pickle provides the data loaded by the module. Its path is defined through resources.filename(resources.ROOT / 'constants/converters/ccd.pickle'). In the Ccd class, the constructor (__init__) passes this path to _load_ccd_pickle_cached(path), which opens the file in binary mode and returns the deserialized object using pickle.loads(f.read()).
The deserialized value is a serialized dictionary of CCD constants, represented as dict[str, Mapping[str, Sequence[str]]]. It originates from a binary .pickle file so that pickle.loads can properly reconstruct the in-memory mapping.
This file must reside under the package's resource hierarchy at constants/converters/ccd.pickle, relative to resources.ROOT. This allows the module to locate it by default and cache it via @functools.cache for efficient reuse in functions such as Ccd, component_name_to_info, and type_symbol. Optionally, users may inject a custom CCD (in mmCIF text format) through the user_ccd parameter to override or extend the default dataset without modifying the pickle itself.
The line 34 of chemical_components.py expects a serialized object as a dict[str, Mapping[str, Sequence[str]]]: a dictionary whose keys are component IDs (e.g., "ARG", "HOH") and whose values are maps of mmCIF tags → tuples of str. It must include at least the fields read by the module:
_chem_comp.name, _chem_comp.type, _chem_comp.mon_nstd_parent_comp_id, _chem_comp.pdbx_synonyms, _chem_comp.formula, _chem_comp.formula_weight, and—To resolve per-atom elements: _chem_comp_atom.atom_id and _chem_comp_atom.type_symbol; optionally, SMILES descriptors: _pdbx_chem_comp_descriptor.type|descriptor|program.
Minimum (realistic) example 100% compatible with the loader:
# Object that pickle.loads(f.read()) must reconstruct:
ccd_dict = {
"ARG": {
"_chem_comp.name": ("ARGININE",),
"_chem_comp.type": ("L-PEPTIDE LINKING",),
"_chem_comp.mon_nstd_parent_comp_id": ("?",),
"_chem_comp.pdbx_synonyms": ("L-Arginine",),
"_chem_comp.formula": ("C6 H14 N4 O2",),
"_chem_comp.formula_weight": ("174.204",),
# Descriptores opcionales (permiten poblar pdbx_smiles):
"_pdbx_chem_comp_descriptor.type": ("SMILES_CANONICAL",),
"_pdbx_chem_comp_descriptor.descriptor": ("N[C@@H](CCCNC(=N)N)C(=O)O",),
"_pdbx_chem_comp_descriptor.program": ("OpenEye OEToolkits",),
# Necesarios para type_symbol():
"_chem_comp_atom.atom_id": ("N", "CA", "CB", "CG", "CD", "NE", "C", "O"),
"_chem_comp_atom.type_symbol": ("N", "C", "C", "C", "C", "N", "C", "O"),
},
"HOH": {
"_chem_comp.name": ("WATER",),
"_chem_comp.type": ("NON-POLYMER",),
"_chem_comp.mon_nstd_parent_comp_id": (".",),
"_chem_comp.pdbx_synonyms": ("H2O",),
"_chem_comp.formula": ("H2 O1",),
"_chem_comp.formula_weight": ("18.015",),
"_chem_comp_atom.atom_id": ("O", "H1", "H2"),
"_chem_comp_atom.type_symbol": ("O", "H", "H"),
},
}
- After cloning the official repository (
https://github.com/google-deepmind/alphafold3), I used thedocker/Dockerfileto deploy a persistent service that allows reproducing command injection into the operating system.
docker build -t alphafold3 -f docker/Dockerfile .
docker run -d --name af3 -w /workspace alphafold3 sleep infinity
docker exec -it af3 bash
- Now, to verify that we can indeed serialize and deserialize a
ccd.picklefile and place it where it’s expected by default (/constants/converters/ccd.pickle), we will runccd_pickle_generator.pyinside the Docker container, and then execute the verifierload_and_print_ccd_pickle.py.
python ccd_pickle_generator.py
python load_and_print_ccd_pickle.py
- Finally, to reproduce the command injection into the operating system, we will remove the
constantsdirectory (and all its contents) created in the previous step, runccd_payload_generator.pyinside the Docker container, and then run the verifierload_and_print_ccd_pickle.py; this will create accd.picklefile that will perform a curl (HTTP request):
python ccd_payload_generator.py
python load_and_print_ccd_pickle.py
Given that I'm working on Windows and inside a Docker container session, I cannot open calc.exe to demonstrate that it is possible to execute arbitrary commands on the host operating system, but it is important to emphasize that ccd_payload_generator.py is customizable to run anything.
That said, I am attaching another PoC to create a marker:
If the by default ccd.pickle file is automatically deserialized can be replaced or overwritten by a malicious actor, the vulnerability enables remote code execution in the context of any process that consumes that artifact — which in CI/CD environments translates to compromising build agents, pipelines, and deployment containers; in cloud environments (for example, Google Cloud) it can expose mounted credentials, allow API calls, and enable lateral movement within the project; and in the software supply chain it means a poisoned artifact can propagate to multiple builds and deployments, introduce persistent backdoors, and evade detection until deserialized, greatly expanding the scope and severity of the incident.
A attacker gains access to the CI pipeline (for example via leaked credentials or a compromised dependency) and replaces the ccd.pickle artifact in the artifact repository or the directory used by the build to package the image; when the pipeline runs, build agents or test containers automatically deserialize that pickle, executing the embedded payload and achieving RCE on the agent with whatever permissions it holds (often service account credentials or temporary Google Cloud mounts). With those credentials the attacker can enumerate projects, retrieve secrets, call Google Cloud APIs to create instances or exfiltrate data, and also replace the same ccd.pickle in published images or artifacts, propagating the compromise downstream into production deployments: in short, a simple file modification becomes a supply-chain vector that enables privilege escalation, persistence, and mass contagion across CI/CD and cloud infrastructure.
ccd.pickle is the file the project loads by default (packaged at constants/converters/ccd.pickle) and is deserialized automatically when Ccd() is initialized. If that default artifact is of poor provenance or quality — i.e., mutable, unsigned, stored in repositories or locations with lax permissions, or not integrity-checked before consumption — the likelihood of a successful attack rises dramatically: an adversary who can replace or overwrite that single file (in the repo, an artifact bucket, or inside an image) causes the normal software load to execute arbitrary code on the agents that consume it, enabling RCE and propagation through CI/CD and deployments. Conversely, if the default ccd.pickle is treated as immutable, signed/verified, protected by strict access controls, and managed as a trusted artifact, the attack vector is effectively mitigated — the difference between a “plastic” file and a hardened binary is the line between theoretical risk and practical exploitation.
This type of vulnerability is commonly exploited in the context of Software Supply Chain Attacks, where an unknown vulnerability is leveraged to poison or compromise services.
In this regard, it directly applies to the following types of Software Supply Chain Attacks:
- Software Supply Chain Compromise → when the attack modifies legitimate components in repositories, pipelines, or dependencies.
- Malicious Package Injection → when an actor directly uploads malicious code to an ecosystem such as npm, PyPI, or RubyGems.
- CI/CD Pipeline Compromise → when integration and deployment systems are manipulated to insert malicious code into the build process.
For example, this has been observed in well-known cases such as Ultralytics / PyPI (2024–2025), Comm100 (2022), GitHub Actions attack on "tj-actions / changed-files" and "reviewdog / action-setup" (2025), and the GitHub Actions Supply Chain Attack (2025, widespread).
- https://www.wiz.io/blog/shai-hulud-2-0-ongoing-supply-chain-attack
- https://cybersierra.co/blog/prevent-cicd-supply-chain-attacks/
- https://about.gitlab.com/blog/gitlab-discovers-widespread-npm-supply-chain-attack/
- https://www.crowdstrike.com/en-us/blog/new-supply-chain-attack-leverages-comm100-chat-installer/
- https://www.reversinglabs.com/blog/compromised-ultralytics-pypi-package-delivers-crypto-coinminer
- https://unit42.paloaltonetworks.com/github-actions-supply-chain-attack/



