Pickle in the Middle: Google's Vertex AI Let a Stranger Squat Your Bucket and Run Code Inside Google's Cloud

Patrick Duggan
Jun 17
4 min read

Here is the uncomfortable part of the Vertex AI flaw that Palo Alto Networks Unit 42 disclosed this week: to poison your machine-learning model and run code inside Google's own serving infrastructure, an attacker needed no access to your project, no stolen credentials, and no phishing email. They needed to know your project ID — which is frequently public — and to own any Google Cloud project with a billing account attached. That's the whole cost of entry. Unit 42 named it "Pickle in the Middle," and it is one of the cleanest demonstrations I've seen of a theme we've been hammering all year: the AI pipeline is an attack surface, and the surface is the boring plumbing nobody is watching.

A note on timing, because honesty is the brand: this is a disclosure of an already-patched bug, not a live fire. Google shipped the first fix on March 31 and completed it on April 15. Unit 42 reports no active exploitation. So this is not a "rotate everything tonight" post — it's a "understand the shape, because the shape is going to come back" post. The mechanism here is reusable, and the next vendor to derive an infrastructure name from a guessable value won't necessarily catch it in March.

The bug, in plain terms

Many Python ML models are serialized with pickle or joblib, and pickle is not a data format — it's a program. When joblib.load() opens a malicious pickle, a crafted __reduce__ method runs arbitrary Python the instant the file is loaded, before any type checking happens. Everyone in machine learning has been told this for years. The Vertex AI flaw is interesting because it found a way to feed you a hostile pickle without ever touching your account.

The vulnerable workflow lived in the google-cloud-aiplatform SDK for Python, versions 1.139.0 and 1.140.0. When you called Model.upload() without explicitly naming a staging bucket, the SDK constructed one from a predictable formula: your project ID, the string "-vertex-staging-", and your region. So a project named my-project in us-central1 got a staging bucket named, deterministically, my-project-vertex-staging-us-central1. Then came the fatal line. The SDK checked whether that bucket existed — and if it did, it used it. It never checked who owned it.

That gap is the entire vulnerability. Because the bucket name is derivable from public information, an attacker could create it first, in their own project, before you ever ran an upload.

How the squat becomes code execution

The attack runs in six moves. First, the attacker squats the bucket: they predict your staging bucket name, create it in their own project, and open it up to allAuthenticatedUsers with reader, object-creator, and object-viewer roles so your service agent can read and write to it. Second, they arm it: a Cloud Function triggered on the storage object-finalize event sits waiting to detect the moment a model lands. Third, you upload your model the normal way, with no custom staging bucket specified, exactly as the SDK documentation showed. Fourth — and this is the elegant, ugly part — the attacker's Cloud Function wins a race. Unit 42's proof-of-concept timed it: the victim uploaded a 601-byte model file, the function detected it about 804 milliseconds later, replaced it with a 2,945-byte malicious payload at 1,433 milliseconds, and Google's service agent read the poisoned file at 2,460 milliseconds — comfortably inside the roughly 2.5-second window. Fifth, you deploy your model to an endpoint, unaware anything was swapped. Sixth, the serving container calls joblib.load(), the __reduce__ payload fires inside Google's serving infrastructure, queries the GCE metadata server, and exfiltrates a service account token carrying the cloud-platform scope.

From there it's cross-tenant. That stolen service-account token let the attacker steal other models deployed in the same project, enumerate BigQuery datasets and read their ACLs, and pull Cloud Logging data that reveals GKE cluster names, container image URIs, and the shape of your internal infrastructure. The poisoned pickle was the foot in the door; the metadata-server credential was the run of the house.

The fix, and the lesson under it

Google's remediation came in two steps. Version 1.144.0 on March 31 inserted a random uuid4 into the staging bucket name, which breaks the attacker's ability to predict it. Version 1.148.0 on April 15 added the real fix: bucket-ownership verification inside Model.upload(), so the SDK now confirms you own the bucket before trusting it. If you are pinned anywhere below 1.144.0, upgrade — and if you can get to 1.148.0 or later, do that, because randomized names are a speed bump and ownership verification is the actual lock.

CVE attribution here is still murky — Unit 42's writeup of the model-upload hijack carried no CVE at publication, while related Vertex AI SDK issues from the same window have been tracked publicly under separate identifiers. We'll update if a clean mapping lands. We cap our certainty at 95% and that's one of the 5%.

The durable lesson has nothing to do with Google specifically. Any time a system derives the name of a trust boundary — a bucket, a queue, a registry path, a topic — from a value an outsider can guess, "it exists, so I'll use it" is a vulnerability waiting for someone to create the resource first. Existence is not ownership. We've watched this exact failure mode play out in npm scope takeovers, in subdomain dangling, in S3 bucket squatting for years; Vertex AI just proved it reaches all the way into a hyperscaler's managed ML serving plane. The defensive instinct is the same one we apply to our own infrastructure: never trust that a named resource is yours because it answers — verify the owner, every time.

The threat feed this post is built on

1.14M+ IOCs, STIX 2.1, precursor signals, supply-chain detection. Free API key in 30 seconds.

Get your free key → analytics.dugganusa.com/stix/register

Pickle in the Middle: Google's Vertex AI Let a Stranger Squat Your Bucket and Run Code Inside Google's Cloud

The bug, in plain terms

How the squat becomes code execution

The fix, and the lesson under it

Recent Posts

Comments