Skip to content

flake: gen - Install Protoc invalid zip (GitHub Releases partial download) #1202

@flake-investigator

Description

@flake-investigator

CI Run Link: https://github.com/coder/coder/actions/runs/20274781010
Branch: main
Commit: 089b67761ad7b8f66404a0b3ac61f62b9cec0b74 (author: Mathias Fredriksson) — coder/coder@089b677
Timing: Failures occurred within minutes of the Slack alert (same run/day).

Failure summary

  • Workflow job: gen
  • Step: Install Protoc
  • Root cause classification: Infrastructure (external artifact download from GitHub Releases)

Key evidence (from job logs)

mkdir -p /tmp/proto
pushd /tmp/proto
curl -L -o protoc.zip https://github.com/protocolbuffers/protobuf/releases/download/v23.4/protoc-23.4-linux-x86_64.zip
unzip protoc.zip
...
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100    92  100    92    0     0   2885      0 --:--:-- --:--:-- --:--:--  2967
End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive.
Archive:  protoc.zip
unzip:  cannot find zipfile directory in one of protoc.zip or protoc.zip.zip, and cannot find protoc.zip.ZIP, period.
##[error]Process completed with exit code 9.

Related evidence from the same run (corroborating infra flake)

  • test-go-pg (windows-2022) failed to download Arsenal Image Mounter driver files due to repeated 504s from GitHub:
curl: (22) The requested URL returned error: 504
Warning: Problem : HTTP error. Will retry in 5 seconds. 5 retries left.
...
##[error]Process completed with exit code 22.
  • test-go-race-pg failed Terraform provider installation with:
Error while installing coder/coder v2.13.1: ... please try again later: 504 Gateway Timeout returned from github.com

(Tracked in existing issue below.)

Root cause

  • External artifact fetches (GitHub Releases) intermittently returned partial/invalid content or 504 Gateway Timeout.
  • Not a product code/test flake; no panic/OOM/data race signatures.

Related issues

Ownership / assignment analysis

  • The step lives in .github/workflows/ci.yaml under the gen job (Install Protoc). This is CI infra ownership.
  • Recent substantive CI maintenance has been by @kacpersaw and @ethanndickson.
  • Assigning to @kacpersaw for triage of CI download reliability in gen.

Mitigations to consider

  • Add robust retries and validation to the protoc download step (e.g., curl --retry-all-errors --retry 5 --retry-delay 2 --fail; unzip -t before install; verify checksum and retry on failure).
  • Consider mirroring protoc or using a package manager/cache.

Reproduction / next steps

  • Re-run the workflow typically succeeds (transient).
  • Update the Install Protoc step in ci.yaml to include retries and validation as above.

Quality Checklist

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions