Conversation
Signed-off-by: oliver könig <okoenig@nvidia.com>
Greptile OverviewGreptile SummaryThis PR refactors the NGC PyTorch image build workflow to fix several structural issues. The changes include:
Critical Issue Remaining: The The action logic correctly handles the optional use of these parameters (checking if Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant GHA as GitHub Actions
participant PreFlight as pre-flight job
participant BuildWheels as build_wheels job
participant BuildNGC as build_wheels_for_ngc job
participant Action as build-pytorch-wheel action
participant Release as GitHub Release
alt Release Event
User->>GHA: Publish Release
GHA->>PreFlight: Trigger with release event
PreFlight->>PreFlight: Build matrix (release mode)
PreFlight->>PreFlight: Run check_for_ngc_images.sh
PreFlight->>PreFlight: Create ngc-images-matrix with os, release-version, container-image
else Workflow Dispatch
User->>GHA: Manually trigger with inputs
GHA->>PreFlight: Trigger with workflow_dispatch
PreFlight->>PreFlight: Build matrix (inputs mode)
PreFlight->>PreFlight: Create ngc-images-matrix from inputs
end
PreFlight->>BuildWheels: Pass build-wheel-matrix
PreFlight->>BuildNGC: Pass ngc-images-matrix
alt NGC Image Not Provided
BuildWheels->>Action: Call with all params (python-version, cuda-version, etc.)
Action->>Action: Build custom Docker image from scratch
Action->>Action: Build wheel in container
Action->>BuildWheels: Return wheel
BuildWheels->>Release: Upload wheel as release asset
end
alt NGC Image Provided
BuildNGC->>Action: Call with release-version + base-image only
Note over BuildNGC,Action: ⚠️ Missing required params:<br/>python-version, cuda-version,<br/>cudnn-version, torch-version,<br/>cxx11_abi, aarch
Action->>Action: Pull NGC Docker image
Action->>Action: Build wheel in NGC container
Action->>BuildNGC: Return wheel
BuildNGC->>Release: Upload wheel as release asset
end
|
| uses: ./.github/actions/build-pytorch-wheel | ||
| id: build-pytorch-wheel | ||
| with: | ||
| release-version: ${{ inputs.release-version }} |
There was a problem hiding this comment.
In release events, inputs.release-version is undefined. Should use matrix.release-version instead.
| release-version: ${{ inputs.release-version }} | |
| release-version: ${{ matrix.release-version }} |
| id: build-pytorch-wheel | ||
| with: | ||
| release-version: ${{ inputs.release-version }} | ||
| base-image: ${{ matrix.container-image }} |
There was a problem hiding this comment.
Missing required parameters: python-version, cuda-version, cudnn-version, torch-version, cxx11_abi, and aarch. The action at .github/actions/build-pytorch-wheel/action.yml:12-32 requires these inputs when base-image is not provided. Since NGC images may not have these pre-configured, ensure they're either in the matrix or passed explicitly.
Signed-off-by: oliver könig <okoenig@nvidia.com>
| MATRIX=$(echo '{ | ||
| "os": ["ubuntu-22.04", "ubuntu-22.04-arm"], | ||
| "release-version": ["${{ inputs.release-version }}"], | ||
| "container-image": "$(cat ngc_images.json | jq -cr)" |
There was a problem hiding this comment.
JSON syntax error: container-image should be an array. The current command substitution with double quotes will produce invalid JSON.
| "container-image": "$(cat ngc_images.json | jq -cr)" | |
| "container-image": $(cat ngc_images.json | jq -cr) |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
| with: | ||
| release-version: ${{ matrix.release-version }} | ||
| base-image: ${{ matrix.container-image }} |
There was a problem hiding this comment.
Missing required action inputs. The action at .github/actions/build-pytorch-wheel/action.yml:12-32 marks python-version, cuda-version, cudnn-version, torch-version, cxx11_abi, and aarch as required: true. Even though they're not used when base-image is provided, GitHub Actions will fail validation without them.
Either:
- Change the action definition to mark these as
required: false, OR - Pass placeholder values here (e.g., from the matrix or hardcoded defaults)
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: