Skip to content

Fix data processing workflow artifact handling and build-resources branch commits#507

Closed
Copilot wants to merge 3 commits intomasterfrom
copilot/fix-data-processing-robustness
Closed

Fix data processing workflow artifact handling and build-resources branch commits#507
Copilot wants to merge 3 commits intomasterfrom
copilot/fix-data-processing-robustness

Conversation

Copy link
Contributor

Copilot AI commented Nov 9, 2025

Description

Fixes artifact fallback and build-resources branch issues in data processing workflows.

Problem: Deploy workflows fell back to incomplete inline processing when artifacts were missing, causing data loss. Build-resources commits excluded files in the artifact (data/, contributor-analysis/, citation_chart.webp) and failed to track new files.

Solution:

  • deploy.yaml & staging-aggregate.yaml: Replace inline fallback with workflow trigger + retry pattern

    • Trigger data-processing workflow when artifact download fails
    • Poll for completion (15min timeout, 30s interval)
    • Retry artifact download, error if still missing
  • data-processing.yml: Include all artifact files in build-resources commits

    • Copy all 6 artifact paths (previously only 2-3)
    • Use git add -A to track untracked files
    • Force push (--force) to ensure reliable updates

Fixes #503

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

  • Not Tested yet

Checklist for Content Editors and Non-Developers

  • The content is clear, accurate, and follows community guidelines.
  • All updated content has been previewed on the staging site.
  • All links, references, and formatting have been checked for correctness.
  • The change aligns with the overall style and communication goals.
  • No broken links in text/content

Checklist for Developers:

  • I have attempted to stay aligned to related code in this repository rather than reinventing the wheel.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.

Additional Notes

Validation:

  • YAML syntax verified for all modified workflows
  • Edge cases analyzed (timeouts, race conditions, missing files)
  • First few workflow runs should be monitored to verify behavior
Original prompt

This section details on the original issue you should resolve

<issue_title>Fix data processing robustness issues</issue_title>
<issue_description>Currently, we have a data processing action that runs weekly / on request and leaves an artifact with the dynamic data (Google Scholar, curated resources, tenzing). That data is then used by deploy actions.

However, if that action fails to upload an artefact, the deploy action has an incomplete fallback which leads to missing data. That rendundancy is not ideal. We need to change the deploy action to trigger the data processing if the artefact is missing/cannot be loaded, and then try again to get the artefact. If that still fails, the action should error. Same for the action that consolidates PRs for staging (staging-aggregate).

Also, the data processing action currently tries to commit the created files to a new branch, yet that fails due to an issue with untracked files. That needs to be reviewed so that all files that are included in the artefact are force-pushed into that branch.</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI self-assigned this Nov 9, 2025
Copilot AI and others added 2 commits November 9, 2025 14:48
- Deploy workflow: Trigger data-processing and retry artifact download on failure
- Staging-aggregate workflow: Apply same robust artifact handling
- Data-processing workflow: Fix build-resources branch to include all artifact files

Co-authored-by: LukasWallrich <60155545+LukasWallrich@users.noreply.github.com>
- Add 10s delay before polling to ensure triggered workflow appears
- Check both status and conclusion for better reporting
- Show workflow status during polling for better visibility

Co-authored-by: LukasWallrich <60155545+LukasWallrich@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix data processing robustness issues Fix data processing workflow artifact handling and build-resources branch commits Nov 9, 2025
Copilot AI requested a review from LukasWallrich November 9, 2025 14:54
@github-actions github-actions bot added the cicd Relevant to GitHub workflows label Nov 19, 2025
@richarddushime richarddushime deleted the copilot/fix-data-processing-robustness branch February 10, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cicd Relevant to GitHub workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix data processing robustness issues

2 participants

Comments