Skip to content

Conversation

@dsikka
Copy link
Collaborator

@dsikka dsikka commented Dec 17, 2025

SUMMARY:

  • Seems like setting the collator from tuncation to default + shuffling addresses the regression we're seeing in lm-eval
  • Given the recovery values you see in these tests were determined using these settings, I think they should be how we evaluate our lm-eval tests for the time being

@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @dsikka, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a regression encountered in lm-eval by adjusting the data processing configuration. The core change involves setting the data_collator to DefaultDataCollator and enabling the shuffling of calibration samples within the oneshot_kwargs, which collectively addresses the observed issue and ensures correct evaluation behavior.

Highlights

  • Regression Fix: Addresses a regression observed in lm-eval by modifying the data collator setup.
  • Data Collator Configuration: The data_collator in oneshot_kwargs is now explicitly set to DefaultDataCollator().
  • Calibration Sample Shuffling: Enabled shuffling of calibration samples by setting shuffle_calibration_samples to True within oneshot_kwargs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a regression in lm-eval by changing the data collator used during the oneshot process. The changes involve updating the e2e testing utility to use DefaultDataCollator and enable shuffling of calibration samples. My review identifies a significant issue where the data_collator is unconditionally overwritten, which would break tests for datasets that use a custom collator. I've provided a suggestion to fix this by only setting the default collator if one isn't already present, ensuring that existing specialized test configurations for datasets like 'flickr30k' and 'calibration' continue to function correctly.

@HDCharles
Copy link
Collaborator

I'm unsure if we should make this change, if the new approach is lower variance and we're confident it's not an actual regression, should we just update the thresholds?

@dsikka
Copy link
Collaborator Author

dsikka commented Dec 17, 2025

I'm unsure if we should make this change, if the new approach is lower variance and we're confident it's not an actual regression, should we just update the thresholds?

Replied on slack

@dsikka dsikka marked this pull request as ready for review December 17, 2025 14:53
@HDCharles
Copy link
Collaborator

ok, we can make this change for now and then decide what we want long term later

HDCharles
HDCharles previously approved these changes Dec 17, 2025
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! One breakpoint to remove

@dsikka dsikka added the ready When a PR is ready for review label Dec 18, 2025
Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
Signed-off-by: Dipika Sikka <ds3822@columbia.edu>
@dsikka dsikka merged commit 2427825 into main Dec 19, 2025
11 checks passed
@dsikka dsikka deleted the fix_lmeval_setup branch December 19, 2025 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants