
While reviewing the ablation study results (as shown in the attached table), I noticed that the configuration of "Pre-training Data: Biased" combined with "Fine-tuning Data: Debiased" was not included in the experiments. I was wondering if there were specific technical considerations or constraints behind this omission?
From my perspective, this particular scenario holds significant practical value for the broader research community. Given that most researchers operate under limited computational budgets, it is often impractical to re-train or debias large-scale pre-training datasets from scratch. Consequently, the pre-trained weights we use are often "biased" by default.
Demonstrating whether (and to what extent) debiasing solely during the Supervised Fine-tuning (SFT) stage can mitigate these biases would provide highly valuable insights for those of us working with limited compute.
I would love to hear your thoughts on this, or if you have any preliminary results/observations regarding this specific setup.
Best regards,