[Inquiry] Questions regarding code benchmarks (Qwen baselines) and fine-tuning scripts

Hello! Thank you for the impressive work on LLaDA 2.0. I really enjoyed reading the paper and I am excited to try out the model.

I have a couple of questions regarding the benchmarks and future usage:

1.  Regarding Code Benchmarks:
    I noticed that the paper compares the model against "Qwen3 8b (no think)". However, it seems that the performance of this baseline on code benchmarks (specifically BIRD and Spider) is significantly lower than that of current SOTA coding models, such as Qwen-2.5-Coder-7B-Instruct or Qwen-2.5-7B-Instruct.
    I was wondering if you have conducted any comparisons against these stronger coding baselines? It would be very helpful to see how LLaDA 2.0 stacks up against specifically optimized coding models.

2.  Fine-tuning Guidance:
    I am interested in performing further fine-tuning (either LoRA or full-parameter) on this model using my own dataset. Could you please provide some guidance or example scripts for training? Knowing the recommended hyperparameters or the training pipeline you used would be greatly appreciated.

Thank you again for your contribution to the open-source community!

Best regards


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inquiry] Questions regarding code benchmarks (Qwen baselines) and fine-tuning scripts #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Inquiry] Questions regarding code benchmarks (Qwen baselines) and fine-tuning scripts #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions