This is the official repository for Robust-R1.
Jiaqi Tang^,
Jianmin Chen^,
Wei Wei**,
Xiaogang Xu,
Runtao Liu,
Xiangyu Wu,
Qipeng Xie,
Jiafei Wu,
Lei Zhang and
Qifeng Chen*
^: Equal contribution. *: Corresponding Author. **: Co-corresponding Author.
- [2025-12-23] π₯ Online demo is now available at HF Space.
- [2025-12-23] π₯ We release the Code, Models, and Dataset on HuggingFace.
- [2025-12-22] β Our paper is now available on arXiv.
- [2025-11-08] π Our paper is accepted by AAAI 2026 Oral.
- π© Limited Interpretability: Lack of explicit mechanisms to diagnose degradation impacts on original semantic information.
- π© Isolated Optimization: Neglect of the degradation propagation relation between the visual encoder and large language model.
-
Clone the repository:
git clone https://github.com/jqtangust/Robust-R1.git cd Robust-R1 -
Create environment:
conda create -n robust_r1 python=3.10 conda activate robust_r1 bash setup.sh
-
The following checkpoints are utilized to run Robust-R1:
Checkpoint Link Note Qwen2.5-VL-Base link Used as initial weights for training. Robust-R1-SFT link Fine-tuned on Robust-R1 dataset Robust-R1-RL link Fine-tuned with reinforcement learning on Robust-R1 dataset
-
Run the command-line demo with a question:
# if you use local weight export MODEL_PATH="your_model_name_or_path" python demo.py "What type of vehicles are the people riding?\n0. trucks\n1. wagons\n2. jeeps\n3. cars\n"
-
Set the model path as an environment variable and run the demo:
# if you use local weight export MODEL_PATH="your_model_name_or_path" python app.py
-
The demo will be available at
http://localhost:7860by default. -
GUI Online Demo.
We employ LLaMA-Factory for supervised fine-tuning of the base model.
-
Clone the repository and install required dependencies:
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git cd LLaMA-Factory pip install -e ".[torch,metrics]"
-
Download the base model Qwen2.5-VL-3B-Instruct.
-
Prepare the training data and configuration files:
- Download the Robust images and unzip it.
- Modify the configuration files in the
LLaMA-Factory/datadirectory.
-
Configure the training YAML file with your local paths (model path, data path, output directory.).
-
Run the training command to train the SFT model:
llamafactory-cli train examples/train_full/qwen2_5_vl_full_sft.yaml
-
Download Robust images and unzip it in
Robust-R1/dataset. -
Prepare the training data file (train.jsonl) and organize the image folders.
-
Download the SFT model checkpoint from Robust-R1-SFT or use your own trained SFT model.
-
Replace the following part in the run_scripts/run_grpo_robust.sh file with your own paths:
data_paths="Robust-R1/data/train.jsonl" image_folders="Robust-R1/data/train_images" model_path="your_model_name_or_path"
-
Run the script:
bash run_scripts/run_grpo_robust.sh
We use VLMEvalKit for anti-degradation evaluation.
-
Clone the VLMEvalKit repository and install dependencies:
git clone https://github.com/open-compass/VLMEvalKit.git cd VLMEvalKit pip install -e .
-
Prepare the evaluation datasets according to VLMEvalKit requirements.
-
Image Degradation Pipeline: Generate corrupted images for robustness evaluation.
We provide an image degradation pipeline for generating corrupted images to evaluate model robustness.
Navigate to the degradation pipeline directory and process images:
cd add_degradation python generate_pipeline_open_source.py --input_dir <input_dir> --output_base_dir <output_base_dir> --dataset_name <dataset_name> --verbose
The script will generate three output directories with different degradation intensities for each image.
-
Configure the model path and evaluation settings in the VLMEvalKit configuration file.
-
Run the evaluation command:
python run.py --model <your_model_name_or_path> --data <dataset_name>
For R-Bench evaluation, we use R-Bench to assess model performance under real-world corruptions.
-
Clone the R-Bench repository:
git clone https://github.com/Q-Future/R-Bench.git
-
Evaluate using VLMEvalKit with R-Bench dataset:
cd VLMEvalKit python run.py --data R-Bench-Dis --model <your_model_name_or_path> --verbose
-
For full dataset evaluation, follow the R-Bench pipeline as described in the R-Bench repository.
If you find Robust-R1 useful for your research and applications, please cite using this BibTeX:
@inproceedings{tang2025robustr1,
title={Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding},
author={Tang, Jiaqi and Chen, Jianmin and Wei, Wei and Xu, Xiaogang and Liu, Runtao and Wu, Xiangyu and Xie, Qipeng and Wu, Jiafei and Zhang, Lei and Chen, Qifeng},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}The work described in this paper was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project Reference Number: AoE/E-601/24-N).
We also thank the authors of VLM-R1, LLaMA-Factory, and R-Bench for their contributions.

