LoRA for Classification #921

UgurKap · 2025-12-05T00:55:42Z

UgurKap
Dec 5, 2025

Hey, I have just implemented the LoRA for classification fine-tuning. In the end, I have noticed the comment about how LoRA is slower, due to the added inference cost but that can be negated on larger models.

My question is, if it makes sense to compare this model against last layer LoRA fine-tuning? Because in classification fine-tuning, we only fine-tuned the layers from the last transformer block onwards and did not touch the first layer weights. I think here, using LoRA only on the last transformer block and the out_head will be fairer.

Here are my results for different experiments (I have a slightly different dataset and training loop, so the numbers differ from the book):

Method	Train	Validation	Test	Training Time
Classification Fine-Tuning (Last Transformer Block Onwards)	99.33%	97.32%	97.33%	0.81 Minutes
LoRA (All Linear Layers)	97.79%	97.32%	94.00%	2.27 Minutes
LoRA (Last Transformer Block Onwards)	96.54%	97.99%	96.00%	1.13 Minutes

I think here, we see the training time cost is not that high when done on the same layers, and also in my experiment with zero hyperparameter tuning, last layer training performed better too (I assumed this might be caused by lora updates to initial layers, causing some "forgetting")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LoRA for Classification #921

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

LoRA for Classification #921

Uh oh!

UgurKap Dec 5, 2025

Replies: 0 comments

UgurKap
Dec 5, 2025