Yun Li | Academic Homepage

Abstract

First integration of Direct Preference Optimization (DPO) into LLM-based autonomous driving. A novel dataset of 74,040 sequences with driving preference annotations is collected, and memory-efficient fine-tuning (LoRA + 4-bit quantization) is performed on a single RTX 3090 Ti. On CARLA closed-loop, PrefDrive reduces traffic light violations by 28.1%, improves route completion by 8.5%, and reduces layout collisions by 63.5%.

Accepted, presented in Cluj-Napoca, Romania.