Multi-PrefDrive: Optimizing Large Language Models for Autonomous Driving Through Multi-Preference Tuning
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025
Abstract
Extends binary DPO to multi-dimensional preference tuning via the Plackett–Luce model. The multi-rejected dataset contains 148,080 sequences (592,320 prompt–response pairs) with risk-categorized alternatives. PLDPO outperforms DPO/IPO/BCO and yields 11.0% overall improvement, 83.6% reduction in infrastructure collisions, and perfect traffic-signal compliance on CARLA Town 04.
Accepted, presented in Hangzhou, China.