Abstract

Generalized MLLM framework that maps driving scenes directly to control actions, with a unified prompt-and-image interface across closed-loop benchmarks.

Published, IEEE RA-L 2024.