Dian Chen

Dian Chen 陈典

Hi! I am a machine learning researcher at Apple AI/ML on the AFM team, working on foundation model. Previously I worked on self-driving at Waabi. I obtained my Ph.D in CS at UT Austin, under the supervision of Prof. Philipp Krähenbühl . During this time, I interned at Waymo, working on multi-agent behavior prediction.

Before that, I studied at UC Berkeley majoring in Computer Science and Applied Mathematics, where I worked with Prof. Pulkit Agrawal, Prof. Deepak Pathak, Prof. Sergey Levine, Prof. Pieter Abbeel, and Prof. Jitendra Malik, working on robot manipulation.

Email / GitHub / Scholar

	Efficient Equivariant Transformer for Self-Driving Agent Modeling Scott Xu, Dian Chen, Kelvin Wong, Chris Zhang, Kion Fallah, Raquel Urtasun Conference on Computer Vision and Pattern Recognition (CVPR), 2026 website / arxiv We propose DriveGATr, a novel architecture for modeling agents that achieves SE(2)-equivariance without the computational cost of existing methods, leveraging geometric deep learning
	MotionLM: Multi-Agent Motion Forecasting as Language Modeling Ari Seff, Brian Cera, Dian Chen, Aurick Zhou, Nigamaa Nayakanti, Khaled S. Refaat, Rami Al-Rfou Benjamin Sapp International Conference on Computer Vision (ICCV), 2023 arxiv We present MotionLM, a behavior predictor that represent continuous trajectories as sequences of discrete motion tokens. MotionLM casts multi-agent motion prediction as a language modeling task over this domain.
	Coopernaut: End-to-End Driving with Cooperative Perception for Networked Vehicles Jiaxun Cui, Hang Qiu, Dian Chen, Peter Stone, Yuke Zhu Conference on Computer Vision and Pattern Recognition (CVPR), 2022 website / code / arxiv We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving. Our model encodes LiDAR information into compact point-based representations that can be transmitted as messages between vehicles via realistic wireless channels.
	Learning from All Vehicles Dian Chen, Philipp Krähenbühl Conference on Computer Vision and Pattern Recognition (CVPR), 2022 Winner of 2021 CARLA AD Challenge website / code / arxiv We present LAV, a mapless, learning-based end-to-end driving system. LAV takes as input multi-modal sensor readings and learns from all nearby vehicles in the scene for both perception and planning. At test time, LAV predicts multi-modal future trajectories for all detected vehicles, including the ego-vehicle. Our system outperforms all prior methods on the public CARLA Leaderboard by a wide margin, improving driving score by 25 and route completion rate by 24 points.
	Learning to Drive From a World on Rails Dian Chen, Vladlen Koltun, Philipp Krähenbühl (Oral Presentation) International Conference on Computer Vision (ICCV), 2021 website / code / video / arxiv We present a model-based RL method for autonomous driving and navigation tasks. The world model is factorized into a passively moving environment, and a compact ego component. Our method significantly simplifies reinforcement learning. It ranks first on the CARLA leaderboard, and outperforms state-of-the-art imitation learning and model-free reinforcement learning on driving tasks. It is also an order of magnitude more sample efficient than model-free RL on the navigation games in the ProcGen benchmark.
	Learning by Cheating Dian Chen, Brady Zhou, Vladlen Koltun, Philipp Krähenbühl Conference on Robot Learning (CoRL), 2019 website / code / video / arxiv We present a two-stage imitation learning method for vision-based driving. Our approach achieves 100% success rate on all tasks in the original CARLA benchmark, sets a new record on the NoCrash benchmark, and reduces the frequency of infractions by an order of magnitude compared to the prior state of the art.
	Learning Instance Segmentation by Interaction Deepak Pathak, Fred Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik (equal contribution) Robotics Vision Workshop, Conference on Computer Vision and Pattern Recognition (CVPR)*, 2018 website / arxiv We present a robotic system that learns to segment its visual observations into individual objects by experimenting with its environment in a completely self-supervised manner. Our system is at par with the state-of-art instance segmentation algorithm trained with strong supervision.
	Zero-Shot Visual Imitation Deepak Pathak, Parsa Mahmoudieh, Michael Luo, Pulkit Agrawal, Dian Chen, Fred Shentu, Evan Shelhamer, Jitendra Malik, Alexei Efros, Trevor Darrell (equal contribution) (Oral Presentation)* International Conference on Learning Representation (ICLR), 2018 website / arxiv We present a novel skill policy architecture and dynamics consistency loss which extend visual imitation to more complex environments while improving robustness. Experiments results are shown in a robot knot tying task and a first-person visual navigation task.
	Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulationg Ashvin Nair, Dian Chen*, Pulkit Agrawal, Phillip Isola, Jitendra Malik, Pieter Abbeel, Sergey Levine (equal contribution) IEEE International Conference on Robotics and Automation (ICRA)*, 2017 website / arxiv We present a system where a robot takes as input a sequence of images of a human manipulating a rope from an initial to goal configuration, and outputs a sequence of actions that can reproduce the human demonstration, using only monocular images as input.

Teaching

CS394D - Deep Learning - Fall 2020
Teaching Assistant

CS395T - Deep Learning Seminar - Fall 2019
Teaching Assistant

CS342 - Neural Networks - Fall 2018
Teaching Assistant

Service

IROS, ICRA, ICLR, NeurIPS, CVPR, ICML, ECCV
Conference Reviewer

RA-L, TPAMI, TIP
Journal Reviewer

template