DrivIng: A Large-Scale Multimodal Driving Dataset with Full Digital Twin Integration


IEEE IV 2026
1Technische Hochschule Ingolstadt 2Technical University of Munich

In Collaboration With

Technische Hochschule Ingolstadt
AI-Motion Bavaria
Technical University of Munich
CVIMS Research Group
DrivIng dataset and digital twin visualization

This visualization illustrates the core features of DrivIng and its digital twin. The left panel shows a real-world satellite view of the track and its fully geo-referenced digital twin, aligned with a location marker indicating the vehicle’s position. The right panel presents the synchronized sensor suite, including six camera views and a LiDAR frame. The top row displays real-world images, while the bottom row shows the corresponding CARLA simulation with all real-world objects precisely mapped. All images and the LiDAR frame include class-colored 3D bounding boxes for clear object distinction.

Satellite image © Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community.

Abstract

Perception is a cornerstone of autonomous driving, enabling vehicles to understand their surroundings and make safe, reliable decisions. Developing robust perception algorithms requires large-scale, high-quality datasets that cover diverse driving conditions and support thorough evaluation. Existing datasets often lack a high-fidelity digital twin, limiting systematic testing, edge-case simulation, sensor modification, and sim-to- real evaluations. To address this gap, we present DrivIng, a large-scale multimodal dataset with a complete geo-referenced digital twin of a ∼ 18 km route spanning urban, suburban, and highway segments. Our dataset provides continuous recordings from six RGB cameras, one LiDAR, and high-precision ADMA- based localization, captured across day, dusk, and night. All sequences are annotated at 10 Hz with 3D bounding boxes and track IDs across 12 classes, yielding ∼ 1.2 million annotated instances. Alongside the benefits of a digital twin, DrivIng allows a 1-to-1 transfer of real traffic into simulation, preserving interactions between agents while enabling realistic and flexible scenario testing. We benchmark DrivIng with state-of-the-art perception models and publicly release the dataset, digital twin, HD map, and codebase to support reproducible research and robust validation.

At a Glance

DrivIng is a real-world autonomous driving dataset with a fully geo-referenced CARLA digital twin, designed for systematic sim-to-real evaluation and reproducible benchmarking across matched real and simulated environments. It provides synchronized multi-sensor streams (six RGB cameras, LiDAR, and ADMA GNSS/IMU), together with 3D bounding-box annotations for 3D detection, tracking, and multi-modal perception.

DrivIng real-world and CARLA digital twin overview
This visualization illustrates the core features of DrivIng and its digital twin. The left panel shows a real-world satellite view of the track and its fully geo-referenced digital twin, aligned with a location marker indicating the vehicle’s position. The right panel presents the synchronized sensor suite, including six camera views and a LiDAR frame. The top row displays real-world images, while the bottom row shows the corresponding CARLA simulation with all real-world objects precisely mapped. All images and the LiDAR frame include class-colored 3D bounding boxes for clear object distinction. Satellite image © Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community.

Dataset Composition

DrivIng provides a synchronized and spatially calibrated multi-modal dataset paired with a geo-referenced CARLA digital twin. The dataset is organized into three continuous sequences recorded under different illumination conditions:

  • Sequences (3): Day, Dusk, and Night continuous runs.
  • Coverage: ~18 km route spanning urban, suburban, and highway segments (unique track length ~16 km).
  • Sensors: 6 RGB cameras (360° coverage), 1 LiDAR, and 1 ADMA GNSS/IMU (geo-referencing and motion).
  • Annotations: 10 Hz 3D bounding boxes with track IDs in the LiDAR point cloud, aligned to the vehicle reference frame.
  • Classes: 12 object categories with class-colored 3D boxes for clear visual distinction.
  • Scale: ~63k annotated frames (~378k camera images, ~63k LiDAR frames) and ~1.2M labeled instances.
  • Environments: Highway, suburban streets, urban roads, and construction zones.
  • Privacy: Faces and license plates are anonymized in camera images.

Calibration Overview

Each sensor within DrivIng is temporally synchronized and spatially calibrated to support frame-accurate multi-sensor fusion and direct comparison between real-world recordings and the CARLA digital twin. Calibration covers intrinsics for each camera, extrinsics between cameras, LiDAR, and the ADMA GNSS/IMU, and consistent alignment to the vehicle reference frame for reliable 3D annotation projection and evaluation.

Calibration overview for DrivIng
Temporal synchronization and spatial calibration overview for DrivIng’s multi-sensor setup (six cameras, LiDAR, and ADMA GNSS/IMU), supporting consistent real-to-sim alignment and reproducible multi-modal evaluation.

Coordinate System

All sensor data and 3D annotations are provided in a consistent vehicle-centric reference frame to enable reliable multi-sensor fusion and benchmarking. The dataset includes the necessary rigid transforms to map between the individual sensor frames (cameras, LiDAR, IMU) and the vehicle coordinate frame. The coordinate convention follows a standard automotive setup with a right-handed axis definition.

Vehicle coordinate system and sensor placement
Vehicle coordinate system and sensor placement used in DrivIng. The figure illustrates the six-camera rig, LiDAR position, IMU placement, and the vehicle geometric center, together with the axis convention used for expressing 3D annotations and sensor extrinsics.

Citation

@misc{rößle2026drivinglargescalemultimodaldriving,
      title={DrivIng: A Large-Scale Multimodal Driving Dataset with Full Digital Twin Integration}, 
      author={Dominik Rößle and Xujun Xie and Adithya Mohan and Venkatesh Thirugnana Sambandham and Daniel Cremers and Torsten Schön},
      year={2026},
      eprint={2601.15260},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.15260}, 
}
      

Funded by:

Hightech Agenda Bayern
iExodus
Baywiss

Data Privacy: