PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

UrbanIng-V2X: A Large-Scale Multi-Vehicle, Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative Perception

Karthikeyan Chandra Sekaran^1*, Markus Geisler^1*, Dominik Rößle^1*
Adithya Mohan¹, Daniel Cremers², Wolfgang Utschick²
Michael Botsch¹, Werner Huber¹, Torsten Schön¹

NeurIPS 2025 ^*Equal contribution. Authors listed in alphabetical order. ¹Technische Hochschule Ingolstadt ²Technical University of Munich

In Collaboration With

Code arXiv

A visual overview of UrbanIng-V2X, showcasing synchronized multi-view perception across vehicles and infrastructure. Each frame illustrates 3D bounding-box annotations from cameras and LiDAR sensors, highlighting cooperative perception and object tracking at complex urban intersections in Ingolstadt, Germany.

Abstract

Recent cooperative perception datasets have played a crucial role in advancing smart mobility applications by enabling information exchange between intelligent agents, helping to overcome challenges such as occlusions and improving overall scene understanding. While some existing real-world datasets incorporate both vehicle-to-vehicle and vehicle-to-infrastructure interactions, they are typically limited to a single intersection or a single vehicle. A comprehensive perception dataset featuring multiple connected vehicles and infrastructure sensors across several intersections remains unavailable, limiting the benchmarking of algorithms in diverse traffic environments. Consequently, overfitting can occur, and models may demonstrate misleadingly high performance due to similar intersection layouts and traffic participant behavior. To address this gap, we introduce UrbanIng-V2X, the first large-scale, multi-modal dataset supporting cooperative perception involving vehicles and infrastructure sensors deployed across three urban intersections in Ingolstadt, Germany. UrbanIng-V2X consists of 34 temporally aligned and spatially calibrated sensor sequences, each lasting 20 seconds. All sequences contain recordings from one of three intersections, involving two vehicles and up to three infrastructure-mounted sensor poles operating in coordinated scenarios. In total, UrbanIng-V2X provides data from 12 vehicle-mounted RGB cameras, 2 vehicle LiDARs, 17 infrastructure thermal cameras, and 12 infrastructure LiDARs. All sequences are annotated at a frequency of 10 Hz with 3D bounding boxes spanning 13 object classes, resulting in approximately 712k annotated instances across the dataset. We provide comprehensive evaluations using state-of-the-art cooperative perception methods and publicly release the codebase, dataset, HD map, and a digital twin of the complete data collection environment via https://github.com/thi-ad/UrbanIng-V2X.

At a Glance

UrbanIng-V2X is a large-scale cooperative perception dataset captured at three intelligent urban intersections in Ingolstadt, Germany. It enables research in multi-vehicle perception, vehicle-to-infrastructure (V2I), and vehicle-to-vehicle (V2V) communication.

Scenarios: 34 coordinated sequences (~20 s each)
Objects: ~712k 3D annotated bounding boxes
Object classes: 13
Annotation rate: 10 Hz
License: CC BY-NC-ND 4.0 (non-commercial academic use)

UrbanIng-V2X Overview — Overview of the UrbanIng-V2X setup: 2 vehicles, 3 infrastructure intersections, 14 LiDARs, 17 thermal cameras, 12 RGB cameras.

Dataset Composition

UrbanIng-V2X provides a synchronized and spatially calibrated multi-modal dataset with contributions from both mobile and fixed sensing nodes:

Vehicles (2): Each equipped with 6 RGB cameras, 1 360° LiDAR, and 1 ADMA GNSS/IMU sensor.
Infrastructure (up to 3 poles per intersection): Each pole includes 4 LiDAR and multiple thermal cameras for 360° coverage.
Collected environments: Three distinct intersection layouts in Ingolstadt.
Applications: Cooperative object detection, tracking, trajectory prediction, and fusion evaluation.

Calibration Overview

Each sensor within UrbanIng-V2X is temporally synchronized and spatially calibrated. Calibration covers LiDAR-to-camera, LiDAR-to-GNSS/IMU, and infrastructure alignment. The provided calibration files allow reconstruction of precise spatial relationships for fusion or cooperative perception tasks.

Coordinate System

All data are expressed in a unified global coordinate system (GC) anchored to the ADMA GNSS/IMU reference frame. Each vehicle’s local frame (x-forward, y-left, z-up) is registered to this global coordinate. Infrastructure LiDARs and cameras are similarly aligned using surveyed extrinsics for interoperability.

Citation

@misc{urbaningv2x2025,
  title={UrbanIng-V2X: A Large-Scale Multi-Vehicle, Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative Perception},
  author={Karthikeyan Chandra Sekaran and Markus Geisler and Dominik Rößle and Adithya Mohan and Daniel Cremers and Wolfgang Utschick and Michael Botsch and Werner Huber and Torsten Schön},
  year={2025},
  eprint={2510.23478},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2510.23478}
}

Funded by:

Data Privacy:

🔒 Data Privacy Policy (THI)