Towards Vision Zero: The Accid3nD Dataset

infrastructure_sensors

Visualization of the raw Accid3nD dataset. Accidents are recorded from roadside cameras on a test bed for autonomous driving. The dataset includes scenes with collisions and overturning vehicles. Some vehicles are catching fire.

Overview

The Accid3nD dataset is the first high-quality real-world accident dataset for the 3D object detection, segmentation, tracking, trajectory prediction and accident detection task in autonomous driving.

It contains:
  • data collected by 5 sensors (cameras and LiDARs) simultaneously from onboard and roadside sensors.
  • 111,945 labeled frames of vehicle crashes at high-speed driving.
  • 2,634,233 labeled 2D bounding boxes, instance segmentation masks, 3D bounding boxes with track IDs.
  • Real accidents including vehicle rollovers, vehicles catching fire and collision events.
  • HD map of the highway.
  • Labels in OpenLABEL standard.
  • A dataset development kit to load, preprocess, visualize, convert labels, and to evaluate accident detection models.

Abstract

Even though a significant amount of work has been done to increase the safety of transportation networks, accidents still occur regularly. They must be understood as an unavoidable and sporadic outcome of traffic networks. No public dataset contains real-world accidents recorded from roadside sensors. We present the Accid3nD dataset, a collection of real-world highway accidents in different weather and lighting conditions. It contains vehicle crashes at high-speed driving with 2,634,233 labeled 2D bounding boxes, instance masks, and 3D bounding boxes with track IDs. In total, the dataset contains 111,945 labeled frames recorded from four roadside cameras and LiDARs at 25 Hz. The dataset contains six object classes and is provided in the OpenLABEL format. We propose an accident detection model that combines a rule-based approach with a learning-based one. Experiments and ablation studies on our dataset show the robustness of our proposed method. The dataset, model, and code are publicly available on our project website.

Sensor Setup

The following roadside sensors were used:
  • 4x Basler ace acA1920-50gc, 1920×1200, Sony IMX174 with 16 mm and 50 mm lenses, frame rate: 50 Hz
  • 1x Valeo LiDAR (SCALA B2), 16 vertical layers, 133° horiz. FOV, 0.125° x 0.6° angular resolution, 200 m range (@80% reflectivity), frame rate: 25 Hz
infrastructure_sensors

Overview of the test field. The green area on the highway marks the area that was used to record the data with four roadside cameras, four radars, and one LiDAR.

infrastructure_sensors

Visualization of roadside sensors used to record our Accid3nD Dataset from roadside infrastructure perspective.

Architecture

infrastructure_sensors

Accident detection pipeline. We use advanced 3D perception techniques and multi-sensor data fusion to create a real-time digital twin of the traffic. Starting with raw camera images, the framework first performs 3D object detection using MonoDet3D to identify and localize vehicles in three dimensions. Following detection, Poly-MOT tracking is applied to maintain continuity across frames, while sensor data fusion combines inputs from four roadside cameras and four radars. The digital twin is then used in two accident detection modules: 1) The Rule-based Accident Detection module extracts features such as lane IDs, distance matrices, and velocities, identifying potential accidents through predefined maneuver detection rules. 2) The Learning-based Accident Detection module employs a YOLOv8 object detector, trained on a custom dataset, to detect accident events. The final output includes the object’s location, confidence score, class, velocity, and detected scenario or maneuver.

Accid3nD Dataset

infrastructure_sensors

Visualization of the labeled Accid3nD dataset with 3D box annotations, instance masks, track IDs, and trajectories. Accidents are recorded from roadside cameras on a test bed for autonomous driving. The dataset includes scenes with collisions and overturning vehicles. Some vehicles are catching fire.

Statistics

infrastructure_sensors

Distribution of labeled object classes in the Accid3nD dataset.

infrastructure_sensors

Average and max. track lengths of all labeled object classes.

infrastructure_sensors

Histogram of labeling distances.

infrastructure_sensors

Histogram of the number of 3D box labels.

infrastructure_sensors

Histogram of the track lengths.

infrastructure_sensors

Lane distribution of all labeled objects on the highway.

infrastructure_sensors

Visualization of speed values for each labeled category. We show the average and maximum speed values for all categories. The average speed is 107 km/h in the dataset.

infrastructure_sensors

Distribution of accident types.

infrastructure_sensors

Heatmap visualization of traffic participant locations. The left lane on the highway towards the north direction indicates a high traffic volume.

Quantitative Evaluation Results

infrastructure_sensors

Quantitative evaluation results of the accident detection module (rule-based approach) over a 128-day monitoring period. The statistics include the detected vehicles, maneuvers, scenarios, and accident events.

Qualitative Evaluation Results

infrastructure_sensors

Qualitative visualization results of our accident detection framework on the Accid3nD test set. The rule-based approach detected a rear-end collision.

infrastructure_sensors

Qualitative visualization results of our accident detection framework on the Accid3nD test set. The learning-based approach detected a car crash with a confidence threshold of 0.8.

Sequence 01: Sequence 02: Sequence 03: Sequence 04: Sequence 05: Sequence 06: Sequence 07: Sequence 08: Sequence 09: Sequence 10: Sequence 11: Sequence 12:

Experiments

Event Runtime
Rule-based Learning-based
Sequence S01, part I 8.63 484.69
Sequence S01, part II 6.60 474.69
Sequence S13, part I 5.02 240.43
Sequence S13, part II 5.33 248.16
Avg. (with 2 cameras) 5.17 244.30
Avg. (with 4 cameras) 7.61 479.69
Average (overall) 6.39 361.99
Runtime comparison of AccidentDet3D on a 15 minute long traffic recording containing 20,000 frames. We compare our rule-based and learning-based accident detection based on the runtime (in seconds). We ablate on the number of cameras used for the detection task to show the scalability of our approach.

Benchmark

Method Accuracy Runtime [s]
Precision Recall F1-Score 2 cameras 4 cameras
Rule-based Approach 1.000 0.500 0.667 0.086 0.127
Learning-based Approach 0.800 1.000 0.889 4.072 7.995
Accident detection results of AccidentDet3D on our Accid3nD test set. We compare our rule-based and learning-based accident detection approach on our test set.