Advanced Autonomous Vehicle Perception System
Real-time perception powered by TensorRT on NVIDIA Jetson
Features • Architecture • Installation • Usage • Nodes
VisionSense-backend.mp4
VisionSense is a comprehensive ROS2-based computer vision system designed for autonomous vehicles running on NVIDIA Jetson platforms with JetPack 6.2. It provides a complete perception pipeline with real-time object detection, lane detection, traffic sign recognition, stereo depth estimation, and driver monitoring capabilities.
| Feature | Description | Model/Method |
|---|---|---|
| Object Detection | Detect vehicles, pedestrians, cyclists, traffic signs/lights | YOLOv8 + TensorRT |
| Multi-Object Tracking | Track objects across frames with unique IDs | BYTE Tracker + Kalman Filter |
| Lane Detection | Segment and detect lane lines | Neural Network + TensorRT |
| Traffic Sign Recognition | Classify 50+ traffic sign types | YOLOv8 Classifier + TensorRT |
| Stereo Depth Estimation | Dense depth maps from stereo camera | LightStereo + TensorRT |
| Driver Monitoring | Face detection and gaze estimation | YOLOv11 + ResNet18 + TensorRT |
| Data Fusion GUI | Real-time visualization of all perception data | OpenCV + X11 |
| Web Dashboard | Remote monitoring interface | HTTP Server |
┌─────────────────────────────────────────────────────────────────────────────┐
│ VisionSense Architecture │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Mono Camera │ │Stereo Camera │ │ IMU/GPS │ │
│ │ (CSI/USB) │ │ (Arducam) │ │ Module │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ camera │ │ camera_stereo│ │ imu_gps │ │
│ │ node │ │ node │ │ node │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ▼ ├────────┬───────────┘ │
│ ┌──────────────┐ │ │ │
│ │ driver │ ▼ ▼ │
│ │ monitor │ ┌─────────┐ ┌─────────┐ │
│ └──────┬───────┘ │ detect │ │ stereo │ │
│ │ │ node │ │ depth │ │
│ │ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ │ ┌────┴────┐ │ │
│ │ ▼ ▼ │ │
│ │ ┌─────────┐ ┌─────────┐ │ │
│ │ │classify │ │ lanedet │ │ │
│ │ │ node │ │ node │ │ │
│ │ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │ │
│ │ └─────┬─────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────┐ │ │
│ │ │ adas │ │ │
│ │ │ node │ │ │
│ │ └────┬─────┘ │ │
│ │ │ │ │
│ └───────────────┼──────────────┘ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ GUI │ │ Dashboard │ │
│ │ (Display) │ │ (Web) │ │
│ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Component | Requirement |
|---|---|
| Hardware | NVIDIA Jetson Orin Nano/NX/AGX |
| OS | Ubuntu 22.04 (JetPack 6.2) |
| ROS2 | Humble Hawksbill |
| CUDA | 12.6+ |
| TensorRT | 10.x |
| OpenCV | 4.x with CUDA support |
Captures video from mono cameras (CSI or USB) for driver monitoring.
| Parameter | Type | Default | Description |
|---|---|---|---|
resource |
string | csi://0 |
Camera source URI |
width |
int | 1280 | Frame width |
height |
int | 720 | Frame height |
Topics Published:
/camera/raw(sensor_msgs/Image) - Raw camera frames
Supported Sources:
- CSI Camera:
csi://0 - USB Camera:
v4l2:///dev/video0 - Video File:
file:///path/to/video.mp4
Handles Arducam stereo camera with synchronized left/right image capture and CUDA-accelerated rotation.
| Parameter | Type | Default | Description |
|---|---|---|---|
resource |
string | /dev/video1 |
V4L2 device path |
width |
int | 3840 | Full stereo width (1920×2) |
height |
int | 1200 | Stereo height |
framerate |
int | 30 | Capture framerate |
rotated_lenses |
bool | true | Apply 90° rotation to each eye |
flip |
string | "" |
Flip before split: rotate-180, vertical-flip, horizontal-flip, or empty |
Topics Published:
/camera_stereo/left/image_raw(sensor_msgs/Image) - Left camera (1200×1200)/camera_stereo/right/image_raw(sensor_msgs/Image) - Right camera (1200×1200)
CUDA Kernels:
- Left eye: 90° counter-clockwise rotation
- Right eye: 90° clockwise rotation
Computes dense depth maps using LightStereo neural network with TensorRT acceleration.
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | LightStereo-S-KITTI.engine |
TensorRT engine path |
max_disparity |
float | 192.0 | Maximum disparity value |
warmup_iterations |
int | 5 | Model warmup runs |
Topics Subscribed:
left/image_raw(sensor_msgs/Image) - Left stereo imageright/image_raw(sensor_msgs/Image) - Right stereo image
Topics Published:
/stereo_depth/disparity(sensor_msgs/Image) - Colorized disparity map/stereo_depth/depth(sensor_msgs/Image) - Depth image (meters)
Model Specifications:
- Input: 1200×1200 stereo pair
- Output: Dense disparity map
- Architecture: LightStereo-S (KITTI trained)
Real-time object detection using YOLOv8 with TensorRT and multi-object tracking.
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | detect.engine |
TensorRT engine path |
labels |
string | labels_detect.txt |
Class labels file |
thresholds |
float[] | [0.40, 0.45, ...] | Per-class confidence thresholds |
track_frame_rate |
int | 30 | Tracking frame rate |
track_buffer |
int | 30 | Lost track buffer size |
Detected Classes:
| ID | Class | Threshold |
|---|---|---|
| 0 | Pedestrian | 0.45 |
| 1 | Cyclist | 0.45 |
| 2 | Vehicle-Car | 0.60 |
| 3 | Vehicle-Bus | 0.45 |
| 4 | Vehicle-Truck | 0.45 |
| 5 | Train | 0.50 |
| 6 | Traffic Light | 0.40 |
| 7 | Traffic Sign | 0.55 |
Topics Subscribed:
/detect/image_in(sensor_msgs/Image) - Input image
Topics Published:
/detect/detections(visionconnect/Detect) - Detection results with tracking/detect/signs(visionconnect/Signs) - Cropped traffic signs for classification
Tracking Features:
- BYTE tracker with Kalman filter prediction
- Unique ID assignment per tracked object
- ID format:
{ClassName}_{ID}(e.g.,Car_001,Pedestrian_003)
Classifies detected traffic signs and lights into 50+ categories.
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | classify.engine |
TensorRT engine path |
labels |
string | labels_classify.txt |
Class labels file |
thresholds |
float[] | [0.30, 0.75] | Traffic light/sign thresholds |
Supported Sign Categories:
- Traffic Lights: Red, Yellow, Green
- Regulatory Signs: Stop, Yield, Speed Limits (15-70 mph), No Entry, No U-Turn, etc.
- Warning Signs: Curve Ahead, Intersection, School Zone, Road Work, etc.
- Guide Signs: Lane Markers, Merge, Highway Signs
Topics Subscribed:
/classify/signs_in(visionconnect/Signs) - Cropped sign images
Topics Published:
/classify/signs(visionconnect/Signs) - Classified signs with labels
Detects and segments lane lines using neural network inference.
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | lane_detect.engine |
TensorRT engine path |
Topics Subscribed:
/lanedet/image_in(sensor_msgs/Image) - Input image
Topics Published:
/lanedet/lanes(visionconnect/Lanes) - Detected lane dataxs,ys: Lane point coordinatesprobs: Lane confidence (4 lanes max)num_lanes: Number of detected laneslaneimg: Visualization overlay
Output:
- Up to 4 lane lines detected
- Polyline representation with confidence scores
- Segmentation mask overlay
TensorRT-accelerated driver attention monitoring using face detection and gaze estimation.
| Parameter | Type | Default | Description |
|---|---|---|---|
face_engine |
string | yolov11n_face_fp16.engine |
Face detection model |
gaze_engine |
string | resnet18_gaze_fp16.engine |
Gaze estimation model |
camera_topic |
string | /camera/raw |
Input camera topic |
confidence |
float | 0.5 | Face detection threshold |
Driver States:
| State | Condition | Alert |
|---|---|---|
ALERT |
Face detected, gaze forward | No |
DISTRACTED |
Gaze >30° off-center for 2s | Yes |
DROWSY |
Eyes closed (future) | Yes |
NO_DRIVER |
No face detected for 1s | Yes |
Topics Subscribed:
/camera/raw(sensor_msgs/Image) - Driver-facing camera
Topics Published:
/driver_monitor/image(sensor_msgs/Image) - Annotated output with gaze arrow/driver_monitor/state(std_msgs/String) - Current driver state/driver_monitor/alert(std_msgs/Bool) - Alert flag
Models:
- Face Detection: YOLOv11-nano (640×640 input, 8400 detections)
- Gaze Estimation: ResNet18 (448×448 input, pitch/yaw angles)
Advanced Driver Assistance System alerts based on lane and detection data.
Topics Subscribed:
/adas/lanes_in(visionconnect/Lanes) - Lane detection data
Topics Published:
/adas/adas_alerts(visionconnect/ADAS) - ADAS warnings
Alerts:
- Lane departure warning
- Forward collision warning (with depth data)
Sensor fusion for IMU and GPS data (BNO055 + GPS module).
Topics Published:
/imu_gps/imu/data(sensor_msgs/Imu) - IMU orientation and acceleration/imu_gps/gps/fix(sensor_msgs/NavSatFix) - GPS coordinates
Real-time data fusion display with multi-panel layout.
Layout:
┌────────────────────────────┬─────────────────┐
│ │ Driver Monitor │
│ │ (1/3 × 1/3) │
│ Main View ├─────────────────┤
│ (2/3 × Full Height) │ Stereo Depth │
│ │ (1/3 × 1/3) │
│ Object Detection + ├─────────────────┤
│ Lane Overlay + │ Summary │
│ Traffic Signs │ (1/3 × 1/3) │
│ │ Speed/GPS/IMU │
└────────────────────────────┴─────────────────┘
Topics Subscribed:
/gui/image_in- Main camera feed/gui/detect_in- Detection results/gui/signs_in- Classified signs/gui/lanes_in- Lane detection/gui/adas_in- ADAS alerts/driver_monitor/image- Driver monitor feed/stereo_depth/disparity- Depth visualization/imu_gps/imu/data- IMU data/imu_gps/gps/fix- GPS coordinates
Web-based monitoring interface accessible via browser.
Access: http://<jetson-ip>:8080
Features:
- Live video stream
- Detection statistics
- System status
git clone https://github.com/connected-wise/VisionSense.git
cd VisionSenseBuild OpenCV from source with CUDA acceleration (required for Jetson):
sudo bash install_opencv_cuda_orin.shNote: This process takes 2-3 hours. The script will:
- Install all OpenCV build dependencies
- Download and compile the latest OpenCV with CUDA 12.6 support
- Configure for Jetson Orin (compute capability 8.7)
Install ROS2 Humble, jetson-inference, and all required libraries:
sudo bash install_all_deps.shThis script installs:
- ROS2 Humble desktop and vision packages
- Build tools (cmake, colcon, etc.)
- jetson-inference library
- Python dependencies (numpy, pyserial)
- System libraries (Eigen3, Qt5, V4L utilities)
source /opt/ros/humble/setup.bash
colcon build --packages-select visionconnectDouble-click the VisionSense icon on the desktop.
source /opt/ros/humble/setup.bash
cd ~/VisionSense && source install/setup.bash
ros2 launch visionconnect visionsense.launch.pyros2 run visionconnect camera
ros2 run visionconnect detect
ros2 run visionconnect guiEdit config/config.yaml:
sensors:
uv_camera: true # Mono camera for driver monitoring
zed_camera: true # Stereo camera
gps_module: true # GPS/IMU module
camera:
ros__parameters:
resource: "csi://0"
width: 1280
height: 720
camera_stereo:
ros__parameters:
resource: "/dev/video1"
width: 3840
height: 1200
rotated_lenses: true
detect:
ros__parameters:
model: "detect.engine"
thresholds: [0.40, 0.45, 0.45, 0.6, 0.45, 0.45, 0.5, 0.40, 0.55]
driver_monitor:
ros__parameters:
face_engine: "/path/to/yolov11n_face_fp16.engine"
gaze_engine: "/path/to/resnet18_gaze_fp16.engine"
confidence: 0.5| Model | Purpose | Input Size | Format |
|---|---|---|---|
detect.engine |
Object Detection | 640×640 | TensorRT FP16 |
classify.engine |
Sign Classification | 224×224 | TensorRT FP16 |
lane_detect.engine |
Lane Detection | 800×288 | TensorRT FP16 |
LightStereo-S-KITTI.engine |
Stereo Depth | 1200×1200 | TensorRT FP16 |
yolov11n_face_fp16.engine |
Face Detection | 640×640 | TensorRT FP16 |
resnet18_gaze_fp16.engine |
Gaze Estimation | 448×448 | TensorRT FP16 |
/camera/raw - Mono camera output
/camera_stereo/left/image_raw - Left stereo image
/camera_stereo/right/image_raw - Right stereo image
/stereo_depth/disparity - Depth visualization
/detect/detections - Object detections with tracking
/detect/signs - Detected traffic signs
/classify/signs - Classified traffic signs
/lanedet/lanes - Lane detection results
/driver_monitor/image - Driver monitoring visualization
/driver_monitor/state - Driver state (ALERT/DISTRACTED/etc)
/adas/adas_alerts - ADAS warnings
/imu_gps/imu/data - IMU sensor data
/imu_gps/gps/fix - GPS coordinates
/gui/fusion - Fused visualization output
# List available cameras
v4l2-ctl --list-devices
# Test stereo camera
gst-launch-1.0 v4l2src device=/dev/video1 ! videoconvert ! autovideosink# Clean rebuild
rm -rf build install log
colcon build --packages-select visionconnectEnsure models are built for your specific Jetson platform (engine files are not portable).
VisionSense is licensed for non-commercial research and educational use only.
✅ Allowed: Research, education, testing, developing your own technologies ❌ Not Allowed: Commercial use, integration into products, offering as a service 💼 Commercial License: Contact licensing@connectedwise.com
See LICENSE for full terms.
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Commit changes (
git commit -m 'feat: add feature') - Push to branch (
git push origin feature/my-feature) - Open a Pull Request
VisionSense - Autonomous Vehicle Vision System
© 2025 ConnectedWise
