This document details the software stack, algorithms, and logic flows powering the Genesis rover. It covers system architecture, autonomous navigation, AI interaction, and control logic.
- System Architecture
- Autonomous Navigation Algorithms
- AI & Interaction Stack
- Control Logic & Data Flow
- Streaming
The software is distributed across three processing units:
- Nvidia Jetson Nano: Handles high-level decision making, Computer Vision (RealSense), AI (Voice/Object Detection), and Host Communication.
- Arduino Mega 2560: Handles low-level motor control (PWM generation) and sensor interfacing (IMU, GPS, Magnetometer).
The rover navigates to specific waypoints using a continuous feedback loop.
1. Position Determination:
Latitude (
2. Bearing Calculation:
The required heading (
3. Distance Calculation (Haversine Formula):
The distance (
(Where $R \approx 6,371$ km)
4. Compass Alignment: The rover compares the calculated Bearing against the HMC5883L Magnetometer heading and rotates the chassis until they align.
Uses an Intel RealSense D415 Depth Camera. The processing pipeline is as follows:
- Preprocessing:
- Decimation Filter: Downsamples image to reduce noise.
- Spatial Filter: Smooths data and fills small holes.
- Thresholding: Pixels > 1.0 meter are filtered out to focus on immediate hazards.
- Canny Edge Detection:
- Gaussian Blur: Reduces noise.
- Gradient Calculation: Finds intensity changes.
- Non-Maximum Suppression: Thins edges.
- Double Thresholding: Classifies strong/weak edges.
- Hysteresis: Connects weak edges to strong edges.
- Avoidance Logic:
The system calculates the "Center of Mass" of detected obstacles to decide the steering direction:
- Left Obstacle: Rover turns Right.
- Right Obstacle: Rover turns Left.
- Clear Path: Rover moves Forward.
- Model: SSD-MobileNet-v2.
- Dataset: MS COCO (91 Classes including Person, Vehicle, Animal, Household items).
- Optimization: Uses TensorRT for real-time inference on the Jetson Nano.
The voice system enables hands-free control and Q&A.
-
ASR (Automatic Speech Recognition):
- QuartzNet-15x5: For full sentence transcription.
- MatchboxNet: For command classification (Keywords: "Stop", "Go", "Left", "Right").
-
TTS (Text-to-Speech):
- FastPitch: Generates MEL spectrograms.
- HiFiGAN: Vocoder that converts spectrograms to audio.
-
GUI Input: User presses
W,A,S,Don the Base Station. - Telemetry: Signal sent via 3DR 433MHz Radio.
- Decoding: Jetson Nano interprets signal.
-
Execution: Arduino Mega receives command
$\rightarrow$ PCA9685 PWM Driver$\rightarrow$ BTS7960 Motor Driver.
Video is streamed from the onboard webcam/RealSense to the Base Station using RTP (Real-time Transport Protocol) over Wi-Fi for low-latency feedback.