University Program- IISc Bangalore: Edge AI / iSign2Text: Real-Time Portable Sign Language-to-Text Translator Public

Target: Cortex-M4F 80MHz Clone this project

Welcome to Edge Impulse, the largest community of edge AI developers!
This is a public Edge Impulse project, use the navigation bar to see all data and models in this project; or clone to retrain or deploy to any edge device.

iSign2Text: Real-Time Portable Sign Language-to-Text Translator

Object detection

About this project

iSign2Text: Real-Time Portable Sign Language-to-Text Translator ✋➡️📱

Welcome to iSign2Text — a real-time, on-device Indian Sign Language (ISL) recognition system built using Edge Impulse and the Arduino Nicla Vision.

The goal of this project is simple but powerful:
👉 Let a tiny edge device watch ISL hand signs
👉 Recognize them using a tiny ML model
👉 Stream the predictions live to any phone or laptop browser — no cloud, no GPU, no internet required (beyond local Wi-Fi).

🌟 Project Overview

iSign2Text is a portable sign-language translator designed to improve accessibility in everyday environments such as classrooms, clinics, or homes.

What it does:

Captures live video using the Nicla Vision’s onboard camera
Runs a tiny, optimized MobileNet model directly on the microcontroller
Detects 15 common ISL hand gestures in real time
Overlays gesture labels on the video stream
Serves the annotated video over Wi-Fi via an MJPEG server, so you can view it in any browser (phone or laptop)

All of this runs fully offline on a low-power embedded device. No cloud calls. No external servers. Just your Nicla Vision and a Wi-Fi network.

🧩 Use Cases

iSign2Text is a proof-of-concept system that can be extended and adapted for:

Demonstrations of Edge AI for accessibility
Classroom teaching (AI, embedded ML, human–computer interaction)
Assistive tools in low-connectivity environments
Rapid prototyping of Sign2Text/Sign2Speech solutions on embedded devices

📂 Dataset

We trained iSign2Text on a custom ISL hand-sign dataset collected directly with the Nicla Vision.

Dataset details:

Total images: 3,036
Number of classes: 15 ISL gestures
Images per class: ~200
Participants: 4 users
Capture device: Arduino Nicla Vision (OpenMV capture script)
Image resolution (raw): 240×240
Annotation: Manually labeled using the Edge Impulse Labeling UI

Classes (15 ISL gestures):

agree, angry, bad, come, fine, go, happy, hello, how, hungry, me, please, sorry, thank, you

Each gesture was recorded under different lighting conditions and hand positions to improve robustness in real-world scenarios.

🧠 Model Information

The core of iSign2Text is a TinyML model optimized for embedded deployment.

Model architecture & settings:

Backbone: MobileNetV2 0.35
Input size: 96×96
Color mode: Grayscale
Quantization: INT8 (8-bit integer)

Performance:

Validation Accuracy: 93.1%
Test Accuracy: 90.17%
Precision: 0.96
Recall: 0.90
F1 Score: 0.93

On-device performance (Nicla Vision):

Inference time: ~691 ms per frame
Peak RAM usage: 137.8 KB
Flash usage: 81.9 KB
Model footprint: ~57 KB

This makes the model small enough and fast enough to run live on the Arduino Nicla Vision without additional hardware acceleration.

⚙️ Features & Functionality

Once deployed, iSign2Text provides:

✅ Real-time gesture recognition for 15 ISL hand signs
✅ On-frame overlays:
- Bounding box around the detected gesture
- Gesture label (e.g., hello, thank, hungry)
✅ MJPEG video streaming over Wi-Fi:
- View the live annotated feed from any browser
✅ Optional display mode:
- Attach a small display module to show video + label directly on the device

In short: point the Nicla Vision at the signer → read the sign name on your phone screen.

🛠️ Hardware & Software Requirements

Hardware:

Arduino Nicla Vision
USB cable (for flashing firmware & power)
Optional: small external display (e.g., SPI display) for on-device output
A Wi-Fi network (e.g., mobile hotspot)

Software / Tools:

Edge Impulse account
Edge Impulse CLI (optional, for advanced usage)
OpenMV firmware flashed on Nicla Vision
Arduino / OpenMV IDE (depending on your workflow)

🚀 Deployment Workflow

Here’s how to go from model design to a live demo:

Model Training (Edge Impulse)
- Import the collected images into your Edge Impulse project.
- Configure the impulse:
  - Image preprocessing → 96×96 grayscale
  - MobileNetV2 0.35 classifier
- Train the model and verify metrics (accuracy, confusion matrix, etc.).
Export Model as OpenMV Library
- In Edge Impulse, go to Deployment.
- Select OpenMV Library as the target.
- Download the generated ZIP with:
  - trained.tflite
  - labels.txt
  - Example OpenMV script
Flash to Nicla Vision
- Flash the OpenMV firmware onto the Nicla Vision (if not already done).
- Copy:
  - trained.tflite
  - labels.txt
  - Your modified inference script (e.g., main.py)
- Save to the Nicla Vision.
Connect to Wi-Fi
- In your script, set the SSID and password of your hotspot or router.
- On boot, the Nicla Vision:
  - Connects to Wi-Fi
  - Starts an MJPEG streaming server
View the Live Stream
- Find the device IP (printed in the serial console or known from your router).
- In your smartphone/PC browser, visit:
  http://<device-ip>:8080
- You should now see the live camera feed with gesture labels overlaid.
Interact & Test
- Show one of the 15 ISL signs to the camera.
- Watch the recognized gesture label appear on the stream ✨

🔍 How It Works (High Level)

Capture
The Nicla Vision captures frames from its onboard camera.
Preprocess
Each frame is resized and converted to a 96×96 grayscale image.
Inference
The MobileNetV2 model (INT8) runs on the microcontroller, outputting a probability distribution over the 15 classes.
Post-process
The top class is selected and mapped to a human-readable gesture label (from labels.txt).
Render & Stream
The label is drawn onto the video frame, and the annotated frames are streamed as MJPEG via a simple HTTP server.

🔭 Future Extensions

This project is just the starting point. Possible future directions:

Add more ISL gestures and expand the vocabulary
Support two-hand signs and more complex poses
Move from isolated signs to continuous sign sequences
Add on-device speech synthesis (Sign2Speech)
- e.g., mapping prediction → audio output
Extend to other sign languages (ASL, BSL, etc.) with new datasets
Improve latency with:
- Model pruning / architecture tweaks
- Better memory and frame-rate optimization