University Program- IISc Bangalore: Edge AI / iSign2Text: Real-Time Portable Sign Language-to-Text Translator Public

iSign2Text: Real-Time Portable Sign Language-to-Text Translator

Object detection

About this project

iSign2Text: Real-Time Portable Sign Language-to-Text Translator ✋➡️📱

Welcome to iSign2Text — a real-time, on-device Indian Sign Language (ISL) recognition system built using Edge Impulse and the Arduino Nicla Vision.

The goal of this project is simple but powerful:
👉 Let a tiny edge device watch ISL hand signs
👉 Recognize them using a tiny ML model
👉 Stream the predictions live to any phone or laptop browser — no cloud, no GPU, no internet required (beyond local Wi-Fi).


🌟 Project Overview

iSign2Text is a portable sign-language translator designed to improve accessibility in everyday environments such as classrooms, clinics, or homes.

What it does:

  • Captures live video using the Nicla Vision’s onboard camera
  • Runs a tiny, optimized MobileNet model directly on the microcontroller
  • Detects 15 common ISL hand gestures in real time
  • Overlays gesture labels on the video stream
  • Serves the annotated video over Wi-Fi via an MJPEG server, so you can view it in any browser (phone or laptop)

All of this runs fully offline on a low-power embedded device. No cloud calls. No external servers. Just your Nicla Vision and a Wi-Fi network.


🧩 Use Cases

iSign2Text is a proof-of-concept system that can be extended and adapted for:

  • Demonstrations of Edge AI for accessibility
  • Classroom teaching (AI, embedded ML, human–computer interaction)
  • Assistive tools in low-connectivity environments
  • Rapid prototyping of Sign2Text/Sign2Speech solutions on embedded devices

📂 Dataset

We trained iSign2Text on a custom ISL hand-sign dataset collected directly with the Nicla Vision.

Dataset details:

  • Total images: 3,036
  • Number of classes: 15 ISL gestures
  • Images per class: ~200
  • Participants: 4 users
  • Capture device: Arduino Nicla Vision (OpenMV capture script)
  • Image resolution (raw): 240×240
  • Annotation: Manually labeled using the Edge Impulse Labeling UI

Classes (15 ISL gestures):

agree, angry, bad, come, fine, go, happy, hello, how, hungry, me, please, sorry, thank, you

Each gesture was recorded under different lighting conditions and hand positions to improve robustness in real-world scenarios.


🧠 Model Information

The core of iSign2Text is a TinyML model optimized for embedded deployment.

Model architecture & settings:

  • Backbone: MobileNetV2 0.35
  • Input size: 96×96
  • Color mode: Grayscale
  • Quantization: INT8 (8-bit integer)

Performance:

  • Validation Accuracy: 93.1%
  • Test Accuracy: 90.17%
  • Precision: 0.96
  • Recall: 0.90
  • F1 Score: 0.93

On-device performance (Nicla Vision):

  • Inference time: ~691 ms per frame
  • Peak RAM usage: 137.8 KB
  • Flash usage: 81.9 KB
  • Model footprint: ~57 KB

This makes the model small enough and fast enough to run live on the Arduino Nicla Vision without additional hardware acceleration.


⚙️ Features & Functionality

Once deployed, iSign2Text provides:

  • Real-time gesture recognition for 15 ISL hand signs
  • On-frame overlays:
    • Bounding box around the detected gesture
    • Gesture label (e.g., hello, thank, hungry)
  • MJPEG video streaming over Wi-Fi:
    • View the live annotated feed from any browser
  • Optional display mode:
    • Attach a small display module to show video + label directly on the device

In short: point the Nicla Vision at the signer → read the sign name on your phone screen.


🛠️ Hardware & Software Requirements

Hardware:

  • Arduino Nicla Vision
  • USB cable (for flashing firmware & power)
  • Optional: small external display (e.g., SPI display) for on-device output
  • A Wi-Fi network (e.g., mobile hotspot)

Software / Tools:

  • Edge Impulse account
  • Edge Impulse CLI (optional, for advanced usage)
  • OpenMV firmware flashed on Nicla Vision
  • Arduino / OpenMV IDE (depending on your workflow)

🚀 Deployment Workflow

Here’s how to go from model design to a live demo:

  1. Model Training (Edge Impulse)

    • Import the collected images into your Edge Impulse project.
    • Configure the impulse:
      • Image preprocessing → 96×96 grayscale
      • MobileNetV2 0.35 classifier
    • Train the model and verify metrics (accuracy, confusion matrix, etc.).
  2. Export Model as OpenMV Library

    • In Edge Impulse, go to Deployment.
    • Select OpenMV Library as the target.
    • Download the generated ZIP with:
      • trained.tflite
      • labels.txt
      • Example OpenMV script
  3. Flash to Nicla Vision

    • Flash the OpenMV firmware onto the Nicla Vision (if not already done).
    • Copy:
      • trained.tflite
      • labels.txt
      • Your modified inference script (e.g., main.py)
    • Save to the Nicla Vision.
  4. Connect to Wi-Fi

    • In your script, set the SSID and password of your hotspot or router.
    • On boot, the Nicla Vision:
      • Connects to Wi-Fi
      • Starts an MJPEG streaming server
  5. View the Live Stream

    • Find the device IP (printed in the serial console or known from your router).
    • In your smartphone/PC browser, visit:
      http://<device-ip>:8080
    • You should now see the live camera feed with gesture labels overlaid.
  6. Interact & Test

    • Show one of the 15 ISL signs to the camera.
    • Watch the recognized gesture label appear on the stream ✨

🔍 How It Works (High Level)

  1. Capture
    The Nicla Vision captures frames from its onboard camera.

  2. Preprocess
    Each frame is resized and converted to a 96×96 grayscale image.

  3. Inference
    The MobileNetV2 model (INT8) runs on the microcontroller, outputting a probability distribution over the 15 classes.

  4. Post-process
    The top class is selected and mapped to a human-readable gesture label (from labels.txt).

  5. Render & Stream
    The label is drawn onto the video frame, and the annotated frames are streamed as MJPEG via a simple HTTP server.


🔭 Future Extensions

This project is just the starting point. Possible future directions:

  • Add more ISL gestures and expand the vocabulary
  • Support two-hand signs and more complex poses
  • Move from isolated signs to continuous sign sequences
  • Add on-device speech synthesis (Sign2Speech)
    • e.g., mapping prediction → audio output
  • Extend to other sign languages (ASL, BSL, etc.) with new datasets
  • Improve latency with:
    • Model pruning / architecture tweaks
    • Better memory and frame-rate optimization

🙌 Contributing & Feedback

Have ideas to improve accuracy, speed, or usability?
Want to add new gestures or support another sign language?

  • Fork the project
  • Experiment with new Edge Impulse models
  • Submit PRs
  • Or just open an issue with suggestions and feedback

Your contributions can help make sign-language translation on the edge more inclusive and widely available. ❤️

00004_fine
00165_agree
00075_come
00167_fine
00001_angry
00036_fine
00119_please
00153_angry

Run this model

On any device

Dataset summary

Data collected
3,036 items
Labels
agree, angry, bad, come, fine and 10 others

Project info

Project ID 841431
License 3-Clause BSD
No. of views 36
No. of clones 0