princep / ThingTank Public

ThingTank

Object detection
TotTalk Speech processing

About this project

TotTalk Box

TotTalk helps toddlers learn to speak by pairing object recognition with real-time pronunciation coaching — everything runs offline on affordable hardware.

DEMO

VIDEO DEMO main.png

IMPLEMENTATION LINK

Table of Contents

Overview

TotTalk uses the world around a toddler to introduce vocabulary, reinforce pronunciation, and keep learning fun. When a child presents an everyday object, the box recognizes it, introduces the word, listens to the child's attempt, and provides gentle feedback.

Everything runs locally on a Qualcomm RubikPi 3 so families keep complete control over their data. Vision, audio, UI, and speech feedback all execute on-device without any cloud dependency.

Core Workflow

https://github.com/PrinceP/tottalk-box/tree/main?tab=readme-ov-file#core-workflow

  1. The child shows any toy or household item to the camera.
  2. TotTalk identifies the object, announces the word, and prompts the child to repeat it.
  3. Whisper-based speech recognition monitors pronunciation, offers retries when needed, and celebrates correct repetitions.

Demo1 Demo2 Demo3

Highlights

  • Real-world vocabulary: Builds a personalized dictionary around objects the child already loves.
  • Active engagement: Encourages movement and call-and-response instead of passive screen time.
  • Robust toddler speech handling: Whisper.cpp pipeline tolerates silence, mumbles, and multiple attempts within edge constraints.
  • Low-latency inference: Computer vision, audio processing, and UI run together on the RubikPi 3.
  • Completely offline: No cloud services—vision, transcription, and feedback logic all execute locally for privacy.

Edge Impulse Development Workflow

We rely on Edge Impulse to capture data, design the computer vision impulse, and deploy an optimized model to the RubikPi 3.

1. Prepare the Project & Dataset

  • Sign in to Edge Impulse Studio and create a project dedicated to TotTalk object recognition.
  • Connect the device with edge-impulse-linux to stream camera frames directly into the Data acquisition tab.
  • Label each sample so that every object class remains balanced; the studio supports bounding-box and image labeling workflows.
  • Maintain an 85% / 15% train-test split to keep hold-out data for unbiased evaluation.

Reference: Edge Impulse data acquisition and labeling tools.

2. Design the Impulse

  • In the Impulse design view, add an Image processing block to resize incoming frames (e.g., 96×96 RGB) and enable automatic normalisation.
  • Add a Transfer Learning learning block; MobileNetV2/ResNet-based backbones usually offer a strong accuracy-to-latency balance on the RubikPi 3.
  • Configure data augmentation (random flip, crop, color shift) so the model generalizes to varied backgrounds and lighting.

Reference: Edge Impulse impulse design.

3. Train & Evaluate the Model

  • Launch training with appropriate hyperparameters (learning-rate scheduler, 20–50 epochs, early stopping).
  • Use the Model testing tab to validate accuracy on the held-out dataset and inspect the per-class confusion matrix.
  • Iterate on data balance, augmentation, or impulse configuration when misclassifications appear.

Reference: Edge Impulse model training and model testing.

4. Deploy & Integrate

  • Generate a Linux eim package or TensorFlow Lite file from the Deployment tab.
  • Install the Edge Impulse Linux SDK for Python to run inference alongside the TotTalk UI.
  • Stream inference results over HTTP or pass them directly into the TotTalk feedback loop for speech prompts.

Reference: Edge Impulse Linux deployment and Linux SDK for Python.

Dataset Summary

We currently track 256 labeled images across 20 object classes. 85% of the samples train the model and 15% remain in the testing set.

Label Emoji
jcb 🚜
monkey 🐒
bat 🦇
bicycle 🚲
candle 🕯️
car 🚗
chair 🪑
comb 💇
elephant 🐘
glass 🥛
green ball 🟢
guitar 🎸
helmet ⛑️
kiwi 🥝
minion 🤖
octopus 🐙
slippers 🥿
stool 🪑
tiger 🐯
tortoise 🐢

Hardware Setup

https://github.com/PrinceP/tottalk-box/tree/main?tab=readme-ov-file#hardware-setup

Bill of Materials

  • RubikPi 3 (Qualcomm QCS6490 SoC)
  • 7″ LCD display (1024×600)
  • Two 5 W speakers
  • Logitech C270 HD webcam (640×480 with integrated microphone)

image1 image2 image3 image4

Device Preparation

Flash Canonical Ubuntu 24.04 using Qualcomm Launcher

https://github.com/PrinceP/tottalk-box/tree/main?tab=readme-ov-file#flash-canonical-ubuntu-2404-using-qualcomm-launcher

Qualcomm® Launcher (see Thundercomm documentation) streamlines flashing Canonical Ubuntu 24.04 Server onto the RubikPi 3. Follow the official walkthrough to install Renesas USB firmware and replace the stock image.

Upgrade to the Latest Canonical Ubuntu Build

Update the board to the most recent certified packages:

sudo apt upgrade -y
git clone -b ubuntu_setup --single-branch https://github.com/rubikpi-ai/rubikpi-script.git
cd rubikpi-script
./install_ppa_pkgs.sh

The helper script installs:

gstreamer1.0-plugins-base-apps, gstreamer1.0-qcom-python-examples, gstreamer1.0-qcom-sample-apps,
gstreamer1.0-tools, libqnn-dev, libsnpe-dev, qcom-adreno1, qcom-fastcv-binaries-dev,
qcom-libdmabufheap-dev, qcom-sensors-test-apps, qcom-video-firmware, qnn-tools, snpe-tools,
tensorflow-lite-qcom-apps, weston-autostart, xwayland, Rubikpi3 camera packages, wiringrp, wiringrp_python,
and tooling such as ffmpeg, net-tools, pulseaudio-utils, python3-pip, selinux-utils, unzip, v4l-utils.

Validate the Platform

cat /etc/os-release
uname -a

Expected output confirms Ubuntu 24.04.2 LTS and the Qualcomm-specific kernel (Linux ubuntu 6.8.0-1055-qcom ...).

Install the Edge Impulse CLI

wget https://cdn.edgeimpulse.com/firmware/linux/setup-edge-impulse-qc-linux.sh
sh setup-edge-impulse-qc-linux.sh
edge-impulse-linux

Follow the browser link presented in the terminal to authenticate the device with Edge Impulse Studio.

Software Installation

Install Drivers, AI Engine Direct, and the IM-SDK

  1. Base tooling:
sudo apt update
sudo apt install -y unzip wget curl python3 python3-pip python3-venv software-properties-common
  1. Qualcomm AI Engine Direct SDK and GStreamer components:
if [ ! -f /etc/apt/sources.list.d/ubuntu-qcom-iot-ubuntu-qcom-ppa-noble.list ]; then
    sudo apt-add-repository -y ppa:ubuntu-qcom-iot/qcom-ppa
fi

sudo apt update
sudo apt install -y gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-base \
    gstreamer1.0-plugins-base-apps gstreamer1.0-plugins-qcom-good gstreamer1.0-qcom-sample-apps \
    libqnn1 libsnpe1 libqnn-dev libsnpe-dev
  1. OpenCL GPU drivers:
sudo apt update
sudo apt install -y clinfo qcom-adreno1

if [ ! -f /usr/lib/libOpenCL.so ]; then
    sudo ln -s /lib/aarch64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so
fi

sudo reboot
clinfo
# Expected:
#   Number of platforms: 1
#   Platform Name: QUALCOMM Snapdragon(TM)
#   Platform Version: OpenCL 3.0 QUALCOMM build: 0808.0.7

Visual References

The screenshots below are explained with steps to reproduce them in Edge Impulse Studio.

Data Collection

https://github.com/PrinceP/tottalk-box/tree/main?tab=readme-ov-file#data-collection

Data Setup1

We can see the devices connected to Edge Impulse Studio. Phone was used to capture the data directly into the studio.

We can see the data collected in the Data Acquisition tab. There are 256 items. Each item has a bounding box around it. We can see the data distribution also which provides a good overview of the data and about data imbalance.

We can see the data labeling in action. The UI is very intuitive and easy to use. Data Setup2

Impulse Design

https://github.com/PrinceP/tottalk-box/tree/main?tab=readme-ov-file#impulse-design

Design the impulse (signal processing + learning block) that powers detection:

  • Go to Impulse design → Create impulse.
  • Click Add a processing block and choose Image (preprocess + normalize).
  • Click Add a learning block and choose Object Detection (Images).
  • Set image size to 320×320 and click Save impulse. Impulse

Feature Extraction

  • Navigate to Impulse design → Image.
  • Set Color depth to RGB and click Save parameters.
  • On the next page, click Generate features. This typically takes a few minutes.

RAMusage

Model Training

  • Go to Impulse design → Object Detection.
  • Advanced training settings: No color space augmentation (to preserve colored object cues).
  • Choose the latest YOLO‑Pro model and click Save & train.
  • After training, review the metrics and confusion matrix. In our run we observed ~97% precision on the training set (results vary by dataset).

Testing

Model Deployment

https://github.com/PrinceP/tottalk-box/tree/main?tab=readme-ov-file#model-deployment

  • Open the Deployment tab.
  • Select Linux (AARCH64 with Qualcomm QNN) to run on RubikPi 3’s Qualcomm AI accelerator.
  • Model optimizations: Quantized (int8), since float32 is not supported for this target.
  • Click Build to compile and download the EIM (Edge Impulse Model) binary.

Deployment

Application

We integrate the Edge Impulse Linux SDK for Python to run inference on webcam frames and feed detections into the TotTalk feedback loop:

Application Setup

Create an isolated environment and install TotTalk dependencies:

python3 -m venv .venv-totalk-box --system-site-packages
source .venv-totalk-box/bin/activate
pip3 install ai-edge-litert==1.3.0 Pillow
pip3 install opencv-python

Install additional system packages:

sudo apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0
sudo apt install python3-venv python3-full
sudo apt install -y pkg-config cmake libcairo2-dev
sudo apt install libgirepository1.0-dev gir1.2-glib-2.0
sudo apt install build-essential python3-dev python3-pip pkg-config meson
sudo apt install fonts-noto-color-emoji
sudo apt install pulseaudio pulseaudio-utils
sudo apt install espeak-ng

Running the Application

python3 class_gallery.py

The UI launches on the RubikPi 3 display, listens for camera events, and streams inference results into the speech feedback loop.

Troubleshooting

Encountering an issue? Capture logs, hardware info, and reproduction steps, then open an issue in the repository.

FAQ

For common questions, start a discussion or open a question in the repository.

Contributing

Pull requests are welcome! Please fork the project, create a descriptive branch, and submit a PR once your changes are ready.

Additional Resources

Elephant.68uhfvoe
Glass.68ui72dn
Octopus.68uhklqn
Teddy bear.68uhdkqp
Stool.68ui41uq
Octopus.68uhl1m2
Kiwi.68uho01i
JCB.68uhauq5

Run this model

On any device

Dataset summary

Data collected
256 items
Labels
bat, bicycle, candle, car, chair and 15 others

Project info

Project ID 794068
License 3-Clause BSD
No. of views 657
No. of clones 0