This is a public Edge Impulse project, use the navigation bar to see all data and models in this project; or clone to retrain or deploy to any edge device.
ThingTank
About this project
TotTalk Box
TotTalk helps toddlers learn to speak by pairing object recognition with real-time pronunciation coaching — everything runs offline on affordable hardware.
DEMO
Table of Contents
- Overview
- Core Workflow
- Highlights
- Edge Impulse Development Workflow
- Dataset Summary
- Hardware Setup
- Visual References
- Device Preparation
- Software Installation
- Application Setup
- Running the Application
- Troubleshooting
- FAQ
- Contributing
- Additional Resources
Overview
TotTalk uses the world around a toddler to introduce vocabulary, reinforce pronunciation, and keep learning fun. When a child presents an everyday object, the box recognizes it, introduces the word, listens to the child's attempt, and provides gentle feedback.
Everything runs locally on a Qualcomm RubikPi 3 so families keep complete control over their data. Vision, audio, UI, and speech feedback all execute on-device without any cloud dependency.
Core Workflow
https://github.com/PrinceP/tottalk-box/tree/main?tab=readme-ov-file#core-workflow
- The child shows any toy or household item to the camera.
- TotTalk identifies the object, announces the word, and prompts the child to repeat it.
- Whisper-based speech recognition monitors pronunciation, offers retries when needed, and celebrates correct repetitions.
Highlights
- Real-world vocabulary: Builds a personalized dictionary around objects the child already loves.
- Active engagement: Encourages movement and call-and-response instead of passive screen time.
- Robust toddler speech handling: Whisper.cpp pipeline tolerates silence, mumbles, and multiple attempts within edge constraints.
- Low-latency inference: Computer vision, audio processing, and UI run together on the RubikPi 3.
- Completely offline: No cloud services—vision, transcription, and feedback logic all execute locally for privacy.
Edge Impulse Development Workflow
We rely on Edge Impulse to capture data, design the computer vision impulse, and deploy an optimized model to the RubikPi 3.
1. Prepare the Project & Dataset
- Sign in to Edge Impulse Studio and create a project dedicated to TotTalk object recognition.
- Connect the device with
edge-impulse-linuxto stream camera frames directly into the Data acquisition tab. - Label each sample so that every object class remains balanced; the studio supports bounding-box and image labeling workflows.
- Maintain an 85% / 15% train-test split to keep hold-out data for unbiased evaluation.
Reference: Edge Impulse data acquisition and labeling tools.
2. Design the Impulse
- In the Impulse design view, add an
Imageprocessing block to resize incoming frames (e.g., 96×96 RGB) and enable automatic normalisation. - Add a Transfer Learning learning block; MobileNetV2/ResNet-based backbones usually offer a strong accuracy-to-latency balance on the RubikPi 3.
- Configure data augmentation (random flip, crop, color shift) so the model generalizes to varied backgrounds and lighting.
Reference: Edge Impulse impulse design.
3. Train & Evaluate the Model
- Launch training with appropriate hyperparameters (learning-rate scheduler, 20–50 epochs, early stopping).
- Use the Model testing tab to validate accuracy on the held-out dataset and inspect the per-class confusion matrix.
- Iterate on data balance, augmentation, or impulse configuration when misclassifications appear.
Reference: Edge Impulse model training and model testing.
4. Deploy & Integrate
- Generate a Linux
eimpackage or TensorFlow Lite file from the Deployment tab. - Install the Edge Impulse Linux SDK for Python to run inference alongside the TotTalk UI.
- Stream inference results over HTTP or pass them directly into the TotTalk feedback loop for speech prompts.
Reference: Edge Impulse Linux deployment and Linux SDK for Python.
Dataset Summary
We currently track 256 labeled images across 20 object classes. 85% of the samples train the model and 15% remain in the testing set.
| Label | Emoji |
|---|---|
| jcb | 🚜 |
| monkey | 🐒 |
| bat | 🦇 |
| bicycle | 🚲 |
| candle | 🕯️ |
| car | 🚗 |
| chair | 🪑 |
| comb | 💇 |
| elephant | 🐘 |
| glass | 🥛 |
| green ball | 🟢 |
| guitar | 🎸 |
| helmet | ⛑️ |
| kiwi | 🥝 |
| minion | 🤖 |
| octopus | 🐙 |
| slippers | 🥿 |
| stool | 🪑 |
| tiger | 🐯 |
| tortoise | 🐢 |
Hardware Setup
https://github.com/PrinceP/tottalk-box/tree/main?tab=readme-ov-file#hardware-setup
Bill of Materials
- RubikPi 3 (Qualcomm QCS6490 SoC)
- 7″ LCD display (1024×600)
- Two 5 W speakers
- Logitech C270 HD webcam (640×480 with integrated microphone)
Device Preparation
Flash Canonical Ubuntu 24.04 using Qualcomm Launcher
Qualcomm® Launcher (see Thundercomm documentation) streamlines flashing Canonical Ubuntu 24.04 Server onto the RubikPi 3. Follow the official walkthrough to install Renesas USB firmware and replace the stock image.
- Complete instructions: https://www.thundercomm.com/rubik-pi-3/en/docs/rubik-pi-3-user-manual/1.0.0-u/Update-Software/3.2.Flash-using-Qualcomm-Launcher
Upgrade to the Latest Canonical Ubuntu Build
Update the board to the most recent certified packages:
sudo apt upgrade -y
git clone -b ubuntu_setup --single-branch https://github.com/rubikpi-ai/rubikpi-script.git
cd rubikpi-script
./install_ppa_pkgs.sh
The helper script installs:
gstreamer1.0-plugins-base-apps, gstreamer1.0-qcom-python-examples, gstreamer1.0-qcom-sample-apps,
gstreamer1.0-tools, libqnn-dev, libsnpe-dev, qcom-adreno1, qcom-fastcv-binaries-dev,
qcom-libdmabufheap-dev, qcom-sensors-test-apps, qcom-video-firmware, qnn-tools, snpe-tools,
tensorflow-lite-qcom-apps, weston-autostart, xwayland, Rubikpi3 camera packages, wiringrp, wiringrp_python,
and tooling such as ffmpeg, net-tools, pulseaudio-utils, python3-pip, selinux-utils, unzip, v4l-utils.
Validate the Platform
cat /etc/os-release
uname -a
Expected output confirms Ubuntu 24.04.2 LTS and the Qualcomm-specific kernel (Linux ubuntu 6.8.0-1055-qcom ...).
Install the Edge Impulse CLI
wget https://cdn.edgeimpulse.com/firmware/linux/setup-edge-impulse-qc-linux.sh
sh setup-edge-impulse-qc-linux.sh
edge-impulse-linux
Follow the browser link presented in the terminal to authenticate the device with Edge Impulse Studio.
Software Installation
Install Drivers, AI Engine Direct, and the IM-SDK
- Base tooling:
sudo apt update
sudo apt install -y unzip wget curl python3 python3-pip python3-venv software-properties-common
- Qualcomm AI Engine Direct SDK and GStreamer components:
if [ ! -f /etc/apt/sources.list.d/ubuntu-qcom-iot-ubuntu-qcom-ppa-noble.list ]; then
sudo apt-add-repository -y ppa:ubuntu-qcom-iot/qcom-ppa
fi
sudo apt update
sudo apt install -y gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-base \
gstreamer1.0-plugins-base-apps gstreamer1.0-plugins-qcom-good gstreamer1.0-qcom-sample-apps \
libqnn1 libsnpe1 libqnn-dev libsnpe-dev
- OpenCL GPU drivers:
sudo apt update
sudo apt install -y clinfo qcom-adreno1
if [ ! -f /usr/lib/libOpenCL.so ]; then
sudo ln -s /lib/aarch64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so
fi
sudo reboot
clinfo
# Expected:
# Number of platforms: 1
# Platform Name: QUALCOMM Snapdragon(TM)
# Platform Version: OpenCL 3.0 QUALCOMM build: 0808.0.7
Visual References
The screenshots below are explained with steps to reproduce them in Edge Impulse Studio.
Data Collection
https://github.com/PrinceP/tottalk-box/tree/main?tab=readme-ov-file#data-collection
We can see the devices connected to Edge Impulse Studio. Phone was used to capture the data directly into the studio.
We can see the data collected in the Data Acquisition tab. There are 256 items. Each item has a bounding box around it. We can see the data distribution also which provides a good overview of the data and about data imbalance.
We can see the data labeling in action. The UI is very intuitive and easy to use.
Impulse Design
https://github.com/PrinceP/tottalk-box/tree/main?tab=readme-ov-file#impulse-design
Design the impulse (signal processing + learning block) that powers detection:
- Go to Impulse design → Create impulse.
- Click Add a processing block and choose Image (preprocess + normalize).
- Click Add a learning block and choose Object Detection (Images).
- Set image size to 320×320 and click Save impulse.
Feature Extraction
- Navigate to Impulse design → Image.
- Set Color depth to RGB and click Save parameters.
- On the next page, click Generate features. This typically takes a few minutes.
Model Training
- Go to Impulse design → Object Detection.
- Advanced training settings: No color space augmentation (to preserve colored object cues).
- Choose the latest YOLO‑Pro model and click Save & train.
- After training, review the metrics and confusion matrix. In our run we observed ~97% precision on the training set (results vary by dataset).
Model Deployment
https://github.com/PrinceP/tottalk-box/tree/main?tab=readme-ov-file#model-deployment
- Open the Deployment tab.
- Select Linux (AARCH64 with Qualcomm QNN) to run on RubikPi 3’s Qualcomm AI accelerator.
- Model optimizations: Quantized (int8), since float32 is not supported for this target.
- Click Build to compile and download the EIM (Edge Impulse Model) binary.
Application
We integrate the Edge Impulse Linux SDK for Python to run inference on webcam frames and feed detections into the TotTalk feedback loop:
- Install the SDK following: https://docs.edgeimpulse.com/tools/libraries/sdks/inference/linux/python
- Integrate detections with the UI/audio pipeline to drive prompts and feedback in real time.
Application Setup
Create an isolated environment and install TotTalk dependencies:
python3 -m venv .venv-totalk-box --system-site-packages
source .venv-totalk-box/bin/activate
pip3 install ai-edge-litert==1.3.0 Pillow
pip3 install opencv-python
Install additional system packages:
sudo apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0
sudo apt install python3-venv python3-full
sudo apt install -y pkg-config cmake libcairo2-dev
sudo apt install libgirepository1.0-dev gir1.2-glib-2.0
sudo apt install build-essential python3-dev python3-pip pkg-config meson
sudo apt install fonts-noto-color-emoji
sudo apt install pulseaudio pulseaudio-utils
sudo apt install espeak-ng
Running the Application
python3 class_gallery.py
The UI launches on the RubikPi 3 display, listens for camera events, and streams inference results into the speech feedback loop.
Troubleshooting
Encountering an issue? Capture logs, hardware info, and reproduction steps, then open an issue in the repository.
FAQ
For common questions, start a discussion or open a question in the repository.
Contributing
Pull requests are welcome! Please fork the project, create a descriptive branch, and submit a PR once your changes are ready.
Additional Resources
- Edge Impulse data workflow: https://docs.edgeimpulse.com/docs/edge-impulse-studio/data-acquisition
- Edge Impulse impulse design: https://docs.edgeimpulse.com/docs/edge-impulse-studio/impulse-design
- Edge Impulse model training: https://docs.edgeimpulse.com/docs/edge-impulse-studio/model-training
- Edge Impulse Linux SDK (Python): https://docs.edgeimpulse.com/tools/libraries/sdks/inference/linux/python
- Thundercomm RubikPi 3 flashing guide: https://www.thundercomm.com/rubik-pi-3/en/docs/rubik-pi-3-user-manual/1.0.0-u/Update-Software/3.2.Flash-using-Qualcomm-Launcher
- Qualcomm AI Engine Direct setup notes: https://qc-ai-test.gitbook.io/qc-ai-test-docs/device-setup/rubik-pi3
Run this model
Dataset summary
Data collected
256 itemsLabels
bat, bicycle, candle, car, chair and 15 othersProject info
| Project ID | 794068 |
| License | 3-Clause BSD |
| No. of views | 657 |
| No. of clones | 0 |