Tell Me A Story
A local-first system for capturing bedtime stories. Built for my daughter and me to understand our storytelling voices.
“Tell me a story. Tell me a story.
Tell me a storrryyyyyy!”
Origin
My daughter, Arti, shouts this sentence at me 1,000 times a day. Then at bedtime, I'm commanded to tell three more before she'll go to sleep. Folks, it's the end of the night. My brain's melted. I love my four-year-old but telling, "one more story!" feels like a type of tiny torture.
Still, when it's done, I'm always glad we share this ritual. And sometimes, I'm impressed by what we create together.
Pappy, the boy who looks like whatever food he wants to eat, usually a quesadilla. Too good, Arti! A perfectly executed Hero's Journey arc about a talking fork. Who knew it could be done?
Those bedtime stories are told and gone. I decided I wanted to keep them.
Why and for what, I'm not sure yet. One side of my brain complains, "Do we really have to digitize everything?" But a little voice insists, "Build the thing. Do it your way. Keep these stories. See what comes next when we get there." Ok...
The System
Guiding Principles
Immediate Connection
Removing friction brings creative choices you could not have imagined before.
Hard Fun
The best learning happens when you're building something you actually want.
Calm Technology
Technology can live on the periphery, there when you need it, and otherwise invisible.
The Arti Test
I only work on things for kids that I'd give my child.
Learning
Part of why I am excited about this project: I get to learn things I actually want to learn.
- AI-assisted development (Claude Code)
- Audio ML pipelines (Whisper, pyannote)
- IoT capture devices (ESP32)
- Edge ML deployment (Jetson, CUDA)
- Local-first architecture
When I realized this bedtime story project gave me a reason to explore NVIDIA's Cuda/Jetson stack, I was unreasonably happy. Robot-brain tech! Positronic! That's exciting. And I do want to have a voice in the rooms where these things that will interact with us - our families, our kids - get built.
AI & Kids
Once I have transcribed stories, the generative AI applications seem easy and obvious. Extract recurring themes and characters using a sprinkle of local model intelligence? (Seems fine) Generate Nano Banana illustrations of characters in the style of famed Pixar Illustrator, Sanjay Patel? (um.. not cool) Build an Eleven Labs powered penguin companion stuffy that tells stories with a voice that sounds exactly like Daddy? (OH GOD. WHAT HAVE I DONE)
Where do I go once the system is built? Not sure. I know the point isn't to outsource storytelling, creativity, or imagination to machines. It's to understand our voices as storytellers. And have fun.
I do believe this project will help me clarify my own red lines about AI/technology and kids. I've started talking to child development experts, artists, and writers in my children's media, animation, writing, and edTech networks to help sharpen my thinking.
The Mahabharata
Right as I started the project, Arti became interested in the Mahabharata. My absolute favorite stories from when I was a kid, but not the easiest to tell to a four-year-old.
The Mahabharata isn't the Ramayana. It's dense. Murky. The good guys aren't totally good. The bad guys aren't totally bad. How do you tell your child about the part when Krishna cuts off Shishupal's head? Or "soften" when Bhima vows to drink Dushasana's blood? What do I do with the simple fact that the five Pandava brothers share one wife? (Tho maybe this is very Gen Alpha and I am just an old, old man)
The other fun part of Mahabharata stories on the technical side is that we have a preschooler, born in Manhattan, stumbling over Sanskrit names, and we still need to transcribe that properly. How do you do that?
I don't know. But it will be fun figuring it all out.
Where I'm Coming From
I'm a Sesame Workshop Writers Room fellow. I wrote for Mo Willems. I earned an Emmy nomination for Daniel Tiger's Neighborhood. Most recently, I built tools for the content team at Kibeam Learning, a company that makes interactive products for kids.
Build Log
Two Lanes, Zero Overlap
First real-audio run through the full normalization pipeline. Two correction passes — LLM and dictionary — that split the work with no overlap.
| Pass | Catches | Example | Count |
|---|---|---|---|
| LLM (qwen3:8b) | Phonetic mishearings needing story context | "fondos" → "Pandavas" | 18 |
| Dictionary | Known transliteration variants | "Duryodhan" → "Duryodhana" | 12 |
Zero overlap, zero false positives. The dictionary also distinguishes variants from aliases — "Duryodhan" is an incomplete transliteration, correct it; "Partha" is a legitimate alternate name for Arjuna, leave it. Without that split, the pipeline "corrects" names that aren't wrong.
The Living Transcript
Where do corrections live? A separate file means two sources of truth. Instead, the transcript itself becomes a living document — each pipeline stage writes corrections inline.
{
"word": "Duryodhana,",
"_original": "thoori odin,",
"_corrections": [
{"stage": "llm", "from": "thoori odin", "to": "Duryodhan"},
{"stage": "dictionary", "from": "Duryodhan", "to": "Duryodhana"}
]
}
One artifact tracks its own history. Extends naturally to human corrections later — same _corrections chain, different _corrected_by.
Full Transcript Wins
Whisper mangles Sanskrit names: "Pandavas" becomes "fondos," "Kauravas" becomes "goros." Needed a correction step for story bible searchability. Tested a local LLM (qwen3:8b) two ways: segment-by-segment and full transcript.
| Approach | Fixed | False Positives |
|---|---|---|
| Segment-by-segment | 4/5 | 3 ("best" → Bhishma, "dad" → Pandu) |
| Full transcript | 5/5 | 0 |
| What was said | What Whisper heard | LLM correction |
|---|---|---|
| Pandavas | fondos | Pandavas ✓ |
| Pandu | fondo | Pandu ✓ |
| Kauravas | goros | Kauravas ✓ |
| Yudhishthira | yudister | Yudhishthira ✓ |
| Dhritarashtra | dhrashtra | Dhritarashtra ✓ |
Without surrounding context, the model can't distinguish English words from phonetic mishearings. The full conversation is the corrective.
Segment-Shaped Problems
Hallucination filtering applied per-word stripped real speech from good segments. "Dad," vanished because its probability fell below threshold. "Why" disappeared from Arti's question.
Switched to segment-level: if even one word in a segment passes, keep the whole segment intact. Drop only when every word fails.
| Unit | "Right." (hallucinated) | "Dad, why do the..." (real) |
|---|---|---|
| Word-level | ✅ Removed | ❌ Stripped "Dad," and "Why" |
| Segment-level | ✅ Removed | ✅ Kept intact |
Hallucinations are entire segments of garbage. Real speech has a mix of confident and uncertain words. The segment is the right unit.