Saurin Choksi
← Back to Projects
Current Project

Tell Me A Story

A local-first system for capturing bedtime stories. Built for my daughter and me to understand our storytelling voices.

“Tell me a story. Tell me a story.
Tell me a storrryyyyyy!

My daughter, Arti, shouts this sentence at me 1,000 times a day. Then at bedtime, I'm commanded to tell three more before she'll go to sleep. Folks, it's the end of the night. My brain's melted. I love my four-year-old but telling, "one more story!" feels like a type of tiny torture.

Still, when it's done, I'm always glad we share this ritual. And sometimes, I'm impressed by what we create together.

Pappy, the boy who looks like whatever food he wants to eat, usually a quesadilla. Too good, Arti! A perfectly executed Hero's Journey arc about a talking fork. Who knew it could be done?

Those bedtime stories are told and gone. I decided I wanted to keep them.

Why and for what, I'm not sure yet. One side of my brain complains, "Do we really have to digitize everything?" But a little voice insists, "Build the thing. Do it your way. Keep these stories. See what comes next when we get there." Ok...

Click cards to explore

Immediate Connection

Removing friction brings creative choices you could not have imagined before.

Hard Fun

The best learning happens when you're building something you actually want.

Calm Technology

Technology can live on the periphery, there when you need it, and otherwise invisible.

The Arti Test

(Me)

I only work on things for kids that I'd give my child.

Part of why I am excited about this project: I get to learn things I actually want to learn.

When I realized this bedtime story project gave me a reason to explore NVIDIA's Cuda/Jetson stack, I was unreasonably happy. Robot-brain tech! Positronic! That's exciting. And I do want to have a voice in the rooms where these things that will interact with us - our families, our kids - get built.

Once I have transcribed stories, the generative AI applications seem easy and obvious. Extract recurring themes and characters using a sprinkle of local model intelligence? (Seems fine) Generate Nano Banana illustrations of characters in the style of famed Pixar Illustrator, Sanjay Patel? (um.. not cool) Build an Eleven Labs powered penguin companion stuffy that tells stories with a voice that sounds exactly like Daddy? (OH GOD. WHAT HAVE I DONE)

Where do I go once the system is built? Not sure. I know the point isn't to outsource storytelling, creativity, or imagination to machines. It's to understand our voices as storytellers. And have fun.

I do believe this project will help me clarify my own red lines about AI/technology and kids. I've started talking to child development experts, artists, and writers in my children's media, animation, writing, and edTech networks to help sharpen my thinking.

Right as I started the project, Arti became interested in the Mahabharata. My absolute favorite stories from when I was a kid, but not the easiest to tell to a four-year-old.

The Mahabharata isn't the Ramayana. It's dense. Murky. The good guys aren't totally good. The bad guys aren't totally bad. How do you tell your child about the part when Krishna cuts off Shishupal's head? Or "soften" when Bhima vows to drink Dushasana's blood? What do I do with the simple fact that the five Pandava brothers share one wife? (Tho maybe this is very Gen Alpha and I am just an old, old man)

The other fun part of Mahabharata stories on the technical side is that we have a preschooler, born in Manhattan, stumbling over Sanskrit names, and we still need to transcribe that properly. How do you do that?

I don't know. But it will be fun figuring it all out.

I'm a Sesame Workshop Writers Room fellow. I wrote for Mo Willems. I earned an Emmy nomination for Daniel Tiger's Neighborhood. Most recently, I built tools for the content team at Kibeam Learning, a company that makes interactive products for kids.

Two Lanes, Zero Overlap

First real-audio run through the full normalization pipeline. Two correction passes — LLM and dictionary — that split the work with no overlap.

Whisper "thoori odin,"
LLM "Duryodhan,"
Dictionary "Duryodhana,"
Human
Pass Catches Example Count
LLM (qwen3:8b) Phonetic mishearings needing story context "fondos" → "Pandavas" 18
Dictionary Known transliteration variants "Duryodhan" → "Duryodhana" 12

Zero overlap, zero false positives. The dictionary also distinguishes variants from aliases — "Duryodhan" is an incomplete transliteration, correct it; "Partha" is a legitimate alternate name for Arjuna, leave it. Without that split, the pipeline "corrects" names that aren't wrong.


The Living Transcript

Where do corrections live? A separate file means two sources of truth. Instead, the transcript itself becomes a living document — each pipeline stage writes corrections inline.

{
  "word": "Duryodhana,",
  "_original": "thoori odin,",
  "_corrections": [
    {"stage": "llm", "from": "thoori odin", "to": "Duryodhan"},
    {"stage": "dictionary", "from": "Duryodhan", "to": "Duryodhana"}
  ]
}

One artifact tracks its own history. Extends naturally to human corrections later — same _corrections chain, different _corrected_by.

Full Transcript Wins

Whisper mangles Sanskrit names: "Pandavas" becomes "fondos," "Kauravas" becomes "goros." Needed a correction step for story bible searchability. Tested a local LLM (qwen3:8b) two ways: segment-by-segment and full transcript.

Approach Fixed False Positives
Segment-by-segment 4/5 3 ("best" → Bhishma, "dad" → Pandu)
Full transcript 5/5 0
What was said What Whisper heard LLM correction
Pandavas fondos Pandavas ✓
Pandu fondo Pandu ✓
Kauravas goros Kauravas ✓
Yudhishthira yudister Yudhishthira ✓
Dhritarashtra dhrashtra Dhritarashtra ✓

Without surrounding context, the model can't distinguish English words from phonetic mishearings. The full conversation is the corrective.

Segment-Shaped Problems

Hallucination filtering applied per-word stripped real speech from good segments. "Dad," vanished because its probability fell below threshold. "Why" disappeared from Arti's question.

Switched to segment-level: if even one word in a segment passes, keep the whole segment intact. Drop only when every word fails.

Unit "Right." (hallucinated) "Dad, why do the..." (real)
Word-level ✅ Removed ❌ Stripped "Dad," and "Why"
Segment-level ✅ Removed ✅ Kept intact

Hallucinations are entire segments of garbage. Real speech has a mix of confident and uncertain words. The segment is the right unit.

Full log (15 entries) →