TTS Chunk Loudness: Before / After

Each show is synthesized as several independent Gemini TTS calls. Even on one voice, adjacent chunks drift 2–4 LU in loudness, and that imbalance survives to the final MP3 — whole-track normalization fixes the average level but can't fix the chunk-to-chunk balance. This change (PR #469) measures each chunk's BS.1770 integrated loudness and applies one scalar gain per chunk toward the show median, so the seams stop jumping.

Peak safety is deliberately conservative. A boost is capped so a chunk's sample peak stays under −3 dBFS; a chunk that already peaks near 0 dBFS therefore can't be boosted without clipping and is left at its original level (shown in the tables as gain 0, “peak-limited”). That leaves one chunk occasionally uncorrected — correct, distortion-free behavior. 10 chunks across these shows were peak-limited.

Headline metric — BS.1770 integrated chunk jump Largest integrated-LUFS gap between adjacent chunks, before vs after the per-chunk gain. This is exactly what the algorithm optimizes (a scalar dB gain shifts a chunk's integrated loudness by that many dB). Improves on all 5 shows.
Secondary spot-check — ±10 s rendered seam Integrated loudness of a 10 s window on each side of one seam in the finished MP3. A useful corroboration, but noisier: a 10 s slice straddling the gap/crossfade reflects whichever words are there, not the chunk's characteristic level.
How this A/B was built. Both sides run the identical production combine pipeline — aformat → apad → acrossfade → loudnorm(I=-19, TP=-2, LRA=10) → libmp3lame 128k — the only difference is the input: “Before” uses the raw TTS chunks, “After” uses the same chunks with the real loudness-normalization.ts gains applied (mode=show_median, maxGainDb=4, peakCeilingDbfs=-3). Pitch-alignment is deliberately omitted from both sides: it's a separate, orthogonal fix applied equally in prod, so leaving it out isolates only the loudness change. Excerpts are ~5 min windows centered on each show's largest chunk seam.
✓ BS.1770 integrated chunk balance improved on 5/5 shows ±10 s seam spot-check improved on 4/5 (show3 = window artifact; its integrated gap still closed)
#TitleTypeSpkChunks Integrated chunk jump (before→after)±10 s seam spot-check
show1 Unlocking B2B Growth: Why 80% of Social Leads Start on LinkedIn PODCAST 2 (multi) 6 4.4 → 2.1 LU 5.6 → 3.8 LU
show2 Brutally Honest Business Blunders: Founders’ Post-Mortem Sessions SHOW 2 (multi) 4 4.3 → 1.3 LU 5.3 → 3.0 LU
show3 Hacker News Daily: The Stories, Debates, and Outrage Shaping Tech SHOW 1 5 3.0 → 1.2 LU 1.1 → 1.7 LU window artifact
show4 July 2nd Through Time: Independence, Intrigue, and Unforgettable Birthdays SHOW 1 4 2.4 → 1.2 LU 1.2 → 0.1 LU
show5 Tech & AI Twitter Pulse: Real-World Deployments and Game-Changing Conversations SHOW 1 4 2.2 → 0.6 LU 1.3 → 0.2 LU

show1 — Unlocking B2B Growth: Why 80% of Social Leads Start on LinkedIn

PODCAST 2 speakers (multi) 6 TTS chunks 300s excerpt gemini-3.1-flash-tts-preview
Chunk-to-chunk loudness jump — BS.1770 integrated: 4.4 LU2.1 LU Largest adjacent-chunk integrated-loudness gap. This is the metric the per-chunk gain targets — smaller is better.
Secondary spot-check ±10 s window either side of the chunk 3→4 seam on the rendered MP3: 5.6 → 3.8 LU ✓ corroborates A 10 s slice straddling the gap captures whatever words are there, not the chunk's characteristic level — noisier than the integrated figure above.
chunk #before LUFSgain dBafter LUFSpeak dBFS
0 -22.7 0.00 -22.7 0.0 peak-limited (uncorrected)
1 -19.9 -0.84 -20.7 0.0
2 -21.6 0.00 -21.6 -0.9 peak-limited (uncorrected)
3 -22.8 0.00 -22.8 -0.5 peak-limited (uncorrected)
4 -18.4 -2.35 -20.7 0.0
5 -19.9 -0.88 -20.7 -0.6
Before (current)
After (this change)

show2 — Brutally Honest Business Blunders: Founders’ Post-Mortem Sessions

SHOW 2 speakers (multi) 4 TTS chunks 300s excerpt gemini-3.1-flash-tts-preview
Chunk-to-chunk loudness jump — BS.1770 integrated: 4.3 LU1.3 LU Largest adjacent-chunk integrated-loudness gap. This is the metric the per-chunk gain targets — smaller is better.
Secondary spot-check ±10 s window either side of the chunk 2→3 seam on the rendered MP3: 5.3 → 3.0 LU ✓ corroborates A 10 s slice straddling the gap captures whatever words are there, not the chunk's characteristic level — noisier than the integrated figure above.
chunk #before LUFSgain dBafter LUFSpeak dBFS
0 -24.2 +0.28 -24.0 -3.3 peak-limited (uncorrected)
1 -22.8 -0.73 -23.5 -2.3
2 -20.6 -2.92 -23.5 0.0
3 -24.9 0.00 -24.9 -0.4 peak-limited (uncorrected)
Before (current)
After (this change)

show3 — Hacker News Daily: The Stories, Debates, and Outrage Shaping Tech

SHOW 1 speaker 5 TTS chunks 300s excerpt gemini-3.1-flash-tts-preview
Chunk-to-chunk loudness jump — BS.1770 integrated: 3.0 LU1.2 LU Largest adjacent-chunk integrated-loudness gap. This is the metric the per-chunk gain targets — smaller is better.
Secondary spot-check ±10 s window either side of the chunk 3→4 seam on the rendered MP3: 1.1 → 1.7 LU window artifact — see note A 10 s slice straddling the gap captures whatever words are there, not the chunk's characteristic level — noisier than the integrated figure above.
Why this show's spot-check ticks up (and why the fix still worked). At the worst seam (chunk 3→4) the BS.1770 integrated gap — the thing the algorithm fixes — closes from 3.0 LU to ~0.0 LU: chunk 4 is a short, quiet outro (-23.6 LUFS integrated, the quietest chunk) and correctly gets +2.55 dB toward the median. The ±10 s spot-window at that boundary ticks up 1.1 → 1.7 LU only because the boosted chunk's first ~10 s happen to be louder than that chunk's own average — a measurement-window artifact of a 10 s slice, not a regression in the integrated balance. Both spot-check values are small (< 2 LU). Shown, not hidden.
chunk #before LUFSgain dBafter LUFSpeak dBFS
0 -22.2 0.00 -22.2 0.0 peak-limited (uncorrected)
1 -21.0 0.00 -21.0 -0.8
2 -20.7 -0.35 -21.0 -0.0
3 -20.6 -0.43 -21.0 -0.1
4 -23.6 +2.55 -21.0 -5.7
Before (current)
After (this change)

show4 — July 2nd Through Time: Independence, Intrigue, and Unforgettable Birthdays

SHOW 1 speaker 4 TTS chunks 300s excerpt gemini-3.1-flash-tts-preview
Chunk-to-chunk loudness jump — BS.1770 integrated: 2.4 LU1.2 LU Largest adjacent-chunk integrated-loudness gap. This is the metric the per-chunk gain targets — smaller is better.
Secondary spot-check ±10 s window either side of the chunk 0→1 seam on the rendered MP3: 1.2 → 0.1 LU ✓ corroborates A 10 s slice straddling the gap captures whatever words are there, not the chunk's characteristic level — noisier than the integrated figure above.
chunk #before LUFSgain dBafter LUFSpeak dBFS
0 -18.8 -1.17 -20.0 0.0
1 -21.2 0.00 -21.2 0.0 peak-limited (uncorrected)
2 -19.0 -0.98 -20.0 -0.0
3 -21.0 0.00 -21.0 -0.0 peak-limited (uncorrected)
Before (current)
After (this change)

show5 — Tech & AI Twitter Pulse: Real-World Deployments and Game-Changing Conversations

SHOW 1 speaker 4 TTS chunks 300s excerpt gemini-3.1-flash-tts-preview
Chunk-to-chunk loudness jump — BS.1770 integrated: 2.2 LU0.6 LU Largest adjacent-chunk integrated-loudness gap. This is the metric the per-chunk gain targets — smaller is better.
Secondary spot-check ±10 s window either side of the chunk 2→3 seam on the rendered MP3: 1.3 → 0.2 LU ✓ corroborates A 10 s slice straddling the gap captures whatever words are there, not the chunk's characteristic level — noisier than the integrated figure above.
chunk #before LUFSgain dBafter LUFSpeak dBFS
0 -17.4 -0.27 -17.7 0.0
1 -18.0 0.00 -18.0 0.0 peak-limited (uncorrected)
2 -18.3 0.00 -18.3 0.0 peak-limited (uncorrected)
3 -16.1 -1.63 -17.7 0.0
Before (current)
After (this change)