TTS Chunk Loudness: Before / After

Each show is synthesized as several independent Gemini TTS calls. Even on one voice, adjacent chunks drift 2–4 LU in loudness, and that imbalance survives to the final MP3 — whole-track normalization fixes the average level but can't fix the chunk-to-chunk balance. This change (PR #469) measures each chunk's BS.1770 integrated loudness and applies one scalar gain per chunk toward the show median, so the seams stop jumping.

Peak safety is deliberately conservative. A boost is capped so a chunk's sample peak stays under −3 dBFS; a chunk that already peaks near 0 dBFS therefore can't be boosted without clipping and is left at its original level (shown in the tables as gain 0, “peak-limited”). That leaves one chunk occasionally uncorrected — correct, distortion-free behavior. 10 chunks across these shows were peak-limited.

Headline metric — BS.1770 integrated chunk jump Largest integrated-LUFS gap between adjacent chunks, before vs after the per-chunk gain. This is exactly what the algorithm optimizes (a scalar dB gain shifts a chunk's integrated loudness by that many dB). Improves on all 5 shows.

Secondary spot-check — ±10 s rendered seam Integrated loudness of a 10 s window on each side of one seam in the finished MP3. A useful corroboration, but noisier: a 10 s slice straddling the gap/crossfade reflects whichever words are there, not the chunk's characteristic level.

How this A/B was built. Both sides run the identical production combine pipeline — aformat → apad → acrossfade → loudnorm(I=-19, TP=-2, LRA=10) → libmp3lame 128k — the only difference is the input: “Before” uses the raw TTS chunks, “After” uses the same chunks with the real loudness-normalization.ts gains applied (mode=show_median, maxGainDb=4, peakCeilingDbfs=-3). Pitch-alignment is deliberately omitted from both sides: it's a separate, orthogonal fix applied equally in prod, so leaving it out isolates only the loudness change. Excerpts are ~5 min windows centered on each show's largest chunk seam.

✓ BS.1770 integrated chunk balance improved on 5/5 shows ±10 s seam spot-check improved on 4/5 (show3 = window artifact; its integrated gap still closed)

#	Title	Type	Spk	Chunks	Integrated chunk jump (before→after)	±10 s seam spot-check
show1	Unlocking B2B Growth: Why 80% of Social Leads Start on LinkedIn	PODCAST	2 (multi)	6	4.4 → 2.1 LU	5.6 → 3.8 LU ✓
show2	Brutally Honest Business Blunders: Founders’ Post-Mortem Sessions	SHOW	2 (multi)	4	4.3 → 1.3 LU	5.3 → 3.0 LU ✓
show3	Hacker News Daily: The Stories, Debates, and Outrage Shaping Tech	SHOW	1	5	3.0 → 1.2 LU	1.1 → 1.7 LU window artifact
show4	July 2nd Through Time: Independence, Intrigue, and Unforgettable Birthdays	SHOW	1	4	2.4 → 1.2 LU	1.2 → 0.1 LU ✓
show5	Tech & AI Twitter Pulse: Real-World Deployments and Game-Changing Conversations	SHOW	1	4	2.2 → 0.6 LU	1.3 → 0.2 LU ✓

show1 — Unlocking B2B Growth: Why 80% of Social Leads Start on LinkedIn

PODCAST 2 speakers (multi) 6 TTS chunks 300s excerpt gemini-3.1-flash-tts-preview

Chunk-to-chunk loudness jump — BS.1770 integrated: 4.4 LU → 2.1 LU Largest adjacent-chunk integrated-loudness gap. This is the metric the per-chunk gain targets — smaller is better.

Secondary spot-check ±10 s window either side of the chunk 3→4 seam on the rendered MP3: 5.6 → 3.8 LU ✓ corroborates A 10 s slice straddling the gap captures whatever words are there, not the chunk's characteristic level — noisier than the integrated figure above.

chunk #	before LUFS	gain dB	after LUFS	peak dBFS
0	-22.7	0.00	-22.7	0.0	peak-limited (uncorrected)
1	-19.9	-0.84	-20.7	0.0
2	-21.6	0.00	-21.6	-0.9	peak-limited (uncorrected)
3	-22.8	0.00	-22.8	-0.5	peak-limited (uncorrected)
4	-18.4	-2.35	-20.7	0.0
5	-19.9	-0.88	-20.7	-0.6

Before (current)

After (this change)

show2 — Brutally Honest Business Blunders: Founders’ Post-Mortem Sessions

SHOW 2 speakers (multi) 4 TTS chunks 300s excerpt gemini-3.1-flash-tts-preview

Chunk-to-chunk loudness jump — BS.1770 integrated: 4.3 LU → 1.3 LU Largest adjacent-chunk integrated-loudness gap. This is the metric the per-chunk gain targets — smaller is better.

Secondary spot-check ±10 s window either side of the chunk 2→3 seam on the rendered MP3: 5.3 → 3.0 LU ✓ corroborates A 10 s slice straddling the gap captures whatever words are there, not the chunk's characteristic level — noisier than the integrated figure above.

chunk #	before LUFS	gain dB	after LUFS	peak dBFS
0	-24.2	+0.28	-24.0	-3.3	peak-limited (uncorrected)
1	-22.8	-0.73	-23.5	-2.3
2	-20.6	-2.92	-23.5	0.0
3	-24.9	0.00	-24.9	-0.4	peak-limited (uncorrected)

Before (current)

After (this change)

show3 — Hacker News Daily: The Stories, Debates, and Outrage Shaping Tech

SHOW 1 speaker 5 TTS chunks 300s excerpt gemini-3.1-flash-tts-preview

Chunk-to-chunk loudness jump — BS.1770 integrated: 3.0 LU → 1.2 LU Largest adjacent-chunk integrated-loudness gap. This is the metric the per-chunk gain targets — smaller is better.

Secondary spot-check ±10 s window either side of the chunk 3→4 seam on the rendered MP3: 1.1 → 1.7 LU window artifact — see note A 10 s slice straddling the gap captures whatever words are there, not the chunk's characteristic level — noisier than the integrated figure above.

Why this show's spot-check ticks up (and why the fix still worked). At the worst seam (chunk 3→4) the BS.1770 integrated gap — the thing the algorithm fixes — closes from 3.0 LU to ~0.0 LU: chunk 4 is a short, quiet outro (-23.6 LUFS integrated, the quietest chunk) and correctly gets +2.55 dB toward the median. The ±10 s spot-window at that boundary ticks up 1.1 → 1.7 LU only because the boosted chunk's first ~10 s happen to be louder than that chunk's own average — a measurement-window artifact of a 10 s slice, not a regression in the integrated balance. Both spot-check values are small (< 2 LU). Shown, not hidden.

chunk #	before LUFS	gain dB	after LUFS	peak dBFS
0	-22.2	0.00	-22.2	0.0	peak-limited (uncorrected)
1	-21.0	0.00	-21.0	-0.8
2	-20.7	-0.35	-21.0	-0.0
3	-20.6	-0.43	-21.0	-0.1
4	-23.6	+2.55	-21.0	-5.7

Before (current)

After (this change)

show4 — July 2nd Through Time: Independence, Intrigue, and Unforgettable Birthdays

SHOW 1 speaker 4 TTS chunks 300s excerpt gemini-3.1-flash-tts-preview

Chunk-to-chunk loudness jump — BS.1770 integrated: 2.4 LU → 1.2 LU Largest adjacent-chunk integrated-loudness gap. This is the metric the per-chunk gain targets — smaller is better.

Secondary spot-check ±10 s window either side of the chunk 0→1 seam on the rendered MP3: 1.2 → 0.1 LU ✓ corroborates A 10 s slice straddling the gap captures whatever words are there, not the chunk's characteristic level — noisier than the integrated figure above.

chunk #	before LUFS	gain dB	after LUFS	peak dBFS
0	-18.8	-1.17	-20.0	0.0
1	-21.2	0.00	-21.2	0.0	peak-limited (uncorrected)
2	-19.0	-0.98	-20.0	-0.0
3	-21.0	0.00	-21.0	-0.0	peak-limited (uncorrected)

Before (current)

After (this change)

show5 — Tech & AI Twitter Pulse: Real-World Deployments and Game-Changing Conversations

SHOW 1 speaker 4 TTS chunks 300s excerpt gemini-3.1-flash-tts-preview

Chunk-to-chunk loudness jump — BS.1770 integrated: 2.2 LU → 0.6 LU Largest adjacent-chunk integrated-loudness gap. This is the metric the per-chunk gain targets — smaller is better.

Secondary spot-check ±10 s window either side of the chunk 2→3 seam on the rendered MP3: 1.3 → 0.2 LU ✓ corroborates A 10 s slice straddling the gap captures whatever words are there, not the chunk's characteristic level — noisier than the integrated figure above.

chunk #	before LUFS	gain dB	after LUFS
0	-17.4	-0.27	-17.7
1	-18.0	0.00	-18.0	peak-limited (uncorrected)
2	-18.3	0.00	-18.3	peak-limited (uncorrected)
3	-16.1	-1.63	-17.7

Before (current)

After (this change)