Recipe #03

Voice on Paper

Hide a voice inside a picture. Print it on paper. Mail it across the city. Let someone hear it.

20 minutes Intermediate
Scroll to Begin

What You'll Create

A spectrogram cipher turns audio into an image. Not metaphorically — literally. Record a voice message, run it through SpectroGhost, and what comes out is a PNG you can print, mail, tape to a park bench, or spray-paint on a wall. Anyone with ReSounder can point their phone at it and hear what it says.

Here's what makes this powerful for ARG design: the spectrogram is meaningless to anyone who doesn't know what they're looking at. It's noise. Static. Abstract decoration. Until it isn't. You can hide audio in plain sight — framed on a wall, tucked inside an envelope, embedded in a printed flyer — and most people walk right past it. Only players who know to look, and know what tool to use, will hear the message.

This is genuinely unexplored territory. The mechanics are simple, the tools are free, and the moment a player hears a voice come out of a piece of paper is hard to overstate. Build it into a treasure hunt, an escape room, a city-wide ARG, or a single unforgettable clue. The medium is the puzzle.

What You'll Learn

  • How to record audio and convert it into a printable spectrogram
  • The key settings that control audio quality and reconstruction clarity
  • How to verify your cipher decodes cleanly before distributing it
  • How to prepare spectrograms for physical (printed) delivery
  • How to design the breadcrumb trail that guides players to the decode

See It In Action

Watch a voice message get recorded, turned into a spectrogram, exported as a PNG, and decoded back into audio — all in the browser, in real time.

What You'll Need

SpectroGhost

Free browser tool for recording audio and generating optimized spectrogram PNGs. No install, no login. Open SpectroGhost →

ReSounder

Free browser tool for decoding spectrograms back into audio. This is the key your players need. Open ReSounder →

Your Message

5 to 10 seconds of clear, deliberate speech. Short messages reconstruct cleanly. Long ones get muddy. Write it out before you record.

A Delivery Plan

Digital PNG? Printed page? Wax-sealed envelope? Telephone pole sticker? Decide how players encounter the cipher — the medium shapes the mystery.

Step by Step

01

Craft Your Message

Before you open SpectroGhost, decide what you want to say. Clarity is everything here. ReSounder uses the Griffin-Lim algorithm to reconstruct audio from your spectrogram — and that algorithm has to estimate phase information that was lost when the sound became an image. It's good. It's not magic. Clear, deliberate speech gives it the best shot.

Speak slowly. Enunciate. Write out your message first and read it aloud a few times before recording. Five to ten seconds is the sweet spot — short enough to reconstruct cleanly, long enough to carry a real message. Think about the moment your player hears it. What do you want them to feel? Write the message backward from that reaction.

Pro Tip: The spectrogram cipher works best as a single decisive moment — a coordinate, a name, a command. Resist the urge to encode a paragraph. Make every second count.
02

Create the Spectrogram

Open SpectroGhost and hit Record. Say your message. When you stop, SpectroGhost generates the spectrogram in real time — you can watch it paint itself as you speak. Three settings to dial in:

  • FFT Size (default: 2048) — Controls the frequency/time resolution tradeoff. Higher values sound auto-tuned. Lower sounds robotic. For speech, 2048 is almost always right.
  • Maximum Frequency (5000 Hz for voice) — Human speech lives below 5 kHz. Going lower makes voices muddy. Going higher wastes image space.
  • Overlap (default: 75%) — More overlap, smoother reconstruction. The default works. Adjust only if you hear artifacts.

When your settings look right, click Optimize. This recalculates image dimensions so every pixel maps one-to-one to actual frequency data — no wasted space, no interpolation. It also embeds metadata into the PNG so ReSounder auto-configures when players open the file. Then click Export PNG.

Pro Tip: Leave the Colormap on Grayscale. It only needs black ink and survives scanning and photography far better than any color map.
03

Verify the Decode

Before you distribute anything, confirm the spectrogram actually decodes. This takes thirty seconds and saves you from shipping a broken cipher.

In SpectroGhost, press Ctrl+Shift+C to copy the spectrogram to your clipboard. Switch to ReSounder and press Ctrl+V to paste it directly. ReSounder reads the embedded metadata and auto-fills every setting. Turn your speakers down first — you don't know how loud it will come out. Then click Invert Spectrogram.

  • Message is clear: Move on to Step 4.
  • Sounds muddy or robotic: Go back and adjust FFT Size or Overlap, re-record, try again.
  • Wrong pitch or speed: Check that Sample Rate and Max Frequency match between tools.
Bonus: The clipboard shortcut also lets you decode any spectrogram you find online — screenshot it, paste into ReSounder, and hear what it says.
04

Prepare for Physical Delivery

Sending the PNG digitally? You're already done — embedded metadata handles everything. Players open the file, click once, hear the message.

Physical delivery is where this gets interesting. Printed spectrograms work, but they need a little preparation before they leave your hands:

  • Add a border — Open the spectrogram in any image editor (PowerPoint works fine) and add a thin rectangular border. When players photograph or scan the print, they need a clean edge to crop to. A slightly tilted photo distorts the reconstruction.
  • Print at high resolution — Use your printer's highest quality setting. More detail in the print means better reconstruction on the other end.
  • Test the full loop — Print it. Photograph it with your phone. Open the photo in ReSounder, use the Image Adjustments panel to crop to the border, adjust brightness if needed, then decode. If you can't get clear audio, your players won't either.
Note: ReSounder's Image Adjustments panel has crop, rotation, keystone correction, and auto-levels built in — designed specifically for working with photographed spectrograms.
05

Design the Player Experience

This is where ARG craft comes in. When players decode an original exported PNG, metadata handles everything automatically. But if your spectrogram has been printed, photographed off a wall, or screenshotted, the metadata is gone — and players will need four things to configure ReSounder manually:

  • Sample Rate: 48 kHz (ReSounder's default — design around it)
  • FFT Size: 2048 (match what you used in SpectroGhost)
  • Max Frequency: 5000 Hz (or whatever you set)
  • Duration: Roughly how long the audio is

How you communicate these is part of the puzzle design. Add them to the PNG border before exporting. Encode them in a separate clue. Tell players to use default settings and design your spectrogram around those. Or let ReSounder's built-in troubleshooting guide carry the determined solvers.

The bigger design question: how do players know to use ReSounder at all? Don't make finding the tool the hard part. Give them a breadcrumb — a QR code, a note that says "this image speaks," a URL in the corner. Let the challenge be the content of the message. The moment they hear a voice come out of printed paper is the payoff. Don't bury that under a tool-discovery puzzle.

Remember: The medium is part of the message. A spectrogram inside a wax-sealed envelope hits differently than one stapled to a telephone pole. Choose your delivery with intention.

Imagine the Possibilities

Anywhere you can put an image, you can now put something people can listen to.

The Treasure Hunt Trail

Each physical location has a printed spectrogram. Players photograph it, hear a voice with coordinates or a riddle, and are directed to the next stop. The cipher is the mechanic. The voice is the payoff.

The Escape Room Voice

A framed piece of abstract art hangs on the wall. Players eventually realize it looks like a spectrogram. They photograph it, load it into ReSounder, and hear someone whisper the combination. The art was always speaking.

The Field Drop

Spectrogram stickers appear at real locations across a city. Each encodes a fragment of a message. Players who know to look find them, decode them, and piece together the full transmission. The city becomes the puzzle board.

Mix It Up

Scale the complexity to match your players, your timeline, and your ARG's tone.

Beginner

Digital-Only Delivery

Send players the exported PNG directly. Embedded metadata auto-configures ReSounder — they open the file, click once, hear the message. No print prep, no parameter hints, no physical logistics. Great for testing the mechanic before committing to a full deployment.

Intermediate

Printed with a Breadcrumb

Print the spectrogram and include a clear pointer to ReSounder — a QR code, a URL, or a note explaining what to do. The physical object creates weight and mystery. The breadcrumb removes friction. Players feel clever for decoding it, not frustrated by the tool hunt.

Advanced

Environmental — No Instructions

Distribute printed spectrograms in real locations with no explanation. Players must identify what they're looking at, find the tool, figure out the settings, and decode. Use this only if tool discovery is an intentional puzzle layer — and only for players who love that kind of challenge.

Ready to Hide a Voice?

The tools are free. The medium is wide open. Record something worth hearing, hide it somewhere worth finding, and give your players a moment they won't forget.