ReSounder

🖼️ Spectrogram Input

or Ctrl+V to paste

Tips & Troubleshooting

Best tip for future troubleshooting: experiment until you gain an intuition about common issues.

Speech sounds distorted or "robotic"
Symptom: Vowels are smeared, consonants feel metallic or synthetic.
Likely cause: Frequency axis scale does not match original spectrogram.
Fix: Try switching Frequency Scale (e.g., Linear ? Log). Ensure Min/Max Frequency align with the original signal's content.

Audio is the wrong pitch (too high or too low)
Symptom: Everything sounds like chipmunks or deep giants.
Likely cause: Frequency bounds do not match the image's real spectral range. The height of the spectrogram determines the musical pitch mapping.
Fix: Increase Max Frequency if the audio sounds too deep. Decrease Min Frequency if the audio sounds too high.

Audio timing seems stretched or compressed
Symptom: Speech rate or tempo sounds wrong.
Likely cause: Horizontal scale mismatch. Time axis scaling changes the actual duration mapping.
Fix: Adjust Assumed Duration (sec) to match original export. Re-render preview with new value before reconstruction.

Audio is just loud impulsive noise
Symptom: Energy bursts appear instead of smooth tonal content.
Likely cause: FFT size too small for the image's time resolution. If the time window is too short, harmonic structure collapses into broadband transients.
Fix: Increase FFT Size to improve frequency resolution. Reduce Assumed Duration (sec) if needed to balance performance. Ideal FFT sizes for speech sampled at 44.1-48 kHz are around 2048 and 4096.

Audio sounds "sing-songy"
Symptom: Sustained tones rise and fall unnaturally, creating a melodic lilt that was not present in the original audio.
Likely cause: The FFT size is too large, causing excessive smoothing over time and smearing rapid changes in pitch and articulation.
Fix: Reduce FFT Size to improve time resolution, preserving more natural speech and transient detail. Ideal FFT sizes for speech sampled at 44.1-48 kHz are around 2048 and 4096.

Audio feels muffled or missing detail
Symptom: High-frequency or Low-frequency content is dull or absent, or not enough detail in spectrogram.
Likely cause: The spectrogram's maximum frequency set too low, or minimum frequency set too high, or spectrogram height too small, or FFT size is too small, or Noise Floor (dB) is too small. Nonlinear scales compress/stretch near the top/bottom of spectrograms.
Fix if you didn't create the spectrogram: Check if the frequency scale was exported as log but decoded linear. Increase Noise Floor (dB).
Fix if you created the spectrogram: Raise the maximum frequency, or lower the minimum frequency, or increase FFT size, or increase image height. Use Linear scale instead of Log, Mel or Bark to maximize detail in the spectrogram.

Audio sounds like white noise
Symptom: Audio sounds like white/broadband noise though the waveform appears correct.
Likely cause: The colormap is being interpreted in the wrong orientation.
Fix: Toggle Invert Colors to correct the intensity mapping.

Audio is too quiet or fades into silence
Symptom: Everything is faint though the waveform appears correct.
Likely cause: Too wide a Dynamic Range or incorrect intensity normalization. Very low pixel values correspond to extreme attenuation in dB space.
Fix: Reduce Dynamic Range (dB). Increase Noise Floor (dB). Ensure color inversion (if applied) matches original export. Increase Pre-gain or Post-gain.

Audio has a high-pitched hissing floor
Symptom: Constant noise underlying all playback.
Likely cause: Noise pixels mapped to non-zero magnitude during dB-to-linear conversion. Small values near the noise floor get amplified in reconstruction.
Fix: Decrease Noise Floor (dB). Increase Dynamic Range (dB) slightly.

Audio has "underwater" / "phasing" artifacts
Symptom: Warbly, chorus-like sound.
Likely cause: Phase propagation struggling due to abrupt changes.
Fix: Ensure FFT Size & Assumed Duration (sec) is close to original. Unfortunately, this is the main symptom of phase reconstruction and can't be entirely removed.

⚙️ Basic Settings

Sample Rate (Hz) FFT Size Assumed Duration (sec)

Frequency Scale Min Frequency (Hz) Max Frequency (Hz)

🎨 Image Interpretation

🔄 Phase Reconstruction

Method Griffin-Lim Iterations

Pre-Gain Post-Gain Reverse Audio

✨ Post-Processing (Not recommended)

Post-reconstruction Smoother Window ms

Temporal Smoothing (3-col) High-end Roll-off