Similar audio files

Finding audio files that sound similar is a pretty useful task, especially when the audio files are encoded in different formats or at different bit rates.



In this post we're going to implement such program which will try to find similar audio files by comparing their waveforms.

# Algorithm

The algorithm used in comparing two waveforms is pretty simple. In fact, the entire program is simple since the audio problem has been translated into a graphics problem.

The steps of the algorithm are as follows:
  • convert any audio file to .wav
  • generate the waveform of the WAV file
  • analyze the waveform and create a fingerprint value
  • compare the fingerprints and generate the final report

First, an audio file is converted to .wav (using sox) which is piped with wav2png, generating the waveform of the WAV file to the stdout, which is captured and analyzed by GD, resulting in a specific fingerprint of the audio file.

The waveform is processed block by block, pixel by pixel:
  _________________________________________
 |_____|_____|_____|_____|_____|_____|_____|
 |_____|_____|_____|_____|_____|_____|_____|
 |_____|_____|_____|_____|_____|_____|_____|
 |_____|_____|_____|_____|_____|_____|_____|

Each block has a distinct number of white pixels, which are collected inside an array which will constitute the unique fingerprint of the waveform.

Now, each block value is compared with the corresponding value of another fingerprint-block. If the difference from all blocks is within the allowed tolerance, then the audio files are marked as similar.

In the end, the similar files are reported to the standard output.


Because this process is pretty slow, each fingerprint is cached inside a special database, using the GDBM_File module.

# Implementation

The first version implements the comparison algorithm described in this post, while the second version uses a more heuristic approach, comparing the entire waveform at once: