Introducing Chromaprint

After several months of reading research papers, learning and weekend coding, I’m very happy to make the half-finished code of my audio fingerprinting library public. :) I’m doing this mostly for selfish reasons, because it will force me to stop thinking in “hacker mode” and hopefully properly finish it, and I also hope to get some help and feedback from other people. There is nothing for regular users yet though, just for developers or people not afraid of the command line.

It all started in February this year, when when I got my Google Alerts mail for “musicbrainz picard” which included a link to this paper (“Waveprint: Efficient wavelet-based audio fingerprinting” (2007) by Shumeet Baluja and Michele Covell). I’ve never paid much attention to how audio identification systems are actually implemented, but I found it interesting that a paper published by Google researchers cited Picard, especially because Picard doesn’t implement any fingerprinting algorithm, just uses libofa. Anyway, I’ve read the paper and realized that maybe it’s not that hard to implement such a system. No tough DSP stuff or scary mathematics (of course there is some DSP stuff and mathematics, but mostly basics). The system described in the paper seemed quite straight-forward to implement, so I become curious and decided to give it a try. Later on I realized that it’s perhaps not the best system and that even the authors published new papers describing different approaches. That, combined with the fact that I was still officially an university student and had free access to all the papers from most organizations like ACM or IEEE, caused that I started reading more and more papers on the topic, learning about the history, how the systems evolved, and so on.

Many ideas were based on a paper by Yan Ke, Derek Hoiem, and Rahul Sukthankar called “Computer Vision for Music Identification” (2005). In fact, even the Last.fm fingerprinter uses the code published by the authors of this paper. This is where I learned that audio identification is more about machine learning that it is about DSP. Many useful methods for extracting interesting features from audio streams are well-known and the problem is more about how to apply and index them the best way. The basic idea here is to treat audio as a spectral image and index the content of the image. I’ll explain this in more detail and how Chromaprint uses this in a following post.

Another important paper for me was “Pairwise Boosted Audio Fingerprint” (2009) by Dalwon Jang, Chang D. Yoo, Sunil Lee, Sungwoong Kim and Ton Kalker (Ton Kalker is a co-author of a historically important paper “Robust Audio Hashing for Content Identification” (2001) published by Philips Research), which combined previous experiments of the authors with audio identification based on spectral centroid features and the indexing approach similar to the one suggested by Y. Ke, D. Hoiem and R. Sukthankar. For a long time this was the best solution I had and since it was actually not very hard to implement, the most time I spent on tweaking the configuration to get the best results.

The last major change came after I learned about “chroma” features by reading the “Efficient Index-Based Audio Matching” (2008) by Frank Kurth and Meinard Müller. I’ve read more papers about chroma features later, but this was the first and also the most important one for me and some ideas about processing the feature vectors from it are implemented in Chromaprint. Chroma features are typically used for music identification, as opposed to audio file identification, but I tried to use them with the approach I already had implemented and it nicely improved the quality of the fingerprinting function and actually reduced complexity which allowed me to use much larger training data sets.

Anyway, this is more or less how I got to this point. As I mentioned, I’ll try to describe in more detail how Chromaprint works and where are the exact ideas from in an another post later. The code is not finished yet, but the core ideas are already implemented and tested. The work that has to be done is mostly about cleaning the code, tweaking the configuration, running the learning algorithm on a better training data set (as I used only random selections of my music collection so far) and building some API that can be used by external applications.

The code is written in C++, but I plan the public API to be in plain C. Except for a FFT library (either FFTW3 or FFmpeg), it has no external dependencies. It’s released under the LGPL 2.1 license, so there should be no problem integrating it into a commercial application, assuming FFmpeg is used for FFT calculations (using FFTW3 would require the binary to be GPL compatible). The project is hosted on Launchpad using Bazaar for development. I’m sorry I didn’t include the complete development history there, but it’s just full of junk commits, so you will not miss much. :)

What I’d really like is to start actually working also on the fingerprint lookup service, for which I need as many fingerprints as possible. I have a proof of concept written in Java, using PostgreSQL and the intarray extension, which allows me to search the fingerprints using GIN indexes. This works fine on a database with tens of thousands of fingerprints, but I’m not sure if it will scale to much higher numbers. If you would like to help and you are running Debian/Ubuntu Linux, please run these commands and email me the compressed fpcollect.log file:

sudo apt-get install bzr cmake libfftw3-dev libavcodec-dev libavformat-dev libtag1-dev libboost-dev
bzr branch lp:chromaprint
cd chromaprint
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TOOLS=ON .
make
./tools/fpcollect /path/to/your/music/library >fpcollect.log

I’m sure the recipe should be easy enough to modify for other Linux distributions. I’ll try to build a Windows binary in the next days.

UPDATE: I’ve compiled a Windows version of fpcollect. If you would like to help, please download it, change the fpcollect.bat file to point to your collection and run it. It should should produce a file called fpcollect.log like the Linux example above.

So that’s all for now. My plans for near future are to clean up the library and build some simple GUI application for collecting fingerprints. Once this is done, I can ask non-programmers to help me build a test database of fingerprints (including MusicBrainz IDs) and work primarily on the server component. There are some things that could be improved on the client library from functional point of view, but I think it’s good enough for now, so the server part seems more important at the moment.

Btw, I always considered “Chromaprint” to be a temporary name for the client library, not meant to be its final name. If you can think of a better name, ideally something that could be used also for the service/server, please let me know!

This entry was posted in Acoustid, Announce, Programming and tagged , , , , , . Bookmark the permalink.

12 Responses to Introducing Chromaprint

  1. Pingback: Lukáš Lalinský | Acoustid

  2. Pingback: Lukáš Lalinský | Cross-compiling with CMake and Autotools

  3. Pingback: Acoustid updates | Lukáš Lalinský

  4. Adam says:

    Hi Lukas,

    I was trying to download the windows version (from the link in the blog) and it was broken.

    Do you have an updated windows version or a working link?

    Great work, on this. I’m going to be using it to create a pseudo-voice biometric system!

    Adam

  5. I’m not sure it can be used for a biometric system. The design goals were quite different, so I don’t think it’s robust enough for such a task.

    Anyway, I removed the Windows version because it was generating older version of fingerprints. It only contained a tool for generating fingerprints from audio files. If you are interested in submitting fingerprints to the Acoustid database, please see Acoustid Fingerprinter instead.

    If you would just like to use the code on Windows from your application, then I’m afraid you will have to compile it yourself. At the moment it’s also pretty GCC-specific, so if you are planning to use MSVC, it probably won’t compile.

  6. Adam says:

    Cool.

    Looks like I’ll be developing on the Linux box after all.

    The biometric system is only for fun – maybe something cool can come out of it. In the end it will be used for a song ID system – very basic, but interesting.

    Thanks for the hard work!

  7. Raju KVG says:

    I got the following error when i try running the following command

    :~$ bzr branch lp:chromaprint
    You have not informed bzr of your Launchpad ID, and you must do this to
    write to Launchpad or access private data. See “bzr help launchpad-login”.
    bzr: ERROR: Unknown repository format: ‘Bazaar repository format 2a (needs bzr 1.16 or later)\n’

  8. You need a more recent version of Bazaar to download the development source code.

  9. Raju KVG says:

    Was able to solve the issue. My bzr version was old, once i updated. It started working fine

  10. Raju KVG says:

    Hi Lukas,

    Still having issues

    root@raju-ol:/home/raju/chromaprint# cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TOOLS=ON .
    – Using FFTW3 for FFT calculations
    – TagLib version too old: version searched :1.6, found 1.5
    CMake Error at cmake/modules/FindTaglib.cmake:132 (message):
    Could not find Taglib
    Call Stack (most recent call first):
    CMakeLists.txt:95 (find_package)

    – Configuring incomplete, errors occurred!

    ————————————————–

    Downloaded latest version of taglib from this site. And installed it. Still it gives the same issue

    http://ktown.kde.org/~wheeler/taglib.html

    ./configure -> make -> make install

    I do not see any errors while installing the taglib. Not sure how to correct it now.

  11. Raju KVG says:

    Hi Lukas,

    looks like i did not install the taglib in the right path. Installed in /usr. Now it worked fine.

    Now i would like to understand how to use your tool. I have placed bunch of mp3 songs in my library, and when i run the fpcollect it gives me a message saying below (Can I only fp on mpeg songs? not on mp3 songs? or any other format?)

    TagLib: MPEG::Header::parse() — First byte did not match MPEG synch.

    Regards,
    Raju

  12. Depends on what do you want to use it for. The fpcollect program scans the specified directory for audio files, and if it finds some file with MusicBrainz track ID embedded in tags, it will calculate fingerprint for it print the information to the console. This was useful to contribute fingerprints without having to use the web service, but now I’d just recommend to use the GUI tool for that. You can use the fpsubmit.py script to submit the log generated by fpcollect to Acoustid though.

    It can process almost any audio format. You can ignore the messages you see, they are warnings from TagLib saying that it failed to parse some part of the MP3 file.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>