After several months of reading research papers, learning and weekend coding, I’m very happy to make the half-finished code of my audio fingerprinting library public. :) I’m doing this mostly for selfish reasons, because it will force me to stop thinking in “hacker mode” and hopefully properly finish it, and I also hope to get some help and feedback from other people. There is nothing for regular users yet though, just for developers or people not afraid of the command line.

It all started in February this year, when when I got my Google Alerts mail for “musicbrainz picard” which included a link to this paper (“Waveprint: Efficient wavelet-based audio fingerprinting” (2007) by Shumeet Baluja and Michele Covell). I’ve never paid much attention to how audio identification systems are actually implemented, but I found it interesting that a paper published by Google researchers cited Picard, especially because Picard doesn’t implement any fingerprinting algorithm, just uses libofa. Anyway, I’ve read the paper and realized that maybe it’s not that hard to implement such a system. No tough DSP stuff or scary mathematics (of course there is some DSP stuff and mathematics, but mostly basics). The system described in the paper seemed quite straight-forward to implement, so I become curious and decided to give it a try. Later on I realized that it’s perhaps not the best system and that even the authors published new papers describing different approaches. That, combined with the fact that I was still officially an university student and had free access to all the papers from most organizations like ACM or IEEE, caused that I started reading more and more papers on the topic, learning about the history, how the systems evolved, and so on.

Many ideas were based on a paper by Yan Ke, Derek Hoiem, and Rahul Sukthankar called “Computer Vision for Music Identification” (2005). In fact, even the Last.fm fingerprinter uses the code published by the authors of this paper. This is where I learned that audio identification is more about machine learning that it is about DSP. Many useful methods for extracting interesting features from audio streams are well-known and the problem is more about how to apply and index them the best way. The basic idea here is to treat audio as a spectral image and index the content of the image. I’ll explain this in more detail and how Chromaprint uses this in a following post.

Another important paper for me was “Pairwise Boosted Audio Fingerprint” (2009) by Dalwon Jang, Chang D. Yoo, Sunil Lee, Sungwoong Kim and Ton Kalker (Ton Kalker is a co-author of a historically important paper “Robust Audio Hashing for Content Identification” (2001) published by Philips Research), which combined previous experiments of the authors with audio identification based on spectral centroid features and the indexing approach similar to the one suggested by Y. Ke, D. Hoiem and R. Sukthankar. For a long time this was the best solution I had and since it was actually not very hard to implement, the most time I spent on tweaking the configuration to get the best results.

The last major change came after I learned about “chroma” features by reading the “Efficient Index-Based Audio Matching” (2008) by Frank Kurth and Meinard Müller. I’ve read more papers about chroma features later, but this was the first and also the most important one for me and some ideas about processing the feature vectors from it are implemented in Chromaprint. Chroma features are typically used for music identification, as opposed to audio file identification, but I tried to use them with the approach I already had implemented and it nicely improved the quality of the fingerprinting function and actually reduced complexity which allowed me to use much larger training data sets.

Anyway, this is more or less how I got to this point. As I mentioned, I’ll try to describe in more detail how Chromaprint works and where are the exact ideas from in an another post later. The code is not finished yet, but the core ideas are already implemented and tested. The work that has to be done is mostly about cleaning the code, tweaking the configuration, running the learning algorithm on a better training data set (as I used only random selections of my music collection so far) and building some API that can be used by external applications.

The code is written in C++, but I plan the public API to be in plain C. Except for a FFT library (either FFTW3 or FFmpeg), it has no external dependencies. It’s released under the LGPL 2.1 license, so there should be no problem integrating it into a commercial application, assuming FFmpeg is used for FFT calculations (using FFTW3 would require the binary to be GPL compatible). The project is hosted on Launchpad using Bazaar for development. I’m sorry I didn’t include the complete development history there, but it’s just full of junk commits, so you will not miss much. :)

What I’d really like is to start actually working also on the fingerprint lookup service, for which I need as many fingerprints as possible. I have a proof of concept written in Java, using PostgreSQL and the intarray extension, which allows me to search the fingerprints using GIN indexes. This works fine on a database with tens of thousands of fingerprints, but I’m not sure if it will scale to much higher numbers. If you would like to help and you are running Debian/Ubuntu Linux, please run these commands and email me the compressed fpcollect.log file:

sudo apt-get install bzr cmake libfftw3-dev libavcodec-dev libavformat-dev libtag1-dev libboost-dev
bzr branch lp:chromaprint
cd chromaprint
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TOOLS=ON .
make
./tools/fpcollect /path/to/your/music/library >fpcollect.log

I’m sure the recipe should be easy enough to modify for other Linux distributions. I’ll try to build a Windows binary in the next days.

UPDATE: I’ve compiled a Windows version of fpcollect. If you would like to help, please download it, change the fpcollect.bat file to point to your collection and run it. It should should produce a file called fpcollect.log like the Linux example above.

So that’s all for now. My plans for near future are to clean up the library and build some simple GUI application for collecting fingerprints. Once this is done, I can ask non-programmers to help me build a test database of fingerprints (including MusicBrainz IDs) and work primarily on the server component. There are some things that could be improved on the client library from functional point of view, but I think it’s good enough for now, so the server part seems more important at the moment.

Btw, I always considered “Chromaprint” to be a temporary name for the client library, not meant to be its final name. If you can think of a better name, ideally something that could be used also for the service/server, please let me know!

I wanted to write something like this for a long time, but for some reason never did it. MusicBrainz has support for folksonomy tagging since 2007, but the coverage of track tags is still not very good. I try to keep some tags in the “genre” tag in audio files, but even with one-time import tool, I’m sure I’d not remember to run this on new files. So the idea here is to submit these tags to MusicBrainz as I listen to the files in my music player (Quod Libet). It’s inspired by a Quod Libet plugin called LastFMTagger, which does something similar, but for Last.fm. I had some free time today, so I wrote a plugin that does one-way synchronization of tags from Quod Libet to MusicBrainz. You can install the plugin using the following commands:

mkdir -p ~/.quodlibet/plugins/events/
cd ~/.quodlibet/plugins/events/
wget http://dl.dropbox.com/u/5215054/mbtagsubmit.py

After you enable it and let it know your MusicBrainz username and password (Music → Plugins), it will watch the songs you listen and if any of them has a “musicbrainz_trackid” and at least one “genre” tag, it will use the MusicBrainz web service to submit them. The submission normally happens only every half an hour, but if you change many files in a short time, it will submit them in batches of 20 tracks (the maximum number allowed by the web service) every 2 minutes.

Working with Oracle is always an adventure. The error messages are usually not very helpful, so you have to guess a lot. What I’ve seen today is an extreme though. Oracle allows you to create a table with a column named “TIMESTAMP” if you quote it:

CREATE TABLE "SOME_TABLE" (
    ...
    "TIMESTAMP" TIMESTAMP WITH TIME ZONE
);

Oracle is rather picky on identifier names, but since it accepted “TIMESTAMP”, I was assuming everything is fine. Later I needed to create a trigger for this table and that’s where the fun starts.

CREATE OR REPLACE TRIGGER "SOME_TABLE_TR"
BEFORE INSERT ON "SOME_TABLE"
FOR EACH ROW
BEGIN
    ...
END;

This was failing for some reason though. The only thing I got was this “nice” error message, pointing to the table name in the CREATE TRIGGER statement:

ORA-06552: PL/SQL: Compilation unit analysis terminated
ORA-06553: PLS-320: the declaration of the type of this expression is incomplete or malformed

What type? Do I have a typo somewhere? Did the table somehow get corrupted? You can’t imaging how long did it take for me to figure out that it doesn’t like the column name, which was not mentioned anywhere in the PL/SQL block. I would have no problem if it told me that I can’t use the name. There are too many restrictions on identifiers anyway. What I don’t understand is why does it allow me to create something that’s going to break other core functionality.

I use Dropbox to synchronize my Tomboy notes. This works very well, but there is a problem when setting it up on a new computer. Tomboy has a special “Start Here” note, which is used mainly for organizing other notes. When I tell it to synchronize notes from the Dropbox directory and overwrite the existing default notes, it will do so, but it doesn’t change it’s “Start Here” note pointer. As as result, Tomboy is not aware that my new Start Here note is the one it should use. As far as I know, there is no way fix this using Tomboy itself. It can be done only using GConf. Here is an example how to do it from the command line:

$ grep 'Start Here' -R ~/.local/share/tomboy/ -l
/home/lukas/.local/share/tomboy/Backup/4a47410a-4976-4cb6-8ddc-fd744710dba7.note
/home/lukas/.local/share/tomboy/7d41fff6-6cae-44bc-87b9-6486c809e7ee.note
$ gconftool-2 --set /apps/tomboy/start_note --type string 'note://tomboy/7d41fff6-6cae-44bc-87b9-6486c809e7ee'

TagLib 1.6.3 was released this Monday, but somehow I forgot to post an update here. There isn’t many changes, the main reason for the release were configuration issues with 1.6.2. The 1.6.3 tarball can be downloaded here or here.

Changes log:

  • Fixed definitions of the TAGLIB_WITH_MP4 and TAGLIB_WITH_ASF macros.
  • Fixed upgrading of ID3v2.3 genre frame with ID3v1 code 0 (Blues).
  • New method int String::toInt(bool *ok) which can return whether the conversion to a number was successfull.
  • Fixed parsing of incorrectly written lengths in ID3v2 (affects mainly compressed frames). (Bug #231075)