Chromaprint 1.3.1 released

A new version of Chromaprint has been released.

Changes since version 1.3:

  • Fixed fpcalc -length to actually restrict fingerprints the requested length.
  • Fixed SONAME version for the shared library.

Download:

Chromaprint 1.3 released

A new version of Chromaprint has been released. This is another small release, there are no changes to the core functionality.

Changes since version 1.2:

  • The binary packages have been built with FFmpeg 2.8.6, adding support for DSF files
  • You can use use fpcalc -length 0 to get the full fingerprint
  • New function chromaprint_get_fingerprint_hash for calculating SimHash from the fingerprint data
  • Added info section to the fpcalc executable on Mac OS X
  • Generate .pc (pkg-config) file on Mac OS X when not building a framework
  • Removed use of some long deprecated FFmpeg APIs
  • Some smaller bug fixes

Download:

Let's Encrypt and Nginx

I'm late to the game, but I finally gave Let's Encrypt a try and I love it. The biggest advantage is the fact that SSL certificates can be completely automated. No more remembering how to renew certificates once a year.

These are mostly just notes for my future use, but maybe it will be useful for somebody. This is how I use Let's Encrypt with Nginx.

Install the letsencrypt client:

cd /opt
git clone https://github.com/letsencrypt/letsencrypt
VENV_PATH=/opt/letsencrypt/env/ /opt/letsencrypt/letsencrypt-auto plugins

Create a directory for the client to use for authorization:

mkdir -p /srv/www/letsencrypt

Then I put this into my nginx site config:

vim /etc/nginx/sites-enabled/example.com
location /.well-known/acme-challenge {
    root /srv/www/letsencrypt;
}
service nginx reload

That allows the letsencrypt client to manage authorization files for my domain. And now I can generate the first certificate:

/opt/letsencrypt/env/bin/letsencrypt certonly --webroot -w /srv/www/letsencrypt/ -d example.com,www.example.com

Hopefully, that should generate a certificate and I can put them into the HTTPS section of my nginx config:

vim /etc/nginx/sites-enabled/example.com
ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
service nginx reload

And for the main benefit, I can now set up a cron job like this, that will make sure my certificates stay up to date:

10 20 * * * /opt/letsencrypt/env/bin/letsencrypt-renewer >/dev/null && service nginx reload >/dev/null

Five years of AcoustID

It's hard to tell the exact date when the AcoustID project started, but if we go by the first entry in the database, it was October 8, 2010. That means project turned five this week! I thought it's a good opportunity to gather some statistics from those five years.

Back in 2010, we were starting from scratch. We had an empty database, while the solution that AcoustID was replacing (MusicDNS/PUID) had fingerprints for 4.4 million MusicBrainz recordings (34% of all MusicBrainz recordings at that time). It took about two years to catch up with that number. Today, AcoustID can identify 8.3 million MusicBrainz recordings, which is 54% of all recordings in the MusicBrainz database. So about twice the size and the fingerprint database is growing faster than MusicBrainz itself, which means eventually it might be able to identify the most of MusicBrainz recordings.

Since early 2011, we also started accepting fingerprints without links to the MusicBrainz database and the number of those has grown even faster, so only a small part of the AcoustID fingerprint database is actually linked to MusicBrainz now. The total number of unique fingerprints ("AcoustIDs") in the database is currently 25.5 million.

Here you can see the numbers on a timeline:

Traffic has naturally grown during the five years as well, but similarly to the database size, the growth is mostly linear. This because of the focus on full audio file tagging and integration with MusicBrainz, which means AcoustID only ends up being used in specialized applications.

Unfortunatelly, the first version released 2010 was pretty minimalistic and did not include request statistics, so we only have these numbers starting from August, 2011.

MusicBrainz Picard is the biggest source of users, which is not surprising, because AcoustID has been created for MusicBrainz Picard. But there are other free applications that use AcoustID -- beets, MusicBee, FileBot, VLC, Clementine, puddletag Kid3, Quod Libet and many many other smaller applications. There are also a few commercial applications that use AcoustID. The number of applications using the service every month is now above 100 and still growing.

It's quite easy to use AcoustID from about any programming language now. Chromaprint fingerprints can be generated from Python, Ruby, Rust, Go, JavaScript and I'm probably missing a few. There are wrappers for C# and Java, but those are always developed directly inside the apps that use them. There is direct support for generating Chromaprint fingerprints in GStreamer and recently also FFmpeg. And there are also alternative implementations of the Chromaprint algorithm in C# (1, 2) and JavaScript.

I have not been working on AcoustID very actively lately and I know that there are some things that need to be done, but I'm still happy that the project is able to run pretty much on its own with very little support, that the architecture designed five years ago is still capable of handling today's traffic and I'm not worried that it won't be able to handle the traffic five years from now.

Happy birthday, AcoustID!

Phoenix database adapter for Python

This is a small project I have been working on for a few weeks now. Mainly to get familiar with the Phoenix database, but also to just try something different from what I do at work or my existing open source projects.

Phoenix is an SQL engine build on top of HBase and as it is typical in the Apache ecosystem, all the existing tools expect you to use Java. Using a non-JVM language pretty much means you are a second-class citizen and Phoenix is no an exception.

Fortunately, Phoenix does have a HTTP-based query server since version 4.4 and this server could be used to access the database from another language. At the time I was looking at it, there were no non-Java client libraries, so I wanted to see how hard it would be to write one in Python.

So, after some digging through the source code and experimenting I was able to talk to the server, and after more digging through the source code, testing, reporting and fixing bugs, I can now release the first version of the Python package which makes it possible to do this:

import phoenixdb

database_url = 'http://localhost:8765/'
conn = phoenixdb.connect(database_url, autocommit=True)

cursor = conn.cursor()
cursor.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, username VARCHAR)")
cursor.execute("UPSERT INTO users VALUES (?, ?)", (1, 'admin'))
cursor.execute("SELECT * FROM users")
print cursor.fetchall()

You can install it from PyPI using pip or easy_install (preferably into a virtualenv):

pip install phoenixdb

Please see the documentation or check the source code for more details on how to use it.

To experiment with the code, you will also need HBase with the Phoenix query server running somewhere. This is not completely trivial to setup, but I have a Vagrant-based environment for testing this and it's mentioned in the documentation.

Of course, nothing is perfect, and the query server is a pretty recent addition to Phoenix so there are problems and with latest released version, you are pretty much restricted to the most basic data types for numbers and text. Additionally, the remote protocol is still in development, so the library will need to keep its releases synchronized with Phoenix releases. But it was a nice experiment and I'm quite happy with how far did I managed to get.

One of my motivations for this was to try if Python could be realistically used to work with a large Phoenix database. Maybe not as the primary way to talk to the database, because the server will probably always be just a second-class citizen, but it could be useful for quick scripts where reliabiliy or performance are not that important.

← Older posts