Archive for October, 2009

TagLib 1.6.1 has been released. It’s a minor bug-fix release. Main changes are content-based detection of .oga files, saving Vorbis Comments to Ogg FLAC files and support for cover art in MP4 files. Tarball is available here for now, later also on Scott’s page, and updated API docs are here.

Detailed changelog:

  • Better detection of the audio codec of .oga files in FileRef.  (Bug #178602)
  • Fixed saving of Vorbis comments to Ogg FLAC files. TagLib tried to include the Vorbis framing bit, which is only correct for Ogg Vorbis. (Max bug #445970)
  • Public symbols now have explicitly set visibility to “default” on GCC.
  • Added missing exports for static ID3v1 functions.
  • Fixed a typo in taglib_c.pc
  • Fixed a failing test on ppc64.
  • Support for binary ‘covr’ atom in MP4 files. TagLib 1.6 treated them as text atoms, which corrupted them in some cases.
  • Fixed ID3v1-style genre to string conversion in MP4 files. (Bug #198238)

Many people dislike the directory-per-branch concept that Bazaar uses. What they don’t realize though, is that this doesn’t mean you need to have a working tree for each branch. You can very easily simulate cheap Git-style branches, but with some added flexibility. Checkouts are a fairly well known feature of Bazaar, but people mostly associate it with the centralized workflow (i.e. checking out remote branches). This is not the only use case for them.

When I work on larger projects, where I need multiple branches, I usually have a directory structure like this:

  • “project”
    • “branches”
      • “branchA”
      • “branchB”
      • “trunk”
    • “work”

In this example, “project” is a shared repository. It contains revisions for project’s branches on a single place. The repository is created with the --no-trees option, so that working trees are not automatically for new branches. All the branches I need to work with are located in “project/branches/XXX”. Thanks to the DAG model, they represent nothing more than pointers to the “head” revision in the repository, so they are pretty cheap to create.

My development happens in “project/work”, which is a lightweight checkout to one of the branches. This means that it doesn’t contain anything else but information about the state of the working tree and a pointer to the branch. For any operation, Bazaar will use the branch it points to instead.

I’ll use QBzr as an example how to set this up:

% bzr init-repo --no-trees qbzr
Shared repository (format: 2a)
Location:
shared repository: qbzr
% cd qbzr
% mkdir branches
% bzr branch lp:qbzr branches/trunk
Branched 1032 revision(s).
% bzr branch lp:qbzr/0.14 branches/0.14
Branches 969 revisions(s).
% bzr co --lightweight branches/trunk work
% cd work

After doing this, I can work commit/pull/push in the “work” directory as if I was in the “trunk” branch. Nothing exciting. Let’s say I want to fix a bug in the “0.14″ branch:

% bzr switch ../branches/0.14

Now I can work as if I was in the “0.14″ branch. So I do some changes, commit them, do some more changes and realize that these should actually go to a new feature branch. So I’ll not commit them and create the new branch (I use this very often, so I have branch --switch aliased to sbranch):

% bzr branch --switch ../branches/0.14 ../branches/new-feature

At this point the “work” directory points to the the “new-feature” branch and the uncommitted changes are still there. So I can commit them, do some more work, merge from other branches, etc. While working on something, I might want to run code from two branches at the same time for comparison. This is where Git doesn’t help you, because you can have only one working tree at a time (unless you make a new clone of the repository). But with this layout in Bazaar, nothing says I can only have one checkout in the repository. I can actually have a checkout of one of the branches anywhere on the disk. So I do this:

% cd ..
% bzr co --lightweight ../branches/trunk tmp

And now I can run both versions from “work” and “tmp” side-by-side. After I’m done, I simply delete the “tmp” directory.

I’m writing this mostly because I’m surprised how little people know about it and I personally find it a very nice way to work in Bazaar.

(This post is mostly for myself, because I know I’ll forget the exact syntax next I need it)

I’ve recently discovered that instead of ssh to one machine and then ssh to another machine in the local network, I can configure ssh to start the proxy connection automatically.  For example:

Host example-local-10
    HostName 192.168.0.10
    ProxyCommand ssh example.com nc %h %p 2> /dev/null

Especially useful when working with various tools that just use the ssh transport protocol, like scp or rsync.

While writing an SQL script to upgrade the MusicBrainz database for the last release, I needed a way to generate new UUIDs from SQL. PostgreSQL has a native UUID data type and a contrib module for generating UUIDs since version 8.3, but this wouldn’t help me, because I needed it to work with at least version 8.1. I had this idea to write PL/pgSQL functions to generate UUIDs, so I skimmer over the RFC 4122 that documents them and found out that it isn’t actually that hard.

MusicBrainz uses random-based UUIDs (version 4) for all it’s new IDs, so the first idea was to implement the same. I know I can’t use this code in the end, because I need a good pseudo-random number generator, but I couldn’t resist to write it anyway. Messing with bits in high-level languages is always fun :) Here is the result (because of the use of the random() function, don’t use the code for anything serious):

CREATE OR REPLACE FUNCTION generate_uuid_v4() RETURNS uuid
    AS $$
DECLARE
    value VARCHAR(36);
BEGIN
    value =          lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || '-';
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || '-';
    value = value || lpad((to_hex((ceil(random() * 255)::int & 15) | 64)), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || '-';
    value = value || lpad((to_hex((ceil(random() * 255)::int & 63) | 128)), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || '-';
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    RETURN value::uuid;
END;
$$ LANGUAGE 'plpgsql';

It turned out that we need deterministic IDs to be generated from the script, so V4 was out of question. That was good, because we would need a better PNRG for the final version.

The next idea was to create the URL on which the new rows will be server and generate name-based UUIDs using the URL namespace. The idea is to concatenate a namespace and a name, calculate a cryptographic hash of the result, and use it’s bits to generate the UUID.  There are two options for hashing, either MD5 (version 3) or SHA-1 (version 5). SHA-1 is preferred by the RFC, but PostgreSQL only has a built-in function for MD5, so the decision for us was easy. The code doesn’t depend any random numbers, so it’s good enough to use in production.

CREATE OR REPLACE FUNCTION from_hex(t text) RETURNS integer
    AS $$
DECLARE
    r RECORD;
BEGIN
    FOR r IN EXECUTE 'SELECT x'''||t||'''::integer AS hex' LOOP
        RETURN r.hex;
    END LOOP;
END
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION generate_uuid_v3(namespace varchar, name varchar) RETURNS uuid
    AS $$
DECLARE
    value varchar(36);
    bytes varchar;
BEGIN
    bytes = md5(decode(namespace, 'hex') || decode(name, 'escape'));
    value = substr(bytes, 1+0, 8);
    value = value || '-';
    value = value || substr(bytes, 1+2*4, 4);
    value = value || '-';
    value = value || lpad(to_hex((from_hex(substr(bytes, 1+2*6, 2)) & 15) | 48), 2, '0');
    value = value || substr(bytes, 1+2*7, 2);
    value = value || '-';
    value = value || lpad(to_hex((from_hex(substr(bytes, 1+2*8, 2)) & 63) | 128), 2, '0');
    value = value || substr(bytes, 1+2*9, 2);
    value = value || '-';
    value = value || substr(bytes, 1+2*10, 12);
    return value::uuid;
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE STRICT;

This code should be easy enough to modify to generate UUIDv5, if you have a way to calculate SHA-1 hashes. To use the function, you need to pass it a namespace and a name. The namespace itself is a UUID, it can be anything, but there are a few well-known options:

  • URL
    '6ba7b8119dad11d180b400c04fd430c8'
  • DNS (fully-qualified domain name)
    '6ba7b8109dad11d180b400c04fd430c8'
  • ISO OID
    '6ba7b8129dad11d180b400c04fd430c8'
  • X.500 DN (in DER or a text output format)
    '6ba7b814-9dad-11d1-80b4-00c04fd430c8'

The URL one is probably the most useful. So, to generate UUIDv3 for http://www.example.com/foo/1, you can use the following:

SELECT generate_uuid_v3('6ba7b8119dad11d180b400c04fd430c8', 'http://www.example.com/foo/1');

One more attempt to have a blog… :)