Archive for December, 2009

My faculty has it’s own thesis LaTex style, which makes it very easy to get a decent looking thesis out of LaTeX without too much effort. The problem is that the style requires you to use ISO-8859-2 in your document, which is something I can’t really live with. :) Here are instructions how to convert the style to UTF-8. I’m posting them here in hope that it will help some other student of Faculty of Informatics, Masaryk University in the future.

I’m doing this on an Ubuntu laptop, so first I’ll need LaTeX:

sudo apt-get install texlive

Then download fithesis:

wget http://www.fi.muni.cz/~xpavlov/fithesis/install.sh
wget http://www.fi.muni.cz/~xpavlov/fithesis/fithesis-0.2.12.tar.gz
tar -zxvf fithesis-0.2.12.tar.gz
cd fithesis-0.2.12

Convert the sources to UTF-8 and change the package options in the file:

recode latin2..utf8 fithesis.dtx
sed -i 's/latin2/utf8/' fithesis.dtx

And now we can install it:

cd ..
chmod a+x install.sh
./install.sh 0.2.12 /usr/share

At this point you can use the package in your UTF-8 encoded LaTeX document.

Another post mainly for myself, just so I know where to find the information quickly the next time I need it. If you have swap on a LVM volume, these commands can be used to resize it (in this case, increase by 100MB):

swapoff /dev/vg_foo/lv_swap
lvextend -L+100M /dev/vg_foo/lv_swap
mkswap /dev/vg_foo/lv_swap
swapon /dev/vg_foo/lv_swap

That is: disable swapping on the volume, extend it, re-create the swap area and enable swapping again.

I’ve started using OpenID some time ago and I really like it. I was surprised that large companied like Google or Yahoo! are OpenID providers and that made me to try using Google’s OpenID. The first site I logged in to was Stack Overflow, which has nice buttons for major providers, so that was easy. The problem was when I first needed to log in to a site without such buttons. After some searching I’ve found out that the Google OpenID end-point is https://www.google.com/accounts/o8/id, but I had to search for it every single time I needed it. Every time I find it I think it shouldn’t not that hard to remember the URL, but the next time I need it I just can’t remember it.

So I thought about switching to using my own URL as my OpenID, but I didn’t want to run my own provider server. OpenID supports delegation, so normally it would be a matter of adding two lines of HTML code to the header of this blog and I could use Google’s OpenID server with my own URL. The issue is with the way Google handles identities. The main URL https://www.google.com/accounts/o8/id is the same for everyone and Google will generate an unique OpenID for every combination of user and OpenID consumer. This is nice from privacy point of view, but it makes it impossible to use OpenID delegation, because in the delegation code I have to specify my OpenID. I can’t do that if my OpenID is different for every site I log in to.

I ended up with just adding this to the static HTML file I have on http://oxygene.sk/:

<meta http-equiv="X-XRDS-Location" content="https://www.google.com/accounts/o8/id" />

This means that every time I want to log in to an OpenID-enabled site, I can type in http://oxygene.sk/ and it will use the same Google’s OpenID as before. Not exactly what I wanted initially, but it’s better than having to remember the long URL.

Dealing with file names in a cross-platform application is not easy. A question about using a file name in QString to create a new TagLib file came up on the TagLib development mailing list yesterday. The original problem was not related to Unicode, but after fixing one C++ issue, it ended up there. So, what was wrong?

QString represents an Unicode string. That is, an array of Unicode code-points. The issue is that on most UN*X platforms, filesystems are not aware of Unicode. File names are stored as an array of bytes. The filesystems don’t care how are the bytes interpreted, but if applications want to display non-ASCII characters properly, they need to decode the bytes into Unicode. Since the filesystem itself doesn’t know the encoding, it’s necessary to look for the information somewhere else.

The user’s locale is probably the first place to look. If the user uses some encoding for input/output, it’s expectable that they use the same encoding for file names. This doesn’t always have to be the case, so GNOME for example uses a special environment variable named G_FILENAME_ENCODING. The problem is that all these solutions work globally for all filesystems. What if the main filesystem uses UTF-8 for everything, but the media player on which I sometimes upload files from Windows uses a different encoding? There is no way to tell applications that it should use CP-1250 for /media/disk-1 and UTF-8 for everything else.

That’s not everything though. Seeing broken characters is not nice, but not a blocking problem either. What if the application can’t even read or write such files? That’s a much larger issue. If the application is using Unicode to store file names, but it can’t properly decode/decode the name, it won’t be able to access the file. The obvious solution is to ignore Unicode and just use byte arrays. This would work fine on UN*X, but new problems will show up if you are trying to write a cross-platform application. To be able to access all files on Windows, you have to do the exact opposite. You have to work with Unicode. On Mac you also have to work with Unicode, but it’s even more interesting, because the filesystem will do Unicode normalization for you. There is no solution that works in all cases on all platforms.

To summarize the situation:

  • File names on UN*X are byte arrays. You don’t know their encoding, you can only guess. It’s safest to not treat them as Unicode. If you want to treat them as Unicode, use functions like QFile::decodeName() or g_filename_from_utf8() to do the guessing for you.
  • File names on Windows are in Unicode. You can work with them using UTF-16.
  • File names on Mac are in normalized Unicode. You can work with them using UTF-8, but you can’t just save any Unicode. The filesystem will normalize it to NFD for you.

It’s sad to say, but I think this is one area where Windows is the nicest platform to deal with.

When I have some free time and I’m bored, I try to help people at Stack Overflow.  Recently the owners of Stack Overflow launched a site where you can post your CV, which are linked to your Stack Overflow account, and companies can search them. Nice idea. But the business model behind it makes it horrible. This blog post by Joel Spolsky actually made me write this rant. Stack Overflow is obviously doing very good at getting money from ads. People answering questions over there actually make them money, as they increase the value of the site. (They still display ads even to those people, which is something I also don’t get, but with Adblock Plus, I don’t care.) The thing is that they charge job seekers for having their CV searchable within the site. The official reason for that is that they want to ensure that everybody who has their CV listed there is actively looking for a job. If that’s so, why are they raising the price from $29 to $99? $29 should do just as well for filtering the people who post their CV “just because they can”. I have real trouble imagining any competent programmer (who actively contributes to Stack Overflow, therefore makes sure they get their ad revenue) would want to pay to get his CV listed on the site. It’s not about the money though, it’s about the principle. I wouldn’t pay for such a service, just like I wouldn’t send my CV to a recruitment agency.