I spent a while this week looking into how best to automatically hyphenate text on websites, to improve Today’s Guardian. I couldn’t find anything recent that summarised the options, so here’s a quick run-down of what I discovered. If you know any more, or any better, please do correct me.
Hyphenation with CSS
Ideally, we’d want to enable hyphenation at the CSS level, on some unprepared text, and have it hyphenate if needed. We’re not there yet.
CSS3 features the
hyphens property but I don’t think it’s supported much yet, even using the
-webkit-hyphens properties for Mozilla- and Webkit-based browsers.
Eric Meyer has a test page and in an ideal world paragraph
#p03 would feature some hyphens at the ends of the lines. This doesn’t work for me in Safari 5.05, Chrome 12.0.742.91, or Firefox 4.0, even in a version of the page with the browser-specific properties set.
But Safari on an iPad with iOS 4.2 does hyphenate paragraph
-webkit-hyphens is applied. And it looks like Firefox Nightly and Webkit Nightly gained support for
I can taste the fut-ure.
However, hyphenation is complicated, and badly-hyphenated text is more annoying to read than non-hyphenated. The technology needs to understand where to put hyphens and there are algorithms to help them do this. The rules are also, of course, different for every language. Germans, with their lengthy compound words are particularly interested in hyphenation.
In CSS3, there is a
hyphenate-resource property which allows you to specify a dictionary containing words and their possible hyphenation points. However, I couldn’t find out much about this property: how to use it, when to use it, if it was necessary (will browsers have dictionaries built in?), where to get dictionaries from, what format they should be in, etc.
Hyphenation at the back end
If you really, really want hyphenation, and can’t wait for browsers to catch up, you’ll need to prepare your text by inserting all the possible hyphenation points in the form of soft-hyphens (the HTML entity
­ or the U+00AD Unicode character). These are normally invisible to the reader but tell the browser where to break a word with a hyphen if necessary.
You could do this on the back end and serve up prepared text. I didn’t look into this very deeply, so if you know of a great solution, let me know. I did find these two projects for Python:
PyHyphen is “a wrapper … around the Open Source C library ‘libhyphen’”, used in OpenOffice and Mozilla.
Python-hyphenatore is a “pure Python” module.
Both of those require hyphenation dictionaries to provide their rules. Apparently OpenOffice supply some. This page of old dictionaries is as far as I got before hitting broken links, and I’ve no idea if they’re the right kind of thing.
Hyphenator seems to be the oldest, looks fairly comprehensive and also comes with plenty of hyphenation dictionaries. Here’s their example page. When I tried using it I got a couple of errors thrown, although the example seems OK, so it’s probably just me.
If you use jQuery the Hyphenator plugin provides a jQuery interface for using Hyphenator itself.