I spent a while this week looking into how best to automatically hyphenate text on websites, to improve Today’s Guardian. I couldn’t find anything recent that summarised the options, so here’s a quick run-down of what I discovered. If you know any more, or any better, please do correct me.
Hyphenation with CSS
Ideally, we’d want to enable hyphenation at the CSS level, on some unprepared text, and have it hyphenate if needed. We’re not there yet.
CSS3 features the
hyphens property but I don’t think it’s supported much yet, even using the
-webkit-hyphens properties for Mozilla- and Webkit-based browsers.
Eric Meyer has a test page and in an ideal world paragraph
#p03 would feature some hyphens at the ends of the lines. This doesn’t work for me in Safari 5.05, Chrome 12.0.742.91, or Firefox 4.0, even in a version of the page with the browser-specific properties set.
But Safari on an iPad with iOS 4.2 does hyphenate paragraph
-webkit-hyphens is applied. And it looks like Firefox Nightly and Webkit Nightly gained support for
I can taste the fut-ure.
However, hyphenation is complicated, and badly-hyphenated text is more annoying to read than non-hyphenated. The technology needs to understand where to put hyphens and there are algorithms to help them do this. The rules are also, of course, different for every language. Germans, with their lengthy compound words are particularly interested in hyphenation.
In CSS3, there is a
hyphenate-resource property which allows you to specify a dictionary containing words and their possible hyphenation points. However, I couldn’t find out much about this property: how to use it, when to use it, if it was necessary (will browsers have dictionaries built in?), where to get dictionaries from, what format they should be in, etc.
Hyphenation at the back end
If you really, really want hyphenation, and can’t wait for browsers to catch up, you’ll need to prepare your text by inserting all the possible hyphenation points in the form of soft-hyphens (the HTML entity
­ or the U+00AD Unicode character). These are normally invisible to the reader but tell the browser where to break a word with a hyphen if necessary.
You could do this on the back end and serve up prepared text. I didn’t look into this very deeply, so if you know of a great solution, let me know. I did find these two projects for Python:
PyHyphen is “a wrapper … around the Open Source C library ‘libhyphen’”, used in OpenOffice and Mozilla.
Python-hyphenatore is a “pure Python” module.
Both of those require hyphenation dictionaries to provide their rules. Apparently OpenOffice supply some. This page of old dictionaries is as far as I got before hitting broken links, and I’ve no idea if they’re the right kind of thing.
Hyphenator seems to be the oldest, looks fairly comprehensive and also comes with plenty of hyphenation dictionaries. Here’s their example page. When I tried using it I got a couple of errors thrown, although the example seems OK, so it’s probably just me.
If you use jQuery the Hyphenator plugin provides a jQuery interface for using Hyphenator itself.
Commenting is disabled on posts once they’re 30 days old.
nick s at 10 Jun 2011, 6:53pm. Permalink
As Bram Stein points out in this thread--
--there's a potential halfway house in simply implementing the Knuth & Plass line-breaking algorithm for justification, which is likely to improve text layout even without Frank Liang's hyphenation algorithm on top, which requires dictionaries to function correctly. The TeX algorithms need to be adapted to cope with dynamic layouts, but they're a fantastic starting point for browser-makers or coders; however, given that they're not even implemented in dedicated e-readers, I suspect the wait may test even your patience.
Glyn at 13 Jun 2011, 9:44pm. Permalink
Personally, I'd be asking languatehat for the best sites to look at this. (His real name is Steve.)
Glyn at 13 Jun 2011, 9:45pm. Permalink
Personally, I'd be asking the regular on your Pepys Diary site www.languagehat for the best sites to look at this www.languagehat.com/
Teoli at 7 Sep 2011, 11:14am. Permalink
Just to give some more information.
-moz-hyphens is supported since Firefox 6. The list of language that can be hyphenated is growing with each version right now, and the complete and up-to-date list may be found at the end of this page: developer.mozilla.org/…
-webkit-hyphens is supported in Safari 5.1, but I don't know which languages are hyphenated by Safari. Nor do I know if Chrome supports it too.