Hyphenation on the web

I spent a while this week looking into how best to automatically hyphenate text on websites, to improve Today’s Guardian. I couldn’t find anything recent that summarised the options, so here’s a quick run-down of what I discovered. If you know any more, or any better, please do correct me.

Hyphenation with CSS

Ideally, we’d want to enable hyphenation at the CSS level, on some unprepared text, and have it hyphenate if needed. We’re not there yet.

CSS3 features the hyphens property but I don’t think it’s supported much yet, even using the -moz-hyphens and -webkit-hyphens properties for Mozilla- and Webkit-based browsers.

Eric Meyer has a test page and in an ideal world paragraph #p03 would feature some hyphens at the ends of the lines. This doesn’t work for me in Safari 5.05, Chrome 12.0.742.91, or Firefox 4.0, even in a version of the page with the browser-specific properties set.

But Safari on an iPad with iOS 4.2 does hyphenate paragraph #p03 when -webkit-hyphens is applied. And it looks like Firefox Nightly and Webkit Nightly gained support for -moz-hyphens and -webkit-hyphens recently.

I can taste the fut-ure.

However, hyphenation is complicated, and badly-hyphenated text is more annoying to read than non-hyphenated. The technology needs to understand where to put hyphens and there are algorithms to help them do this. The rules are also, of course, different for every language. Germans, with their lengthy compound words are particularly interested in hyphenation.

In CSS3, there is a hyphenate-resource property which allows you to specify a dictionary containing words and their possible hyphenation points. However, I couldn’t find out much about this property: how to use it, when to use it, if it was necessary (will browsers have dictionaries built in?), where to get dictionaries from, what format they should be in, etc.

Hyphenation at the back end

If you really, really want hyphenation, and can’t wait for browsers to catch up, you’ll need to prepare your text by inserting all the possible hyphenation points in the form of soft-hyphens (the HTML entity ­ or the U+00AD Unicode character). These are normally invisible to the reader but tell the browser where to break a word with a hyphen if necessary.

You could do this on the back end and serve up prepared text. I didn’t look into this very deeply, so if you know of a great solution, let me know. I did find these two projects for Python:

  • PyHyphen is “a wrapper … around the Open Source C library ‘libhyphen’”, used in OpenOffice and Mozilla.

  • Python-hyphenatore is a “pure Python” module.

Both of those require hyphenation dictionaries to provide their rules. Apparently OpenOffice supply some. This page of old dictionaries is as far as I got before hitting broken links, and I’ve no idea if they’re the right kind of thing.

Hyphenation with JavaScript

Alternatively, you could serve up standard text and process it at the front end with JavaScript. There are a couple of ways to do this.

Hyphenator seems to be the oldest, looks fairly comprehensive and also comes with plenty of hyphenation dictionaries. Here’s their example page. When I tried using it I got a couple of errors thrown, although the example seems OK, so it’s probably just me.

If you use jQuery the Hyphenator plugin provides a jQuery interface for using Hyphenator itself.

The second distinct JavaScript option is Hypher which the author claims is faster than Hyphenator. You can use the dictionaries supplied by Hyphenator, and it will return an array of word fragments you can join with soft hyphens. It gets more complicated to use it on text containing HTML, and I haven’t got round to trying that yet.

Conclusion

Personally, I’ll probably end up just using CSS3 and waiting for browsers to catch up. The author of the JavaScript Hyphenator has described it as an interim solution until hyphenation can be done with CSS alone. Unless you’re doing something with very narrow columns, and hyphenation will make a huge difference, I’d wait for CSS3. But then I’m very patient.

Comments

  • As Bram Stein points out in this thread--

    typophile.com/node/712…

    --there's a potential halfway house in simply implementing the Knuth & Plass line-breaking algorithm for justification, which is likely to improve text layout even without Frank Liang's hyphenation algorithm on top, which requires dictionaries to function correctly. The TeX algorithms need to be adapted to cope with dynamic layouts, but they're a fantastic starting point for browser-makers or coders; however, given that they're not even implemented in dedicated e-readers, I suspect the wait may test even your patience.

  • Personally, I'd be asking languatehat for the best sites to look at this. (His real name is Steve.)

    Glyn

  • Personally, I'd be asking the regular on your Pepys Diary site www.languagehat for the best sites to look at this www.languagehat.com/

  • Just to give some more information.
    -moz-hyphens is supported since Firefox 6. The list of language that can be hyphenated is growing with each version right now, and the complete and up-to-date list may be found at the end of this page: developer.mozilla.org/…

    -webkit-hyphens is supported in Safari 5.1, but I don't know which languages are hyphenated by Safari. Nor do I know if Chrome supports it too.

Commenting is disabled on posts once they’re 30 days old.

10 Jun 2011 at Twitter

  • 10:20pm: @janetgyford Click 'Profile' at the top, then there's a link for 'Favorites'. It makes no sense!
  • 10:16pm: @paulpod Infuriating isn't it. I've lost count of how many times I've tweaked the same bit of WordPress CSS and then it's overwritten later.
  • 10:14pm: Watching the Autotune Channel. Or "4Music".
  • 5:01pm: If you’re into such things, I just wrote about hyphenating text on the web, because I couldn’t find a decent summary: http://bit.ly/lRxSZz
  • 3:23pm: @revdancatt I probably use the wrong ones, but then I haven’t been doing python long. You sound very good at it.
  • 3:21pm: @revdancatt Nice. I tend toward alphabetical but I bet yours looks prettier.
  • 2:50pm: Google Books’ “Snippet view” should be called “Worthless Result There’s No Point Clicking On Because There’s Nothing Here view”.
  • 11:36am: @antimega Some reviews of Chiswell Street Dining Rooms here: http://www.barbicantalk.com/forum/viewtopic.php?f=2&t=7987
  • 10:58am: @FinalBullet Just recite some lines from ‘Red Dwarf’. It’ll be fine.
  • 9:41am: Start the weekend early with the 12” Disco Fever Mix of De La Soul’s “Saturdays”, from here: http://bit.ly/iwNuYO (Also: TWENTY YEARS OLD!)

10 Jun 2011 in Links