Writing

Parsing a Wikipedia page’s content with python

A while back I was asking on Twitter and Stack Overflow about how to parse a Wikipedia page’s content using python. It seemed harder than I expected, given the number of Wikimedia-related tools available. Here’s what I ended up doing.

What I wanted to do:

  • Fetch the content of a particular Wikipedia page.
  • Tweak that content (e.g., hide certain elements).
  • Save the resulting HTML.

Given MediaWiki has an API I initially thought the best thing would be to grab structured content using that, remove the elements I didn’t want, then render the rest into nice, clean HTML. This seemed more robust than scraping a rendered page’s HTML, parsing it, removing bits, then saving the remainder. Scraping always feels like a last resort. And there was an API!

But MediaWiki content is much more complicated than I first thought and, following the discussion on my Stack Overflow question, it seemed like turning Wikipedia’s raw wikitext into HTML was going to be more trouble than it was worth.

A small step up from scraping standard Wikipedia pages would be to omit all the stuff surrounding the content, which can be done by appending ?action=render to the URL, e.g. for /Samuel_Pepys. Then it would be a case of parsing the HTML, ensuring it’s sane, and stripping out anything I didn’t want.

The resulting python script (on GitHub, tests) is part of my Pepys’ Diary code, in Django, but is fairly standalone.

The process is:

  1. Fetch the HTML page using requests.

  2. Use bleach to ensure the HTML is valid and, whitelisting only the HTML tags and attributes we want, strip out unwanted elements.

  3. Use BeautifulSoup to further strip out HTML elements based on their CSS class names, and to add extra classes to elements with certain existing classes.

  4. Return the new, improved HTML.

It seems to work alright, resulting in some decent-looking copies of Wikipedia pages.

For completeness, here’s the code at the time of writing, but the GitHub version may be newer:

from bs4 import BeautifulSoup
import bleach
import requests


class WikipediaFetcher(object):

    def fetch(self, page_name):
        """
        Passed a Wikipedia page's URL fragment, like
        'Edward_Montagu,_1st_Earl_of_Sandwich', this will fetch the page's
        main contents, tidy the HTML, strip out any elements we don't want
        and return the final HTML string.

        Returns a dict with two elements:
            'success' is either True or, if we couldn't fetch the page, False.
            'content' is the HTML if success==True, or else an error message.
        """
        result = self._get_html(page_name)

        if result['success']:
            result['content'] = self._tidy_html(result['content'])

        return result

    def _get_html(self, page_name):
        """
        Passed the name of a Wikipedia page (eg, 'Samuel_Pepys'), it fetches
        the HTML content (not the entire HTML page) and returns it.

        Returns a dict with two elements:
            'success' is either True or, if we couldn't fetch the page, False.
            'content' is the HTML if success==True, or else an error message.
        """
        error_message = ''

        url = 'https://en.wikipedia.org/wiki/%s' % page_name

        try:
            response = requests.get(url, params={'action':'render'}, timeout=5)
        except requests.exceptions.ConnectionError as e:
            error_message = "Can't connect to domain."
        except requests.exceptions.Timeout as e:
            error_message = "Connection timed out."
        except requests.exceptions.TooManyRedirects as e:
            error_message = "Too many redirects."

        try:
            response.raise_for_status()
        except requests.exceptions.HTTPError as e:
            # 4xx or 5xx errors:
            error_message = "HTTP Error: %s" % response.status_code
        except NameError:
            if error_message == '':
                error_message = "Something unusual went wrong."

        if error_message:
            return {'success': False, 'content': error_message} 
        else:
            return {'success': True, 'content': response.text}

    def _tidy_html(self, html):
        """
        Passed the raw Wikipedia HTML, this returns valid HTML, with all
        disallowed elements stripped out.
        """
        html = self._bleach_html(html)
        html = self._strip_html(html)
        return html

    def _bleach_html(self, html):
        """
        Ensures we have valid HTML; no unclosed or mis-nested tags.
        Removes any tags and attributes we don't want to let through.
        Doesn't remove the contents of any disallowed tags.

        Pass it an HTML string, it'll return the bleached HTML string.
        """

        # Pretty much most elements, but no forms or audio/video.
        allowed_tags = [
            'a', 'abbr', 'acronym', 'address', 'area', 'article',
            'b', 'blockquote', 'br',
            'caption', 'cite', 'code', 'col', 'colgroup',
            'dd', 'del', 'dfn', 'div', 'dl', 'dt',
            'em',
            'figcaption', 'figure', 'footer',
            'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'header', 'hgroup', 'hr',
            'i', 'img', 'ins',
            'kbd',
            'li',
            'map',
            'nav',
            'ol',
            'p', 'pre',
            'q',
            's', 'samp', 'section', 'small', 'span', 'strong', 'sub', 'sup',
            'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'time', 'tr',
            'ul',
            'var',
        ]

        # These attributes will be removed from any of the allowed tags.
        allowed_attributes = {
            '*':        ['class', 'id'],
            'a':        ['href', 'title'],
            'abbr':     ['title'],
            'acronym':  ['title'],
            'img':      ['alt', 'src', 'srcset'],
            # Ugh. Don't know why this page doesn't use .tright like others
            # http://127.0.0.1:8000/encyclopedia/5040/
            'table':    ['align'],
            'td':       ['colspan', 'rowspan'],
            'th':       ['colspan', 'rowspan', 'scope'],
        }

        return bleach.clean(html, tags=allowed_tags,
                                    attributes=allowed_attributes, strip=True)

    def _strip_html(self, html):
        """
        Takes out any tags, and their contents, that we don't want at all.
        And adds custom classes to existing tags (so we can apply CSS styles
        without having to multiply our CSS).

        Pass it an HTML string, it returns the stripped HTML string.
        """

        # CSS selectors. Strip these and their contents.
        selectors = [
            'div.hatnote',
            'div.navbar.mini', # Will also match div.mini.navbar
            # Bottom of https://en.wikipedia.org/wiki/Charles_II_of_England :
            'div.topicon',
            'a.mw-headline-anchor',
        ]

        # Strip any element that has one of these classes.
        classes = [
            # "This article may be expanded with text translated from..."
            # https://en.wikipedia.org/wiki/Afonso_VI_of_Portugal
            'ambox-notice',
            'magnify',
            # eg audio on https://en.wikipedia.org/wiki/Bagpipes
            'mediaContainer',
            'navbox',
            'noprint',
        ]

        # Any element has a class matching a key, it will have the classes
        # in the value added.
        add_classes = {
            # Give these tables standard Bootstrap styles.
            'infobox':   ['table', 'table-bordered'],
            'ambox':     ['table', 'table-bordered'],
            'wikitable': ['table', 'table-bordered'],
        } 

        soup = BeautifulSoup(html)

        for selector in selectors:
            [tag.decompose() for tag in soup.select(selector)]

        for clss in classes:
            [tag.decompose() for tag in soup.find_all(attrs={'class':clss})]

        for clss, new_classes in add_classes.iteritems():
            for tag in soup.find_all(attrs={'class':clss}):
                tag['class'] = tag.get('class', []) + new_classes

        # Depending on the HTML parser BeautifulSoup used, soup may have
        # surrounding <html><body></body></html> or just <body></body> tags.
        if soup.body:
            soup = soup.body
        elif soup.html:
            soup = soup.html.body

        # Put the content back into a string.
        html = ''.join(str(tag) for tag in soup.contents)

        return html

In Web Development on 25 March 2015. Permalink

Practical Television, February 1952

A while back I bought a copy of Practical Television from February 1952. It’s a fascinating look at a time when TVs were new technology and there were societies of people interested in what we might now call “hacking” with their TVs.

Cover of Practical Television, February 1952There are articles full of circuit diagrams like “From VCR97 to Magnetic-2; Converting a Receiver for Standard Tubes” and “Modifying the AN/APR-4”. There are also lengthy tips on getting the most from your Murphy Receiver and how to make a slotted indoor aerial for “fringe area reception without an elaborate outside array.” (There’s a lot of mention of “fringe areas” as presumably signal coverage was patchy.) It also has plenty of adverts for components and tools — valves, face plates, scanning coils, test meters — but none at all for televisions themselves.

For some context, one report mentions that in nearly three years the number of television licenses (required to own a TV set) has increased from 285,500 to 1,113,900, which is still less than 5% of today’s number. This is around 16 months before the coronation of Queen Elizabeth II which, in Britain, is seen as the moment TV-watching hit the big time. There was only a single TV channel to watch, the BBC, and it only broadcast for a few hours each day.

I’ve uploaded a scan of the magazine to the Internet Archive, and have also pulled out a few of the most interesting snippets below…

From “Televiews”, the introductory editorial page:

Some of the Sunday newspapers have been criticising a BBC producer because he declines to devote programme time to a talk on breathing as an aid to health. Some time ago this same producer was in trouble with the doctors because of his broadcasts on slimming for women. Perhaps this has made him cautious. We, however, support his decision for we do not believe that a talk on correct breathing would have been of interest since it is taught in every school.

From the “Telenews” section:

Interference from Lights

According to a Midlands electrical contractor, Mr. S. Dagnall, interference on television screens over the Christmas period may have been caused by the on-and-off flashing of fairy-lights on Christmas trees.

Suppressors could be fitted, but only at a cost of between 15s. to £1, and as this was not considered worth while, Mr. Dagnall refused to sell the lights.

Cinema’s Ally

Mr. J. Goodlatte, chief of the ABC cinema circuit, believes that the effect of television on box office takings in this country is only slight.

… Mr. Goodlatte praises the everyday housewife and places her second only to the films themselves as the cinema’s greatest ally. After a hard day at home it is she who most wants to go out for her entertainment.

Television Tape Recordings

It was recently announced in Hollywood, California, that a method of recording television images on a magnetic tape is expected to be ready for commercial use in a few months.

This system is the result of two years of research, financed by Bing Crosby.

Viewing in the Clouds

A British firm are planning a receiver for installing in airliners, to provide entertainment for passengers.

The biggest snag would be interference from the engines.

Appeal from the BBC

The BBC recently appealed for better behaviour from patrons at televised outside events where candid cameras take close-ups of crowds or where commentators hold interviews with people surging in the background.

Some viewers have complained about hand-waving or face-pulling from people who should know that they can be seen at home.

From “Underneath the Dipole”, subtitled “Television Pick-ups and Reflections”:

What a magnificent array of talent took part in the TV Christmas Party. Norman Wisdom, recently returned from America, … proved that his particular line of comedy is well suited to TV. When he was in America, he took a trip to Hollywood for a “look around,” and amongst other things was shown a television set by Stan Laurel which gave a choice of twelve programmes — most of them absolutely first-rate in entertainment value.

From “Here and There”:

Sport

An announcement is expected shortly from The Television Sports Advisory Committee concerning efforts that are being made to reach some agreement over the televising of sporting events.

The committee is expected to favour the broadcasting of all forms of sport and, if necessary, to televise only parts of matches, events or meetings should promoters not favour the transmission of whole programmes — that is, the first or second half only of a football match or one period of an ice-hockey game.

Some sporting associations have had no objection in the past to the televising of complete relays — the Lawn Tennis Association and the Rugby Union, for instance — but most promoting bodies feel that though the idea may have advertising qualities, the relaying of a complete afternoon’s sport would affect attendances and cash losses would be the inevitable result.

If only they could see how much sport broadcasting deals bring in today.

And finally:

“American Menace”

In a cable from California, U.S., where he is staying, “Wee Georgie Wood,” considers television in the United States to be a “menace to the country.”

That is the item in full.

In Television on 15 February 2015. Permalink

Temporary archiving

Perma.cc is (another) way of archiving web pages. This time in an “authoritative”-sounding manner. From the front page:

Perma.cc helps scholars, journals and courts create permanent links to the online sources cited in their work.

Perma.cc is powered by libraries because we’re in the forever business. We’re already looking after printed materials. It’s time we did the same for links.

Links become permanent when they are “vested” by someone affiliated with a vesting organization, such as a journal or court.

Which all sounds good. The more archiving the better if you ask me, given how ephemeral everything is around here.

However, if we leave the big text of the home page and visit the small print of the Terms of service we find these clauses:

9. Termination of Service

We reserve the right at any time to modify, suspend or discontinue the Site or Service, in whole or in part, without notice, and shall have no liability for doing so.

10. Disclaimer of Warranties; Limitations of Liability and Remedies

(a) WHILE WE ASPIRE TO PRESERVE LINKS AND ARCHIVAL COPIES OF CONTENT STORED AT THE DIRECTION OF USERS, WE MAKE NO REPRESENTATIONS, WARRANTIES, OR UNDERTAKINGS AS TO PERMANENCE OR THE DURATION OF PRESERVATION. AS INDICATED ELSEWHERE IN THESE TERMS OF USE, WE RESERVE THE RIGHT TO DELETE OR DISABLE ACCESS TO USER SUBMITTED CONTENT, AND TO TERMINATE ALL OR PART OF THE SERVICE AT ANY TIME. YOU ACKNOWLEDGE THAT STORED LINKS MAY FAIL TO WORK.

Brilliant. An archiving service that explicitly states: it might not keep things forever; it might delete some things; or, even, it might close without notice.

This seems to contradict the front page blurb selling the ability to “create permanent links” and boasting “we’re in the forever business”.

I assume those running the service have the very best of intentions and fully intend (or, at least, hope) to be as permanent and “forever” as humanly possible. But if so, why the ugly bundle of caveats hidden behind-the-scenes? They effectively turn the front page puffery into outright lies.

It’s easy to say you’ll archive copies of anything, forever. It’s much harder, and more interesting, to set up the technical, organisational, financial and legal structures to actually do it. Which, as far as we can tell, are the important things that seem to be missing here.

In Misc on 1 February 2015. Permalink

Reading about dancing about architecture

Ages ago I asked on Twitter if anyone could recommend music blogs to read, because I felt a bit out of touch. A few people suggested sites and I meant to summarise the advice. And here we all are.

  • Pitchfork — I’ve heard Pitchfork referred to off-handedly as if it’s too popular to be credible and so, being afflicted with terrible reverse snobbery, I didn’t even read this one. Although I do find their Spotify app handy for ideas of new albums to try.

  • The Quietus — I have no frame of reference for these things. Is this like a less popular Pitchfork? Should I like it? What does it like? I ended up unsubscribing from it as the RSS feed only has brief summaries of each post.

  • Wondering Sound — Again, I’m not quite sure how this differs from the previous two but I quite like it. The RSS feed contains full articles, the design is nice, and I even found myself enjoying some of the writing, which is more than I hoped for. I’m still subscribed.

  • Popjustice — I used to subscribe to an RSS feed from here which was quite fun but I unsubbed because a lot of the posts assumed too much existing knowledge. It was like overhearing someone else’s in-jokes.

  • No Rock and Roll Fun — I’m still subscribed to this one. Fun, brief, a good old blog like they used to be.

(Apologies to the people who recommended these; I can’t remember who suggested which sites now. But thanks.)


It’s tricky though, this. These sites churn out loads of posts and I wasn’t interested enough to click on most of them to read further. I only want to read the posts about musicians I like, or ones I find interesting, or ones that I don’t know yet but might like.

But there’s no way of doing this except by going through everything. Even with a nice RSS reader, and only subscribed to two of those sites above, this feels more of a tedious task than if I was skimming through a paper magazine. My heart sinks as I see 35 new posts and I have to decide which to try reading and which to mark as read. It’s more of a chore than flipping paper pages until something catches my eye.

I think, also, I was hoping to recapture something from when I last regularly read any music press. But that’s like wishing I could grow back the hair I had at the time, and just as unlikely.

When I was 18 and reading Melody Maker in the library my horizons were narrow (or short? or close?). I loosely felt like I was in some kind of club. It was for me. The music news felt precious and rare. I could get interested in, say, Suede’s apparently amazing debut singles even though I’d never even heard them. I knew too much about the Scene That Celebrates Itself. I liked Mr Angry. I’d read overly long interviews with scruffy guitar bands that went nowhere. I would read reviews of albums and singles by people I hadn’t heard of just in case they sounded interesting.

Decades later my horizons are wider (longer? further?). I’m interested in more types of music, but less passionately. There are many, many more ways of reading about music. I’m not part of a particular scene or club. I can, if I want, easily submerge myself in more music news and reviews than I could ever read, and yet none of it feels like it’s just for me now. So maybe it doesn’t matter if I feel out of touch, and no longer know all of the backstories. So long as I can, somehow, keep finding new music, I can just listen to it, rather than read about it.

In Music on 30 January 2015. Permalink

Tech’s tunnel vision

A couple of days ago I linked to this post by Tim Maly which is full of interesting thoughts sparked by attending the XOXO conference. I wrote then: “Makes me want an at least partly explicit socialist / social democratic tech conference.”

Yesterday a friend asked me what I meant by that, so I had to spend longer thinking about it than the few seconds it had taken me to write the sentence.

While I frequently roll my eyes at the more extreme examples of the tech industry’s Randian selfishness and California Ideology thoughtleading, the homogeneity of the mainstream ideas seems just as alarming. Free market capitalism, to one degree or another, is the default setting and it’s hard to imagine alternatives.

In part, these are the times we’re living in (in “the West” at least). In the UK the three main political parties offer variations on a theme rather than drastic alternatives. I’m in my 40s and have no adult memory of a time before the 1980s, and haven’t lived in any countries that offer an even slightly different society (e.g., maybe the Nordic model). I find it really hard to imagine a different kind of economy and society; my brain has, by now, been wired to accept free-market capitalism, with slight variations on the amount of social saftey-net, as the default and only possibility.

The tech industry takes this tunnel vision even further, with its standard economic behaviours being more extreme and showing less variety than in business as a whole. Not all tech companies are funded by venture capital, growing as rapidly as possible, concentrating on growth over profit, and caring little about wider society, but enough of them that this is the default. You can do things differently, but it almost seems peculiar. You need a very good reason.

None of that should be a revelation of course, but I’m trying to explain why thinking of different ways of doing things is so difficult. The default stories are so strong, so ingrained, that even imagining viable alternatives is hard. But there must be alternatives; there always are. And the tech industry loves alternatives! Let’s disrupt!

Trying to imagine a tech conference that would embody an alternative viewpoint — a more “socialist / social democratic” alternative — actually seems like a good way in. I don’t have to imagine a whole new society and economic model, but only try to imagine what kinds of topics might be talked about at a conference along those lines. Some topics I quickly wrote down:

  • Different models for start-ups. Co-operatives. Employee ownership. Normal, slowly-growing, profit-making businesses.

  • Ruricomp — technology for people who don’t live in cities.

  • Technology for people who don’t live in the first world. (There’s a lot of them and they have a lot of technology, but most of us know nothing about it.)

  • What governments can do, should do, and are doing.

  • Websites that make the whole Web better. (To quote Tom Coates (PDF).)

  • New services that work fine on technology that’s been around for years.

  • Innovative ideas for improving genuinely public transportation (rather than private transportation or very expensive “public” transportation).

  • The benefits of unions, and how to start or join one.

  • Services designed for people who have little money.

  • Services designed for people who aren’t fully able.

  • Models for keeping services running over the long-term. (What happens when your company closes, or to your personal projects when you die?)

  • The state of technology and digital services in the NHS.

  • How to treat low-paid workers as humans rather than interchangeable meat robots.

This is a very mixed bag. You may be able to come up with more and better ideas. And I suspect a conference that included some or all of these topics could be utterly unbearable and full of tedious bleating people like me wanting to make the world a better place. I make me sick.

But I’ve realised that I spend a lot of time getting annoyed about things in this industry that annoy me, and I’m worried I increasingly define myself by the things that I don’t believe in. Not all of tech is terrible. There are plenty of decent people doing worthwhile things, whether traditionally “worthy” or not. I need to start noticing the things and ideas I do believe in, that I want to emulate, help or achieve.

I’m still fascinated by new technology and ideas and problems but the frame within which those are set is important. The default worldview of the tech industry feels constraining rather than liberating, and restricts the kinds of technology, ideas and problems that we think about. There are alternative viewpoints, even if they’re hard to imagine.

In Misc on 23 September 2014. Permalink

Booking reference

After the previous post’s long sequence of service design failures, here’s one little thing that should be easy to get right, and which causes incredible frustration at exactly the wrong moment.

When buying a train ticket online in the UK, in advance, you can choose to collect the ticket from a machine at the train station. You receive an email containing all the relevant details.

When you get to the station the machine offers this screen:

Photo of the screen

It asks:

Please enter your booking reference:

You check the email you’ve kept handy on your phone and look for the “booking reference”:

Email screenshot 1

There it is, near the bottom:

Your booking reference is 2106902679.

You type it in to the machine, although the input field is the wrong length, which is odd.

The machine rejects your booking reference.

You try again, because you must, but no joy.

In desperation you scroll through the rest of the email because you don’t know what else to do and your train leaves in a few minutes:

Email screenshot 2

Hmm. What’s this?

Collection reference: R2T4KB9C

But the screen wants the “booking reference”, which was the first number.

It’s worth a try…

Ah, this “collection reference” is the right length…

And, yes, the machine starts printing your tickets!


So, what the collection machine calls a “booking reference”, the email calls a “collection reference” or (in the subject line) a “booking confirmation”. And the email contains a “booking reference” number which appears to have no purpose, even though it’s the first reference in the body of the email.

When starting a project, and throughout its life, it’s important to ensure everyone involved calls things by the same names. I’m guessing that in this case different teams, or even companies, built the software that sends the emails and the software that runs the collection machines. Due to a mismatch of internal vocabularies the single piece of information the user needs at the most crucial part of the process has been muddled.

It’s probably a nightmare of laborious and expensive change request processes to fix this simple piece of wording, which has already been wrong for months, if not years. But that’s another issue.

In Misc on 12 August 2014. Permalink

Visit your nearest branch

Last week I spent a frustrating morning trying to open a business bank account. I assumed banks would make it as easy as possible and so I was surprised how frustrating it was. I’m easily put off by small but easily-avoidable annoyances and I found plenty of those.

I had no idea how to choose a bank and bank account. I don’t have any obscure requirements: let people pay me; let me transfer money; a debit/ATM card; online banking. Charges for business current accounts vary, but not enough to make a wild difference. I had few good reasons to choose one bank over another.

Barclays

A couple of friends recommended Barclays, mainly because it can integrate with FreeAgent without going through a third-party service. It sounded pretty broken — one friend registered for both online and mobile banking, and with “data services”, but still needed to separately request a “phone banking PIN” to enable a feed into FreeAgent. But, once that was navigated, it apparently worked well.

Barclays logo

I wasn’t wild about Barclays. I don’t like their shade of blue, I don’t like their now insipid eagle/shield logo, and I still associate them with closing my first ever bank account in the 1980s over their support of apartheid in South Africa. But these days it’s just another bank, I guess.

You can start an application online but, for a limited company, you will subsequently have to book an appointment in a branch anyway. So, given the website said this:

Part of the Barclays site saying you can visit your local branch to apply for an account

at 9am I was in my local branch, Moorgate. “If they can’t open business accounts in the City of London, there’s no hope!” I thought to myself. They can’t, there wasn’t.

A polite man insisted that I would have to call a central Barclays phone number to book an appointment at the branch in which I was standing. I couldn’t speak to anyone at the branch, not even to arrange when to come in later. But it says “visit your nearest branch”! Why does it say that?! I was amazed and annoyed and walked out.

I still don’t understand. Even if I could have made an appointment in the branch, to see someone in that same branch later, why tell me to visit the branch to do so? Only tell me to visit the branch to apply for an account if I can apply for an account when I get there!

NatWest

In the 1980s, when I closed my Barclays account, I walked up the high street into NatWest and opened one there. So I’m slightly more fond of it than most anonymous international banking corporations. Plus, out of all the banks, I like their logo the most:

NatWest logo

Despite their current credit card advertising campaign which emphasises simplicity, fairness and transparency, finding their business banking charges wasn’t simple or transparent, being buried in a PDF and not listed in any navigation. But still, two years’ free banking was at least fair.

(Incidentally, that URL for NatWest’s “Start-up package” is 780 characters long. Which is an improvement on the 1,126 it had when I looked last week.)

NatWest screenshot

If I read this correctly, applying online means waiting five working days for a further discussion. Or I could call them or visit a branch. Having been bitten by Barclay’s branch-visiting, I called the number helpfully displayed on the right.

But, ha ha, despite being displayed in a box headed “Apply now” on a business banking page, that number is not the number to call if you want to apply for a business bank account. The woman who answered gave me the number to call (she couldn’t transfer me).

I pressed on, and called the new number. Before we could get started on the application process, the next woman had to read some standard stuff out to me in the “I am reading this” voice. The first of these was that in 2016 my account would move to a new bank, Williams & Glyn. Oh. I’ve just made the arbitrary decision to bank with NatWest and now you’re telling me I can’t. Hmm.

Unable to make snap decisions on the phone, I thanked her and hung up. Next!

Metro Bank

A couple of friends said they’d walked into branches of Metro Bank and opened up accounts on the spot. This sounded good and in line with Metro Bank’s aims of shaking up the system, gently.

I routed round an initial hiccup — the link Google displayed for Metro’s business banking was a 404 — and checked that I should be able to open a business account by walking into a branch. Yes! “Please visit your local store to apply for this account.”

I walked to the Cheapside branch around 10am and entered their small, quiet replica of a Reno casino. Unfortunately, a helpful woman told me that the “CSR” was busy at the moment and I’d have to wait for 90 minutes. Oh. This is the downside of inviting people to just walk in: you need the capacity to handle them. There seemed to be more staff than customers but they must have been the wrong kind of “CSR”s. So I left.

The day was far too hot already and I was getting nowhere, with something I thought would be simple. I couldn’t believe it was so difficult.

Falling Down starring Victor Meldrew

Cater Allen

A couple of friends said they were with the Cater Allen private bank, but their business bank account page didn’t really encourage me to simply open an account. A lot of text, talk of “an Application Pack for your client”, a lot of documents to download, and a ten day processing time.

I moved on. I really wanted a bank to sell itself to me, simply, and make it as easy as possible for me to become their customer.

Triodos

Someone suggested Triodos, who I’d forgotten all about, despite having had a savings account with them for a while. They’re an ethical bank, who do good things, and have a business current account. You don’t get a credit or debit card but I was past caring, and figured I could get a credit card elsewhere.

To apply for an account you have to call to make sure your business is suitable, which is fair enough. I called, had a chat (“Well, I’m not planning on making websites for arms dealers, ha ha!”) and was emailed a link to the application form.

For some reason this link, at the end of a long email, wasn’t clickable. But, undeterred, I worked out how to get to the form. This is, however, as far as I got. The form wanted to know “the main activities of your Organisation” and “your main sources of income”. The first field needed at least 50 words (or, as the error message put it, 200 characters), and the second needed at least 25 words (or 150 characters). It’s just not that complicated. I design and make websites, people pay me money to do so. At this point, and given the limitations of the account, I was beyond making up nonsense to please a form’s validation algorithms and closed the tab.

Santander

Belatedly, I remembered another friend saying Santander had been quick to set up their business account. So, despite disliking their stagey, awkward adverts featuring sports people, and their renaming of the combined Abbey National and Bradford & Bingley, I just wanted a bank account.

Ignoring the forest of stock photography showing industrious white people, I quickly found their current account for new businesses, for which I could apply online, with a nice clear list of requirements. The form was simple, with a minimum of onerous questions, and it was soon completed.

Within 24 hours I received an email telling me my new account was open! Easy! The email said:

We’re pleased to let you know that we’ve opened a Santander Business Bank Account for you and you can start using your account straightaway.

and:

Your account is now open and ready for use…

Brilliant!

Except there was no account number or sort code, and the details for accessing online, mobile and telephone banking would be sent in two letters within 7 to 10 days.

So the account is only “open and ready for use” in the sense that it exists as an entry in a database somewhere. And I can only “start using your account” in the sense of… I don’t know. That just doesn’t make any sense.

Done

Still, it’s done. Or soon will be.

A lot of the above is going to seem petty. And, yes, it is. I could easily have ignored many of these little difficulties and applied for an account a few hours sooner than I did. But no one should have to encounter these difficulties.

They’re mostly easy things to get right if you want to make it easy for people to use your service. Putting the correct phone number on the page. Giving people the correct instructions. Making it easy to find information and fill in forms. It’s not just making good websites, it’s about making the whole company and service work how someone would expect.

Banks seem to be like electricity suppliers or mobile phone companies. They’re desperate to seem special but for many customers they’re simply interchangeable utilities. But we have to use at least one of them. There’s a huge barrier to new companies entering the market and doing things in new ways. Consequently we have to put up with the above petty silliness, clunky online banking, Verified By Visa, and everything else that could be done so much better if only the services focused on the people who use them. Or want to use them.

Further reading

In Misc on 30 July 2014. Permalink

Recent comments on writing

Writing archives by category