Journal tags: mac

31

sparkline

Simon’s rule

I got a nice email from someone regarding my recent posts about performance on The Session. They said:

I hope this message finds you well. First and foremost, I want to express how impressed I am with the overall performance of https://thesession.org/. It’s a fantastic resource for music enthusiasts like me.

How nice! I responded, thanking them for the kind words.

They sent a follow-up clarification:

Awesome, anyway there was an issue in my message.

The line ‘It’s a fantastic resource for music enthusiasts like me.’ added by chatGPT and I didn’t notice.

I imagine this is what it feels like when you’re on a phone call with someone and towards the end of the call you hear a distinct flushing sound.

I wrote back and told them about Simon’s rule:

I will not publish anything that takes someone else longer to read than it took me to write.

That just feels so rude!

I think that’s a good rule.

Automation

I just described prototype code as code to be thrown away. On that topic…

I’ve been observing how people are programming with large language models and I’ve seen a few trends.

The first thing that just about everyone agrees on is that the code produced by a generative tool is not fit for public consumption. At least not straight away. It definitely needs to be checked and tested. If you enjoy debugging and doing code reviews, this might be right up your street.

The other option is to not use these tools for production code at all. Instead use them for throwaway code. That could be prototyping. But it could also be the code for those annoying admin tasks that you don’t do very often.

Take content migration. Say you need to grab a data dump, do some operations on the data to transform it in some way, and then pipe the results into a new content management system.

That’s almost certainly something you’d want to automate with bespoke code. Once the content migration is done, the code can be thrown away.

Read Matt’s account of coding up his Braggoscope. The code needed to spider a thousand web pages, extract data from those pages, find similarities, and output the newly-structured data in a different format.

I’ve noticed that these are just the kind of tasks that large language models are pretty good at. In effect you’re training the tool on your own very specific data and getting it to do your drudge work for you.

To me, it feels right that the usefulness happens on your own machine. You don’t put the machine-generated code in front of other humans.

Permission

Back when the web was young, it wasn’t yet clear what the rules were. Like, could you really just link to something without asking permission?

Then came some legal rulings to establish that, yes, on the web you can just link to anything without checking if it’s okay first.

What about search engines and directories? Technically they’re rifling through all the stuff we publish and reposting snippets of it. Is that okay?

Again, through some legal precedents—but mostly common agreement—everyone decided that on balance it was fine. After all, those snippets they publish are helping your site get traffic.

In short order, search came to rule the web. And Google came to rule search.

The mutually beneficial arrangement persisted uneasily. Despite Google’s search results pages getting worse and worse in recent years, the company’s huge market share of search means you generally want to be in their good books.

Google’s business model relies on us publishing web pages so that they can put ads around the search results linking to that content, and we rely on Google to send people to our websites by responding smartly to search queries.

That has now changed. Instead of responding to search queries by linking to the web pages we’ve made, Google is instead generating dodgy summaries rife with hallucina… lies (a psychic hotline, basically).

Google still benefits from us publishing web pages. We no longer benefit from Google slurping up those web pages.

With AI, tech has broken the web’s social contract:

Google has steadily been manoeuvring their search engine results to more and more replace the pages in the results.

As Chris puts it:

Me, I just think it’s fuckin’ rude.

Google is a portal to the web. Google is an amazing tool for finding relevant websites to go to. That was useful when it was made, and it’s nothing but grown in usefulness. Google should be encouraging and fighting for the open web. But now they’re like, actually we’re just going to suck up your website, put it in a blender with all other websites, and spit out word smoothies for people instead of sending them to your website. Instead.

Ben proposes an update to robots.txt that would allow us to specify licensing information:

Robots.txt needs an update for the 2020s. Instead of just saying what content can be indexed, it should also grant rights.

Like crawl my site only to provide search results not train your LLM.

It’s a solid proposal. But Google has absolutely no incentive to implement it. They hold all the power.

Or do they?

There is still the nuclear option in robots.txt:

User-agent: Googlebot
Disallow: /

That’s what Vasilis is doing:

I have been looking for ways to not allow companies to use my stuff without asking, and so far I coulnd’t find any. But since this policy change I realised that there is a simple one: block google’s bots from visiting your website.

The general consensus is that this is nuts. “If you don’t appear in Google’s results, you might as well not be on the web!” is the common cry.

I’m not so sure. At least when it comes to personal websites, search isn’t how people get to your site. They get to your site from RSS, newsletters, links shared on social media or on Slack.

And isn’t it an uncomfortable feeling to think that there’s a third party service that you absolutely must appease? It’s the same kind of justification used by people who are still on Twitter even though it’s now a right-wing transphobic cesspit. “If I’m not on Twitter, I might as well not be on the web!”

The situation with Google reminds me of what Robin said about Twitter:

The speed with which Twitter recedes in your mind will shock you. Like a demon from a folktale, the kind that only gains power when you invite it into your home, the platform melts like mist when that invitation is rescinded.

We can rescind our invitation to Google.

Talking about “web3” and “AI”

When I was hosting the DIBI conference in Edinburgh back in May, I moderated an impromptu panel on AI:

On the whole, it stayed quite grounded and mercifully free of hyperbole. Both speakers were treating the current crop of technologies as tools. Everyone agreed we were on the hype cycle, probably the peak of inflated expectations, looking forward to reaching the plateau of productivity.

Something else that happened at that event was that I met Deborah Dawton from the Design Business Association. She must’ve liked the cut of my jib because she invited me to come and speak at their get-together in Brighton on the topic of “AI, Web3 and design.”

The representative from the DBA who contacted me knew what they were letting themselves in for. They wrote:

I’ve read a few of your posts on the subject and it would be great if you could join us to share your perspectives.

How could I say no?

I’ve published a transcript of the short talk I gave.

Nailspotting

I’m sure you’ve heard the law of the instrument: when all you have is a hammer, everything looks like a nail.

There’s another side to it. If you’re selling hammers, you’ll depict a world full of nails.

Recent hammers include cryptobollocks and virtual reality. It wasn’t enough for blockchains and the metaverse to be potentially useful for some situations; they staked their reputations on being utterly transformative, disrupting absolutely every facet of life.

This kind of hype is a terrible strategy in the long-term. But if you can convince enough people in the short term, you can make a killing on the stock market. In truth, the technology itself is superfluous. It’s the hype that matters. And if the hype is over-inflated enough, you can even get your critics to do your work for you, broadcasting their fears about these supposedly world-changing technologies.

You’d think we’d learn. If an industry cries wolf enough times, surely we’d become less trusting of extraordinary claims. But the tech industry continues to cry wolf—or rather, “hammer!”—at regular intervals.

The latest hammer is machine learning, usually—incorrectly—referred to as Artificial Intelligence. What makes this hype cycle particularly infuriating is that there are genuine use cases. There are some nails for this hammer. They’re just not as plentiful as the breathless hype—both positive and negative—would have you believe.

When I was hosting the DiBi conference last week, there was a little section on generative “AI” tools. Matt Garbutt covered the visual side, demoing tools like Midjourney. Scott Salisbury covered the text side, showing how you can generate code. Afterwards we had a panel discussion.

During the panel I asked some fairly straightforward questions that nobody could answer. Who owns the input (the data used by these generative tools)? Who owns the output?

On the whole, it stayed quite grounded and mercifully free of hyperbole. Both speakers were treating the current crop of technologies as tools. Everyone agreed we were on the hype cycle, probably the peak of inflated expectations, looking forward to reaching the plateau of productivity.

Scott explicitly warned people off using generative tools for production code. His advice was to stick to side projects for now.

Matt took a closer look at where these tools could fit into your day-to-day design work. Mostly it was pretty sensible, except when he suggested that there could be any merit to using these tools as a replacement for user testing. That’s a terrible idea. A classic hammer/nail mismatch.

I think I moderated the panel reasonably well, but I have one regret. I wish I had first read Baldur Bjarnason’s new book, The Intelligence Illusion. I started reading it on the train journey back from Edinburgh but it would have been perfect for the panel.

The Intelligence Illusion is very level-headed. It is neither pro- nor anti-AI. Instead it takes a pragmatic look at both the benefits and the risks of using these tools in your business.

It has excellent advice for spotting genuine nails. For example:

Generative AI has impressive capabilities for converting and modifying seemingly unstructured data, such as prose, images, and audio. Using these tools for this purpose has less copyright risk, fewer legal risks, and is less error prone than using it to generate original output.

Think about transcripts of videos or podcasts—an excellent use of this technology. As Baldur puts it:

The safest and, probably, the most productive way to use generative AI is to not use it as generative AI. Instead, use it to explain, convert, or modify.

He also says:

Prefer internal tools over externally-facing chatbots.

That chimes with what I’ve been seeing. The most interesting uses of this technology that I’ve seen involve a constrained dataset. Like the way Luke trained a language model on his own content to create a useful chat interface.

Anyway, The Intelligence Illusion is full of practical down-to-earth advice based on plenty of research backed up with copious citations. I’m only halfway through it and it’s already helped me separate the hype from the reality.

Steam

Picture someone tediously going through a spreadsheet that someone else has filled in by hand and finding yet another error.

“I wish to God these calculations had been executed by steam!” they cry.

The year was 1821 and technically the spreadsheet was a book of logarithmic tables. The frustrated cry came from Charles Babbage, who channeled his frustration into a scheme to create the world’s first computer.

His difference engine didn’t work out. Neither did his analytical engine. He’d spend his later years taking his frustrations out on street musicians, which—as a former busker myself—earns him a hairy eyeball from me.

But we’ve all been there, right? Some tedious task that feels soul-destroying in its monotony. Surely this is exactly what machines should be doing?

I have a hunch that this is where machine learning and large language models might turn out to be most useful. Not in creating breathtaking works of creativity, but in menial tasks that nobody enjoys.

Someone was telling me earlier today about how they took a bunch of haphazard notes in a client meeting. When the meeting was done, they needed to organise those notes into a coherent summary. Boring! But ChatGPT handled it just fine.

I don’t think that use-case is going to appear on the cover of Wired magazine anytime soon but it might be a truer glimpse of the future than any of the breathless claims being eagerly bandied about in Silicon Valley.

You know the way we no longer remember phone numbers, because, well, why would we now that we have machines to remember them for us? I’d be quite happy if machines did that for the annoying little repetitive tasks that nobody enjoys.

I’ll give you an example based on my own experience.

Regular expressions are my kryptonite. I’m rubbish at them. Any time I have to figure one out, the knowledge seeps out of my brain before long. I think that’s because I kind of resent having to internalise that knowledge. It doesn’t feel like something a human should have to know. “I wish to God these regular expressions had been calculated by steam!”

Now I can get a chatbot with a large language model to write the regular expression for me. I still need to describe what I want, so I need to write the instructions clearly. But all the gobbledygook that I’m writing for a machine now gets written by a machine. That seems fair.

Mind you, I wouldn’t blindly trust the output. I’d take that regular expression and run it through a chatbot, maybe a different chatbot running on a different large language model. “Explain what this regular expression does,” would be my prompt. If my input into the first chatbot matches the output of the second, I’d have some confidence in using the regular expression.

A friend of mine told me about using a large language model to help write SQL statements. He described his database structure to the chatbot, and then described what he wanted to select.

Again, I wouldn’t use that output without checking it first. But again, I might use another chatbot to do that checking. “Explain what this SQL statement does.”

Playing chatbots off against each other like this is kinda how machine learning works under the hood: generative adverserial networks.

Of course, the task of having to validate the output of a chatbot by checking it with another chatbot could get quite tedious. “I wish to God these large language model outputs had been validated by steam!”

Sounds like a job for machines.

Disclosure

You know how when you’re on hold to any customer service line you hear a message that thanks you for calling and claims your call is important to them. The message always includes a disclaimer about calls possibly being recorded “for training purposes.”

Nobody expects that any training is ever actually going to happen—surely we would see some improvement if that kind of iterative feedback loop were actually in place. But we most certainly want to know that a call might be recorded. Recording a call without disclosure would be unethical and illegal.

Consider chatbots.

If you’re having a text-based (or maybe even voice-based) interaction with a customer service representative that doesn’t disclose its output is the result of large language models, that too would be unethical. But, at the present moment in time, it would be perfectly legal.

That needs to change.

I suspect the necessary legislation will pass in Europe first. We’ll see if the USA follows.

In a way, this goes back to my obsession with seamful design. With something as inherently varied as the output of large language models, it’s vital that people have some way of evaluating what they’re told. I believe we should be able to see as much of the plumbing as possible.

The bare minimum amount of transparency is revealing that a machine is in the loop.

This shouldn’t be a controversial take. But I guarantee we’ll see resistance from tech companies trying to sell their “AI” tools as seamless, indistinguishable drop-in replacements for human workers.

Guessing

The last talk at the last dConstruct was by local clever clogs Anil Seth. It was called Your Brain Hallucinates Your Conscious Reality. It’s well worth a listen.

Anil covers a lot of the same ground in his excellent book, Being You. He describes a model of consciousness that inverts our intuitive understanding.

We tend to think of our day-to-day reality in a fairly mechanical cybernetic manner; we receive inputs through our senses and then make decisions about reality informed by those inputs.

As another former dConstruct speaker, Adam Buxton, puts it in his interview with Anil, it feels like that old Beano cartoon, the Numskulls, with little decision-making homonculi inside our head.

But Anil posits that it works the other way around. We make a best guess of what the current state of reality is, and then we receive inputs from our senses, and then we adjust our model accordingly. There’s still a feedback loop, but cause and effect are flipped. First we predict or guess what’s happening, then we receive information. Rinse and repeat.

The book goes further and applies this to our very sense of self. We make a best guess of our sense of self and then adjust that model constantly based on our experiences.

There’s a natural tendency for us to balk at this proposition because it doesn’t seem rational. The rational model would be to make informed calculations based on available data …like computers do.

Maybe that’s what sets us apart from computers. Computers can make decisions based on data. But we can make guesses.

Enter machine learning and large language models. Now, for the first time, it appears that computers can make guesses.

The guess-making is not at all like what our brains do—large language models require enormous amounts of inputs before they can make a single guess—but still, this should be the breakthrough to be shouted from the rooftops: we’ve taught machines how to guess!

And yet. Almost every breathless press release touting some revitalised service that uses AI talks instead about accuracy. It would be far more honest to tout the really exceptional new feature: imagination.

Using AI, we will guess who should get a mortgage.

Using AI, we will guess who should get hired.

Using AI, we will guess who should get a strict prison sentence.

Reframed like that, it’s easy to see why technologists want to bury the lede.

Alas, this means that large language models are being put to use for exactly the wrong kind of scenarios.

(This, by the way, is also true of immersive “virtual reality” environments. Instead of trying to accurately recreate real-world places like meeting rooms, we should be leaning into the hallucinatory power of a technology that can generate dream-like situations where the pleasure comes from relinquishing control.)

Take search engines. They’re based entirely on trust and accuracy. Introducing a chatbot that confidentally conflates truth and fiction doesn’t bode well for the long-term reputation of that service.

But what if this is an interface problem?

Currently facts and guesses are presented with equal confidence, hence the accurate descriptions of the outputs as bullshit or mansplaining as a service.

What if the more fanciful guesses were marked as such?

As it is, there’s a “temperature” control that can be adjusted when generating these outputs; the more the dial is cranked, the further the outputs will stray from the safest predictions. What if that could be reflected in the output?

I don’t know what that would look like. It could be typographic—some markers to indicate which bits should be taken with pinches of salt. Or it could be through content design—phrases like “Perhaps…”, “Maybe…” or “It’s possible but unlikely that…”

I’m sure you’ve seen the outputs when people request that ChatGPT write their biography. Perfectly accurate statements are generated side-by-side with complete fabrications. This reinforces our scepticism of these tools. But imagine how differently the fabrications would read if they were preceded by some simple caveats.

A little bit of programmed humility could go a long way.

Right now, these chatbots are attempting to appear seamless. If 80% or 90% of their output is accurate, then blustering through the other 10% or 20% should be fine, right? But I think the experience for the end user would be immensely more empowering if these chatbots were designed seamfully. Expose the wires. Show the workings-out.

Mind you, that only works if there is some way to distinguish between fact and fabrication. If there’s no way to tell how much guessing is happening, then that’s a major problem. If you can’t tell me whether something is 50% true or 75% true or 25% true, then the only rational response is to treat the entire output as suspect.

I think there’s a fundamental misunderstanding behind the design of these chatbots that goes all the way back to the Turing test. There’s this idea that the way to make a chatbot believable and trustworthy is to make it appear human, attempting to hide the gears of the machine. But the real way to gain trust is through honesty.

I want a machine to tell me when it’s guessing. That won’t make me trust it less. Quite the opposite.

After all, to guess is human.

Like

We use metaphors all the time. To quote George Lakoff, we live by them.

We use analogies some of the time. They’re particularly useful when we’re wrapping our heads around something new. By comparing something novel to something familiar, we can make a shortcut to comprehension, or at least, categorisation.

But we need a certain amount of vigilance when it comes to analogies. Just because something is like something else doesn’t mean it’s the same.

With that in mind, here are some ways that people are describing generative machine learning tools. Large language models are like…

Bookshop

Back at the start of the (first) lockdown, I wrote about using my website as an outlet:

While you’re stuck inside, your website is not just a place you can go to, it’s a place you can control, a place you can maintain, a place you can tidy up, a place you can expand. Most of all, it’s a place you can lose yourself in, even if it’s just for a little while.

Last week was eventful and stressful. For everyone. I found myself once again taking refuge in my website, tinkering with its inner workings in the way that someone else would potter about in their shed or take to their garage to strip down the engine of some automotive device.

Colly drew my attention to Bookshop.org, newly launched in the UK. It’s an umbrella website for independent bookshops to sell through. It’s also got an affiliate scheme, much like Amazon. I set up a Bookshop page for myself.

I’ve been tracking the books I’m reading for the past three years here on my own website. I set about reproducing that list on Bookshop.

It was exactly the kind of not-exactly-mindless but definitely-not-challenging task that was perfect for the state of my brain last week. Search for a book; find the ISBN number; paste that number into a form. It’s the kind of task that a real programmer would immediately set about automating but one that I embraced as a kind of menial task to keep me occupied.

I wasn’t able to get a one-to-one match between the list on my site and my reading list on Bookshop. Some titles aren’t available in the online catalogue. For example, the book I’m reading right now—A Paradise Built in Hell by Rebecca Solnit—is nowhere to be found, which seems like an odd omission.

But most of the books I’ve read are there on Bookshop.org, complete with pretty book covers. Then I decided to reverse the process of my menial task. I took all of the ISBN numbers from Bookshop and add them as machine tags to my reading notes here on my own website. Book cover images on Bookshop have predictable URLs that use the ISBN number (well, technically the EAN number, or ISBN-13, but let’s not go down a 927 rabbit hole here). So now I’m using that metadata to pull in images from Bookshop.org to illustrate my reading notes here on adactio.com.

I’m linking to the corresponding book on Bookshop.org using this URL structure:

https://uk.bookshop.org/a/{{ affiliate code }}/{{ ISBN number }}

I realised that I could also link to the corresponding entry on Open Library using this URL structure:

https://openlibrary.org/isbn/{{ ISBN number }}

Here, for example, is my note for The Raven Tower by Ann Leckie. That entry has a tag:

book:ean=9780356506999

With that information I can illustrate my note with this image:

https://images-eu.bookshop.org/product-images/images/9780356506999.jpg

I’m linking off to this URL on Bookshop.org:

https://uk.bookshop.org/a/980/9780356506999

And this URL on Open Library:

https://openlibrary.org/isbn/9780356506999

The end result is that my reading list now has more links and pretty pictures.

Oh, I also set up a couple of shorter lists on Bookshop.org:

The books listed in those are drawn from my end of the year round-ups when I try to pick one favourite non-fiction book and one favourite work of fiction (almost always speculative fiction). The books in those two lists are the ones that get two hearty thumbs up from me. If you click through to buy one of them, the price might not be as cheap as on Amazon, but you’ll be supporting an independent bookshop.

Web browsers on iOS

Safari is the only browser on iOS devices.

I don’t mean it’s the only browser that ships with iOS devices. I mean it’s the only browser that can be installed on iOS devices.

You can install something called Chrome. You can install something called Firefox. Those aren’t different web browsers. Under the hood they’re using Safari’s rendering engine. They have to. The app store doesn’t allow other browsers to be listed. The apps called Chrome and Firefox are little more than skinned versions of Safari.

If you’re a web developer, there are two possible reactions to hearing this. One is “Duh! Everyone knows that!”. The other is “What‽ I never knew that!”

If you fall into the first category, I’m guessing you’ve been a web developer for a while. The fact that Safari is the only browser on iOS devices is something you’ve known for years, and something you assume everyone else knows. It’s common knowledge, right?

But if you’re relatively new to web development—heck, if you’ve been doing web development for half a decade—you might fall into the second category. After all, why would anyone tell you that Safari is the only browser on iOS? It’s common knowledge, right?

So that’s the situation. Safari is the only browser that can run on iOS. The obvious follow-on question is: why?

Apple at this point will respond with something about safety and security, which are certainly important priorities. So let me rephrase the question: why on iOS?

Why can I install Chrome or Firefox or Edge on my Macbook running macOS? If there are safety or security reasons for preventing me from installing those browsers on my iOS device, why don’t those same concerns apply to my macOS device?

At one time, the mobile operating system—iOS—was quite different to the desktop operating system—OS X. Over time the gap has narrowed. At this point, the operating systems are converging. That makes sense. An iPhone, an iPad, and a Macbook aren’t all that different apart from the form factor. It makes sense that computing devices from the same company would share an underlying operating system.

As this convergence continues, the browser question is going to have to be decided in one direction or the other. As it is, Apple’s laptops and desktops strongly encourage you to install software from their app store, though it is still possible to install software by other means. Perhaps they’ll decide that their laptops and desktops should only be able to install software from their app store—a decision they could justify with safety and security concerns.

Imagine that situation. You buy a computer. It comes with one web browser pre-installed. You can’t install a different web browser on your computer.

You wouldn’t stand for it! I mean, Microsoft got fined for anti-competitive behaviour when they pre-bundled their web browser with Windows back in the 90s. You could still install other browsers, but just the act of pre-bundling was seen as an abuse of power. Imagine if Windows never allowed you to install Netscape Navigator?

And yet that’s exactly the situation in 2020.

You buy a computing device from Apple. It might be a Macbook. It might be an iPad. It might be an iPhone. But you can only install your choice of web browser on one of those devices. For now.

It is contradictory. It is hypocritical. It is indefensible.

The Machines Stop

The Situation feels like it’s changing. It’s not over, not by a long shot. But it feels like it’s entering a different, looser phase.

Throughout the lockdown, there’s been a strange symmetry between the outside world and the inside of our home. As the outside world slowed to a halt, so too did half the machinery in our flat. Our dishwasher broke shortly before the official lockdown began. So did our washing machine.

We had made plans for repairs and replacements, but as events in the world outside escalated, those plans had to be put on hold. Plumbers and engineers weren’t making any house calls, and rightly so.

We even had the gas to our stovetop cut off for a while—you can read Jessica’s account of that whole affair. All the breakdowns just added to the entropic Ballardian mood.

But the gas stovetop was fixed. And so too was the dishwasher, eventually. Just last week, we got our new washing machine installed. Piece by piece, the machinery of our interier world revived in lockstep with the resucitation of the world outside.

As of today, pubs will be open. I won’t be crossing their thresholds just yet. We know so much more about the spread of the virus now, and gatherings of people in indoor spaces are pretty much the worst environments for transmission.

I’m feeling more sanguine about outdoor spaces. Yesterday, Jessica and I went into town for Street Diner. It was the first time since March that we walked in that direction—our other excursions have been in the direction of the countryside.

It was perfectly fine. We wore masks, and while we were certainly in the minority, we were not alone. People were generally behaving responsibly.

Brighton hasn’t done too badly throughout The Situation. But still, like I said, I have no plans to head to the pub on a Saturday night. The British drinking culture is very much concentrated on weekends. Stay in all week and then on the weekend, lassen die Sau raus!, as the Germans would say.

After months of lockdown, reopening pubs on a Saturday seems like a terrible idea. Over in Ireland, pubs have been open since Monday—a sensible day to soft-launch. With plenty of precautions in place, things are going well there.

I’ve been watching The Situation in Ireland throughout. It’s where my mother lives, so I was understandably concerned. But they’ve handled everything really well. It’s not New Zealand, but it’s also not the disaster that is the UK.

It really has been like watching an A/B test run at the country level. Two very similar populations confronted with exactly the same crisis. Ireland took action early, cancelling the St. Patrick’s Day parade(!) while the UK was still merrily letting Cheltenham go ahead. Ireland had clear guidance. The UK had dilly-dallying and waffling. And when the shit really hit the fan, the Irish taoiseach rolled up his sleeves and returned to medical work. Meanwhile the UK had Dominic Cummings making a complete mockery of the sacrifices that everyone was told to endure.

What’s strange is that people here in the UK don’t seem to realise how the rest of the world, especially other European countries, have watched the response here with shock and horror. The narrative here seems to be that we all faced this thing together, and with our collective effort, we averted the worst. But the numbers tell a very different story. Comparing the numbers here with the numbers in Ireland—or pretty much any other country in Europe—is sobering.

So even though the timelines for reopenings here converge with Ireland’s, The Situation is far from over.

Even without any trips to pubs, restaurants, or other indoor spaces, I’m looking forward to making some more excursions into town. Not that it’s been bad staying at home. I’ve really quite enjoyed staying put, playing music, reading books, and watching television.

I was furloughed from work for a while in June. Normally, my work at this time of year would involve plenty of speaking at conferences. Seeing as that wasn’t happening, it made sense to take advantage of the government scheme to go into work hibernation for a bit.

I was worried I might feel at a bit of a loose end, but I actually really enjoyed it. The weather was good so I spent quite a bit of time just sitting in the back garden, reading (I am very, very grateful to have even a small garden). I listened to music. I watched movies. I surfed the web. Yes, properly surfed the web, going from link to link, get lost down rabbit holes. I tell you, this World Wide Web thing is pretty remarkable. Some days I used it to read up on science or philosophy. I spent a week immersed in Napoleonic history. I have no idea how or why. But it was great.

I’m back at work now, and have been for a couple of weeks. But I wouldn’t mind getting furloughed again. It felt kind of like being retired. I’m quite okay with the propsect of retirement now, as long as we have music and sunshine and the World Wide Web.

That’s the future. For now, The Situation continues, albeit in looser form.

I’ve really enjoyed reading other people’s accounts throughout. My RSS reader is getting a good workout. I always look forward to weeknotes from Alice, Nat, and Phil (this piece from Phil has really stuck with me). Jessica has written fifteen installments—and counting—of A Journal of the Plague Week. I know I’m biased, but I think it’s some mighty fine writing. Start here.

Voice User Interface Design by Cheryl Platz

Cheryl Platz is speaking at An Event Apart Chicago. Her inaugural An Event Apart presentation is all about voice interfaces, and I’m going to attempt to liveblog it…

Why make a voice interface?

Successful voice interfaces aren’t necessarily solving new problems. They’re used to solve problems that other devices have already solved. Think about kitchen timers. There are lots of ways to set a timer. Your oven might have one. Your phone has one. Why use a $200 device to solve this mundane problem? Same goes for listening to music, news, and weather.

People are using voice interfaces for solving ordinary problems. Why? Context matters. If you’re carrying a toddler, then setting a kitchen timer can be tricky so a voice-activated timer is quite appealing. But why is voice is happening now?

Humans have been developing the art of conversation for thousands of years. It’s one of the first skills we learn. It’s deeply instinctual. Most humans use speach instinctively every day. You can’t necessarily say that about using a keyboard or a mouse.

Voice-based user interfaces are not new. Not just the idea—which we’ve seen in Star Trek—but the actual implementation. Bell Labs had Audrey back in 1952. It recognised ten words—the digits zero through nine. Why did it take so long to get to Alexa?

In the late 70s, DARPA issued a challenge to create a voice-activated system. Carnagie Mellon came up with Harpy (with a thousand word grammar). But none of the solutions could respond in real time. In conversation, we expect a break of no more than 200 or 300 milliseconds.

In the 1980s, computing power couldn’t keep up with voice technology, so progress kind of stopped. Time passed. Things finally started to catch up in the 90s with things like Dragon Naturally Speaking. But that was still about vocabulary, not grammar. By the 2000s, small grammars were starting to show up—starting an X-Box or pausing Netflix. In 2008, Google Voice Search arrived on the iPhone and natural language interaction began to arrive.

What makes natural language interactions so special? It requires minimal training because it uses the conversational muscles we’ve been working for a lifetime. It unlocks the ability to have more forgiving, less robotic conversations with devices. There might be ten different ways to set a timer.

Natural language interactions can also free us from “screen magnetism”—that tendency to stay on a device even when our original task is complete. Voice also enables fast and forgiving searches of huge catalogues without time spent typing or browsing. You can pick a needle straight out of a haystack.

Natural language interactions are excellent for older customers. These interfaces don’t intimidate people without dexterity, vision, or digital experience. Voice input often leads to more inclusive experiences. Many customers with visual or physical disabilities can’t use traditional graphical interfaces. Voice experiences throw open the door of opportunity for some people. However, voice experience can exclude people with speech difficulties.

Making the case for voice interfaces

There’s a misconception that you need to work at Amazon, Google, or Apple to work on a voice interface, or at least that you need to have a big product team. But Cheryl was able to make her first Alexa “skill” in a week. If you’re a web developer, you’re good to go. Your voice “interaction model” is just JSON.

How do you get your product team on board? Find the customers (and situations) you might have excluded with traditional input. Tell the stories of people whose hands are full, or who are vision impaired. You can also point to the adoption rate numbers for smart speakers.

You’ll need to show your scenario in context. Otherwise people will ask, “why can’t we just build an app for this?” Conduct research to demonstrate the appeal of a voice interface. Storyboarding is very useful for visualising the context of use and highlighting existing pain points.

Getting started with voice interfaces

You’ve got to understand how the technology works in order to adapt to how it fails. Here are a few basic concepts.

Utterance. A word, phrase, or sentence spoken by a customer. This is the true form of what the customer provides.

Intent. This is the meaning behind a customer’s request. This is an important distinction because one intent could have thousands of different utterances.

Prompt. The text of a system response that will be provided to a customer. The audio version of a prompt, if needed, is generated separately using text to speech.

Grammar. A finite set of expected utterances. It’s a list. Usually, each entry in a grammar is paired with an intent. Many interfaces start out as being simple grammars before moving on to a machine-learning model later once the concept has been proven.

Here’s the general idea with “artificial intelligence”…

There’s a human with a core intent to do something in the real world, like knowing when the cookies in the oven are done. This is translated into an intent like, “set a 15 minute timer.” That’s the utterance that’s translated into a string. But it hasn’t yet been parsed as language. That string is passed into a natural language understanding system. What comes is a data structure that represents the customers goal e.g. intent=timer; duration=15 minutes. That’s sent to the business logic where a timer is actually step. For a good voice interface, you also want to send back a response e.g. “setting timer for 15 minutes starting now.”

That seems simple enough, right? What’s so hard about designing for voice?

Natural language interfaces are a form of artifical intelligence so it’s not deterministic. There’s a lot of ruling out false positives. Unlike graphical interfaces, voice interfaces are driven by probability.

How do you turn a sound wave into an understandable instruction? It’s a lot like teaching a child. You feed a lot of data into a statistical model. That’s how machine learning works. It’s a probability game. That’s where it gets interesting for design—given a bunch of possible options, we need to use context to zero in on the most correct choice. This is where confidence ratings come in: the system will return the probability that a response is correct. Effectively, the system is telling you how sure or not it is about possible results. If the customer makes a request in an unusual or unexpected way, our system is likely to guess incorrectly. That’s because the system is being given something new.

Designing a conversation is relatively straightforward. But 80% of your voice design time will be spent designing for what happens when things go wrong. In voice recognition, edge cases are front and centre.

Here’s another challenge. Interaction with most voice interfaces is part conversation, part performance. Most interactions are not private.

Humans don’t distinguish digital speech fom human speech. That means these devices are intrinsically social. Our brains our wired to try to extract social information, even form digital speech. See, for example, why it’s such a big question as to what gender a voice interface has.

Delivering a voice interface

Storyboards help depict the context of use. Sample dialogues are your new wireframes. These are little scripts that not only cover the happy path, but also your edge case. Then you reverse engineer from there.

Flow diagrams communicate customer states, but don’t use the actual text in them.

Prompt lists are your final deliverable.

Functional prototypes are really important for voice interfaces. You’ll learn the real way that customers will ask for things.

If you build a working prototype, you’ll be building two things: a natural language interaction model (often a JSON file) and custom business logic (in a programming language).

Eventually voice design will become a core competency, much like mobile, which was once separate.

Ask yourself what tasks your customers complete on your site that feel clunkly. Remember that voice desing is almost never about new scenarious. Start your journey into voice interfaces by tackling old problems in new, more inclusive ways.

May the voice be with you!

Programming CSS

There’s a worrying tendency for “real” programmers look down their noses at CSS. It’s just a declarative language, they point out, not a fully-featured programming language. Heck, it isn’t even a scripting language.

That may be true, but that doesn’t mean that CSS isn’t powerful. It’s just powerful in different ways to traditional languages.

Take CSS selectors, for example. At the most basic level, they work like conditional statments. Here’s a standard if statement:

if (condition) {
// code here
}

The condition needs to evaluate to true in order for the code in the curly braces to be executed. Sound familiar?

condition {
// styles here
}

That’s a very simple mapping, but what if the conditional statement is more complicated?

if (condition1 && condition2) {
// code here
}

Well, that’s what the decendant selector does:

condition1 condition2 {
// styles here
}

In fact, we can get even more specific than that by using the child combinator, the sibling combinator, and the adjacent sibling combinator:

  • condition1 > condition2
  • condition1 ~ condition2
  • condition2 + condition2

AND is just one part of Boolean logic. There’s also OR:

if (condition1 || condition2) {
// code here
}

In CSS, we use commas:

condition1, condition2 {
// styles here
}

We’ve even got the :not() pseudo-class to complete the set of Boolean possibilities. Once you add quantity queries into the mix, made possible by :nth-child and its ilk, CSS starts to look Turing complete. I’ve seen people build state machines using the adjacent sibling combinator and the :checked pseudo-class.

Anyway, my point here is that CSS selectors are really powerful. And yet, quite often we deliberately choose not to use that power. The entire raison d’être for OOCSS, BEM, and Smacss is to deliberately limit the power of selectors, restricting them to class selectors only.

On the face of it, this might seem like an odd choice. After all, we wouldn’t deliberately limit ourselves to a subset of a programming language, would we?

We would and we do. That’s what templating languages are for. Whether it’s PHP’s Smarty or Twig, or JavaScript’s Mustache, Nunjucks, or Handlebars, they all work by providing a deliberately small subset of features. Some pride themselves on being logic-less. If you find yourself trying to do something that the templating language doesn’t provide, that’s a good sign that you shouldn’t be trying to do it in the template at all; it should be in the controller.

So templating languages exist to enforce simplicity and ensure that the complexity happens somewhere else. It’s a similar story with BEM et al. If you find you can’t select something in the CSS, that’s a sign that you probably need to add another class name to the HTML. The complexity is confined to the markup in order to keep the CSS more straightforward, modular, and maintainable.

But let’s not forget that that’s a choice. It’s not that CSS in inherently incapable of executing complex conditions. Quite the opposite. It’s precisely because CSS selectors (and the cascade) are so powerful that we choose to put guard rails in place.

Optimise without a face

I’ve been playing around with the newly-released Squoosh, the spiritual successor to Jake’s SVGOMG. You can drag images into the browser window, and eyeball the changes that any optimisations might make.

On a project that Cassie is working on, it worked really well for optimising some JPEGs. But there were a few images that would require a bit more fine-grained control of the optimisations. Specifically, pictures with human faces in them.

I’ve written about this before. If there’s a human face in image, I open that image in a graphics editing tool like Photoshop, select everything but the face, and add a bit of blur. Because humans are hard-wired to focus on faces, we’ll notice any jaggy artifacts on a face, but we’re far less likely to notice jagginess in background imagery: walls, materials, clothing, etc.

On the face of it (hah!), a browser-based tool like Squoosh wouldn’t be able to optimise for faces, but then Cassie pointed out something really interesting…

When we were both at FFConf on Friday, there was a great talk by Eleanor Haproff on machine learning with JavaScript. It turns out there are plenty of smart toolkits out there, and one of them is facial recognition. So I wonder if it’s possible to build an in-browser tool with this workflow:

  • Drag or upload an image into the browser window,
  • A facial recognition algorithm finds any faces in the image,
  • Those portions of the image remain crisp,
  • The rest of the image gets a slight blur,
  • Download the optimised image.

Maybe the selecting/blurring part would need canvas? I don’t know.

Anyway, I thought this was a brilliant bit of synthesis from Cassie, and now I’ve got two questions:

  1. Does this exist yet? And, if not,
  2. Does anyone want to try building it?

Machine supplying

I wrote a little something recently about some inspiring projects that people are working on. Like Matt’s Machine Supply project. There’s a physical side to that project—a tweeting book-vending machine in London—but there’s also the newsletter, 3 Books Weekly.

I was honoured to be asked by Matt to contribute three book recommendations. That newsletter went out last week. Here’s what I said…

The Victorian Internet by Tom Standage

A book about the history of telegraphy might not sound like the most riveting read, but The Victorian Internet is both fascinating and entertaining. Techno-utopianism, moral panic, entirely new ways of working, and a world that has been utterly transformed: the parallels between the telegraph and the internet are laid bare. In fact, this book made me realise that while the internet has been a great accelerator, the telegraph was one of the few instances where a technology could truly be described as “disruptive.”

Ancillary Justice: 1 (Imperial Radch) by Ann Leckie

After I finished reading the final Iain M. Banks novel I was craving more galaxy-spanning space opera. The premise of Ancillary Justice with its description of “ship minds” led me to believe that this could be picking up the baton from the Culture series. It isn’t. This is an entirely different civilisation, one where song-collecting and tea ceremonies have as much value as weapons and spacecraft. Ancillary Justice probes at the deepest questions of identity, both cultural and personal. As well as being beautifully written, it’s also a rollicking good revenge thriller.

The City & The City by China Miéville

China Miéville’s books are hit-and-miss for me, but this one is a direct hit. The central premise of this noir-ish tale defies easy description, so I won’t even try. In fact, one of the great pleasures of this book is to feel the way your mind is subtly contorted by the author to accept a conceit that should be completely unacceptable. Usually when a book is described as “mind-altering” it’s a way of saying it has drug-like properties, but The City & The City is mind-altering in an entirely different and wholly unique way. If Borges and Calvino teamed up to find The Maltese Falcon, the result would be something like this.

When I sent off my recommendations, I told Matt:

Oh man, it was so hard to narrow this down! So many books I wanted to mention: Station 11, The Peripheral, The Gone-Away World, Glasshouse, Foucault’s Pendulum, Oryx and Crake, The Wind-up Girl …this was so much tougher than I thought it was going to be.

And Matt said:

Tell you what — if you’d be up for writing recommendations for another 3 books, from those ones you mentioned, I’d love to feature those in the machine!

Done!

Station Eleven by Emily St. John Mandel

Station Eleven made think about the purpose of art and culture. If art, as Brian Eno describes it, is “everything that you don’t have to do”, what happens to art when the civilisational chips are down? There are plenty of post-pandemic stories of societal collapse. But there’s something about this one that sets it apart. It doesn’t assume that humanity will inevitably revert to an existence that is nasty, brutish and short. It’s also a beautifully-written book. The opening chapter completely sucker-punched me.

Glasshouse by Charles Stross

On the face of it, this appears to be another post-Singularity romp in a post-scarcity society. It is, but it’s also a damning critique of gamification. Imagine the Stanford prison experiment if it were run by godlike experimenters. Stross’s Accelerando remains the definitive description of an unfolding Singularity, but Glasshouse is the one that has stayed with me.

The Gone-Away World by Nick Harkaway

This isn’t an easy book to describe, but it’s a very easy book to enjoy. A delightful tale of a terrifying apocalypse, The Gone-Away World has plenty of laughs to balance out the existential dread. Try not to fall in love with the charming childhood world of the narrator—you know it can’t last. But we’ll always have mimes and ninjas.

I must admit, it’s a really lovely feeling to get notified on Twitter when someone buys one of the recommended books.

Small pieces, loosely joined by machine tags

I’ve already described how machine tags on Huffduffer trigger a number of third-party API calls. Tagging something with music:artist=..., book:author=..., film:title=... or any number of similar machine tags will fire off calls to places like Amazon, The New York Times, or Last.fm.

For a while now, I’ve wanted to include Flickr in that list of third-party services but I couldn’t think of an easy way of associating audio files with photos. Then I realised that a mechanism already exists, and it’s another machine tag. Anything on Flickr that’s been tagged with lastfm:event=... will probably be a picture of a musical artist.

So if anything is tagged on Huffduffer with music:artist=..., all I need to do is fire off a call to Last.fm to get a list of that artist’s events using the method artist.getEvents. Once I have the event IDs I can search Flickr for photos that have been machine tagged with those IDs.

There’s just one problem. Last.fm’s API only returns future events for an artist. There’s no method for past events.

Undeterred, I found a RESTful interface that provides the past events of an artist on Last.fm. The format returned isn’t JSON or XML. It’s HTML. It turns out that past events are freely available in the profile for any artist on Last.fm with the identifier last.fm/music/{artist}/+events/{year}. Here, for example, are Salter Cane gigs in 2009: last.fm/music/Salter+Cane/+events/2009.

If only those events were structured in hCalendar! As it is, I have to run through all the links in the document to find the hrefs beginning with the string http://www.last.fm/event/ and then extract the event ID that immediately follows that string.

Once I’ve extracted the event IDs for an artist, I can fire off a search on Flickr using the flickr.photos.search method with a machine_tags parameter (as well as passing the artist name in the text parameter just to be sure).

Here’s an example result in the sidebar on Huffduffer: huffduffer.com/tags/music:artist=Bat+for+Lashes

It’s messy but it works. I guess that’s the dictionary definition of a hack.

Machine tag browsing

After I started rewarding machine tagging on Huffduffer with API calls to Amazon and Last.fm, people started using them quite a bit. But when it came to displaying tag clouds, I wasn’t treating machine tags any differently to other tags. Everything was being displayed in one big cloud.

I decided it would be good to separate out machine tags and display them after displaying “regular” tags. That started me thinking about how best to display machine tags.

One of the best machine tag visualisations I’ve seen so far is Paul Mison’s Flickr machine tag browser, somewhat like the list view in OS X’s Finder. Initially, I tried doing something similar for Huffduffer: a table with three columns; namespace, predicate, and values.

That morphed into a two column layout (predicate and values) with the namespace spanning both columns. The values themselves are still displayed as a cloud to indicate usage.

Huffduffer machine tags

This is marked up as a table. The namespace is in a th inside the thead. In the tbody, each tr contains a th for the predicate and td for the values.

<table>
 <thead>
  <tr>
   <th colspan="2"><a href="/tags/book">book</a></th>
  </tr>
 </thead>
 <tbody>
  <tr>
   <th><a href="/tags/book:author">author</a></th>
   <td><a href="/tags/book:author=arthur+c.+clarke">arthur c. clarke</a></td>
  </tr>
 </tbody>
</table>

Table markup allows for some nice :hover styles (in browsers that allow :hover styles on more than links). Whenever you hover over a table cell, you are also hovering over a table row and a table. By setting :hover states on all three elements, wayfinding becomes a bit clearer.

table:hover thead th a
table tbody tr:hover th a
table tbody tr td a:hover

Huffduffer machine tags on hover

See for yourself. I think it’s a pretty sturdy markup and style pattern that I’ll probably use again.

Machine-tagging Huffduffer some more

After I wrote about the hoops I had to jump through to get Amazon’s API to output JSON (via XSLT), Tom detailed a way of avoiding JSON by using XML-RPC. That’s very kind of him but the truth is that:

  1. I like dealing with JSON and
  2. the XSL transformation is done by Amazon, not me; that wouldn’t be the case if I used XML-RPC.

Anyway, having successfully created a Huffduffer-Amazon bridge using machine tags, I thought I’d do a little more hacking. Instead of restricting the mashup love to Amazon, I figured that Last.fm would be the perfect place to pull in information for anything tagged with the music namespace.

Last.fm has quite a full-featured API and yes, it can output JSON. To start with, I’m using the artist.getInfo method for anything tagged with music:artist=..., music:singer=... or music:band=.... Here are some examples:

I’m pulling a summary of the artist’s bio, a list of similar artists and a picture of the artist in question. For maximum effect, view in Safari, the browser with the finest implementation of .

Nice as Last.fm’s API is, it’s not without its quirks. Like most APIs, the methods are divided into those that require (anything of a sensitive nature) and those that don’t (publicly available information). The method user.getInfo requires authentication. Yet, every piece of information returned by that method is available on the public profile.

So when I wanted to find a Last.fm user’s profile picture—having figured out through when someone on Huffduffer has a Last.fm account—it made far more sense for me to use to parse the microformatted public URL than to use the API method.

Just over two years ago, Drew delivered a superb presentation called In some situations, the answer is definitely “Yes.”

Update: It all ties together, as Julian explains on Twitter:

@adactio ha, I went to Drew’s presentation you mentioned on your blog; it made me add microformats to Last.fm in the first place :D

Machine-tagging Huffduffer

Over the weekend I was looking at the latest additions to Huffduffer. I noticed that Xavier Roy was using to tag a reading by Richard Dawkins. What an excellent idea!

I set aside a little time to do a little hacking with Amazon’s API. Now you can tag stuff on Huffduffer with machine tags like book:author=steven johnson, book:title=the invention of air or music:artist=my morning jacket. Other namespaces are film and movie. Anything matching that pattern will trigger a search on Amazon and display a list of results.

Amazon’s API was one of the first I ever messed about with, first on The Session and later on Adactio Elsewhere. There are things I really like about it and things I really dislike.

I dislike the fact that there’s no option to receive JSON instead of XML. However, one of the things I like is the option to pass the URL of an file to transform the XML (I wish more APIs offered that service). So even though JSON isn’t officially offered, it’s perfectly feasible to generate JSON from the combination of XML + XSL. That’s what I did for the Huffduffer hacking—I find it a lot easier to deal with JSON than XML in PHP5. If you fancy doing something similar, help yourself to my XSL file. It’s very basic but it could make a decent starting point.

But the thing I dislike the most about the Amazon API is the documentation. It’s not that there’s a lack of documentation. Far from it. It’s just not organised very well. I find it very hard to get the information I need, even when I know that the information is there somewhere. Flickr still leads the pack when it comes to API documentation. Amazon would do well to take a leaf out of Flickr’s documentation book (hope you’re listening, Jeff).