Journal tags: machine

17

sparkline

Nailspotting

I’m sure you’ve heard the law of the instrument: when all you have is a hammer, everything looks like a nail.

There’s another side to it. If you’re selling hammers, you’ll depict a world full of nails.

Recent hammers include cryptobollocks and virtual reality. It wasn’t enough for blockchains and the metaverse to be potentially useful for some situations; they staked their reputations on being utterly transformative, disrupting absolutely every facet of life.

This kind of hype is a terrible strategy in the long-term. But if you can convince enough people in the short term, you can make a killing on the stock market. In truth, the technology itself is superfluous. It’s the hype that matters. And if the hype is over-inflated enough, you can even get your critics to do your work for you, broadcasting their fears about these supposedly world-changing technologies.

You’d think we’d learn. If an industry cries wolf enough times, surely we’d become less trusting of extraordinary claims. But the tech industry continues to cry wolf—or rather, “hammer!”—at regular intervals.

The latest hammer is machine learning, usually—incorrectly—referred to as Artificial Intelligence. What makes this hype cycle particularly infuriating is that there are genuine use cases. There are some nails for this hammer. They’re just not as plentiful as the breathless hype—both positive and negative—would have you believe.

When I was hosting the DiBi conference last week, there was a little section on generative “AI” tools. Matt Garbutt covered the visual side, demoing tools like Midjourney. Scott Salisbury covered the text side, showing how you can generate code. Afterwards we had a panel discussion.

During the panel I asked some fairly straightforward questions that nobody could answer. Who owns the input (the data used by these generative tools)? Who owns the output?

On the whole, it stayed quite grounded and mercifully free of hyperbole. Both speakers were treating the current crop of technologies as tools. Everyone agreed we were on the hype cycle, probably the peak of inflated expectations, looking forward to reaching the plateau of productivity.

Scott explicitly warned people off using generative tools for production code. His advice was to stick to side projects for now.

Matt took a closer look at where these tools could fit into your day-to-day design work. Mostly it was pretty sensible, except when he suggested that there could be any merit to using these tools as a replacement for user testing. That’s a terrible idea. A classic hammer/nail mismatch.

I think I moderated the panel reasonably well, but I have one regret. I wish I had first read Baldur Bjarnason’s new book, The Intelligence Illusion. I started reading it on the train journey back from Edinburgh but it would have been perfect for the panel.

The Intelligence Illusion is very level-headed. It is neither pro- nor anti-AI. Instead it takes a pragmatic look at both the benefits and the risks of using these tools in your business.

It has excellent advice for spotting genuine nails. For example:

Generative AI has impressive capabilities for converting and modifying seemingly unstructured data, such as prose, images, and audio. Using these tools for this purpose has less copyright risk, fewer legal risks, and is less error prone than using it to generate original output.

Think about transcripts of videos or podcasts—an excellent use of this technology. As Baldur puts it:

The safest and, probably, the most productive way to use generative AI is to not use it as generative AI. Instead, use it to explain, convert, or modify.

He also says:

Prefer internal tools over externally-facing chatbots.

That chimes with what I’ve been seeing. The most interesting uses of this technology that I’ve seen involve a constrained dataset. Like the way Luke trained a language model on his own content to create a useful chat interface.

Anyway, The Intelligence Illusion is full of practical down-to-earth advice based on plenty of research backed up with copious citations. I’m only halfway through it and it’s already helped me separate the hype from the reality.

Steam

Picture someone tediously going through a spreadsheet that someone else has filled in by hand and finding yet another error.

“I wish to God these calculations had been executed by steam!” they cry.

The year was 1821 and technically the spreadsheet was a book of logarithmic tables. The frustrated cry came from Charles Babbage, who channeled his frustration into a scheme to create the world’s first computer.

His difference engine didn’t work out. Neither did his analytical engine. He’d spend his later years taking his frustrations out on street musicians, which—as a former busker myself—earns him a hairy eyeball from me.

But we’ve all been there, right? Some tedious task that feels soul-destroying in its monotony. Surely this is exactly what machines should be doing?

I have a hunch that this is where machine learning and large language models might turn out to be most useful. Not in creating breathtaking works of creativity, but in menial tasks that nobody enjoys.

Someone was telling me earlier today about how they took a bunch of haphazard notes in a client meeting. When the meeting was done, they needed to organise those notes into a coherent summary. Boring! But ChatGPT handled it just fine.

I don’t think that use-case is going to appear on the cover of Wired magazine anytime soon but it might be a truer glimpse of the future than any of the breathless claims being eagerly bandied about in Silicon Valley.

You know the way we no longer remember phone numbers, because, well, why would we now that we have machines to remember them for us? I’d be quite happy if machines did that for the annoying little repetitive tasks that nobody enjoys.

I’ll give you an example based on my own experience.

Regular expressions are my kryptonite. I’m rubbish at them. Any time I have to figure one out, the knowledge seeps out of my brain before long. I think that’s because I kind of resent having to internalise that knowledge. It doesn’t feel like something a human should have to know. “I wish to God these regular expressions had been calculated by steam!”

Now I can get a chatbot with a large language model to write the regular expression for me. I still need to describe what I want, so I need to write the instructions clearly. But all the gobbledygook that I’m writing for a machine now gets written by a machine. That seems fair.

Mind you, I wouldn’t blindly trust the output. I’d take that regular expression and run it through a chatbot, maybe a different chatbot running on a different large language model. “Explain what this regular expression does,” would be my prompt. If my input into the first chatbot matches the output of the second, I’d have some confidence in using the regular expression.

A friend of mine told me about using a large language model to help write SQL statements. He described his database structure to the chatbot, and then described what he wanted to select.

Again, I wouldn’t use that output without checking it first. But again, I might use another chatbot to do that checking. “Explain what this SQL statement does.”

Playing chatbots off against each other like this is kinda how machine learning works under the hood: generative adverserial networks.

Of course, the task of having to validate the output of a chatbot by checking it with another chatbot could get quite tedious. “I wish to God these large language model outputs had been validated by steam!”

Sounds like a job for machines.

Disclosure

You know how when you’re on hold to any customer service line you hear a message that thanks you for calling and claims your call is important to them. The message always includes a disclaimer about calls possibly being recorded “for training purposes.”

Nobody expects that any training is ever actually going to happen—surely we would see some improvement if that kind of iterative feedback loop were actually in place. But we most certainly want to know that a call might be recorded. Recording a call without disclosure would be unethical and illegal.

Consider chatbots.

If you’re having a text-based (or maybe even voice-based) interaction with a customer service representative that doesn’t disclose its output is the result of large language models, that too would be unethical. But, at the present moment in time, it would be perfectly legal.

That needs to change.

I suspect the necessary legislation will pass in Europe first. We’ll see if the USA follows.

In a way, this goes back to my obsession with seamful design. With something as inherently varied as the output of large language models, it’s vital that people have some way of evaluating what they’re told. I believe we should be able to see as much of the plumbing as possible.

The bare minimum amount of transparency is revealing that a machine is in the loop.

This shouldn’t be a controversial take. But I guarantee we’ll see resistance from tech companies trying to sell their “AI” tools as seamless, indistinguishable drop-in replacements for human workers.

Guessing

The last talk at the last dConstruct was by local clever clogs Anil Seth. It was called Your Brain Hallucinates Your Conscious Reality. It’s well worth a listen.

Anil covers a lot of the same ground in his excellent book, Being You. He describes a model of consciousness that inverts our intuitive understanding.

We tend to think of our day-to-day reality in a fairly mechanical cybernetic manner; we receive inputs through our senses and then make decisions about reality informed by those inputs.

As another former dConstruct speaker, Adam Buxton, puts it in his interview with Anil, it feels like that old Beano cartoon, the Numskulls, with little decision-making homonculi inside our head.

But Anil posits that it works the other way around. We make a best guess of what the current state of reality is, and then we receive inputs from our senses, and then we adjust our model accordingly. There’s still a feedback loop, but cause and effect are flipped. First we predict or guess what’s happening, then we receive information. Rinse and repeat.

The book goes further and applies this to our very sense of self. We make a best guess of our sense of self and then adjust that model constantly based on our experiences.

There’s a natural tendency for us to balk at this proposition because it doesn’t seem rational. The rational model would be to make informed calculations based on available data …like computers do.

Maybe that’s what sets us apart from computers. Computers can make decisions based on data. But we can make guesses.

Enter machine learning and large language models. Now, for the first time, it appears that computers can make guesses.

The guess-making is not at all like what our brains do—large language models require enormous amounts of inputs before they can make a single guess—but still, this should be the breakthrough to be shouted from the rooftops: we’ve taught machines how to guess!

And yet. Almost every breathless press release touting some revitalised service that uses AI talks instead about accuracy. It would be far more honest to tout the really exceptional new feature: imagination.

Using AI, we will guess who should get a mortgage.

Using AI, we will guess who should get hired.

Using AI, we will guess who should get a strict prison sentence.

Reframed like that, it’s easy to see why technologists want to bury the lede.

Alas, this means that large language models are being put to use for exactly the wrong kind of scenarios.

(This, by the way, is also true of immersive “virtual reality” environments. Instead of trying to accurately recreate real-world places like meeting rooms, we should be leaning into the hallucinatory power of a technology that can generate dream-like situations where the pleasure comes from relinquishing control.)

Take search engines. They’re based entirely on trust and accuracy. Introducing a chatbot that confidentally conflates truth and fiction doesn’t bode well for the long-term reputation of that service.

But what if this is an interface problem?

Currently facts and guesses are presented with equal confidence, hence the accurate descriptions of the outputs as bullshit or mansplaining as a service.

What if the more fanciful guesses were marked as such?

As it is, there’s a “temperature” control that can be adjusted when generating these outputs; the more the dial is cranked, the further the outputs will stray from the safest predictions. What if that could be reflected in the output?

I don’t know what that would look like. It could be typographic—some markers to indicate which bits should be taken with pinches of salt. Or it could be through content design—phrases like “Perhaps…”, “Maybe…” or “It’s possible but unlikely that…”

I’m sure you’ve seen the outputs when people request that ChatGPT write their biography. Perfectly accurate statements are generated side-by-side with complete fabrications. This reinforces our scepticism of these tools. But imagine how differently the fabrications would read if they were preceded by some simple caveats.

A little bit of programmed humility could go a long way.

Right now, these chatbots are attempting to appear seamless. If 80% or 90% of their output is accurate, then blustering through the other 10% or 20% should be fine, right? But I think the experience for the end user would be immensely more empowering if these chatbots were designed seamfully. Expose the wires. Show the workings-out.

Mind you, that only works if there is some way to distinguish between fact and fabrication. If there’s no way to tell how much guessing is happening, then that’s a major problem. If you can’t tell me whether something is 50% true or 75% true or 25% true, then the only rational response is to treat the entire output as suspect.

I think there’s a fundamental misunderstanding behind the design of these chatbots that goes all the way back to the Turing test. There’s this idea that the way to make a chatbot believable and trustworthy is to make it appear human, attempting to hide the gears of the machine. But the real way to gain trust is through honesty.

I want a machine to tell me when it’s guessing. That won’t make me trust it less. Quite the opposite.

After all, to guess is human.

Like

We use metaphors all the time. To quote George Lakoff, we live by them.

We use analogies some of the time. They’re particularly useful when we’re wrapping our heads around something new. By comparing something novel to something familiar, we can make a shortcut to comprehension, or at least, categorisation.

But we need a certain amount of vigilance when it comes to analogies. Just because something is like something else doesn’t mean it’s the same.

With that in mind, here are some ways that people are describing generative machine learning tools. Large language models are like…

Bookshop

Back at the start of the (first) lockdown, I wrote about using my website as an outlet:

While you’re stuck inside, your website is not just a place you can go to, it’s a place you can control, a place you can maintain, a place you can tidy up, a place you can expand. Most of all, it’s a place you can lose yourself in, even if it’s just for a little while.

Last week was eventful and stressful. For everyone. I found myself once again taking refuge in my website, tinkering with its inner workings in the way that someone else would potter about in their shed or take to their garage to strip down the engine of some automotive device.

Colly drew my attention to Bookshop.org, newly launched in the UK. It’s an umbrella website for independent bookshops to sell through. It’s also got an affiliate scheme, much like Amazon. I set up a Bookshop page for myself.

I’ve been tracking the books I’m reading for the past three years here on my own website. I set about reproducing that list on Bookshop.

It was exactly the kind of not-exactly-mindless but definitely-not-challenging task that was perfect for the state of my brain last week. Search for a book; find the ISBN number; paste that number into a form. It’s the kind of task that a real programmer would immediately set about automating but one that I embraced as a kind of menial task to keep me occupied.

I wasn’t able to get a one-to-one match between the list on my site and my reading list on Bookshop. Some titles aren’t available in the online catalogue. For example, the book I’m reading right now—A Paradise Built in Hell by Rebecca Solnit—is nowhere to be found, which seems like an odd omission.

But most of the books I’ve read are there on Bookshop.org, complete with pretty book covers. Then I decided to reverse the process of my menial task. I took all of the ISBN numbers from Bookshop and add them as machine tags to my reading notes here on my own website. Book cover images on Bookshop have predictable URLs that use the ISBN number (well, technically the EAN number, or ISBN-13, but let’s not go down a 927 rabbit hole here). So now I’m using that metadata to pull in images from Bookshop.org to illustrate my reading notes here on adactio.com.

I’m linking to the corresponding book on Bookshop.org using this URL structure:

https://uk.bookshop.org/a/{{ affiliate code }}/{{ ISBN number }}

I realised that I could also link to the corresponding entry on Open Library using this URL structure:

https://openlibrary.org/isbn/{{ ISBN number }}

Here, for example, is my note for The Raven Tower by Ann Leckie. That entry has a tag:

book:ean=9780356506999

With that information I can illustrate my note with this image:

https://images-eu.bookshop.org/product-images/images/9780356506999.jpg

I’m linking off to this URL on Bookshop.org:

https://uk.bookshop.org/a/980/9780356506999

And this URL on Open Library:

https://openlibrary.org/isbn/9780356506999

The end result is that my reading list now has more links and pretty pictures.

Oh, I also set up a couple of shorter lists on Bookshop.org:

The books listed in those are drawn from my end of the year round-ups when I try to pick one favourite non-fiction book and one favourite work of fiction (almost always speculative fiction). The books in those two lists are the ones that get two hearty thumbs up from me. If you click through to buy one of them, the price might not be as cheap as on Amazon, but you’ll be supporting an independent bookshop.

The Machines Stop

The Situation feels like it’s changing. It’s not over, not by a long shot. But it feels like it’s entering a different, looser phase.

Throughout the lockdown, there’s been a strange symmetry between the outside world and the inside of our home. As the outside world slowed to a halt, so too did half the machinery in our flat. Our dishwasher broke shortly before the official lockdown began. So did our washing machine.

We had made plans for repairs and replacements, but as events in the world outside escalated, those plans had to be put on hold. Plumbers and engineers weren’t making any house calls, and rightly so.

We even had the gas to our stovetop cut off for a while—you can read Jessica’s account of that whole affair. All the breakdowns just added to the entropic Ballardian mood.

But the gas stovetop was fixed. And so too was the dishwasher, eventually. Just last week, we got our new washing machine installed. Piece by piece, the machinery of our interier world revived in lockstep with the resucitation of the world outside.

As of today, pubs will be open. I won’t be crossing their thresholds just yet. We know so much more about the spread of the virus now, and gatherings of people in indoor spaces are pretty much the worst environments for transmission.

I’m feeling more sanguine about outdoor spaces. Yesterday, Jessica and I went into town for Street Diner. It was the first time since March that we walked in that direction—our other excursions have been in the direction of the countryside.

It was perfectly fine. We wore masks, and while we were certainly in the minority, we were not alone. People were generally behaving responsibly.

Brighton hasn’t done too badly throughout The Situation. But still, like I said, I have no plans to head to the pub on a Saturday night. The British drinking culture is very much concentrated on weekends. Stay in all week and then on the weekend, lassen die Sau raus!, as the Germans would say.

After months of lockdown, reopening pubs on a Saturday seems like a terrible idea. Over in Ireland, pubs have been open since Monday—a sensible day to soft-launch. With plenty of precautions in place, things are going well there.

I’ve been watching The Situation in Ireland throughout. It’s where my mother lives, so I was understandably concerned. But they’ve handled everything really well. It’s not New Zealand, but it’s also not the disaster that is the UK.

It really has been like watching an A/B test run at the country level. Two very similar populations confronted with exactly the same crisis. Ireland took action early, cancelling the St. Patrick’s Day parade(!) while the UK was still merrily letting Cheltenham go ahead. Ireland had clear guidance. The UK had dilly-dallying and waffling. And when the shit really hit the fan, the Irish taoiseach rolled up his sleeves and returned to medical work. Meanwhile the UK had Dominic Cummings making a complete mockery of the sacrifices that everyone was told to endure.

What’s strange is that people here in the UK don’t seem to realise how the rest of the world, especially other European countries, have watched the response here with shock and horror. The narrative here seems to be that we all faced this thing together, and with our collective effort, we averted the worst. But the numbers tell a very different story. Comparing the numbers here with the numbers in Ireland—or pretty much any other country in Europe—is sobering.

So even though the timelines for reopenings here converge with Ireland’s, The Situation is far from over.

Even without any trips to pubs, restaurants, or other indoor spaces, I’m looking forward to making some more excursions into town. Not that it’s been bad staying at home. I’ve really quite enjoyed staying put, playing music, reading books, and watching television.

I was furloughed from work for a while in June. Normally, my work at this time of year would involve plenty of speaking at conferences. Seeing as that wasn’t happening, it made sense to take advantage of the government scheme to go into work hibernation for a bit.

I was worried I might feel at a bit of a loose end, but I actually really enjoyed it. The weather was good so I spent quite a bit of time just sitting in the back garden, reading (I am very, very grateful to have even a small garden). I listened to music. I watched movies. I surfed the web. Yes, properly surfed the web, going from link to link, get lost down rabbit holes. I tell you, this World Wide Web thing is pretty remarkable. Some days I used it to read up on science or philosophy. I spent a week immersed in Napoleonic history. I have no idea how or why. But it was great.

I’m back at work now, and have been for a couple of weeks. But I wouldn’t mind getting furloughed again. It felt kind of like being retired. I’m quite okay with the propsect of retirement now, as long as we have music and sunshine and the World Wide Web.

That’s the future. For now, The Situation continues, albeit in looser form.

I’ve really enjoyed reading other people’s accounts throughout. My RSS reader is getting a good workout. I always look forward to weeknotes from Alice, Nat, and Phil (this piece from Phil has really stuck with me). Jessica has written fifteen installments—and counting—of A Journal of the Plague Week. I know I’m biased, but I think it’s some mighty fine writing. Start here.

Voice User Interface Design by Cheryl Platz

Cheryl Platz is speaking at An Event Apart Chicago. Her inaugural An Event Apart presentation is all about voice interfaces, and I’m going to attempt to liveblog it…

Why make a voice interface?

Successful voice interfaces aren’t necessarily solving new problems. They’re used to solve problems that other devices have already solved. Think about kitchen timers. There are lots of ways to set a timer. Your oven might have one. Your phone has one. Why use a $200 device to solve this mundane problem? Same goes for listening to music, news, and weather.

People are using voice interfaces for solving ordinary problems. Why? Context matters. If you’re carrying a toddler, then setting a kitchen timer can be tricky so a voice-activated timer is quite appealing. But why is voice is happening now?

Humans have been developing the art of conversation for thousands of years. It’s one of the first skills we learn. It’s deeply instinctual. Most humans use speach instinctively every day. You can’t necessarily say that about using a keyboard or a mouse.

Voice-based user interfaces are not new. Not just the idea—which we’ve seen in Star Trek—but the actual implementation. Bell Labs had Audrey back in 1952. It recognised ten words—the digits zero through nine. Why did it take so long to get to Alexa?

In the late 70s, DARPA issued a challenge to create a voice-activated system. Carnagie Mellon came up with Harpy (with a thousand word grammar). But none of the solutions could respond in real time. In conversation, we expect a break of no more than 200 or 300 milliseconds.

In the 1980s, computing power couldn’t keep up with voice technology, so progress kind of stopped. Time passed. Things finally started to catch up in the 90s with things like Dragon Naturally Speaking. But that was still about vocabulary, not grammar. By the 2000s, small grammars were starting to show up—starting an X-Box or pausing Netflix. In 2008, Google Voice Search arrived on the iPhone and natural language interaction began to arrive.

What makes natural language interactions so special? It requires minimal training because it uses the conversational muscles we’ve been working for a lifetime. It unlocks the ability to have more forgiving, less robotic conversations with devices. There might be ten different ways to set a timer.

Natural language interactions can also free us from “screen magnetism”—that tendency to stay on a device even when our original task is complete. Voice also enables fast and forgiving searches of huge catalogues without time spent typing or browsing. You can pick a needle straight out of a haystack.

Natural language interactions are excellent for older customers. These interfaces don’t intimidate people without dexterity, vision, or digital experience. Voice input often leads to more inclusive experiences. Many customers with visual or physical disabilities can’t use traditional graphical interfaces. Voice experiences throw open the door of opportunity for some people. However, voice experience can exclude people with speech difficulties.

Making the case for voice interfaces

There’s a misconception that you need to work at Amazon, Google, or Apple to work on a voice interface, or at least that you need to have a big product team. But Cheryl was able to make her first Alexa “skill” in a week. If you’re a web developer, you’re good to go. Your voice “interaction model” is just JSON.

How do you get your product team on board? Find the customers (and situations) you might have excluded with traditional input. Tell the stories of people whose hands are full, or who are vision impaired. You can also point to the adoption rate numbers for smart speakers.

You’ll need to show your scenario in context. Otherwise people will ask, “why can’t we just build an app for this?” Conduct research to demonstrate the appeal of a voice interface. Storyboarding is very useful for visualising the context of use and highlighting existing pain points.

Getting started with voice interfaces

You’ve got to understand how the technology works in order to adapt to how it fails. Here are a few basic concepts.

Utterance. A word, phrase, or sentence spoken by a customer. This is the true form of what the customer provides.

Intent. This is the meaning behind a customer’s request. This is an important distinction because one intent could have thousands of different utterances.

Prompt. The text of a system response that will be provided to a customer. The audio version of a prompt, if needed, is generated separately using text to speech.

Grammar. A finite set of expected utterances. It’s a list. Usually, each entry in a grammar is paired with an intent. Many interfaces start out as being simple grammars before moving on to a machine-learning model later once the concept has been proven.

Here’s the general idea with “artificial intelligence”…

There’s a human with a core intent to do something in the real world, like knowing when the cookies in the oven are done. This is translated into an intent like, “set a 15 minute timer.” That’s the utterance that’s translated into a string. But it hasn’t yet been parsed as language. That string is passed into a natural language understanding system. What comes is a data structure that represents the customers goal e.g. intent=timer; duration=15 minutes. That’s sent to the business logic where a timer is actually step. For a good voice interface, you also want to send back a response e.g. “setting timer for 15 minutes starting now.”

That seems simple enough, right? What’s so hard about designing for voice?

Natural language interfaces are a form of artifical intelligence so it’s not deterministic. There’s a lot of ruling out false positives. Unlike graphical interfaces, voice interfaces are driven by probability.

How do you turn a sound wave into an understandable instruction? It’s a lot like teaching a child. You feed a lot of data into a statistical model. That’s how machine learning works. It’s a probability game. That’s where it gets interesting for design—given a bunch of possible options, we need to use context to zero in on the most correct choice. This is where confidence ratings come in: the system will return the probability that a response is correct. Effectively, the system is telling you how sure or not it is about possible results. If the customer makes a request in an unusual or unexpected way, our system is likely to guess incorrectly. That’s because the system is being given something new.

Designing a conversation is relatively straightforward. But 80% of your voice design time will be spent designing for what happens when things go wrong. In voice recognition, edge cases are front and centre.

Here’s another challenge. Interaction with most voice interfaces is part conversation, part performance. Most interactions are not private.

Humans don’t distinguish digital speech fom human speech. That means these devices are intrinsically social. Our brains our wired to try to extract social information, even form digital speech. See, for example, why it’s such a big question as to what gender a voice interface has.

Delivering a voice interface

Storyboards help depict the context of use. Sample dialogues are your new wireframes. These are little scripts that not only cover the happy path, but also your edge case. Then you reverse engineer from there.

Flow diagrams communicate customer states, but don’t use the actual text in them.

Prompt lists are your final deliverable.

Functional prototypes are really important for voice interfaces. You’ll learn the real way that customers will ask for things.

If you build a working prototype, you’ll be building two things: a natural language interaction model (often a JSON file) and custom business logic (in a programming language).

Eventually voice design will become a core competency, much like mobile, which was once separate.

Ask yourself what tasks your customers complete on your site that feel clunkly. Remember that voice desing is almost never about new scenarious. Start your journey into voice interfaces by tackling old problems in new, more inclusive ways.

May the voice be with you!

Optimise without a face

I’ve been playing around with the newly-released Squoosh, the spiritual successor to Jake’s SVGOMG. You can drag images into the browser window, and eyeball the changes that any optimisations might make.

On a project that Cassie is working on, it worked really well for optimising some JPEGs. But there were a few images that would require a bit more fine-grained control of the optimisations. Specifically, pictures with human faces in them.

I’ve written about this before. If there’s a human face in image, I open that image in a graphics editing tool like Photoshop, select everything but the face, and add a bit of blur. Because humans are hard-wired to focus on faces, we’ll notice any jaggy artifacts on a face, but we’re far less likely to notice jagginess in background imagery: walls, materials, clothing, etc.

On the face of it (hah!), a browser-based tool like Squoosh wouldn’t be able to optimise for faces, but then Cassie pointed out something really interesting…

When we were both at FFConf on Friday, there was a great talk by Eleanor Haproff on machine learning with JavaScript. It turns out there are plenty of smart toolkits out there, and one of them is facial recognition. So I wonder if it’s possible to build an in-browser tool with this workflow:

  • Drag or upload an image into the browser window,
  • A facial recognition algorithm finds any faces in the image,
  • Those portions of the image remain crisp,
  • The rest of the image gets a slight blur,
  • Download the optimised image.

Maybe the selecting/blurring part would need canvas? I don’t know.

Anyway, I thought this was a brilliant bit of synthesis from Cassie, and now I’ve got two questions:

  1. Does this exist yet? And, if not,
  2. Does anyone want to try building it?

Machine supplying

I wrote a little something recently about some inspiring projects that people are working on. Like Matt’s Machine Supply project. There’s a physical side to that project—a tweeting book-vending machine in London—but there’s also the newsletter, 3 Books Weekly.

I was honoured to be asked by Matt to contribute three book recommendations. That newsletter went out last week. Here’s what I said…

The Victorian Internet by Tom Standage

A book about the history of telegraphy might not sound like the most riveting read, but The Victorian Internet is both fascinating and entertaining. Techno-utopianism, moral panic, entirely new ways of working, and a world that has been utterly transformed: the parallels between the telegraph and the internet are laid bare. In fact, this book made me realise that while the internet has been a great accelerator, the telegraph was one of the few instances where a technology could truly be described as “disruptive.”

Ancillary Justice: 1 (Imperial Radch) by Ann Leckie

After I finished reading the final Iain M. Banks novel I was craving more galaxy-spanning space opera. The premise of Ancillary Justice with its description of “ship minds” led me to believe that this could be picking up the baton from the Culture series. It isn’t. This is an entirely different civilisation, one where song-collecting and tea ceremonies have as much value as weapons and spacecraft. Ancillary Justice probes at the deepest questions of identity, both cultural and personal. As well as being beautifully written, it’s also a rollicking good revenge thriller.

The City & The City by China Miéville

China Miéville’s books are hit-and-miss for me, but this one is a direct hit. The central premise of this noir-ish tale defies easy description, so I won’t even try. In fact, one of the great pleasures of this book is to feel the way your mind is subtly contorted by the author to accept a conceit that should be completely unacceptable. Usually when a book is described as “mind-altering” it’s a way of saying it has drug-like properties, but The City & The City is mind-altering in an entirely different and wholly unique way. If Borges and Calvino teamed up to find The Maltese Falcon, the result would be something like this.

When I sent off my recommendations, I told Matt:

Oh man, it was so hard to narrow this down! So many books I wanted to mention: Station 11, The Peripheral, The Gone-Away World, Glasshouse, Foucault’s Pendulum, Oryx and Crake, The Wind-up Girl …this was so much tougher than I thought it was going to be.

And Matt said:

Tell you what — if you’d be up for writing recommendations for another 3 books, from those ones you mentioned, I’d love to feature those in the machine!

Done!

Station Eleven by Emily St. John Mandel

Station Eleven made think about the purpose of art and culture. If art, as Brian Eno describes it, is “everything that you don’t have to do”, what happens to art when the civilisational chips are down? There are plenty of post-pandemic stories of societal collapse. But there’s something about this one that sets it apart. It doesn’t assume that humanity will inevitably revert to an existence that is nasty, brutish and short. It’s also a beautifully-written book. The opening chapter completely sucker-punched me.

Glasshouse by Charles Stross

On the face of it, this appears to be another post-Singularity romp in a post-scarcity society. It is, but it’s also a damning critique of gamification. Imagine the Stanford prison experiment if it were run by godlike experimenters. Stross’s Accelerando remains the definitive description of an unfolding Singularity, but Glasshouse is the one that has stayed with me.

The Gone-Away World by Nick Harkaway

This isn’t an easy book to describe, but it’s a very easy book to enjoy. A delightful tale of a terrifying apocalypse, The Gone-Away World has plenty of laughs to balance out the existential dread. Try not to fall in love with the charming childhood world of the narrator—you know it can’t last. But we’ll always have mimes and ninjas.

I must admit, it’s a really lovely feeling to get notified on Twitter when someone buys one of the recommended books.

Small pieces, loosely joined by machine tags

I’ve already described how machine tags on Huffduffer trigger a number of third-party API calls. Tagging something with music:artist=..., book:author=..., film:title=... or any number of similar machine tags will fire off calls to places like Amazon, The New York Times, or Last.fm.

For a while now, I’ve wanted to include Flickr in that list of third-party services but I couldn’t think of an easy way of associating audio files with photos. Then I realised that a mechanism already exists, and it’s another machine tag. Anything on Flickr that’s been tagged with lastfm:event=... will probably be a picture of a musical artist.

So if anything is tagged on Huffduffer with music:artist=..., all I need to do is fire off a call to Last.fm to get a list of that artist’s events using the method artist.getEvents. Once I have the event IDs I can search Flickr for photos that have been machine tagged with those IDs.

There’s just one problem. Last.fm’s API only returns future events for an artist. There’s no method for past events.

Undeterred, I found a RESTful interface that provides the past events of an artist on Last.fm. The format returned isn’t JSON or XML. It’s HTML. It turns out that past events are freely available in the profile for any artist on Last.fm with the identifier last.fm/music/{artist}/+events/{year}. Here, for example, are Salter Cane gigs in 2009: last.fm/music/Salter+Cane/+events/2009.

If only those events were structured in hCalendar! As it is, I have to run through all the links in the document to find the hrefs beginning with the string http://www.last.fm/event/ and then extract the event ID that immediately follows that string.

Once I’ve extracted the event IDs for an artist, I can fire off a search on Flickr using the flickr.photos.search method with a machine_tags parameter (as well as passing the artist name in the text parameter just to be sure).

Here’s an example result in the sidebar on Huffduffer: huffduffer.com/tags/music:artist=Bat+for+Lashes

It’s messy but it works. I guess that’s the dictionary definition of a hack.

Machine tag browsing

After I started rewarding machine tagging on Huffduffer with API calls to Amazon and Last.fm, people started using them quite a bit. But when it came to displaying tag clouds, I wasn’t treating machine tags any differently to other tags. Everything was being displayed in one big cloud.

I decided it would be good to separate out machine tags and display them after displaying “regular” tags. That started me thinking about how best to display machine tags.

One of the best machine tag visualisations I’ve seen so far is Paul Mison’s Flickr machine tag browser, somewhat like the list view in OS X’s Finder. Initially, I tried doing something similar for Huffduffer: a table with three columns; namespace, predicate, and values.

That morphed into a two column layout (predicate and values) with the namespace spanning both columns. The values themselves are still displayed as a cloud to indicate usage.

Huffduffer machine tags

This is marked up as a table. The namespace is in a th inside the thead. In the tbody, each tr contains a th for the predicate and td for the values.

<table>
 <thead>
  <tr>
   <th colspan="2"><a href="/tags/book">book</a></th>
  </tr>
 </thead>
 <tbody>
  <tr>
   <th><a href="/tags/book:author">author</a></th>
   <td><a href="/tags/book:author=arthur+c.+clarke">arthur c. clarke</a></td>
  </tr>
 </tbody>
</table>

Table markup allows for some nice :hover styles (in browsers that allow :hover styles on more than links). Whenever you hover over a table cell, you are also hovering over a table row and a table. By setting :hover states on all three elements, wayfinding becomes a bit clearer.

table:hover thead th a
table tbody tr:hover th a
table tbody tr td a:hover

Huffduffer machine tags on hover

See for yourself. I think it’s a pretty sturdy markup and style pattern that I’ll probably use again.

Machine-tagging Huffduffer some more

After I wrote about the hoops I had to jump through to get Amazon’s API to output JSON (via XSLT), Tom detailed a way of avoiding JSON by using XML-RPC. That’s very kind of him but the truth is that:

  1. I like dealing with JSON and
  2. the XSL transformation is done by Amazon, not me; that wouldn’t be the case if I used XML-RPC.

Anyway, having successfully created a Huffduffer-Amazon bridge using machine tags, I thought I’d do a little more hacking. Instead of restricting the mashup love to Amazon, I figured that Last.fm would be the perfect place to pull in information for anything tagged with the music namespace.

Last.fm has quite a full-featured API and yes, it can output JSON. To start with, I’m using the artist.getInfo method for anything tagged with music:artist=..., music:singer=... or music:band=.... Here are some examples:

I’m pulling a summary of the artist’s bio, a list of similar artists and a picture of the artist in question. For maximum effect, view in Safari, the browser with the finest implementation of .

Nice as Last.fm’s API is, it’s not without its quirks. Like most APIs, the methods are divided into those that require (anything of a sensitive nature) and those that don’t (publicly available information). The method user.getInfo requires authentication. Yet, every piece of information returned by that method is available on the public profile.

So when I wanted to find a Last.fm user’s profile picture—having figured out through when someone on Huffduffer has a Last.fm account—it made far more sense for me to use to parse the microformatted public URL than to use the API method.

Just over two years ago, Drew delivered a superb presentation called In some situations, the answer is definitely “Yes.”

Update: It all ties together, as Julian explains on Twitter:

@adactio ha, I went to Drew’s presentation you mentioned on your blog; it made me add microformats to Last.fm in the first place :D

Machine-tagging Huffduffer

Over the weekend I was looking at the latest additions to Huffduffer. I noticed that Xavier Roy was using to tag a reading by Richard Dawkins. What an excellent idea!

I set aside a little time to do a little hacking with Amazon’s API. Now you can tag stuff on Huffduffer with machine tags like book:author=steven johnson, book:title=the invention of air or music:artist=my morning jacket. Other namespaces are film and movie. Anything matching that pattern will trigger a search on Amazon and display a list of results.

Amazon’s API was one of the first I ever messed about with, first on The Session and later on Adactio Elsewhere. There are things I really like about it and things I really dislike.

I dislike the fact that there’s no option to receive JSON instead of XML. However, one of the things I like is the option to pass the URL of an file to transform the XML (I wish more APIs offered that service). So even though JSON isn’t officially offered, it’s perfectly feasible to generate JSON from the combination of XML + XSL. That’s what I did for the Huffduffer hacking—I find it a lot easier to deal with JSON than XML in PHP5. If you fancy doing something similar, help yourself to my XSL file. It’s very basic but it could make a decent starting point.

But the thing I dislike the most about the Amazon API is the documentation. It’s not that there’s a lack of documentation. Far from it. It’s just not organised very well. I find it very hard to get the information I need, even when I know that the information is there somewhere. Flickr still leads the pack when it comes to API documentation. Amazon would do well to take a leaf out of Flickr’s documentation book (hope you’re listening, Jeff).

Welcome to the machine tag

At the same time that Flickr are demonstrating idiocy in the human resources department, they continue to do so some very cool stuff behind the scenes.

Aaron has been walking through some new API methods over on the Flickr code blog, quoting something I said in a chat with Steve Ivy:

something:somethingelse=somethingspecific

…which I don’t even remember saying but the shoe fits.

There’s something about the mix of rigidity and haphazardness in machine tags that appeals to me. While they all share the same structure, everyone is free to invent their own usage. If machine tags were required to go through a specification process we would have event:lastfm=... and event:upcoming=... instead of lastfm:event=... and upcoming:event=... but really, it simply doesn’t matter as long as people are actually doing the tagging.

With the introduction of these new API methods, it looks like there’s room to build more finely-tuned apps to pivot around namespaces, predicates and values.

Paul Mison has written an desktop-like machine tag browser which shows at a glance just how many different machine tag namespaces are out there. Quite a few pictures have been tagged with adactio:post=... since I first introduced the idea.

Machine Tags of Loving Grace

One of the highlights of Refresh Edinburgh for me was listening to Dan Champion give a presentation on his new site, Revish. He talked through the motivation, planning and production of the site. This was an absolute joy to listen to and it was filled with very valuable practical advice.

Revish is a book review site with a heavy dollop of social interaction. Even in its not-quite-finished state, it’s pushing all the right buttons with me:

  • The markup is clean, semantic and valid.
  • The layout is uncluttered and flexible.
  • The URL structure is logical.
  • The data is available through microformats, RSS and an API.

There’s some really smart stuff going on with the sign-up process. If your chosen username matches a Flickr username, it automatically grabs the buddy icon. At the sign-up stage you also have the option of globally disabling any Ajax on the site—an accessibility option that I advocate in my book. Truth be told, there isn’t yet any Ajax on the site but the availability of this option shows a lot of forethought.

Also at the sign-up stage, there’s a quick’n’dirty auto-discovery of contacts wherever there’s overlap with Revish usernames and your Flickr contacts. This is very cool—one small step toward portable social networks.

One of the features dovetails nicely with Richard’s recent discussion about machine tags ISBNs. If you tag a picture of a book on Flickr with book:isbn=[ISBN number], that picture will then show up on the corresponding Revish page. You can see it in action on the page for Bulletproof Ajax.

Oh, and don’t worry about whether a book has any reviews on Revish yet: the site uses Amazon’s API to pull in the basic book info. As long as a book has an ISBN, it has a page on Revish. So the Revish page for a book can effectively become a mashup of Amazon details and Flickr pictures (just take a look at the page for John’s new microformats book).

I like this format for machine tagging information related to books. As pointed out in a comment on Richard’s post, this opens up the way for plenty of other tagging like book:title="[book title]" and book:author="[author name]".

I’ve started to implement this machine tag format here. If you look at my last post—which has a whole list of books—you’ll see that I’ve tagged the post with a bunch of machine tags in the book:isbn format. By making a quick call to Amazon, I can pull in some information on each book. For now I’m just displaying a small cover image with a link through to the Amazon page.

That last entry is a bit of an extreme example; I’m assuming that most of the time I’ll be just adding one book machine tag to a post at most, probably to accompany a review.

Machine tags (or triple tags) is still a relatively young idea. Most of the structures so far have been emergent, like Upcoming and Last.fm’s event tags and my own blog post machine tags. There’s now a site dedicated to standardising on some namespaces—MachineTags.org has a blog, a wiki and a mailing list. Right now, the wiki has pages for existing conventions like geo tagging and drafts for events and book tagging. This will be an interesting space to watch.

Ghost in the Machine Tags

Richard has some very nifty ideas up his sleeve for the next iteration of his site. Some of these are design-related and some are technical. He just gave a peek into the technical side of things by explaining how he’s using tags to tie content together. Not just any old tags, mind: machine tags.

You may remember that Flickr rolled out machine tags a while back. That’s their name for what’s basically tripletags; tags that take the form of namespace:predicate=value. There’s some tight integration between Upcoming and Flickr using the machine tag upcoming:event=[ID]. You can see a looser coupling (one way rather than bi-directional) in the recently-updated events section of Last.fm which uses lastfm:event=[ID]. As an example, take a look at the page for a Low Lows concert I went to and took pictures of.

Richard is making use of machine tagging to associate his Flickr pictures with his blog posts. He’s also planning to use Amazon’s API to associate ISBN numbers with blog posts, raising the question of which namespace to use:

We therefore need a triple-tag version of the ISBN tag, and here’s my suggestion: iso:isbn=0713998393. ISBN is a standard recognised by the International Organisation for Standardization (ISO) so I thought it made a certain sense for ISO to be the namespace. Other standardised entities could be tagged in a similar way, such as iso:issn=15340295.

Seems like a sound idea to me. I might experiment with machine tagging reviews here in that way and then pulling in complementary information from Amazon.

But that’s for another day. For now, I’ve gone ahead and integrated Flickr machine tagging here… but this works from the opposite direction. Instead of tagging my blog posts with flickr:photo=[ID], I’m pulling in any photos on Flickr tagged with adactio:post=[ID].

Now, I’ve already been integrating Flickr pictures with my blog posts using regular “human” tags, but this is a bit different. For a start, to see the associations using the regular tags, you need to click a link (then the Hijax-y goodness takes over and shows any of my tagged photos without a page refresh). Also, this searches specifically for any of my photos that share a tag with my blog post. If I were to run a search on everyone’s photos, the amount of false positives would get really high. That’s not a bug; it’s a feature of the gloriously emergent nature of human tagging.

For the machine tagging, I can be a bit more confident. If a picture is tagged with adactio:post=1245, I can be pretty confident that it should be associated with http://adactio.com/journal/1245. If any matches are found, thumbnails of the photos are shown right after the blog post: no click required.

I’m not restricting the search to just my photos, either. Any photos tagged with adactio:post=[ID] will show up on http://adactio.com/journal/[ID]. In a way, I’m enabling comments on all my posts. But instead of text comments, anyone now has the ability to add photos that they think are related to a blog post of mine. Remember, it doesn’t even need to be your Flickr picture that you’re machine tagging: you can also machine tag photos from your contacts or anyone else who is allowing their pictures to be tagged.

I realise that I’m opening myself up for a whole new kind of spam. But any kind of spam that requires namespaced tagging on a third-party site is pretty dedicated. If someone actually goes to that much effort to put a thumbnail of an inappropriate image at the end of one of my blog posts, I probably wrote something particularly inflammatory in that post—which would make the associated thumbnail a valid comment, I guess.

Here are some examples of posts I’ve been machine tagging on Flickr:

Once again, like Upcoming and Last.fm, these are event-based. But the machine tagging would work equally well for location-based posts. So when I go up to Scotland next week and blog about it, I (or you or anybody) can then go to Flickr, find some nice pictures of Edinburgh and using the adactio namespace, associate the pictures with the blog post.

It’s a strange mixture of RESTful URLs here and taggable objects there.

If nothing else, this will be an interesting experiment. Machine tags don’t have the low barrier to entry of regular tagging but they aren’t as complex as something like RDF. It might be that they hit the sweet spot between accuracy and ease of use.

Oh, and if you find any Flickr pictures related to this blog post, tag them with adactio:post=1274.