Journal tags: learning

40

sparkline

Simon’s rule

I got a nice email from someone regarding my recent posts about performance on The Session. They said:

I hope this message finds you well. First and foremost, I want to express how impressed I am with the overall performance of https://thesession.org/. It’s a fantastic resource for music enthusiasts like me.

How nice! I responded, thanking them for the kind words.

They sent a follow-up clarification:

Awesome, anyway there was an issue in my message.

The line ‘It’s a fantastic resource for music enthusiasts like me.’ added by chatGPT and I didn’t notice.

I imagine this is what it feels like when you’re on a phone call with someone and towards the end of the call you hear a distinct flushing sound.

I wrote back and told them about Simon’s rule:

I will not publish anything that takes someone else longer to read than it took me to write.

That just feels so rude!

I think that’s a good rule.

Automation

I just described prototype code as code to be thrown away. On that topic…

I’ve been observing how people are programming with large language models and I’ve seen a few trends.

The first thing that just about everyone agrees on is that the code produced by a generative tool is not fit for public consumption. At least not straight away. It definitely needs to be checked and tested. If you enjoy debugging and doing code reviews, this might be right up your street.

The other option is to not use these tools for production code at all. Instead use them for throwaway code. That could be prototyping. But it could also be the code for those annoying admin tasks that you don’t do very often.

Take content migration. Say you need to grab a data dump, do some operations on the data to transform it in some way, and then pipe the results into a new content management system.

That’s almost certainly something you’d want to automate with bespoke code. Once the content migration is done, the code can be thrown away.

Read Matt’s account of coding up his Braggoscope. The code needed to spider a thousand web pages, extract data from those pages, find similarities, and output the newly-structured data in a different format.

I’ve noticed that these are just the kind of tasks that large language models are pretty good at. In effect you’re training the tool on your own very specific data and getting it to do your drudge work for you.

To me, it feels right that the usefulness happens on your own machine. You don’t put the machine-generated code in front of other humans.

Permission

Back when the web was young, it wasn’t yet clear what the rules were. Like, could you really just link to something without asking permission?

Then came some legal rulings to establish that, yes, on the web you can just link to anything without checking if it’s okay first.

What about search engines and directories? Technically they’re rifling through all the stuff we publish and reposting snippets of it. Is that okay?

Again, through some legal precedents—but mostly common agreement—everyone decided that on balance it was fine. After all, those snippets they publish are helping your site get traffic.

In short order, search came to rule the web. And Google came to rule search.

The mutually beneficial arrangement persisted uneasily. Despite Google’s search results pages getting worse and worse in recent years, the company’s huge market share of search means you generally want to be in their good books.

Google’s business model relies on us publishing web pages so that they can put ads around the search results linking to that content, and we rely on Google to send people to our websites by responding smartly to search queries.

That has now changed. Instead of responding to search queries by linking to the web pages we’ve made, Google is instead generating dodgy summaries rife with hallucina… lies (a psychic hotline, basically).

Google still benefits from us publishing web pages. We no longer benefit from Google slurping up those web pages.

With AI, tech has broken the web’s social contract:

Google has steadily been manoeuvring their search engine results to more and more replace the pages in the results.

As Chris puts it:

Me, I just think it’s fuckin’ rude.

Google is a portal to the web. Google is an amazing tool for finding relevant websites to go to. That was useful when it was made, and it’s nothing but grown in usefulness. Google should be encouraging and fighting for the open web. But now they’re like, actually we’re just going to suck up your website, put it in a blender with all other websites, and spit out word smoothies for people instead of sending them to your website. Instead.

Ben proposes an update to robots.txt that would allow us to specify licensing information:

Robots.txt needs an update for the 2020s. Instead of just saying what content can be indexed, it should also grant rights.

Like crawl my site only to provide search results not train your LLM.

It’s a solid proposal. But Google has absolutely no incentive to implement it. They hold all the power.

Or do they?

There is still the nuclear option in robots.txt:

User-agent: Googlebot
Disallow: /

That’s what Vasilis is doing:

I have been looking for ways to not allow companies to use my stuff without asking, and so far I coulnd’t find any. But since this policy change I realised that there is a simple one: block google’s bots from visiting your website.

The general consensus is that this is nuts. “If you don’t appear in Google’s results, you might as well not be on the web!” is the common cry.

I’m not so sure. At least when it comes to personal websites, search isn’t how people get to your site. They get to your site from RSS, newsletters, links shared on social media or on Slack.

And isn’t it an uncomfortable feeling to think that there’s a third party service that you absolutely must appease? It’s the same kind of justification used by people who are still on Twitter even though it’s now a right-wing transphobic cesspit. “If I’m not on Twitter, I might as well not be on the web!”

The situation with Google reminds me of what Robin said about Twitter:

The speed with which Twitter recedes in your mind will shock you. Like a demon from a folktale, the kind that only gains power when you invite it into your home, the platform melts like mist when that invitation is rescinded.

We can rescind our invitation to Google.

Talking about “web3” and “AI”

When I was hosting the DIBI conference in Edinburgh back in May, I moderated an impromptu panel on AI:

On the whole, it stayed quite grounded and mercifully free of hyperbole. Both speakers were treating the current crop of technologies as tools. Everyone agreed we were on the hype cycle, probably the peak of inflated expectations, looking forward to reaching the plateau of productivity.

Something else that happened at that event was that I met Deborah Dawton from the Design Business Association. She must’ve liked the cut of my jib because she invited me to come and speak at their get-together in Brighton on the topic of “AI, Web3 and design.”

The representative from the DBA who contacted me knew what they were letting themselves in for. They wrote:

I’ve read a few of your posts on the subject and it would be great if you could join us to share your perspectives.

How could I say no?

I’ve published a transcript of the short talk I gave.

Nailspotting

I’m sure you’ve heard the law of the instrument: when all you have is a hammer, everything looks like a nail.

There’s another side to it. If you’re selling hammers, you’ll depict a world full of nails.

Recent hammers include cryptobollocks and virtual reality. It wasn’t enough for blockchains and the metaverse to be potentially useful for some situations; they staked their reputations on being utterly transformative, disrupting absolutely every facet of life.

This kind of hype is a terrible strategy in the long-term. But if you can convince enough people in the short term, you can make a killing on the stock market. In truth, the technology itself is superfluous. It’s the hype that matters. And if the hype is over-inflated enough, you can even get your critics to do your work for you, broadcasting their fears about these supposedly world-changing technologies.

You’d think we’d learn. If an industry cries wolf enough times, surely we’d become less trusting of extraordinary claims. But the tech industry continues to cry wolf—or rather, “hammer!”—at regular intervals.

The latest hammer is machine learning, usually—incorrectly—referred to as Artificial Intelligence. What makes this hype cycle particularly infuriating is that there are genuine use cases. There are some nails for this hammer. They’re just not as plentiful as the breathless hype—both positive and negative—would have you believe.

When I was hosting the DiBi conference last week, there was a little section on generative “AI” tools. Matt Garbutt covered the visual side, demoing tools like Midjourney. Scott Salisbury covered the text side, showing how you can generate code. Afterwards we had a panel discussion.

During the panel I asked some fairly straightforward questions that nobody could answer. Who owns the input (the data used by these generative tools)? Who owns the output?

On the whole, it stayed quite grounded and mercifully free of hyperbole. Both speakers were treating the current crop of technologies as tools. Everyone agreed we were on the hype cycle, probably the peak of inflated expectations, looking forward to reaching the plateau of productivity.

Scott explicitly warned people off using generative tools for production code. His advice was to stick to side projects for now.

Matt took a closer look at where these tools could fit into your day-to-day design work. Mostly it was pretty sensible, except when he suggested that there could be any merit to using these tools as a replacement for user testing. That’s a terrible idea. A classic hammer/nail mismatch.

I think I moderated the panel reasonably well, but I have one regret. I wish I had first read Baldur Bjarnason’s new book, The Intelligence Illusion. I started reading it on the train journey back from Edinburgh but it would have been perfect for the panel.

The Intelligence Illusion is very level-headed. It is neither pro- nor anti-AI. Instead it takes a pragmatic look at both the benefits and the risks of using these tools in your business.

It has excellent advice for spotting genuine nails. For example:

Generative AI has impressive capabilities for converting and modifying seemingly unstructured data, such as prose, images, and audio. Using these tools for this purpose has less copyright risk, fewer legal risks, and is less error prone than using it to generate original output.

Think about transcripts of videos or podcasts—an excellent use of this technology. As Baldur puts it:

The safest and, probably, the most productive way to use generative AI is to not use it as generative AI. Instead, use it to explain, convert, or modify.

He also says:

Prefer internal tools over externally-facing chatbots.

That chimes with what I’ve been seeing. The most interesting uses of this technology that I’ve seen involve a constrained dataset. Like the way Luke trained a language model on his own content to create a useful chat interface.

Anyway, The Intelligence Illusion is full of practical down-to-earth advice based on plenty of research backed up with copious citations. I’m only halfway through it and it’s already helped me separate the hype from the reality.

Steam

Picture someone tediously going through a spreadsheet that someone else has filled in by hand and finding yet another error.

“I wish to God these calculations had been executed by steam!” they cry.

The year was 1821 and technically the spreadsheet was a book of logarithmic tables. The frustrated cry came from Charles Babbage, who channeled his frustration into a scheme to create the world’s first computer.

His difference engine didn’t work out. Neither did his analytical engine. He’d spend his later years taking his frustrations out on street musicians, which—as a former busker myself—earns him a hairy eyeball from me.

But we’ve all been there, right? Some tedious task that feels soul-destroying in its monotony. Surely this is exactly what machines should be doing?

I have a hunch that this is where machine learning and large language models might turn out to be most useful. Not in creating breathtaking works of creativity, but in menial tasks that nobody enjoys.

Someone was telling me earlier today about how they took a bunch of haphazard notes in a client meeting. When the meeting was done, they needed to organise those notes into a coherent summary. Boring! But ChatGPT handled it just fine.

I don’t think that use-case is going to appear on the cover of Wired magazine anytime soon but it might be a truer glimpse of the future than any of the breathless claims being eagerly bandied about in Silicon Valley.

You know the way we no longer remember phone numbers, because, well, why would we now that we have machines to remember them for us? I’d be quite happy if machines did that for the annoying little repetitive tasks that nobody enjoys.

I’ll give you an example based on my own experience.

Regular expressions are my kryptonite. I’m rubbish at them. Any time I have to figure one out, the knowledge seeps out of my brain before long. I think that’s because I kind of resent having to internalise that knowledge. It doesn’t feel like something a human should have to know. “I wish to God these regular expressions had been calculated by steam!”

Now I can get a chatbot with a large language model to write the regular expression for me. I still need to describe what I want, so I need to write the instructions clearly. But all the gobbledygook that I’m writing for a machine now gets written by a machine. That seems fair.

Mind you, I wouldn’t blindly trust the output. I’d take that regular expression and run it through a chatbot, maybe a different chatbot running on a different large language model. “Explain what this regular expression does,” would be my prompt. If my input into the first chatbot matches the output of the second, I’d have some confidence in using the regular expression.

A friend of mine told me about using a large language model to help write SQL statements. He described his database structure to the chatbot, and then described what he wanted to select.

Again, I wouldn’t use that output without checking it first. But again, I might use another chatbot to do that checking. “Explain what this SQL statement does.”

Playing chatbots off against each other like this is kinda how machine learning works under the hood: generative adverserial networks.

Of course, the task of having to validate the output of a chatbot by checking it with another chatbot could get quite tedious. “I wish to God these large language model outputs had been validated by steam!”

Sounds like a job for machines.

Disclosure

You know how when you’re on hold to any customer service line you hear a message that thanks you for calling and claims your call is important to them. The message always includes a disclaimer about calls possibly being recorded “for training purposes.”

Nobody expects that any training is ever actually going to happen—surely we would see some improvement if that kind of iterative feedback loop were actually in place. But we most certainly want to know that a call might be recorded. Recording a call without disclosure would be unethical and illegal.

Consider chatbots.

If you’re having a text-based (or maybe even voice-based) interaction with a customer service representative that doesn’t disclose its output is the result of large language models, that too would be unethical. But, at the present moment in time, it would be perfectly legal.

That needs to change.

I suspect the necessary legislation will pass in Europe first. We’ll see if the USA follows.

In a way, this goes back to my obsession with seamful design. With something as inherently varied as the output of large language models, it’s vital that people have some way of evaluating what they’re told. I believe we should be able to see as much of the plumbing as possible.

The bare minimum amount of transparency is revealing that a machine is in the loop.

This shouldn’t be a controversial take. But I guarantee we’ll see resistance from tech companies trying to sell their “AI” tools as seamless, indistinguishable drop-in replacements for human workers.

Guessing

The last talk at the last dConstruct was by local clever clogs Anil Seth. It was called Your Brain Hallucinates Your Conscious Reality. It’s well worth a listen.

Anil covers a lot of the same ground in his excellent book, Being You. He describes a model of consciousness that inverts our intuitive understanding.

We tend to think of our day-to-day reality in a fairly mechanical cybernetic manner; we receive inputs through our senses and then make decisions about reality informed by those inputs.

As another former dConstruct speaker, Adam Buxton, puts it in his interview with Anil, it feels like that old Beano cartoon, the Numskulls, with little decision-making homonculi inside our head.

But Anil posits that it works the other way around. We make a best guess of what the current state of reality is, and then we receive inputs from our senses, and then we adjust our model accordingly. There’s still a feedback loop, but cause and effect are flipped. First we predict or guess what’s happening, then we receive information. Rinse and repeat.

The book goes further and applies this to our very sense of self. We make a best guess of our sense of self and then adjust that model constantly based on our experiences.

There’s a natural tendency for us to balk at this proposition because it doesn’t seem rational. The rational model would be to make informed calculations based on available data …like computers do.

Maybe that’s what sets us apart from computers. Computers can make decisions based on data. But we can make guesses.

Enter machine learning and large language models. Now, for the first time, it appears that computers can make guesses.

The guess-making is not at all like what our brains do—large language models require enormous amounts of inputs before they can make a single guess—but still, this should be the breakthrough to be shouted from the rooftops: we’ve taught machines how to guess!

And yet. Almost every breathless press release touting some revitalised service that uses AI talks instead about accuracy. It would be far more honest to tout the really exceptional new feature: imagination.

Using AI, we will guess who should get a mortgage.

Using AI, we will guess who should get hired.

Using AI, we will guess who should get a strict prison sentence.

Reframed like that, it’s easy to see why technologists want to bury the lede.

Alas, this means that large language models are being put to use for exactly the wrong kind of scenarios.

(This, by the way, is also true of immersive “virtual reality” environments. Instead of trying to accurately recreate real-world places like meeting rooms, we should be leaning into the hallucinatory power of a technology that can generate dream-like situations where the pleasure comes from relinquishing control.)

Take search engines. They’re based entirely on trust and accuracy. Introducing a chatbot that confidentally conflates truth and fiction doesn’t bode well for the long-term reputation of that service.

But what if this is an interface problem?

Currently facts and guesses are presented with equal confidence, hence the accurate descriptions of the outputs as bullshit or mansplaining as a service.

What if the more fanciful guesses were marked as such?

As it is, there’s a “temperature” control that can be adjusted when generating these outputs; the more the dial is cranked, the further the outputs will stray from the safest predictions. What if that could be reflected in the output?

I don’t know what that would look like. It could be typographic—some markers to indicate which bits should be taken with pinches of salt. Or it could be through content design—phrases like “Perhaps…”, “Maybe…” or “It’s possible but unlikely that…”

I’m sure you’ve seen the outputs when people request that ChatGPT write their biography. Perfectly accurate statements are generated side-by-side with complete fabrications. This reinforces our scepticism of these tools. But imagine how differently the fabrications would read if they were preceded by some simple caveats.

A little bit of programmed humility could go a long way.

Right now, these chatbots are attempting to appear seamless. If 80% or 90% of their output is accurate, then blustering through the other 10% or 20% should be fine, right? But I think the experience for the end user would be immensely more empowering if these chatbots were designed seamfully. Expose the wires. Show the workings-out.

Mind you, that only works if there is some way to distinguish between fact and fabrication. If there’s no way to tell how much guessing is happening, then that’s a major problem. If you can’t tell me whether something is 50% true or 75% true or 25% true, then the only rational response is to treat the entire output as suspect.

I think there’s a fundamental misunderstanding behind the design of these chatbots that goes all the way back to the Turing test. There’s this idea that the way to make a chatbot believable and trustworthy is to make it appear human, attempting to hide the gears of the machine. But the real way to gain trust is through honesty.

I want a machine to tell me when it’s guessing. That won’t make me trust it less. Quite the opposite.

After all, to guess is human.

Like

We use metaphors all the time. To quote George Lakoff, we live by them.

We use analogies some of the time. They’re particularly useful when we’re wrapping our heads around something new. By comparing something novel to something familiar, we can make a shortcut to comprehension, or at least, categorisation.

But we need a certain amount of vigilance when it comes to analogies. Just because something is like something else doesn’t mean it’s the same.

With that in mind, here are some ways that people are describing generative machine learning tools. Large language models are like…

Media queries with display-mode

It’s said that the best way to learn about something is to teach it. I certainly found that to be true when I was writing the web.dev course on responsive design.

I felt fairly confident about some of the topics, but I felt somewhat out of my depth when it came to some of the newer modern additions to browsers. The last few modules in particular were unexplored areas for me, with topics like screen configurations and media features. I learned a lot about those topics by writing about them.

Best of all, I got to put my new-found knowledge to use! Here’s how…

The Session is a progressive web app. If you add it to the home screen of your mobile device, then when you launch the site by tapping on its icon, it behaves just like a native app.

In the web app manifest file for The Session, the display-mode property is set to “standalone.” That means it will launch without any browser chrome: no address bar and no back button. It’s up to me to provide the functionality that the browser usually takes care of.

So I added a back button in the navigation interface. It only appears on small screens.

Do you see the assumption I made?

I figured that the back button was most necessary in the situation where the site had been added to the home screen. That only happens on mobile devices, right?

Nope. If you’re using Chrome or Edge on a desktop device, you will be actively encourged to “install” The Session. If you do that, then just as on mobile, the site will behave like a standalone native app and launch without any browser chrome.

So desktop users who install the progressive web app don’t get any back button (because in my CSS I declare that the back button in the interface should only appear on small screens).

I was alerted to this issue on The Session:

It downloaded for me but there’s a bug, Jeremy - there doesn’t seem to be a way to go back.

Luckily, this happened as I was writing the module on media features. I knew exactly how to solve this problem because now I knew about the existence of the display-mode media feature. It allows you to write media queries that match the possible values of display-mode in a web app manifest:

.goback {
  display: none;
}
@media (display-mode: standalone) {
  .goback {
    display: inline;
  }
}

Now the back button shows up if you “install” The Session, regardless of whether that’s on mobile or desktop.

Previously I made the mistake of inferring whether or not to show the back button based on screen size. But the display-mode media feature allowed me to test the actual condition I cared about: is this user navigating in standalone mode?

If I hadn’t been writing about media features, I don’t think I would’ve been able to solve the problem. It’s a really good feeling when you’ve just learned something new, and then you immediately find exactly the right use case for it!

Faulty logic

I’m a fan of logical properties in CSS. As I wrote in the responsive design course on web.dev, they’re crucial for internationalisation.

Alaa Abd El-Rahim has written articles on CSS tricks about building multi-directional layouts and controlling layout in a multi-directional website. Not having to write separate stylesheets—or even separate rules—for different writing modes is great!

More than that though, I think understanding logical properties is the best way to truly understand CSS layout tools like grid and flexbox.

It’s like when you’re learning a new language. At some point your brain goes from translating from your mother tongue into the other language, and instead starts thinking in that other language. Likewise with CSS, as some point you want to stop translating “left” and “right” into “inline-start” and “inline-end” and instead start thinking in terms of inline and block dimensions.

As is so often the case with CSS, I think new features like these are easier to pick up if you’re new to the language. I had to unlearn using floats for layout and instead learn flexbox and grid. Someone learning layout from scatch can go straight to flexbox and grid without having to ditch the cognitive baggage of floats. Similarly, it’s going to take time for me to shed the baggage of directional properties and truly grok logical properties, but someone new to CSS can go straight to logical properties without passing through the directional stage.

Except we’re not quite there yet.

In order for logical properties to replace directional properties, they need to be implemented everywhere. Right now you can’t use logical properties inside a media query, for example:

@media (min-inline-size: 40em)

That wont’ work. You have to use the old-fashioned syntax:

@media (min-width: 40em)

Now you could rightly argue that in this instance we’re talking about the physical dimensions of the viewport. So maybe width and height make more sense than inline and block.

But then take a look at how the syntax for container queries is going to work. First you declare the axis that you want to be contained using the syntax from logical properties:

main {
  container-type: inline-size;
}

But then when you go to declare the actual container query, you have to use the corresponding directional property:

@container (min-width: 40em)

This won’t work:

@container (min-inline-size: 40em)

I kind of get why it won’t work: the syntax for container queries should match the syntax for media queries. But now the theory behind disallowing logical properties in media queries doesn’t hold up. When it comes to container queries, the physical layout of the viewport isn’t what matters.

I hope that both media queries and container queries will allow logical properties sooner rather than later. Until they fall in line, it’s impossible to make the jump fully to logical properties.

There are some other spots where logical properties haven’t been fully implemented yet, but I’m assuming that’s a matter of time. For example, in Firefox I can make a wide data table responsive by making its container side-swipeable on narrow screens:

.table-container {
  max-inline-size: 100%;
  overflow-inline: auto;
}

But overflow-inline and overflow-block aren’t supported in any other browsers. So I have to do this:

.table-container {
  max-inline-size: 100%;
  overflow-x: auto;
}

Frankly, mixing and matching logical properties with directional properties feels worse than not using logical properties at all. The inconsistency is icky. This feels old-fashioned but consistent:

.table-container {
  max-width: 100%;
  overflow-x: auto;
}

I don’t think there are any particular technical reasons why browsers haven’t implemented logical properties consistently. I suspect it’s more a matter of priorities. Fully implementing logical properties in a browser may seem like a nice-to-have bit of syntactic sugar while there are other more important web standard fish to fry.

But from the perspective of someone trying to use logical properties, the patchy rollout is frustrating.

User agents

I was on the podcast A Question Of Code recently. It was fun! The podcast is aimed at people who are making a career change into web development, so it’s right up my alley.

I sometimes get asked about what a new starter should learn. On the podcast, I mentioned a post I wrote a while back with links to some great resources and tutorials. As I said then:

For web development, start with HTML, then CSS, then JavaScript (and don’t move on to JavaScript too quickly—really get to grips with HTML and CSS first).

That’s assuming you want to be a good well-rounded web developer. But it might be that you need to get a job as quickly as possible. In that case, my advice would be very different. I would advise you to learn React.

Believe me, I take no pleasure in giving that advice. But given the reality of what recruiters are looking for, knowing React is going to increase your chances of getting a job (something that’s reflected in the curricula of coding schools). And it’s always possible to work backwards from React to the more fundamental web technologies of HTML, CSS, and JavaScript. I hope.

Regardless of your initial route, what’s the next step? How do you go from starting out in web development to being a top-notch web developer?

I don’t consider myself to be a top-notch web developer (far from it), but I am very fortunate in that I’ve had the opportunity to work alongside some tippety-top-notch developers at ClearleftTrys, Cassie, Danielle, Mark, Graham, Charlotte, Andy, and Natalie.

They—and other top-notch developers I’m fortunate to know—have something in common. They prioritise users. Sure, they’ll all have their favourite technologies and specialised areas, but they don’t lose sight of who they’re building for.

When you think about it, there’s quite a power imbalance between users and developers on the web. Users can—ideally—choose which web browser to use, and maybe make some preference changes if they know where to look, but that’s about it. Developers dictate everything else—the technology that a website will use, the sheer amount of code shipped over the network to the user, whether the site will be built in a fragile or a resilient way. Users are dependent on developers, but developers don’t always act in the best interests of users. It’s a classic example of the principal-agent problem:

The principal–agent problem, in political science and economics (also known as agency dilemma or the agency problem) occurs when one person or entity (the “agent”), is able to make decisions and/or take actions on behalf of, or that impact, another person or entity: the “principal”. This dilemma exists in circumstances where agents are motivated to act in their own best interests, which are contrary to those of their principals, and is an example of moral hazard.

A top-notch developer never forgets that they are an agent, and that the user is the principal.

But is it realistic to expect web developers to be so focused on user needs? After all, there’s a whole separate field of user experience design that specialises in this focus. It hardly seems practical to suggest that a top-notch developer needs to first become a good UX designer. There’s already plenty to focus on when it comes to just the technology side of front-end development.

So maybe this is too simplistic a way of defining the principle-agent relationship between users and developers:

user :: developer

There’s something that sits in between, mediating that relationship. It’s a piece of software that in the world of web standards is even referred to as a “user agent”: the web browser.

user :: web browser :: developer

So if making the leap to understanding users seems too much of a stretch, there’s an intermediate step. Get to know how web browsers work. As a web developer, if you know what web browsers “like” and “dislike”, you’re well on the way to making great user experiences. If you understand the pain points for browser when they’re parsing and rendering your code, you’ve got a pretty good proxy for understanding the pain points that your users are experiencing.

Getting started

I got an email recently from a young person looking to get into web development. They wanted to know what languages they should start with, whether they should a Mac or a Windows PC, and what some places to learn from.

I wrote back, saying this about languages:

For web development, start with HTML, then CSS, then JavaScript (and don’t move on to JavaScript too quickly—really get to grips with HTML and CSS first).

And this is what I said about hardware and software:

It doesn’t matter whether you use a Mac or a Windows PC, as long as you’ve got an internet connection, some web browsers (Chrome, Firefox, for example) and a text editor. There are some very good free text editors available for Mac and PC:

For resources, I had a trawl through links I’ve tagged with “learning” and “html” and sent along some links to free online tutorials:

After sending that email, I figured that this list might be useful to anyone else looking to start out in web development. If you know of anyone in that situation, I hope this list might help.

Voice User Interface Design by Cheryl Platz

Cheryl Platz is speaking at An Event Apart Chicago. Her inaugural An Event Apart presentation is all about voice interfaces, and I’m going to attempt to liveblog it…

Why make a voice interface?

Successful voice interfaces aren’t necessarily solving new problems. They’re used to solve problems that other devices have already solved. Think about kitchen timers. There are lots of ways to set a timer. Your oven might have one. Your phone has one. Why use a $200 device to solve this mundane problem? Same goes for listening to music, news, and weather.

People are using voice interfaces for solving ordinary problems. Why? Context matters. If you’re carrying a toddler, then setting a kitchen timer can be tricky so a voice-activated timer is quite appealing. But why is voice is happening now?

Humans have been developing the art of conversation for thousands of years. It’s one of the first skills we learn. It’s deeply instinctual. Most humans use speach instinctively every day. You can’t necessarily say that about using a keyboard or a mouse.

Voice-based user interfaces are not new. Not just the idea—which we’ve seen in Star Trek—but the actual implementation. Bell Labs had Audrey back in 1952. It recognised ten words—the digits zero through nine. Why did it take so long to get to Alexa?

In the late 70s, DARPA issued a challenge to create a voice-activated system. Carnagie Mellon came up with Harpy (with a thousand word grammar). But none of the solutions could respond in real time. In conversation, we expect a break of no more than 200 or 300 milliseconds.

In the 1980s, computing power couldn’t keep up with voice technology, so progress kind of stopped. Time passed. Things finally started to catch up in the 90s with things like Dragon Naturally Speaking. But that was still about vocabulary, not grammar. By the 2000s, small grammars were starting to show up—starting an X-Box or pausing Netflix. In 2008, Google Voice Search arrived on the iPhone and natural language interaction began to arrive.

What makes natural language interactions so special? It requires minimal training because it uses the conversational muscles we’ve been working for a lifetime. It unlocks the ability to have more forgiving, less robotic conversations with devices. There might be ten different ways to set a timer.

Natural language interactions can also free us from “screen magnetism”—that tendency to stay on a device even when our original task is complete. Voice also enables fast and forgiving searches of huge catalogues without time spent typing or browsing. You can pick a needle straight out of a haystack.

Natural language interactions are excellent for older customers. These interfaces don’t intimidate people without dexterity, vision, or digital experience. Voice input often leads to more inclusive experiences. Many customers with visual or physical disabilities can’t use traditional graphical interfaces. Voice experiences throw open the door of opportunity for some people. However, voice experience can exclude people with speech difficulties.

Making the case for voice interfaces

There’s a misconception that you need to work at Amazon, Google, or Apple to work on a voice interface, or at least that you need to have a big product team. But Cheryl was able to make her first Alexa “skill” in a week. If you’re a web developer, you’re good to go. Your voice “interaction model” is just JSON.

How do you get your product team on board? Find the customers (and situations) you might have excluded with traditional input. Tell the stories of people whose hands are full, or who are vision impaired. You can also point to the adoption rate numbers for smart speakers.

You’ll need to show your scenario in context. Otherwise people will ask, “why can’t we just build an app for this?” Conduct research to demonstrate the appeal of a voice interface. Storyboarding is very useful for visualising the context of use and highlighting existing pain points.

Getting started with voice interfaces

You’ve got to understand how the technology works in order to adapt to how it fails. Here are a few basic concepts.

Utterance. A word, phrase, or sentence spoken by a customer. This is the true form of what the customer provides.

Intent. This is the meaning behind a customer’s request. This is an important distinction because one intent could have thousands of different utterances.

Prompt. The text of a system response that will be provided to a customer. The audio version of a prompt, if needed, is generated separately using text to speech.

Grammar. A finite set of expected utterances. It’s a list. Usually, each entry in a grammar is paired with an intent. Many interfaces start out as being simple grammars before moving on to a machine-learning model later once the concept has been proven.

Here’s the general idea with “artificial intelligence”…

There’s a human with a core intent to do something in the real world, like knowing when the cookies in the oven are done. This is translated into an intent like, “set a 15 minute timer.” That’s the utterance that’s translated into a string. But it hasn’t yet been parsed as language. That string is passed into a natural language understanding system. What comes is a data structure that represents the customers goal e.g. intent=timer; duration=15 minutes. That’s sent to the business logic where a timer is actually step. For a good voice interface, you also want to send back a response e.g. “setting timer for 15 minutes starting now.”

That seems simple enough, right? What’s so hard about designing for voice?

Natural language interfaces are a form of artifical intelligence so it’s not deterministic. There’s a lot of ruling out false positives. Unlike graphical interfaces, voice interfaces are driven by probability.

How do you turn a sound wave into an understandable instruction? It’s a lot like teaching a child. You feed a lot of data into a statistical model. That’s how machine learning works. It’s a probability game. That’s where it gets interesting for design—given a bunch of possible options, we need to use context to zero in on the most correct choice. This is where confidence ratings come in: the system will return the probability that a response is correct. Effectively, the system is telling you how sure or not it is about possible results. If the customer makes a request in an unusual or unexpected way, our system is likely to guess incorrectly. That’s because the system is being given something new.

Designing a conversation is relatively straightforward. But 80% of your voice design time will be spent designing for what happens when things go wrong. In voice recognition, edge cases are front and centre.

Here’s another challenge. Interaction with most voice interfaces is part conversation, part performance. Most interactions are not private.

Humans don’t distinguish digital speech fom human speech. That means these devices are intrinsically social. Our brains our wired to try to extract social information, even form digital speech. See, for example, why it’s such a big question as to what gender a voice interface has.

Delivering a voice interface

Storyboards help depict the context of use. Sample dialogues are your new wireframes. These are little scripts that not only cover the happy path, but also your edge case. Then you reverse engineer from there.

Flow diagrams communicate customer states, but don’t use the actual text in them.

Prompt lists are your final deliverable.

Functional prototypes are really important for voice interfaces. You’ll learn the real way that customers will ask for things.

If you build a working prototype, you’ll be building two things: a natural language interaction model (often a JSON file) and custom business logic (in a programming language).

Eventually voice design will become a core competency, much like mobile, which was once separate.

Ask yourself what tasks your customers complete on your site that feel clunkly. Remember that voice desing is almost never about new scenarious. Start your journey into voice interfaces by tackling old problems in new, more inclusive ways.

May the voice be with you!

Going Offline—the talk of the book

I gave a new talk at An Event Apart in Seattle yesterday morning. The talk was called Going Offline, which the eagle-eyed amongst you will recognise as the title of my most recent book, all about service workers.

I was quite nervous about this talk. It’s very different from my usual fare. Usually I have some big sweeping arc of history, and lots of pretentious ideas joined together into some kind of narrative arc. But this talk needed to be more straightforward and practical. I wasn’t sure how well I would manage that brief.

I knew from pretty early on that I was going to show—and explain—some code examples. Those were the parts I sweated over the most. I knew I’d be presenting to a mixed audience of designers, developers, and other web professionals. I couldn’t assume too much existing knowledge. At the same time, I didn’t want to teach anyone to such eggs.

In the end, there was an overarching meta-theme to talk, which was this: logic is more important than code. In other words, figuring out what you’re trying to accomplish (and describing it clearly) is more important than typing curly braces and semi-colons. Programming is an act of translation. Before you can translate something, you need to be able to articulate it clearly in your own language first. By emphasising that point, I hoped to make the code less overwhelming to people unfamilar with it.

I had tested the talk with some of my Clearleft colleagues, and they gave me great feedback. But I never know until I’ve actually given a talk in front of a real conference audience whether the talk is any good or not. Now that I’ve given the talk, and received more feedback, I think I can confidentally say that it’s pretty damn good.

My goal was to explain some fairly gnarly concepts—let’s face it: service workers are downright weird, and not the easiest thing to get your head around—and to leave the audience with two feelings:

  1. This is exciting, and
  2. This is something I can do today.

I deliberately left time for questions, bribing people with free copies of my book. I got some great questions, and I may incorporate some of them into future versions of this talk (conference organisers, if this sounds like the kind of talk you’d like at your event, please get in touch). Some of the points brought up in the questions were:

  • Is there some kind of wizard for creating a typical service worker script for any site? I didn’t have a direct answer to this, but I have attempted to make a minimal viable service worker that could be used for just about any site. Mostly I encouraged the questioner to roll their sleeves up and try writing a bespoke script. I also mentioned the Workbox library, but I gave my opinion that if you’re going to spend the time to learn the library, you may as well spend the time to learn the underlying language.
  • What are some state-of-the-art progressive web apps for offline user experiences? Ooh, this one kind of stumped me. I mean, the obvious poster children for progressive webs apps are things like Twitter, Instagram, and Pinterest. They’re all great but the offline experience is somewhat limited. To be honest, I think there’s more potential for great offline experiences by publishers. I especially love the pattern on personal sites like Una’s and Sara’s where people can choose to save articles offline to read later—like a bespoke Instapaper or Pocket. I’d love so see that pattern adopted by some big publications. I particularly like that gives so much more control directly to the end user. Instead of trying to guess what kind of offline experience they want, we give them the tools to craft their own.
  • Do caches get cleaned up automatically? Great question! And the answer is mostly no—although browsers do have their own heuristics about how much space you get to play with. There’s a whole chapter in my book about being a good citizen and cleaning up your caches, but I didn’t include that in the talk because it isn’t exactly exciting: “Hey everyone! Now we’re going to do some housekeeping—yay!”
  • Isn’t there potential for abuse here? This is related to the previous question, and it’s another great question to ask of any technology. In short, yes. Bad actors could use service workers to fill up caches uneccesarily. I’ve written about back door service workers too, although the real problem there is with iframes rather than service workers—iframes and cookies are technologies that are already being abused by bad actors, and we’re going to see more and more interventions by ethical browser makers (like Mozilla) to clamp down on those technologies …just as browsers had to clamp down on the abuse of pop-up windows in the early days of JavaScript. The cache API could become a tragedy of the commons. I liken the situation to regulation: we should self-regulate, but if we prove ourselves incapable of that, then outside regulation (by browsers) will be imposed upon us.
  • What kind of things are in the future for service workers? Excellent question! If you think about it, a service worker is kind of a conduit that gives you access to different APIs: the Cache API and the Fetch API being the main ones now. A service worker is like an airport and the APIs are like the airlines. There are other APIs that you can access through service workers. Notifications are available now on desktop and on Android, and they’ll be coming to iOS soon. Background Sync is another powerful API accessed through service workers that will get more and more browser support over time. The great thing is that you can start using these APIs today even if they aren’t universally supported. Then, over time, more and more of your users will benefit from those enhancements.

If you attended the talk and want to learn more about about service workers, there’s my book (obvs), but I’ve also written lots of blog posts about service workers and I’ve linked to lots of resources too.

Finally, here’s a list of links to all the books, sites, and articles I referenced in my talk…

Books

Sites

Progressive Web Apps

Optimise without a face

I’ve been playing around with the newly-released Squoosh, the spiritual successor to Jake’s SVGOMG. You can drag images into the browser window, and eyeball the changes that any optimisations might make.

On a project that Cassie is working on, it worked really well for optimising some JPEGs. But there were a few images that would require a bit more fine-grained control of the optimisations. Specifically, pictures with human faces in them.

I’ve written about this before. If there’s a human face in image, I open that image in a graphics editing tool like Photoshop, select everything but the face, and add a bit of blur. Because humans are hard-wired to focus on faces, we’ll notice any jaggy artifacts on a face, but we’re far less likely to notice jagginess in background imagery: walls, materials, clothing, etc.

On the face of it (hah!), a browser-based tool like Squoosh wouldn’t be able to optimise for faces, but then Cassie pointed out something really interesting…

When we were both at FFConf on Friday, there was a great talk by Eleanor Haproff on machine learning with JavaScript. It turns out there are plenty of smart toolkits out there, and one of them is facial recognition. So I wonder if it’s possible to build an in-browser tool with this workflow:

  • Drag or upload an image into the browser window,
  • A facial recognition algorithm finds any faces in the image,
  • Those portions of the image remain crisp,
  • The rest of the image gets a slight blur,
  • Download the optimised image.

Maybe the selecting/blurring part would need canvas? I don’t know.

Anyway, I thought this was a brilliant bit of synthesis from Cassie, and now I’ve got two questions:

  1. Does this exist yet? And, if not,
  2. Does anyone want to try building it?

Praise for Going Offline

I’m very, very happy to see that my new book Going Offline is proving to be accessible and unintimidating to a wide audience—that was very much my goal when writing it.

People have been saying nice things on their blogs, which is very gratifying. It’s even more gratifying to see people use the knowledge gained from reading the book to turn those blogs into progressive web apps!

Sara Soueidan:

It doesn’t matter if you’re a designer, a junior developer or an experienced engineer — this book is perfect for anyone who wants to learn about Service Workers and take their Web application to a whole new level.

I highly recommend it. I read the book over the course of two days, but it can easily be read in half a day. And as someone who rarely ever reads a book cover to cover (I tend to quit halfway through most books), this says a lot about how good it is.

Eric Lawrence:

I was delighted to discover a straightforward, very approachable reference on designing a ServiceWorker-backed application: Going Offline by Jeremy Keith. The book is short (I’m busy), direct (“Here’s a problem, here’s how to solve it“), opinionated in the best way (landmine-avoiding “Do this“), and humorous without being confusing. As anyone who has received unsolicited (or solicited) feedback from me about their book knows, I’m an extremely picky reader, and I have no significant complaints on this one. Highly recommended.

Ben Nadel:

If you’re interested in the “offline first” movement or want to learn more about Service Workers, Going Offline by Jeremy Keith is a really gentle and highly accessible introduction to the topic.

Daniel Koskine:

Jeremy nails it again with this beginner-friendly introduction to Service Workers and Progressive Web Apps.

Donny Truong

Jeremy’s technical writing is as superb as always. Similar to his first book for A Book Apart, which cleared up all my confusions about HTML5, Going Offline helps me put the pieces of the service workers’ puzzle together.

People have been saying nice things on Twitter too…

Aaron Gustafson:

It’s a fantastic read and a simple primer for getting Service Workers up and running on your site.

Ethan Marcotte:

Of course, if you’re looking to take your website offline, you should read @adactio’s wonderful book

Lívia De Paula Labate:

Ok, I’m done reading @adactio’s Going Offline book and as my wife would say, it’s the bomb dot com.

If that all sounds good to you, get yourself a copy of Going Offline in paperbook, or ebook (or both).

Frustration

I had some problems with my bouzouki recently. Now, I know my bouzouki pretty well. I can navigate the strings and frets to make music. But this was a problem with the pickup under the saddle of the bouzouki’s bridge. So it wasn’t so much a musical problem as it was an electronics problem. I know nothing about electronics.

I found it incredibly frustrating. Not only did I have no idea how to fix the problem, but I also had no idea of the scope of the problem. Would it take five minutes or five days? Who knows? Not me.

My solution to a problem like this is to pay someone else to fix it. Even then I have to go through the process of having the problem explained to me by someone who understands and cares about electronics much more than me. I nod my head and try my best to look like I’m taking it all in, even though the truth is I have no particular desire to get to grips with the inner workings of pickups—I just want to make some music.

That feeling of frustration I get from having wiring issues with a musical instrument is the same feeling I get whenever something goes awry with my web server. I know just enough about servers to be dangerous. When something goes wrong, I feel very out of my depth, and again, I have no idea how long it will take the fix the problem: minutes, hours, days, or weeks.

I had a very bad day yesterday. I wanted to make a small change to the Clearleft website—one extra line of CSS. But the build process for the website is quite convoluted (and clever), automatically pulling in components from the site’s pattern library. Something somewhere in the pipeline went wrong—I still haven’t figured out what—and for a while there, the Clearleft website was down, thanks to me. (Luckily for me, Danielle saved the day …again. I’d be lost without her.)

I was feeling pretty down after that stressful day. I felt like an idiot for not knowing or understanding the wiring beneath the site.

But, on the other hand, considering I was only trying to edit a little bit of CSS, maybe the problem didn’t lie entirely with me.

There’s a principle underlying the architecture of the World Wide Web called The Rule of Least Power. It somewhat counterintuitively states that you should:

choose the least powerful language suitable for a given purpose.

Perhaps, given the relative simplicity of the task I was trying to accomplish, the plumbing was over-engineered. That complexity wouldn’t matter if I could circumvent it, but without the build process, there’s no way to change the markup, CSS, or JavaScript for the site.

Still, most of the time, the build process isn’t a hindrance, it’s a help: concatenation, minification, linting and all that good stuff. Most of my frustration when something in the wiring goes wrong is because of how it makes me feel …just like with the pickup in my bouzouki, or the server powering my website. It’s not just that I find this stuff hard, but that I also feel like it’s stuff I’m supposed to know, rather than stuff I want to know.

On that note…

Last week, Paul wrote about getting to grips with JavaScript. On the very same day, Brad wrote about his struggle to learn React.

I think it’s really, really, really great when people share their frustrations and struggles like this. It’s very reassuring for anyone else out there who’s feeling similarly frustrated who’s worried that the problem lies with them. Also, this kind of confessional feedback is absolute gold dust for anyone looking to write explanations or documentation for JavaScript or React while battling the curse of knowledge. As Paul says:

The challenge now is to remember the pain and anguish I endured, and bare that in mind when helping others find their own path through the knotted weeds of JavaScript.

Technical balance

Two technical editors worked with me on Going Offline.

Jake was one of the tech editors. He literally (co-)wrote the spec on service workers. There ain’t nuthin’ he don’t know about the code involved. His job was to catch any technical inaccuracies in my writing.

The other tech editor was Amber. She’s relatively new to web development. While I was writing the book, she had a solid grounding in HTML and CSS, but not much experience in JavaScript. That made her the perfect archetypal reader. Her job was to point out whenever I wasn’t explaining something clearly enough.

My job was to satisfy both of them. I needed to explain service workers and all its associated APIs. I also needed to make it approachable and understandable to people who haven’t dived deeply into JavaScript.

I deliberately didn’t wait until I was an expert in this topic before writing Going Offline. I knew that the more familiar I became with the ins-and-outs of getting a service worker up and running, the harder it would be for me to remember what it was like not to know that stuff. I figured the best way to avoid the curse of knowledge would be not to accrue too much of it. But then once I started researching and writing, I inevitably became more au fait with the topic. I had to try to battle against that, trying to keep a beginner’s mind.

My watchword was this great piece of advice from Codebar:

Assume that anyone you’re teaching has no knowledge but infinite intelligence.

It was tricky. I’m still not sure if I managed to pull off the balancing act, although early reports are very, very encouraging. You’ll be able to judge for yourself soon enough. The book is shipping at the start of next week. Get your order in now.

The audience for Going Offline

My new book, Going Offline, starts with no assumption of JavaScript knowledge, but by the end of the book the reader is armed with enough code to make any website work offline.

I didn’t want to overwhelm the reader with lots of code up front, so I’ve tried to dole it out in manageable chunks. The amount of code ramps up a little bit in each chapter until it peaks in chapter five. After that, it ramps down a bit with each subsequent chapter.

This tweet perfectly encapsulates the audience I had in mind for the book:

Some people have received advance copies of the PDF, and I’m very happy with the feedback I’m getting.

Honestly, that is so, so gratifying to hear!

Words cannot express how delighted I am with Sara’s reaction:

She’s walking the walk too:

That gives me a warm fuzzy glow!

If you’ve been nervous about service workers, but you’ve always wanted to turn your site into a progressive web app, you should get a copy of this book.