Tags: format

294

sparkline

Tuesday, November 13th, 2018

Optimise without a face

I’ve been playing around with the newly-released Squoosh, the spiritual successor to Jake’s SVGOMG. You can drag images into the browser window, and eyeball the changes that any optimisations might make.

On a project that Cassie is working on, it worked really well for optimising some JPEGs. But there were a few images that would require a bit more fine-grained control of the optimisations. Specifically, pictures with human faces in them.

I’ve written about this before. If there’s a human face in image, I open that image in a graphics editing tool like Photoshop, select everything but the face, and add a bit of blur. Because humans are hard-wired to focus on faces, we’ll notice any jaggy artifacts on a face, but we’re far less likely to notice jagginess in background imagery: walls, materials, clothing, etc.

On the face of it (hah!), a browser-based tool like Squoosh wouldn’t be able to optimise for faces, but then Cassie pointed out something really interesting…

When we were both at FFConf on Friday, there was a great talk by Eleanor Haproff on machine learning with JavaScript. It turns out there are plenty of smart toolkits out there, and one of them is facial recognition. So I wonder if it’s possible to build an in-browser tool with this workflow:

  • Drag or upload an image into the browser window,
  • A facial recognition algorithm finds any faces in the image,
  • Those portions of the image remain crisp,
  • The rest of the image gets a slight blur,
  • Download the optimised image.

Maybe the selecting/blurring part would need canvas? I don’t know.

Anyway, I thought this was a brilliant bit of synthesis from Cassie, and now I’ve got two questions:

  1. Does this exist yet? And, if not,
  2. Does anyone want to try building it?

Monday, November 12th, 2018

Squoosh

A handy in-browser image compression tool. Drag, drop, tweak, and export.

Saturday, November 10th, 2018

Webmentions at Indie Web Camp Berlin

I was in Berlin for most of last week, and every day was packed with activity:

By the time I got back to Brighton, my brain was full …just in time for FF Conf.

All of the events were very different, but equally enjoyable. It was also quite nice to just attend events without speaking at them.

Indie Web Camp Berlin was terrific. There was an excellent turnout, and once again, I found that the format was just right: a day of discussions (BarCamp style) followed by a day of doing (coding, designing, hacking). I got very inspired on the first day, so I was raring to go on the second.

What I like to do on the second day is try to complete two tasks; one that’s fairly straightforward, and one that’s a bit tougher. That way, when it comes time to demo at the end of the day, even if I haven’t managed to complete the tougher one, I’ll still be able to demo the simpler one.

In this case, the tougher one was also tricky to demo. It involved a lot of invisible behind-the-scenes plumbing. I was tweaking my webmention endpoint (stop sniggering—tweaking your endpoint is no laughing matter).

Up until now, I could handle straightforward webmentions, and I could handle updates (if I receive more than one webmention from the same link, I check it each time). But I needed to also handle deletions.

The spec is quite clear on this. A 404 isn’t enough to trigger a deletion—that might be a temporary state. But a status of 410 Gone indicates that a resource was once here but has since been deliberately removed. In that situation, any stored webmentions for that link should also be removed.

Anyway, I think I got it working, but it’s tricky to test and even trickier to demo. “Not to worry”, I thought, “I’ve always got my simpler task.”

For that, I chose to add a little map to my homepage showing the last location I published something from. I’ve been geotagging all my content for years (journal entries, notes, links, articles), but not really doing anything with that data. This is a first step to doing something interesting with many years of location data.

I’ve got it working now, but the demo gods really weren’t with me at Indie Web Camp. Both of my demos failed. The webmention demo failed quite embarrassingly.

As well as handling deletions, I also wanted to handle updates where a URL that once linked to a post of mine no longer does. Just to be clear, the URL still exists—it’s not 404 or 410—but it has been updated to remove the original link back to one of my posts. I know this sounds like another very theoretical situation, but I’ve actually got an example of it on my very first webmention test post from five years ago. Believe it or not, there’s an escort agency in Nottingham that’s using webmention as a vector for spam. They post something that does link to my test post, send a webmention, and then remove the link to my test post. I almost admire their dedication.

Still, I wanted to foil this particular situation so I thought I had updated my code to handle it. Alas, when it came time to demo this, I was using someone else’s computer, and in my attempt to right-click and copy the URL of the spam link …I accidentally triggered it. In front of a room full of people. It was midly NSFW, but more worryingly, a potential Code Of Conduct violation. I’m very sorry about that.

Apart from the humiliating demo, I thoroughly enjoyed Indie Web Camp, and I’m going to keep adjusting my webmention endpoint. There was a terrific discussion around the ethical implications of storing webmentions, led by Sebastian, based on his epic post from earlier this year.

We established early in the discussion that we weren’t going to try to solve legal questions—like GDPR “compliance”, which varies depending on which lawyer you talk to—but rather try to figure out what the right thing to do is.

Earlier that day, during the introductions, I quite happily showed webmentions in action on my site. I pointed out that my last blog post had received a response from another site, and because that response was marked up as an h-entry, I displayed it in full on my site. I thought this was all hunky-dory, but now this discussion around privacy made me question some inferences I was making:

  1. By receiving a webention in the first place, I was inferring a willingness for the link to be made public. That’s not necessarily true, as someone pointed out: a CMS could be automatically sending webmentions, which the author might be unaware of.
  2. If the linking post is marked up in h-entry, I was inferring a willingness for the content to be republished. Again, not necessarily true.

That second inferrence of mine—that publishing in a particular format somehow grants permissions—actually has an interesting precedent: Google AMP. Simply by including the Google AMP script on a web page, you are implicitly giving Google permission to store a complete copy of that page and serve it from their servers instead of sending people to your site. No terms and conditions. No checkbox ticked. No “I agree” button pressed.

Just sayin’.

Anyway, when it comes to my own processing of webmentions, I’m going to take some of the suggestions from the discussion on board. There are certain signals I could be looking for in the linking post:

  • Does it include a link to a licence?
  • Is there a restrictive robots.txt file?
  • Are there meta declarations that say noindex?

Each one of these could help to infer whether or not I should be publishing a webmention or not. I quickly realised that what we’re talking about here is an algorithm.

Despite its current usage to mean “magic”, an algorithm is a recipe. It’s a series of steps that contribute to a decision point. The problem is that, in the case of silos like Facebook or Instagram, the algorithms are secret (which probably contributes to their aura of magical thinking). If I’m going to write an algorithm that handles other people’s information, I don’t want to make that mistake. Whatever steps I end up codifying in my webmention endpoint, I’ll be sure to document them publicly.

Saturday, November 3rd, 2018

2018-11-03, 21:54 - sonniesedge.co.uk

Day one of Indie Web Camp Berlin is done, and it was great! Here’s Charlie’s recap of the sessions she attended.

Wednesday, September 19th, 2018

RevAMP. — Ethan Marcotte

I’m heartened to see that Google’s moving to make the AMP format more open. But this new governance model doesn’t change the underlying, more fundamental issues: specifically, Google’s use of its market dominance to broaden AMP’s adoption, and to influence the direction of a more decentralized and open web.

Sunday, September 9th, 2018

Why the Future of Data Storage is (Still) Magnetic Tape - IEEE Spectrum

It turns out that a whole lot of The So-Called Cloud is relying on magnetic tape for its backups.

Monday, August 20th, 2018

Playing with the Indieweb

A good half-hour presentation by Stephen Rushe on the building blocks of the indie web. You can watch the video or look through the slides.

I’ve recently been exploring the world of the IndieWeb, and owning my own content rather than being reliant on the continued existence of “silos” to maintain it. This has led me to discover the varied eco-system of IndieWeb, such as IndieAuth, Microformats, Micropub, Webmentions, Microsub, POSSE, and PESOS.

Tuesday, July 31st, 2018

abc to SVG | CSS-Tricks

Aw, this is so nice! Chris points to the way that The Session generates sheet music from abc text:

The SVG conversion is made possible entirely in JavaScript by an open source library. That’s the progressive enhancement part. Store and ship the basic format, and let the browser enhance the experience, if it can (it can).

Here’s another way of thinking of it: I was contacted by a blind user of The Session who hadn’t come across abc notation before. Once they realised how it worked, they said it was like having alt text for sheet music! 🤯

Thursday, July 26th, 2018

Figures in the Stars

A lovely bit of data visualisation from Nadieh showing the differences and commonalities in constellations across cultures. As always, she’s written up the process too.

Sunday, July 1st, 2018

Monday, June 18th, 2018

The crooked timber of humanity | 1843

Tom Standage—author of the brilliant book The Victorian Internet—relates a tale of how the Chappe optical telegraph was hacked in 19th century France, thereby making it one of the earliest recorded instances of a cyber attack.

Friday, March 30th, 2018

Putting Civilization in a Box Means Choosing Our Legacy

A run-down of digital preservation technologies for very, very long-term storage …in space.

Tuesday, March 27th, 2018

Compressive Images Revisited - TimKadlec.com

Tim explains why that neat trick of making a really big JPEG with quality set to 0% is no longer necessary, and how the savings you make in bandwidth with that technique are nullified by the expense of the memory footprint needed.

Tuesday, March 20th, 2018

How Fast Is Amp Really? - TimKadlec.com

An excellent, thorough, even-handed analysis of AMP’s performance from Tim. The AMP format doesn’t make that much of a difference, the AMP cache does speed things up (as would any CDN), but it’s the pre-rendering that really delivers the performance boost …as long as you give up your URLs.

But right now, the incentives being placed on AMP content seem to be accomplishing exactly what you would think: they’re incentivizing AMP, not performance.

Thursday, March 8th, 2018

New Dark Age: Technology, Knowledge and the End of the Future by James Bridle

James is writing a book. It sounds like a barrel of laughs.

In his brilliant new work, leading artist and writer James Bridle offers us a warning against the future in which the contemporary promise of a new technologically assisted Enlightenment may just deliver its opposite: an age of complex uncertainty, predictive algorithms, surveillance, and the hollowing out of empathy.

Saturday, February 24th, 2018

On AMP for Email by Jason Rodriguez

Philosophically, I’m completely against Google’s AMP project and AMP for Email, too. I will always side with the open web and the standards that power it, and AMP is actively working against both. I’m all-in on a faster web for everyone, but I just can’t get behind Google’s self-serving method for providing that faster web.

Transparency and the AMP Project · Issue #13597 · ampproject/amphtml

Luke Stevens is trying to get untangle the very mixed signals being sent from different parts of Google around AMP’s goals. The response he got—before getting shut down—is very telling in its hubris and arrogance.

I believe the people working on the AMP format are well-intentioned, but I also believe they have conflated the best interests of Google with the best interests of the web.

Wednesday, February 14th, 2018

AMP: the missing controversy – Ferdy Christant

AMP pages aren’t fast because of the AMP format. AMP pages are fast when you visit one via Google search …because of Google’s monopoly on preloading:

Technically, a clever trick. It’s hard to argue with that. Yet I consider it cheating and anti competitive behavior.

Preloading is exclusive to AMP. Google does not preload non-AMP pages. If Google would have a genuine interest in speeding up the whole web on mobile, it could simply preload resources of non-AMP pages as well. Not doing this is a strong hint that another agenda is at work, to say the least.

AMPlified. — Ethan Marcotte

As of this moment, the power dynamics are skewed pretty severely in favor of Google’s proprietary AMP standard, and against those of us who’d ask this question:

What can I do about AMP?

The Two Faces of AMP - TimKadlec.com

So, to recap, the web community has stated over and over again that we’re not comfortable with Google incentivizing the use of AMP with search engine carrots. In response, Google has provided yet another search engine carrot for AMP.

This wouldn’t bother me if AMP was open about what it is: a tool for folks to optimize their search engine placement. But of course, that’s not the claim. The claim is that AMP is “for the open web.”

Spot on, Tim. Spot on.

If AMP is truly for the open web, de-couple it from Google search entirely. It has no business there.

Look, AMP, you’re either a tool for the open web, or you’re a tool for Google search. I don’t mind if you’re the latter, but please stop pretending you’re something else.