Lock up your data

There have been a number of experiments carried out to investigate the effects of video on communication. I recall hearing about one experiment done with mothers and babies. The mothers were placed in one room with a video camera and the babies were placed in another room with a monitor showing a video feed from the mother. The babies interacted just fine with the video representations of their mothers. Then a one second lag was introduced. The babies freaked out.

I was reminded of this during the closing panel on day two of Fundamentos Web. Tim Berners-Lee dialed in via iChat to join a phalanx of panelists in meatspace. Alas, the signal wasn’t particularly strong. Add to that the problem of simultaneous translation, which isn’t really simultaneous, and you’ve got a gap of quite a few seconds between Asturias and Sir Tim’s secret lair. The resultant communication was, therefore, not really much of a conversation. It was still fascinating though.

Some of the most interesting perspectives came from George and Hannah—the people who are working at the coalface of social media. George asked Sir Tim for advice on the cultural side-effects of open data—how to educate people that publishing on sites like Flickr means that your pictures can and will be viewed in other contexts. Interestingly, Sir Tim’s response indicated that he was more concerned with educating people in how to keep their data private.

This difference in perspective might be an indication of a generation gap. The assumption amongst, say, teenagers is that everything is public except what they explictly want to keep private. The default assumption amongst older folks (such as my generation) is the exact opposite: data is private except when it is explictly made public. The first position matches the sensibilities of Flickr and Last.fm. The second position is more in line with Facebook’s walled garden approach.

I was really glad that George raised this issue. It’s something that has been occupying my mind lately, particular in reference to Flickr.

Flickr provides a range of ways of accessing your photos; the website, RSS, KML, LOL… and of course, the API. It’s a wonderful API, certainly the best one that I’ve played with. I had a blast putting together the Flickr portion of Adactio Elsewhere.

Using the API, I was able to put together my own interface onto my photos and the latest photos from my contacts. There’s nothing particularly remarkable about that—there are literally hundreds, if not thousands, of third-party sites that use the Flickr API to do the same thing. However, a lot of those sites use Flash or non-degrading Ajax. But I use Hijax. That means that, even though I’ve built an Ajax interface, the fundamental interaction is RESTful with good ol’ fashioned URLs. As a result—and this is just one of the benefits of Hijax—the Googlebot can spider all possible states of my application.

You can probably see where this is going. It’s a similar situation to what happened with my pirate-speak page converter. Even though I’m not providing a direct interface onto anyone’s pictures, Google is listing deep links in its search results.

This has resulted in a shitstorm on the Flickr forum. Reading through the reactions on that thread has been illuminating. In a nutshell, I’m getting penalised for having search-engine friendly pages. I, along with some other people on that thread, have tried to explain that Adactio Elsewhere is just one example of public Flickr data appearing beyond the bounds of Flickr’s domain—an issue tangentially relatred to intellectual property rights.

In this particular sitution, I was able to take some steps to soothe the injured parties by creating a PHP array called $stroppy_users. I also added a meta element instructing searchbots not to index Adactio Elsewhere which, I believe, will prevent any future grievances. As I said in the forum:

If a tree falls in the forest and Google doesn’t index it, does it make a noise?

I think the outburst of moral panic on the Flickr forum is symptomatic of a larger trend that has accompanied the growth of the site’s user base. Two years ago, Flickr was not your father’s photo sharing website. Now, especially with the migration from Yahoo Photos, it is. If you look at some of the frightened reactions to Flickr’s pirate day shenanigans you’ll see even more signs of this growth (Tom has a great in-depth look at the furore).

As sites like Flickr and Last.fm move from a user base of early adopters into the mainstream, this issue becomes more important. What isn’t clear is how the moral responsibility should be distributed. Should Flickr provide clearer rules for API use? Should Google index less? Should the people publishing photos take more care in choosing when to mark photos as public and when to mark photos as private? Should developers (like myself) be more cautious in what we allow our applications to do with the API?

I don’t know the answers but I’m fairly certain that we’re not dealing with a technological issue here; this is a cultural matter.

Have you published a response to this? :

Responses

pauldwaite

My first reaction is a grumpy IT one. It’s the internet. It’s a planet-wide, open network. If you want your data to be private, don’t put it on there.

I think sites like Flickr need to be clear with their users about how public their data is. I imagine that becomes much harder when your users are normal people.

# Posted by pauldwaite on Friday, October 5th, 2007 at 1:50pm

pauldwaite

Having read the Flickr forum thread, man, some Flickr users get really prickly about this, don’t they?

# Posted by pauldwaite on Friday, October 5th, 2007 at 2:07pm

Nate Klaiber

I have read through the Flirck forum. I don’t feel you are to blame for the situation, and you have explained yourself very well. I think it is a deeper issue that you have explained here.

The ‘social web’ consumes everything. It can be cached, archived, or used via other means all around the web via the methods you explained above.

I think Flickr should explain more about this when people post photos - that the meta information isn’t sent with the API to keep their photos truly protected from use elsewhere.

I don’t think it is google’s fault, they are doing their job of indexing the content.

Sure, they asked you to remove the instances of your displaying of their photos - but, what about the rest that don’t? I don’t think they realize the volume of other sites that do the same thing in a different way.

It is definitely a generational thing. My parents wouldn’t put much information on the WWW, while I am not as worried about the information I post (to a degree). So, is it just ‘if you want your stuff private, never put it on the net?’

Drew McLellan

From a purely technical viewpoint, it’s interesting to consider what it means to have image on a web page. By making use of the HTML IMG element, your page provides a link to an image resource elsewhere on the web. As most browsers have the capability of displaying images, that resource is requested by the user’s browser and displayed alongside the rest of the document.

Compare this to publishing a regular text hyperlink (using the A element) to another HTML document elsewhere on the web. It’s usually unsatisfactory to display another HTML document inside the linking page (yes, Snap Preview, I’m looking at you), but it can be done should the viewer of the page have software to do that.

So technically what’s happening isn’t at all different. The argument against Adactio Elsewhere would appear to be that it’s ok for me to publish something, but not for you to suggest others dare look at it.

Sander

The one suggestion from the group of "concerned" flickr users which I found to have some merit was to check the license of the photos (I’m taking their word for this being possible with the API), and to limit the way you display photos based on that. Now in some ways that’s still tangential, but on the other hand, it could be seen as just good netizenship - as something that all sites using the API should perhaps do, not because they’re being required to, but because it is likely to most accurately reflect the intent of the original photographers. Doing this would be a social solution rather than a technical solution, but if API users can in general agree with each other to do that, it really would be a solution. (And then it could be written into the various tutorials and examples to let new users of the API pick that up from the start.)

Have you considered going this route? Other than doing this being a bit of extra work when you probably just quickly dashed off the current implementation, is there a reason you don’t? It seems to me to be a more useful solution in the long term than just blocking search engines wholesale, as that’s really just avoidance of the symptoms becoming known…

Generalizing from the above, I’d say the moral responsibility is twofold: API providers should enable ways to accurately reflect the intent of its content providers, and API consumers should honor this intent (at least when using the data in a way not specific to a single user).

P.S. Please, could you please provide a larger default size for the comment textarea? I don’t comment here quite enough to overcome my laziness wrt writing a rule for it in my userContent.css :)

# Posted by Sander on Sunday, October 7th, 2007 at 2:42pm

Rory Parle

The problem is the disjunction between the non-technical perception of a web application and the actual workings. From a user’s perspective, you’re displaying their photos against their will and without their permission. Thus is appears very clear cut from a user point of view that you’re in the wrong.

But from a technical perspective you’re simply telling their browser how to access the pictures and instructing it to display them. So it also appears very clear cut, once one understands what’s actually going on, that you’re doing nothing wrong.

I would suggest that Google has no responsibility to prevent this sort of deep linking, you have no responsibility as an application developer to block it. If Flickr users are bothered by how Flickr distributes their photos or the data (including URLs) about their photos then it is up to either the users to abandon Flickr for something else or to Flickr to allow users to restrict that distribution of data.

That said, were I in your position I think I would do the same as you have done, simply as a courtesy.

# Posted by Rory Parle on Sunday, October 7th, 2007 at 3:59pm

Previously on this day

15 years ago I wrote Web video roundup

Clean out your fat pipes, here comes some bandwidth-sucking fun.

16 years ago I wrote Olde England

I spent the weekend with my old pals from Hamburg, Schorsch and Birgit, who were in town for a quick visit.

17 years ago I wrote Body art meets architecture

Somebody is going around piercing buildings in Brighton & Hove.

18 years ago I wrote First Language Gene Found

Good news for Noam Chomsky. Scientists in England claim to have isolated a gene directly related to language.