Tags: data

252

sparkline

Figures in the Stars

A lovely bit of data visualisation from Nadieh showing the differences and commonalities in constellations across cultures. As always, she’s written up the process too.

Astronomical Typography

Typography meets astronomy in 16th century books like the Astronomicum Caesareum.

It is arguably the most typographically impressive scientific manual of the sixteenth century. Owen Gingerich claimed it, “the most spectacular contribution of the book-maker’s art to sixteenth-century science.”

Disturbances #16: Digital Dust

From smart dust and spimes, through to online journaling and social media, to machine learning, big data and digital preservation…

Is the archive where information goes to live forever, or where data goes to die?

Ways to think about machine learning — Benedict Evans

This strikes me as a sensible way of thinking about machine learning: it’s like when we got relational databases—suddenly we could do more, quicker, and easier …but it doesn’t require us to treat the technology like it’s magic.

An important parallel here is that though relational databases had economy of scale effects, there were limited network or ‘winner takes all’ effects. The database being used by company A doesn’t get better if company B buys the same database software from the same vendor: Safeway’s database doesn’t get better if Caterpillar buys the same one. Much the same actually applies to machine learning: machine learning is all about data, but data is highly specific to particular applications. More handwriting data will make a handwriting recognizer better, and more gas turbine data will make a system that predicts failures in gas turbines better, but the one doesn’t help with the other. Data isn’t fungible.

Generative Artistry

Tutorials for recreating classics of generative art with JavaScript and canvas.

Artificial Intelligence for more human interfaces | Christian Heilmann

An even-handed assessment of the benefits and dangers of machine learning.

New Privacy Rules Could Make This Woman One of Tech’s Most Important Regulators - The New York Times

It’s kind of surreal to see a profile in the New York Times of my sister-in-law. Then again, she is Ireland’s data protection commissioner, and what with Facebook, Twitter, and Google all being based in Ireland, and with GDPR looming, her work is more important than ever.

By the way, this article has 26 tracking scripts. I don’t recall providing consent for any of them.

Sandra Rendgen

A blog dedicated to data visualisation, all part of ongoing research for a book on Charles-Joseph Minard.

Data visualisation, interactive media and computational design are one focus of my work, but I also do research in the history of maps and diagrams.

Colour Wheels, Charts, and Tables Through History – The Public Domain Review

These are beautiful!

Featured below is a chronology of various attempts through the last four centuries to visually organise and make sense of colour.

Shipmap.org | Visualisation of Global Cargo Ships | By Kiln and UCL

A beautiful visualisation of shipping routes and cargo. Mesmerising!

You can see movements of the global merchant fleet over the course of 2012, overlaid on a bathymetric map. You can also see a few statistics such as a counter for emitted CO2 (in thousand tonnes) and maximum freight carried by represented vessels (varying units).

Facebook Is Tracking Me Even Though I’m Not on Facebook | American Civil Liberties Union

But while I’ve never “opted in” to Facebook or any of the other big social networks, Facebook still has a detailed profile that can be used to target me. I’ve never consented to having Facebook collect my data, which can be used to draw very detailed inferences about my life, my habits, and my relationships. As we aim to take Facebook to task for its breach of user trust, we need to think about what its capabilities imply for society overall. After all, if you do #deleteFacebook, you’ll find yourself in my shoes: non-consenting, but still subject to Facebook’s globe-spanning surveillance and targeting network.

Facebook’s “shadow profiles” are truly egregious …and if you include social sharing buttons on a website, you’re contributing to the data harvest.

If you administer a website and you include a “Like” button on every page, you’re helping Facebook to build profiles of your visitors, even those who have opted out of the social network.

If you are responsible for running a website, try browsing it with a third-party-blocking extension turned on. Think about how much information you’re requiring your users to send to third parties as a condition for using your site. If you care about being a good steward of your visitors’ data, you can re-design your website to reduce this kind of leakage.

Doc Searls Weblog · Facebook’s Cambridge Analytica problems are nothing compared to what’s coming for all of online publishing

What will happen when the Times, the New Yorker and other pubs own up to the simple fact that they are just as guilty as Facebook of leaking its readers’ data to other parties, for—in many if not most cases—God knows what purposes besides “interest-based” advertising? And what happens when the EU comes down on them too? It’s game-on after 25 May, when the EU can start fining violators of the General Data Protection Regulation (GDPR). Key fact: the GDPR protects the data blood of EU citizens wherever they risk having it sucked in the digital world.

Sessions Map

This is nifty—a map of all the Irish music sessions and events happening around the world, using the data from TheSession.org.

If you’re interested in using data from The Session, there’s a read-only API and regularly-updated data dumps.

Animated SVG Radial Progress Bars - daverupert.com

Using a single path SVG, a smidge of CSS, and ~6 lines of JavaScript.

In this days of monolithic frameworks, I really like seeing modest but powerful patterns like this—small pieces that we can loosely join.

Paul Ford: Facebook Is Why We Need a Digital Protection Agency - Bloomberg

The word “leak” is right. Our sense of control over our own destinies is being challenged by these leaks. Giant internet platforms are poisoning the commons. They’ve automated it.

New Dark Age: Technology, Knowledge and the End of the Future by James Bridle

James is writing a book. It sounds like a barrel of laughs.

In his brilliant new work, leading artist and writer James Bridle offers us a warning against the future in which the contemporary promise of a new technologically assisted Enlightenment may just deliver its opposite: an age of complex uncertainty, predictive algorithms, surveillance, and the hollowing out of empathy.

Be Kind, Design – Nat Dudley – Medium

The transcript of Nat’s superb Webstock talk.

We need to start thinking about inclusivity the same way as we think about mobile design. We realised with mobile, that we have to put that experience at the centre of what we do, otherwise it won’t be successful and we’ll fail mobile users. We realised we have to design mobile-first.

The same is true for inclusivity. Instead of it being an afterthought if it’s thought about at all, it needs to be our first thought. It needs to be central to our strategy, embedded in how we analyse and solve the problems we encounter. Designing inclusive-first is the only way we’ll manage to serve and protect all of the people who use what we build.

Let’s talk about usernames

This post goes into specifics on Django, but the broader points apply no matter what your tech stack. I’m relieved to find out that The Session is using the tripartite identity pattern (although Huffduffer, alas, isn’t):

What we really want in terms of identifying users is some combination of:

  1. System-level identifier, suitable for use as a target of foreign keys in our database
  2. Login identifier, suitable for use in performing a credential check
  3. Public identity, suitable for displaying to other users

Many systems ask the username to fulfill all three of these roles, which is probably wrong.

as days pass by — Collecting user data while protecting user privacy

Really smart thinking from Stuart on how the randomised response technique could be applied to analytics. My only question is who exactly does the implementation.

The key point here is that, if you’re collecting data about a load of users, you’re usually doing so in order to look at it in aggregate; to draw conclusions about the general trends and the general distribution of your user base. And it’s possible to do that data collection in ways that maintain the aggregate properties of it while making it hard or impossible for the company to use it to target individual users. That’s what we want here: some way that the company can still draw correct conclusions from all the data when collected together, while preventing them from targeting individuals or knowing what a specific person said.