Sparklining

I was in Nuremberg last weekend for Indie Web Camp. It was great.

At some point I really should stop being surprised by just how much gets done in one weekend, but once again, I was blown away by the results.

On the first day we had very productive BarCamp-like discussion sessions, and on the second day it was heads-down hacking. But it was hacking with help. Being in the same room as other people who each have their own areas of expertise is so useful. It really turbo-charges the amount that you can get accomplished.

For example, I was helping Tom turn his website into a progressive web app with the addition of a service worker and a manifest file. Meanwhile Tom was helping somebody else get a Wordpress site up and running.

Actually, that was what really blew me away: two people began the second day of Indie Web Camp Nuremberg without websites and by the end of the day, they both had their own sites up and running. For me, that’s the real spirit of the indie web—I know we tend to go on about the technologies like h-card, h-entry, webmentions, micropub, and IndieAuth, but really it’s not about the technologies; it’s about having your own place on the web so that you have control over what you put out in the world.

For my part, I was mostly making some cosmetic changes to my site. There was a really good discussion on the first page about home pages. What’s the purpose of a home page? For some, it’s about conveying information about the person. For others, it’s a stream of activity.

My site used to have a splash-like homepage; just a brief bio and a link to the latest blog post. Then I changed it into a stream a few years ago. But that means that the home page of my site doesn’t feel that different from sections of the site like the journal or the link list.

During the discussion at Indie Web Camp, we started looking at how silos design their profile pages to see what we could learn from them. Looking at my Twitter profile, my Instagram profile, my Untappd profile, or just about any other profile, it’s a mixture of bio and stream, with the addition of stats showing activity on the site—signs of life.

I decided I’d add signs of life to my home page. Once again, I reached for my favourite little data visualisation helper: sparklines

A sparkline is a small intense, simple, word-sized graphic with typographic resolution.

I’ve already got sparklines on Huffduffer and on The Session so I suppose it was only a matter of time before they showed up here.

Small Screen Sparklines Large Screen Sparklines

I’ve been tweaking them ever since I got back from Germany. Now I’ve added in a little h-card bio as well.

Bio and sparklines Bio And Sparklines (large screen)

Initially I was using the fantastic little scripted SVG that Stuart made , the same one that I’m using on Huffduffer and The Session. But Kevin pointed out that a straightforward polyline would be more succinct. And in the case of my own site, there’s only four sparklines so it wouldn’t be a huge overhead to hard-code the values straight into the SVGs.

Yesterday was the first day of Render Conference in Oxford (I’ll be speaking later today). Sara gave a blisteringly great talk on (what else?) SVGs and I got so inspired I started refactoring my code right there and then. I’m pretty happy with how the sparklines are working now, although I’m sure I’ll continue to play around with them some more.

There’s another activity visualisation that I’m eager to play around with. I really like the calendar heatmap on my Github profile. I could imagine using something like that for an archive view on my own site.

Luckily for me, I’ll have a chance to play around with my website a bit more very soon. There’s going to be another Indie Web Camp in Germany very soon.

Indie Web Camp Düsseldorf will take place on May 7th and 8th, right before Beyond Tellerrand. Last year’s event was really inspiring. If there’s any chance you can make it, you should come along. You won’t regret it.

Have you published a response to this? :

Responses

Webrocker

After having a splendit time last weekend helping and participating at the Indiewebcamp Nuremberg (as a result of which I know have a ‚progressive web app‚ where before there was only my website ;), I changed my travel plans around the visit of the Beyond Tellerrand conference, and will attend the next Indiewebcamp in Düsseldorf. Yay! Eager and happy to meet ‚Die drei von der Indiewebtankstelle‘ (Aaron, Tantek and Jeremy) again, and also Joschi and the other attendees in just about two weeks. :-)

Foto @iwontsignuphere / tollwerk

# Posted by Webrocker on Friday, April 22nd, 2016 at 9:10pm

voxpelli.com

Liked: https://adactio.com/journal/10531

# Friday, April 22nd, 2016 at 10:11pm

rem

@adactio the problem I’ve always had with sparklines is that I never understand what the axis is. For your journal is it X=days? :-\

# Posted by rem on Monday, May 30th, 2016 at 7:45pm

Jeremy Keith

@rem Time, yes. Overall time divided by number of pixels on the X axis. So in my case (15 years of posts), one pixel is about 45 days.

https://jeremycherfas.net

As I mentioned a while ago, I had been living in cloud cuckoo land with respect to spammers. I naïvely thought that my little microblog was of no interest to spammers. The truth was, I just wasn’t being notified of the incoming dross. As a result, I’m afraid I might have got onto some lists as a very soft touch. Now that I am getting notified, I am trying to stomp on spam as soon as I can, and I recently found myself wondering whether the flood was increasing, decreasing or staying the same. To begin with, I wasn’t keeping records. About 10 days ago I started to do that, recording the number of spam received per day and also the number of items deleted per say, which is often more because I am still deleting ones that came in while I was blissfully ignorant.

Today, I thought I would use that data to play with sparklines, which many people in the indieweb use to display data. I was inspired most recently by Ton Zylstra and Jeremy Keith, whose code I lifted. That worked fine; you can see the result over there in the sidebar. For now, I’m happy to update by hand once a week or so. At some point, however, I am going to have to go back and try to understand what the code is actually doing. Kevin Marks’ article is where I’ll start.

Francesco Schwarz

Since I started my IndieWeb journey this month I’ve been thinking a lot about the digital content I’m producing and how I can be more in control of my data, avoiding data silos as much as possible.

The first step towards that goal was to put a system in place where I write and publish content on this website and optionally publish it to other social media outlets, such as Twitter and Mastodon. This content strategy is called POSSE. If a Twitter user answers a tweet that got originally published via my website, I will be notified about it automatically (this is called backfeed), and I can choose to answer the tweet again through my website, which would subsequently create a native Twitter response. Puh, that was a long thought. But you get the gist of it, right? The idea behind all of this is to almost exclusively have as the source of my content this very website. This has some really cool benefits, for example you are being enabled to steer conversations freely from one platform to another, which is pretty mind-blowing when you think about it.

These mechanisms are being powered by Webmentions and Bridgy, and once they are set up (I’m not goin’ to lie to you, it was pretty frustrating at times to get this thing goin’), everything works seamlessly. But the real magic happens if you skip social networks altogether and directly communicate from one Webmention endpoint to another. Ben Werdmüller describes it best:

The effect of independence is practical, not just ideological: if you publish on your own site, your words are much more likely to stand the test of time and still be online years later. Social networks come and go, adjust their policies, etc. And there’s a business value to being able to point to a single space online that holds your body of thought and work.

I see the elegance in this approach and am intrigued by it.

Anyway, excuse me for drifting off, let’s return to the topic of this article:

The system I described above now works for articles I write, for links I post and for notes I want to share. Another stream of data I’m producing since roughly three years now is running data, and I wanted to add that stream to my website as well. So I want to walk you through

  • the migration of said data to my website,
  • the automatic synchronization of all future runs to my website, and
  • the basic data visualization of some running metrics.
Data flow for runs

My data flow for runs looks something likes this:

  1. I record a run with my Garmin watch and save it at the end.
  2. The run then gets synchronized to my Garmin account.
    • After that I usually edit some metadata like changing the title, adding a note, make the run visible to my connections, etc.
  3. After the Garmin sync the run gets automatically syndicated to Strava and Runalyze.
    • I do this mainly for backup reasons.
    • Strava has the additional benefit of acting like a social network for athletes and it is not tied to any specific brands.
    • Runalyze is cool because it’s a passionate project by two runners and they present the data in a lot of fascinating ways, e.g. by adding additional interesting metrics like “marathon shape”, “monotony” or “fatigue”.
  4. Additionally (and this is new), runs now get also uploaded to the ProcessWire instance of this website.

When you look at the data flow you can already see that I can’t follow the same approach as I described in the beginning, meaning I can’t do POSSE: I have to first upload the activity through a proprietary brand with proprietary accessories (the Garmin watch) and only after that initial gateway can I backup the data to systems of my choosing. 🤷‍♂️

Data export from Garmin

First I needed to export the data of all my Garmin runs. Luckily, you can export every single data point that Garmin has saved about you on the “Export your data” page in your account. Click the button and after a few minutes you get an e-mail with a link to a ZIP file which contains a ton of data.

You can find the running activities in the file suffixed with summarizedActivities.json. Here’s a screenshot of where you can find it:

Data format of an activity

The JSON file basically consists of a huge array which contains all activities as JSON objects. An activity contains every data point my Garmin watch tracks for a run and it looks something like this (this is the shortened version, I omitted a lot of fields):

{ "activityId": 5108786485, "name": "Munich Running", "activityType": "running", "beginTimestamp": 1592491169000, "eventTypeId": 9, "duration": 4299037.109375, "distance": 1200615.0390625, "elevationGain": 4900, "elevationLoss": 4000, "avgSpeed": 0.27929999828338625, "maxSpeed": 0.32379999160766604, "avgHr": 140, "maxHr": 154, "maxRunCadence": 89, "steps": 6880, "calories": 3712.3577199999995, "aerobicTrainingEffect": 3.299999952316284, "avgVerticalOscillation": 10.130000305175782, "avgGroundContactTime": 289, "avgStrideLength": 105.98000488281251, "vO2MaxValue": 50, "avgVerticalRatio": 9.390000343322754, "anaerobicTrainingEffect": 0, "minTemperature": 23, "maxTemperature": 29, "minElevation": 51659.99755859375, "maxElevation": 53479.998779296875, "avgDoubleCadence": 156.328125, "maxDoubleCadence": 179, "locationName": "Munich", "summarizedDiveInfo": {}, "maxVerticalSpeed": 0.0399993896484375, "lapCount": 13, "waterEstimated": 865, "trainingEffectLabel": "AEROBICBASE", "activityTrainingLoad": 90.9810791015625, "pr": false
}

As I occasionally also record step walks or hikes and those also show up in the activities array, the first step was to filter the array for runs only (the trigger being "activityType": "running").

After that I had to think about which data points are relevant for me. Which metrics do I care about? Which metrics are irrelevant for me? For example the metrics duration, distance, elevationGain and beginTimestamp are pretty important, but things like the estimation of water loss or my user profile ID are not.

For the relevant metrics I then had to understand what the unit value for each metric actually stands for and then find out what calculations to make to transform it into the value unit that I want. For example, the unit for

  • avgVerticalRatio (e.g. 9.3900…) is percentage (no transformation needed, just rounding),
  • calories (e.g. 3712.3577…) is kilojoules, you have to multiply the value by 0.2388 to get kilocalories
  • duration (e.g. 4299037.1093) is milliseconds, you have to multiply the value by 1000 and round it to get seconds
  • maxSpeed (e.g. 0.3237…) is freakin’ meters per decisecond, you have to multiply the value by 36 to get kilometers per hour
  • pr stands for “personal record” and is just a boolean value,
  • and so on…
Migration script for ProcessWire

Next up is parsing the JSON, looping through every activity, apply the appropriate data transformations determined in the previous step and save every activity as a page in the ProcessWire backend.

Before that I already established a template hierarchy in ProcessWire: there is a page with the template runs and it has as its children pages with the template run. Every run page contains the fields that get filled up by the appropriate data of the JSON activity.

Here’s roughly what I came up with:

function saveGarminRunsToProcessWire() { $parent = pages()->get("template=runs"); // get the JSON and decode it to a PHP array // I can loop through $runs = jsondecode(filegetcontents('runs.json')); foreach ($runs as $run) { // check if run already exists if (pages()->get("template=run,activityid={$run->activityId}")->id) return; $p = new Page(); $p->of(false); $p->template = 'run'; $p->parent = $parent; // use the timestamp as the page name aka the URL // of the page $name = date('Y-m-d-H-i-s', $run->beginTimestamp / 1000); $p->name = sanitizer()->pageName($name); // save the page to have the template // fields available $p->save(); // now populate fields $p->of(false); if ($run->avgSpeed) { // transform from meters per deciseconds to kilometers // per hour, and round the value (e.g. 10.1 km/h) $p->speedavg = round($run->avgSpeed * 36, 1); } if ($run->pr == true) { // is_personal_record is a checkbox field in the ProcessWire // backend. Check it, if the JSON value is true $p->ispersonalrecord = 1; } // and so on… // save the page at the end $p->save(); }
}

In total I map around 30 fields from a JSON activity to a ProcessWire page.

After executing the script, all my Garmin runs where successfully mapped to a ProcessWire page in the backend:

And a single run features all data fields I deemed relevant for my use case:

All that was left to do is to present the data somehow in the frontend. The parent page of all runs shows the stream:

And a single run features its data in a simple way:

Syncing new runs

The next big question for me was how to continuously sync new runs to my website. What I needed was a way to use the Garmin API to get the most recent run(s) of my account, and basically execute the exact same function from above with the API response as the input data.

Luckily some great folks already thought long and hard about this and came up with solutions: David Wilcock has written a PHP adapter for interrogating the Garmin Connect API and it fits my needs perfectly (by the way, the documentation of the API functions is superb).

Ok, I have the API to talk to Garmin and get a response of my most recent running activites. What was left to do is to put ProcessWire into the mix and bootstrap the engine into the script. You can use ProcessWire’s API in other PHP apps and shell scripts, and that’s what I did. After changing the signature (and the name) of the “save Garmin JSON run to ProcessWire page” function from above, and installing the dependencies of the php-garmin-connect project with composer install I was ready to go. I came up with this script:

#!/usr/bin/php <?php namespace ProcessWire; // bootstrap ProcessWire engine
include '/path/to/processwire/index.php';
// include function to save from JSON to ProcessWire
include '/path/to/self-written/save-from-json-to-processwire.php';
// include php-garmin-connect project
require_once DIR . '/path/to/php-garmin-connect/vendor/autoload.php'; $credentials = [ 'username' => 'xxx', 'password' => 'xxx',
]; try { $garminConnect = new \dawguk\GarminConnect($credentials); // get the 5 most recent running activities $runs = $garminConnect->getActivityList(0, 5, 'running'); foreach ($runs as $run) { // the ProcessWire function from above saveGarminRunToProcessWirePage($run); }
} catch (Exception $objException) { // some log statement
}

Now whenever I want to sync Garmin runs to my website, all I have to do is execute the above script through my terminal.

To automate the process I set up a cron job in my Uberspace server to execute the script once every other day:

00 20 */2 * * /location/of/the/script/save-garmin-run-to-processwire.sh

Every time I sync a run to Garmin I can be sure that a day or two later the run will show up automatically on my website as well, and that all data aggregations and visualizations are updated with it (more on that below). 🎉

Bummer: Garmin export !== Garmin API

After the cron job ran (no pun intended) for the first time I got some weird values for a few metrics and some fields where completely empty in the newly created ProcessWire page for the run.

Turns out that the data format of an activity from the Garmin export is not really exactly the same as the response you get directly from the Garmin API. Some keys are named differently (e.g. averageRunningCadenceInStepsPerMinute instead of avgDoubleCadence or averageHR instead of avgHr) and also a few unit values are different (e.g. the unit for speeds is not “meters per decisecond” but the very more sensible “meters per second”, and the unit for calories is not “kilojoules” but “kilocalories”). So I adapted my data transformations accordingly. After deleting the ProcessWire page with the faulty data and manually triggering the script again, everything was fine.

In hindsight it would have made more sense to exclusively use the Garmin API as a means of exporting the running data and do the initial index. Afterwards one is always wiser.

The fun stuff: data aggregation and visualization

The cool thing about hosting data on a platform you have complete control over is that your imagination is the only limiting factor of what you can do with it and how to present the data in a way that you want it to be presented.

I started with two simple additional views on top of the (rather lame) stream of single runs:

  1. A table summary of a few metrics.
  2. Charts showcasing specific metrics of the last 100 runs.

You can see the views on the top of the “Runs” page.

Table summary

For the table summary I wanted to add up current weekly, monthly and yearly numbers for metrics like duration and distance. Additionally I showcase the number for the same metric of the corresponding last time period (last week, last month or last year) to get a quick overview of how I’m comparing to past periods.

As the calculations are potentially rather expensive, I hooked into the “saved a page” ProcessWire hook and do the calculations every time a “run” page is being saved in the backend. The values you see in the table are basically just static numbers that get retrieved quickly, the computation for them already happened before you visited the website.

SVG charts

For the charts showcasing metrics of the last 100 runs I opted for simple SVG spark line charts. The spark lines I spotted on Jeremy’s website were the initial motivator for me.

There are cool tools you can use for this, e.g. SVG Sparklines by Kevin Marks. The handling is super easy: include a SVG file, add the data you want to showcase as a query parameter, done. It looks like this: sparkline.svg?2,0,1,1,4,6,9,6,5,0,1,2,5,2. The SVG file has some JavaScript inside that interprets the query parameter and builds the SVG polyline accordingly.

In the end I chose not to use it, because I wanted to not depend upon JavaScript to render such a simple thing as a SVG chart. I do not use any external tool for generating the SVGs, instead I hand-coded it. The important things I wanted to achieve technically:

  • No JavaScript needed for showing a basic version of the SVG charts.
  • Progressively enhance the visuals with JavaScript by adding color scales to the data set to better distinguish between low and high values.
  • Make the charts responsive.

If you visit the “Runs” page with deactivated JavaScript and look at the charts you will see this:

The charts are accessible in that you can get a feeling of the relationship between the single data points. For example the spikes in the “Distance” chart are long runs I did when training for a marathon this year. On the other hand, charts like “Ground contact time” show that there are metrics that do not have much variance between data points.

When you activate JavaScript the charts get progressively enhanced with a neat color scale:

This is powered by chroma.js. You basically give it the minimum and maximum value of a data set and it generates a semantic color scale for every single data point in the set: lighter colors representing low values, darker colors representing high values). Relationships and differences can be spotted much easier and faster now, especially in data sets that don’t have much variance in them like “Ground contact time” or “Avg. vertical oscillation”.

Side note: For different types of color scales the chroma.js Color Palette Helper and ColorBrewer 2.0 were very helpful for me. Chapter 6: Visualizing Geographic Data was crucial for me in understanding how to work with chroma.js.

Moving on, making the charts responsive wasn’t an easy feat. The article setting me in the right thinking spot was “How to Make Charts with SVG” from Robin Rendle. As it is often the case in responsive design, every value you put on the SVG elements must be relative in order for this to work. Here’s a simplified example of the code I’m using:

<svg version='1.1' xmlns='http://www.w3.org/2000/svg'> <rect width='50%' height='100%' y='0%' style='transform: translateX(0);'></rect> <rect width='50%' height='50%' y='50%' style='transform: translateX(50%);'></rect>
</svg>

Let’s walk through the attributes and how they are being calculated:

  • width: divide 1 by the length of the data set, multiply the result by 100.
  • height: the first rect has the highest value in the data set with height: 100%. Divide the actual data value (e.g. in the case of speed this would be something like 10.1 kilometers per hour) by the highest value in the whole data set (also 10.1 in this case), multiply the result by 100. The second rect’s actual data value is therefore 5.05 kilometers per hour.
  • y: subtract 100 by the height value to push the rect to the bottom of the SVG canvas area (otherwise the chart would be “up side down”).
  • style: multiply width by the index of the data point to order the rects side-by-side on the x-axis of the chart.

This mechanism scales beautifully, you could easily have hundreds or thousands of rects in there (not sure if your browser/computer could handle such an amount though 🤭). You can give the wrapping svg element any width or height you want, and the chart elements will expand and contract accordingly.

What’s next?

I want to put more work into the charts, progressively enhancing them even more. I thought about

  • linking every single bar to the URL of the run it represents,
  • add more info to the user on hover
  • show the min. and max. value of each chart by default, accompanied by the unit of the metric,
  • show a median trend line for every chart, and
  • eventually improving the accessibility of the charts.
Conclusion

This was a wild little ride! I learned a lot along the way:

  • The unit meters per deciseconds is the weirdest thing I saw in a long time.
  • Writing SVG code by hand is actually a lot of fun.
  • I’m always fascinated with visualizing data and I feel I should dig deeper into this area.
  • I looove the thought process of starting with something simple and enhancing it step by step by bringing new technologies into the mix. It has definitiely a philosophical dimension to it.
  • I noticed that I’m somehow more motivated to go out for a run because I want to make sure that the cron job has something to do when it fires. Is that weird?

4 Likes

# Liked by Gunnar Bittersmann on Friday, April 22nd, 2016 at 7:33pm

# Liked by Ozan Çağlargil on Monday, May 30th, 2016 at 7:04pm

# Liked by Mark Everitt on Monday, May 30th, 2016 at 8:31pm

# Liked by Jan Skovgaard on Tuesday, May 31st, 2016 at 5:34am

Previously on this day

5 years ago I wrote 100 words 031

Day thirty one.

5 years ago I wrote Complete line-up for Responsive Day Out 3

Lyza is speaking! Squee!

14 years ago I wrote Transcribing podcasts

Making podcasts accessible.

15 years ago I wrote Fight club

There are a lot of ways of tracking up-to-the-minute memes on the Web right now. You can check out what’s hot on Technorati. You can find out what’s popular with Del.icio.us.

17 years ago I wrote Surfin' Safari

All software development should be this efficient and transparent:

18 years ago I wrote Herman Miller: Aeron Chairs

This room is starting to look like a little dot com start-up office.