I read Edward Tufte’s book The Visual Display of Quantitative Information, and afterwards I was burning to visualize some data. I settled on charting the time of day that I make phone calls at.


the graphic

call graphicThinking about this data, it occurred to me that a simple bar chart or histogram would be inappropriate, because I’d have to choose some arbitrary point at which to start drawing the bars, say midnight, with the result that 11pm would end up on the other side of the chart. That would be wrong, because in reality one comes right after the other. To solve that problem, I decided to plot my calls in a circle. The result is that the angular distance between two data measures on the chart corresponds directly with their temporal distance (fancy term isn’t it?).

Similarly, the “small multiple” (Tufte’s term) plots for each weekday are arranged in a flattened clockwise loop to keep temporally adjacent days near to each other.

Initially, I envisioned this plot as a circular histogram, and that’s how I coded it. Then I had the idea of making those small multiple plots for the individual days, and I realized that I’d need to use a consistent scale for comparisons between those plots to be meaningful. So it turned into a circular bar chart.

I liked the way that looked, but one of Tufte’s points kept nagging me: he says that the amount of visual real-estate you give to the representations of your data points should be directly proportional to the magnitude of those data points (I’m paraphrasing, accurately I hope). Looking at my circular bar chart, I realized that I was violating that principle, and that he was right about it. I varied the radius of each circular “bar” in proportion to the amount of calls during the corresponding hour, but the area of a pie slice (segment of a circle) is proportional to the square of the radius of the pie. That means that the area of the circular bar for an hour with 5 calls would be less than half the area for the circular bar representing 10 calls. So I scrapped the bars in favor of lines; they’re one-dimensional, and so are the data they represent.

The lines for daytime hours are colored in shades of yellow, and nighttime ones in shades of blue. Tufte says that most people can distinguish blue from other colors, even if they’re colorblind, and as a bonus, there’s some semantic relationship between those colors and the hours of the day they correspond with (skycolor). Unfortunately the yellow is hard to see against a white background. Expletive deleted.

the making of

To get at my phone’s call data, I mangled mrflip’s perl code for extracting the SQLite database from the iphone’s backup files (see this thread). Then I put together some python scripts to digest the call data into a form that’s conducive to charting and send it down the pipeline to a buggy little piece of Processing code that draws the actual graphics. The bugs have to do with the calculation of the low, median, and high values that are labeled on the charts, but I touched up the graphic in Photoshop to mask the issues.

That cover-up was a profound philosophical moment for me. I had the option of tightening my code to handle all cases, pass all unit tests (hah! as if I wrote those), and be prepared for use by total strangers. But as the gears in my mind spun up to design this bulletproof algorithm, the monkeywrench of cool rationality fell in. Taking that code from functional with bugs to bugless would have taken a disproportionate amount of time. I’m ok with mostly bugless, because this code is throwaway; it was just a tool I made as one step in the process of generating the chart.

This is what I like most about being able to code: if I don’t have the tools I need to make something, I can make the tools.