Personal analytics: 7 years of personal email activity

Last month, I posted some analytics on my behavior over the course of 12 years of work emails that I’ve sent. At the time, I had data for just my work email and just for sent messages. I wanted to look at my personal email activity as well, but there were two things preventing me:

  1. I didn’t have the time.
  2. I wanted to do it all in Mathematica, which I am slowly teaching myself from scratch.

Well, I found a little bit of time, and I only required a little time because Paul-Jean Letourneau, a lead developer at WolframAlpha, wrote a post on how to use Mathematica to do the kind of email analytics that Stephen Wolfram posted about last month. This was great because the post contained all of the code needed to do this kind of analysis and all I need is some good code examples to quickly learn a new system. The code provided worked almost without change on my own MacBook instance of Mathematica. I had to make a few minor changes to get the mailboxes I wanted. And I had to add the following line to increase the heap space for Java:

ReinstallJava[CommandLine -> "java", JVMArguments -> "-Xmx3024m"]

Without that line, the code executed fine for sent mail, but ultimately resulted in an out-of-memory error for incoming mail.

The resulting data is a pretty good look at my personal email use over the last 7 years. We’ll start with email that I’ve sent. This goes back only to 2009 because that is when I switched from Panix to Gmail. The code looked at my Sent Mail folder in Gmail and looked for email sent from my Gmail address. I had years of imported mail from Panix, but the sent messages are from a different email address and I decided not to move things around or change the code to include these. It’s still 3 years of sent mail data which is good enough for some analysis. Here is the diurnal plot of my sent email, a total of 4,382 messages:

dirunal sent mail.png

It’s a pretty sparse chart, not nearly as dense as my work email, but there are a few things of note. I generally don’t start sending email until just before 9am. And the volume of my email sending increased at the beginning of 2011. Also of note is the gap in late 2011; this is when I was on vacation and wasn’t doing very much email.

The picture for my incoming email is very different:

dirunal incoming mail.png

For a very long time, I used my work email for everything, but in late 2004, when I was experiencing a bit of burnout, I decided to draw a clear line between my work life and personal life. I got a personal email account and started using my work email only for work. From late 2004 to early 2009, I used Panix for my personal email and then switched to Gmail.

One thing that’s clear: I get a lot of personal email and it has only increased over time. And it seems to come in constantly. A lot of this email is things like automatic notifications of comments on the blog; or twitter messages; some of it is subscription-based. Those are pretty easy to see: those horizontal lines that run across sections of the plot early in the morning. But a fair amount is legitimate mail that I must deal with in one way or another.

I really like these diurnal plots, but another way to look at this is volume over time. Here is a plot of my incoming and outgoing email volume over time:

incoming outgoing avg by month.png

This data is averaged monthly. The darker plot is my sent email. Everything else is incoming mail. Clearly, the volume of my incoming mail is increasing. The amount of mail that I sent is relatively stable and that is on purpose. I used to sent lots and lots of email but I’ve tried to get that under control. It’s better for everyone.

Here is the same data for email I’ve sent, with a daily volume above the monthly averages:

time series sent.png
time series sent month.png

And here is a similar plot for my incoming email:

time series incoming.png
time series incoming month.png

Another interesting way to look at this data is to see how much email I send and receive throughout an average day, morning to night. Here my typical behavior for sending email throughout the day:

daily dist sent.png

I clearly do the bulk of my email sending in the morning, and of course, there is a spike at lunchtime. And I’m pretty good about not sending email in the middle of the night.

Here is my incoming email throughout the day:

daily dist incoming.png

I receive a fair volume of email pretty steadily throughout the day. That spike at about 5am each morning is a daily email from Google Calendar with my agenda for the day. And that spike at 7pm is likely an email I get telling me that my cloud backups for the day have completed successfully.

I have some other plots that came from the Mathematica code, but I imagine the charts I’ve already show will test the patience of most readers and so I won’t bore you with the rest. I am still learning how to perform this analysis with Mathematica and am adding to the scripts that I have to pull in data from my FitBit pedometer, from my key logger and some other interesting sources. You can expect so see some of that data in the future once I have the scripts working properly.

2 thoughts on “Personal analytics: 7 years of personal email activity

  1. Interesting data with a clear graphical presentation, Jamie. Do you have any hypotheses about the effect of e-mail on your productivity? I’ve often wondered whether I was more productive in pre-email days when I dealt with mail only once per day, and I knew that any query would require a week or more for a response. Asimov commented that he always liked to deal with all the day’s mail at once when it arrived. I wonder how he would deal with email. I vow that someday I will do an experiment and treat email as if it were regular mail, checking and answering only once a day and not at all on Sunday, but I suspect that email-on-demand would be a hard addiction to break.

    I’ve wondered whether I would feel better or be more productive if I limited e-mail

    1. Michael, when Asimov lived in Newton, MA in the 1960s, he went one step further. He discovered that he could go to the local post office as soon as they opened and pick up the days mail first thing. Then he could work through it and be done with it early!

      Part of the reason I’m interested in personal analytics is to discover how I work and how I can be more productive without overdoing things. Dealing with email only once per day would be an interesting experiment, but it wouldn’t work well for me since many people I work with are in an office across the country and email is our primary means of communication. That said, my email is probably still too interrupt-driven. I get notifications whenever a new email comes in and I feel like I need to look at it right away. A better short term experiment (for me, anyway) would be to look at those times during the day when I seem to be getting the most email and instead of reading it as it comes in, breaking it in to chucks. Say, check email once an hour and not in between. Or every two hours. Or four. Whatever I could reasonably get away with. This would certainly give me more “flow” time between emails and fewer interruptions. Whether or not it would work, I don’t know.

      That would change the charts for my outgoing email, but I’m not sure I have much control of the incoming email.

Comments are closed.