Tagged / analytics / open government / dap /
by Eric Mill, Gabriel Ramírez, Gray Brooks, Leah Bannon, and Shawn Allen
The U.S. federal government now has a public dashboard and dataset for its web traffic, at analytics.usa.gov.
This data comes from a unified Google Analytics profile that is managed by the Digital Analytics Program, which (like 18F) is a team inside of the General Services Administration, an independent federal agency.
18F worked with the Digital Analytics Program, the U.S. Digital Service, and the White House to build and host the dashboard and its public dataset.
You can read more on the White House blog about the project, and some insights from the data.
In this post, we'll explain how the dashboard works, the engineering choices we made, and the open source work we produced along the way.
A few important notes:
The analytics.usa.gov dashboard is a static website, stored in Amazon S3 and served via Amazon CloudFront. The dashboard loads empty, uses JavaScript to download JSON data, and renders it client-side into tables and charts.
The real-time data is cached from Google every minute, and re-downloaded every 15 seconds. The rest of the data is cached daily, and only downloaded on page load.
So the big number of people online:
...is made with this HTML:
<section id="realtime"
data-block="realtime"
data-source="https://analytics.usa.gov/data/live/realtime.json"
data-refresh="15">
<h2 id="current_visitors" class="data">...</h2>
<div class="chart_subtitle">people on government websites now</div>
</section>
And then we use a whole bunch of D3 to download and render the data.
Each section of the dashboard requires downloading a separate piece of data to populate it. This does mean the dashboard may take some time to load fully over slow connections, but it keeps our code very simple and the relationship between data and display very clear.
To manage the data reporting process, we made an open source tool called analytics-reporter
.
It's a lightweight command line tool, written in Node, that downloads reports from Google Analytics, and transforms the report data into more friendly, provider-agnostic JSON.
You can install it from npm:
npm install -g analytics-reporter
After following the setup instructions to authorize the tool with Google, the tool can produce JSON reports of any report defined in reports.json
.
A report description looks like this:
{
"name": "devices",
"frequency": "daily",
"query": {
"dimensions": ["ga:date" ,"ga:deviceCategory"],
"metrics": ["ga:sessions"],
"start-date": "90daysAgo",
"end-date": "yesterday",
"sort": "ga:date"
}
}
And if you ask the included analytics
command to run that report by name:
analytics --only devices
Then it will print out something like this:
{
"name": "devices",
"data": [
{
"date": "2014-10-14",
"device": "desktop",
"visits": "11495462"
},
{
"date": "2014-10-14",
"device": "mobile",
"visits": "2499586"
},
// ...
],
"totals": {
"devices": {
"mobile": 213920363,
"desktop": 755511646,
"tablet": 81874189
},
"start_date": "2014-10-14",
"end_date": "2015-01-11"
}
}
The tool comes with a built in --publish
command, so that if you define some Amazon S3 details, it can publish the data to S3 directly.
Running this command:
analytics --only --devices --publish
...runs the report and uploads the data directly to:
Real-time data is downloaded from the Google Analytics Real Time Reporting API, and daily data is downloaded from the Google Analytics Core Reporting API.
We have a single Ubuntu server running in Amazon EC2 that uses a crontab to run commands like this at appropriate intervals to keep our data fresh.
There's some pretty clear room for improvement here — the tool doesn't do dynamic queries, reports are hardcoded into version control, and the repo includes an 18F-specific crontab. But it's very simple to use, and a command line interface with environment variables for configuration gives it the flexibility to be deployed in a wide variety of environments.
Our all-static approach has some clear limitations: there's a delay to the live data, and we can't answer dynamic queries. We provide a fixed set of data, and we only provide a snapshot in time that we constantly overwrite.
We went this route because it lets us handle potentially heavy traffic to live data without having to scale a dynamic application server. It also means that we can stay easily within Google Analytics' daily API request limits, because our API requests are only a function of time, not traffic.
All static files are stored in Amazon S3 and served by Amazon CloudFront, so we can lean on CloudFront to absorb all unpredictable load. Our server that runs the cronjobs is not affected by website visitors, and has no appreciable load.
From a maintenance standpoint, this is a dream. And we can always replace this later with a dynamic server if it becomes necessary, by which time we'll have a clearer understanding of what kind of traffic the site can expect and what features people want.
We went to a local civic hacking meetup and conducted a quick usability testing workshop. In line with PRA guidelines, we interviewed 9 members of the public and a handful of federal government employees. Any government project can do this, and the feedback was very helpful.
We asked our testers to find specific information we wanted to convey and solicited general feedback. Some examples of changes we made based on their feedback include:
We certainly haven't resolved all the usability issues, so please share your feedback.
Open source: We're an open source team, and built the dashboard in the open from day 1.
All of our work is released under a CC0 public domain dedication.
Open data: All the data we use for the dashboard is available for direct download below the dashboard. Right now, it's just live snapshots, and there's no formal documentation.
Your ideas and bug reports will be very helpful in figuring out what to do next.
Secure connections: 18F uses HTTPS for everything we do, including analytics dashboards.
In the meantime, we hope this dashboard is useful for citizens and for the federal government, and we hope to see you on GitHub.