Using the US Census API(s)
10 November ’16
The other day I was building a model and wanted to layer in some ZIP-level census data. Some quick Googling led me to this .gov site where you can search a database of stock reports and data cuts. However, I wasn't able to find what I was looking for. Not sure if I was searching ineffectively (the interface is a little clunky) or my somewhat-specific request didn't align with any of the stock reports.
In either case, I stumbled upon a post from the data folks at Splitwise (if you're not already a user, I highly recommend) which cued me into the US Census API(s). An API! It is not the most elegant or well-documented but is still very useful once you get the hang of it. Each data source (e.g. 2010 Dicentennial Census) has its own API endpoint, and each endpoint has a list of geographies that you can pull data at and a list of variables that can be pulled.
A few useful notes:
- You can apply for an API key here
- Here's a link to all the APIs available and their respective documentation
- In some cases, you can't pull data at the level you want in a single API call. For instance, for the 2010 Dicentennial Census data, you must specify a state when requesting ZIP-level data. So you need to download a list of states, then loop through the states and download ZIP-level information for each one (adding
&in=state:XXinto each call). Example here. For other APIs like the American Community Survey data, this isn't necessary.
- You may need to assemble ratio metrics by downloading the numerator and the denominator as separate variables. For instance, to get % of households in each ZIP that are renters using the American Community Survey API, I would download the numerator (B07013_003E) and the denominator (B07013_001E) and assemble the metric on the back end. Example here. This is actually kind of nice because it lets me decide what sample size I want to require to populate the metric.
I posted a few code snippets which pull data by ZIP code on my GH. Cheers!