• Book Review: Speed & Scale

    John Doerr and Ryan Panchadsaram. Speed & Scale: A Global Action Plan for Solving Our Climate Crisis Now (Penguin Business 2021)


    Read on →

  • Bounding Boxes for all US Counties

    image A post from several years back contained the bounding box coordinates of all US states and has been one of the more viewed pages on this site. Unfortunately, if your area of interest is below the state-level, these bounding boxes may only get you part of the way to your destination. Why waste time expanding a geographic search to areas beyond your narrow AOI?

    Read on →

  • Tip for Installing Orfeo Toolbox Plugin for QGIS on MacOS

    image

    Read on →

  • Cleaning Berkeley Earth's BEST Gridded Daily Temperature Data

    You may have recently seen air quality maps produced by the Berkeley Earth group, especially in the wake of the horrific Camp Fire whose death toll now exceeds 80. For example, here’s their real-time visualization of PM2.5 concentrations.

    Read on →

  • Bounding Boxes for All US States

    boundingboxes

    Read on →

  • A Better ZIP5-County Crosswalk

    I use a healthcare expenditure dataset with observations geographically coded at the 5-digit zipcode level, but I’d also like to know which county an observation ‘belongs’ to. Maybe I want to cluster standard errors by county, or control for county-specific trends. You’d imagine this would be straightforward, but I haven’t yet found a government crosswalk that is comprehensive in all the ZIP5s that appear in my data. What follows is the best solution I’m aware of, to ensure that I match as many ZIP5s as possible. While this only increases the number of ZIP5-county matches by about 110 over what HUD offers, it’s an improvement of more than 6,000 over the Census crosswalk.

    Read on →

  • Large Stata Datasets and False Errors about 'Duplicates'

    Variable storage types exercise more importance when working with larger datasets, and variables with more digits. I’m reminded of this because of an error message Stata threw while trying to perform a long reshape, claiming duplicate entries of the ID variable. That was obviously not the case, since the _n id was uniquely created, and the value of each visibly corresponded to its row index.

    Read on →

  • Identify nth-Degree Neighbors Using R's Simple Features Package, Simply

    You’re more likely to complain about the neighbors upstairs who are making noise after midnight than those in an apartment two buildings away. Proximity matters and that’s patently obvious, but oftentimes it takes a bit of work to identify who is close and who isn’t. While raster data is packaged in a consistent gridded format for which inverse distance weighting schemes readily can be applied, shapefiles with oddly-shaped features, like these gerrymandered districts, may present more of a challenge. Fortunately the simple features library in R can save the day and with little sweat on your brow.

    Read on →

  • Local Macros in Stata Using Regular Expressions

    Regular expressions can dramatically make your scripting simpler, more automated, and enable you to embed systematically-important information in filenames, variables, dictionaries, and paths. With enough practice, xkcd reminds us that regexp can also make you a superhero.

    Read on →

  • Converting .TXT/.GRD Climate Data Files to netCDF Format

    Climate data is packaged and distributed in too many file formats. Under ideal circumstances, you could easily convert data from formats you’re not familiar with (and don’t have scripts to handle), to those that you do. This is why analogous tools like Stat/Transfer for statistical databases often used by social scientists, are so helpful. If a stranger on the street gives you SPSS data, you can on-the-fly convert it to something which is Stata-readable. Albeit, the value of software like Stat/Transfer diminishes as more stat packages have comprehensive in-built conversion tools, like R’s readstata13 and read stata in pandas. Getting similar functionality with climate data requires a bit more lift.

    Read on →