Book Review: Speed & Scale
John Doerr and Ryan Panchadsaram. Speed & Scale: A Global Action Plan for Solving Our Climate Crisis Now (Penguin Business 2021)
Bounding Boxes for all US Counties
A post from several years back contained the bounding box coordinates of all US states and has been one of the more viewed pages on this site. Unfortunately, if your area of interest is below the state-level, these bounding boxes may only get you part of the way to your destination. Why waste time expanding a geographic search to areas beyond your narrow AOI?
Tip for Installing Orfeo Toolbox Plugin for QGIS on MacOS
Cleaning Berkeley Earth's BEST Gridded Daily Temperature Data
You may have recently seen air quality maps produced by the Berkeley Earth group, especially in the wake of the horrific Camp Fire whose death toll now exceeds 80. For example, here’s their real-time visualization of PM2.5 concentrations.
Bounding Boxes for All US States
A Better ZIP5-County Crosswalk
I use a healthcare expenditure dataset with observations geographically coded at the 5-digit zipcode level, but I’d also like to know which county an observation ‘belongs’ to. Maybe I want to cluster standard errors by county, or control for county-specific trends. You’d imagine this would be straightforward, but I haven’t yet found a government crosswalk that is comprehensive in all the ZIP5s that appear in my data. What follows is the best solution I’m aware of, to ensure that I match as many ZIP5s as possible. While this only increases the number of ZIP5-county matches by about 110 over what HUD offers, it’s an improvement of more than 6,000 over the Census crosswalk.
Large Stata Datasets and False Errors about 'Duplicates'
Variable storage types exercise more importance when working with larger datasets, and variables with more digits. I’m reminded of this because of an error message Stata threw while trying to perform a long
reshape, claiming duplicate entries of the ID variable. That was obviously not the case, since the
_nid was uniquely created, and the value of each visibly corresponded to its row index.
Identify nth-Degree Neighbors Using R's Simple Features Package, Simply
You’re more likely to complain about the neighbors upstairs who are making noise after midnight than those in an apartment two buildings away. Proximity matters and that’s patently obvious, but oftentimes it takes a bit of work to identify who is close and who isn’t. While raster data is packaged in a consistent gridded format for which inverse distance weighting schemes readily can be applied, shapefiles with oddly-shaped features, like these gerrymandered districts, may present more of a challenge. Fortunately the simple features library in R can save the day and with little sweat on your brow.
Local Macros in Stata Using Regular Expressions
Regular expressions can dramatically make your scripting simpler, more automated, and enable you to embed systematically-important information in filenames, variables, dictionaries, and paths. With enough practice, xkcd reminds us that regexp can also make you a superhero.
Converting .TXT/.GRD Climate Data Files to netCDF Format
Climate data is packaged and distributed in
toomany file formats. Under ideal circumstances, you could easily convert data from formats you’re not familiar with (and don’t have scripts to handle), to those that you do. This is why analogous tools like Stat/Transfer for statistical databases often used by social scientists, are so helpful. If a stranger on the street gives you SPSS data, you can on-the-fly convert it to something which is Stata-readable. Albeit, the value of software like Stat/Transfer diminishes as more stat packages have comprehensive in-built conversion tools, like R’s
read statain pandas. Getting similar functionality with climate data requires a bit more lift.