Bounding Boxes for All US States

Sometimes you come across an API that requires bounding box coordinates to subset your query, but which doesn't offer an interactive map to actually create one and extract those min/max values. No fear, here are the extents of each US state and territory in NAD83 coordinates using the 2017 US Census 1:500,000 shapefile. https://gist.github.com/a8dx/2340f9527af64f8ef8439366de981168 To … Continue reading Bounding Boxes for All US States

A Better ZIP5-County Crosswalk

I use a healthcare expenditure dataset with observations geographically coded at the 5-digit zipcode level, but I'd also like to know which county an observation 'belongs' to. Maybe I want to cluster standard errors by county, or control for county-specific trends. You'd imagine this would be straightforward, but I haven't yet found a government crosswalk … Continue reading A Better ZIP5-County Crosswalk

Large Stata Datasets and False Errors about ‘Duplicates’

Variable storage types exercise more importance when working with larger datasets, and variables with more digits. I'm reminded of this because of an error message Stata threw while trying to perform a long reshape, claiming duplicate entries of the ID variable. That was obviously not the case, since the _n id was uniquely created, and … Continue reading Large Stata Datasets and False Errors about ‘Duplicates’

Identify nth-Degree Neighbors Using R’s Simple Features Package, Simply

You're more likely to complain about the neighbors upstairs who are making noise after midnight than those in an apartment two buildings away. Proximity matters and that's patently obvious, but oftentimes it takes a bit of work to identify who is close and who isn't. While raster data is packaged in a consistent gridded format for … Continue reading Identify nth-Degree Neighbors Using R’s Simple Features Package, Simply

Local Macros in Stata Using Regular Expressions

Regular expressions can dramatically make your scripting simpler, more automated, and enable you to embed systematically-important information in filenames, variables, dictionaries, and paths. With enough practice, xkcd reminds us that regexp can also make you a superhero. Stata provides a very nice table of their regular expressions and offers some helpful examples, but these seem … Continue reading Local Macros in Stata Using Regular Expressions

Converting .TXT/.GRD Climate Data Files to netCDF Format

Climate data is packaged and distributed in too many file formats.  Under ideal circumstances, you could easily convert data from formats you're not familiar with (and don't have scripts to handle), to those that you do. This is why analogous tools like Stat/Transfer for statistical databases often used by social scientists, are so helpful.  If a stranger … Continue reading Converting .TXT/.GRD Climate Data Files to netCDF Format

Recent Trends in Women’s Employment in Rural India

That female labor force participation (FLFP) is U-shaped in per capita income is one significant stylized fact at the intersection of development and labor economics. At low levels of development, subsistence requirements render women's work a necessity for household survival. At higher incomes, the nature of employment changes, with the growth of manufacturing jobs that tend … Continue reading Recent Trends in Women’s Employment in Rural India

Effortlessly Merging 1,000s of Raw Data Files with Stata

I frequently have to consolidate 100s or 1,000s of raw data files into Stata, that are potentially stored in numerous and potentially unknown folders and subfolders, and have developed a workflow that I think is useful.  This approach means I don't need to know file names or paths, and can instead assign search parameters that … Continue reading Effortlessly Merging 1,000s of Raw Data Files with Stata