A Better ZIP5-County Crosswalk

I use a healthcare expenditure dataset with observations geographically coded at the 5-digit zipcode level, but I'd also like to know which county an observation 'belongs' to. Maybe I want to cluster standard errors by county, or control for county-specific trends. You'd imagine this would be straightforward, but I haven't yet found a government crosswalk … Continue reading A Better ZIP5-County Crosswalk

Large Stata Datasets and False Errors about ‘Duplicates’

Variable storage types exercise more importance when working with larger datasets, and variables with more digits. I'm reminded of this because of an error message Stata threw while trying to perform a long reshape, claiming duplicate entries of the ID variable. That was obviously not the case, since the _n id was uniquely created, and … Continue reading Large Stata Datasets and False Errors about ‘Duplicates’

Identify nth-Degree Neighbors Using R’s Simple Features Package, Simply

You're more likely to complain about the neighbors upstairs who are making noise after midnight than those in an apartment two buildings away. Proximity matters and that's patently obvious, but oftentimes it takes a bit of work to identify who is close and who isn't. While raster data is packaged in a consistent gridded format for … Continue reading Identify nth-Degree Neighbors Using R’s Simple Features Package, Simply

Converting .TXT/.GRD Climate Data Files to netCDF Format

Climate data is packaged and distributed in too many file formats.  Under ideal circumstances, you could easily convert data from formats you're not familiar with (and don't have scripts to handle), to those that you do. This is why analogous tools like Stat/Transfer for statistical databases often used by social scientists, are so helpful.  If a stranger … Continue reading Converting .TXT/.GRD Climate Data Files to netCDF Format

Effortlessly Merging 1,000s of Raw Data Files with Stata

I frequently have to consolidate 100s or 1,000s of raw data files into Stata, that are potentially stored in numerous and potentially unknown folders and subfolders, and have developed a workflow that I think is useful.  This approach means I don't need to know file names or paths, and can instead assign search parameters that … Continue reading Effortlessly Merging 1,000s of Raw Data Files with Stata

Towards Closing Gender Data Gaps

In May, the Bill & Melinda Gates Foundation announced a three-year, $80 million investment towards closing the gender data gap, but I only today came across this great video on the same initiative.  A portion of the funds will be directed towards improved data collection, particularly of the time use patterns of women and girls and on household … Continue reading Towards Closing Gender Data Gaps

Stata-Latex esttab Regression Table Output Streamlining

Researchers spend an excessive amount of time getting up to speed with a field's chosen tools and methods, excessive because there is often a consensus on best practice and yet those best practices are not made common knowledge.  I think the CS and statistics communities have this right in their pushing for open data, transparency, and reproducibility … Continue reading Stata-Latex esttab Regression Table Output Streamlining

Stata: Reghdfe and factor interactions

If you don't know about the reghdfe function in Stata, you are likely missing out, especially if you run 'high dimensional fixed effects' models -- i.e., your model includes 3+ dimensions of FE, perhaps 2 in time and 1 in space-time.  I've been encountering a situation which raises this unhelpful error message: (null assertion) Empty … Continue reading Stata: Reghdfe and factor interactions