Identify nth-Degree Neighbors Using R's Simple Features Package, Simply
You’re more likely to complain about the neighbors upstairs who are making noise after midnight than those in an apartment two buildings away. Proximity matters and that’s patently obvious, but oftentimes it takes a bit of work to identify who is close and who isn’t. While raster data is packaged in a consistent gridded format for which inverse distance weighting schemes readily can be applied, shapefiles with oddly-shaped features, like these gerrymandered districts, may present more of a challenge. Fortunately the simple features library in R can save the day and with little sweat on your brow.
This brief tutorial walks through an example of identifying all first- (all my neighbors who share a common edge with me) and second-degree neighbors (all neighbors of my neighbors) of a given county in North Carolina. Since the map is bundled with the library install, no additional downloading is needed. Everything is generalizable and extending to higher-order neighbors is straightforward, but probably unnecessary. While some great R packages exist to perform this computation in a graph setting, I think sf is superior for spatial data.
A couple years ago my buddy and co-author Eyal Frank published a post outlining a similar ArcGIS workflow, but since then the sf library for R has come online and greatly expands R’s spatial analysis capabilities, to the extent that much of what was previously achievable (and easy) through Arc products can now be run natively in R – plus you don’t have to pray that your session will complete before the program crashes. We can achieve similar ends with less code, and without giving ESRI all of our money. You can instead give it to R Consortium which provides financial support to awesome R developers like Edzer Pebesma and Jeroen Ooms for new creations.
The complete code is here.
Let’s start with the North Carolina county-level map packaged with your sf install, as an example. Let’s select Chatham County, since it’s in the middle of the state and will therefore give us many second-degree neighbors without needing to cross state lines. For now we start with the entire state, and then will drill down to just Chatham.
The heavy lifting is done in the FIRSTdegreeNeighbors function which uses st_touches
to return index values corresponding to all adjacent entries. This is performed for every county in the state and with some finagling, we can produce a data frame of all positive matches.
Since we’re relying on sparse matrices to hold our results, you could supply a large feature collection (e.g., 50,000+) and not worry about crashing R because you’ve exceeded memory caps that dense matrices will blow through. [h/t Aaron]
The output is a symmetric matrix which can be plotted using the county ID values, where count is the number of first degree neighbor matches over some defined hexagonal size. You can control this size, which directly affects total counts per ‘bin.’ You might use this as a sanity check that only nearby features are getting picked up, but why stop there — we should plot the results and make sure it’s doing what we expect.
The code shows how easy it is to iterate the process through subsetting and merging to generate the set of second-degree neighbors. There’s some extra work to ensure that none of our first-degree neighbors appear in the list of second-degree neighbors, but that’s easy enough to do using set functions and subsets.
Now let’s plot. If we use the basic plot
command, we can create a relatively decent looking figure without much work, with Chatham County in purple, its first-degree neighbors in red and its second-degree neighbors in blue.
Alternatively we can convert our sf
objects to the Spatial class, and then layer them onto a Leaflet basemap. This map uses the CartoDB positron tiling, which provides a nice background, especially when visualizing land areas bordering large bodies of water.