Climate data is packaged and distributed in too many file formats.  Under ideal circumstances, you could easily convert data from formats you’re not familiar with (and don’t have scripts to handle), to those that you do. This is why analogous tools like Stat/Transfer for statistical databases often used by social scientists, are so helpful.  If a stranger on the street gives you SPSS data, you can on-the-fly convert it to something which is Stata-readable. Albeit, the value of software like Stat/Transfer diminishes as more stat packages have comprehensive in-built conversion tools, like R’s readstata13 and read_stata in pandas. Getting similar functionality with climate data requires a bit more lift.

This post is a tutorial and link to scripts that can convert the .TXT/.GRD file combination format used by the India Meteorological Department (IMD) into formats that are more usable for people working with climate data. And if you’re not trained in the sciences, but rather as an economist, figuring out how to use this data often comes with its own frustrations. I hope this helps.

I rely on the Climate Data Operators (CDO) to do the heavy lifting in my climate data workflow and work almost exclusively with the netCDF4 file format.  Since IMD provides their climate data in .TXT/.GRD format, extra work is required to turn those files into more familiar formats. Here we’ll convert it to netCDF, which after processing can then be exported to a spreadsheet.

Data Setup and Processing Steps

IMD provides you daily data for each weather variable (TMAX, TMIN, TAVG) in year-specific .TXT and .GRD files.  The .TXT file includes the lat/lon grid boundaries, timestamp, and daily values for each pixel.  The .GRD file somehow converts this long .TXT into an array stack.

Step 1 – We first generate year-specific .CTL files which contain the data header, since IMD provides us only a single .CTL.  I found doing this in R to be relatively straightforward, since a .CTL can be read as a standard text file.  Each of the generated .CTL files designate the source data and the spatial/temporal grid resolution.  Here I use a modulo operator to differentiate leap years and accordingly modify the number of time-steps. If leap year date data is not included, then this wouldn’t be a concern. The following R code is an example of how those .CTLs can be auto-generated for a specified year range.

Step 2 – CDO enables easy conversion between GrADS and netCDF which we’ll exploit. This could be done at the command line, for reproducibility we’ll use the Python wrappers which can be installed via pip. The following script demonstrates how those binaries can be read in, converted to netCDF, and concatenated by temperature variable.

Step 3 – Now the concatenated output file has been saved as a netCDF, which means you can perform all the standard CDO operators on it. You can also simply read your files in R, and run functions like spatial averaging with a minimum of code, as in this example.

As always, happy to field any questions you might have on the code and workflow!