Global hydropower scenarios revisited
/ Sean Turner
I published a paper in 2017 that analyzes monthly resolution projections of global hydropower production under climate change. Recently some folks have asked me for this data. Unfortunately the formatting is horrible, so I’m providing this tutorial to absolve past sins.
For this tutorial I’m going to download these data and aggregate to country-scale, monthly time series of projected generation (shout out to Febin Kachirayil for this request).
First, we’re gonna need the IDs and associated countries for each dam of the Global Reservoir and Dams Database. You can download this from here.
We wanna grab the just the data table contained in the shape file GRanD_dams_v1_1.shp
# load libraries
library(dplyr) # for data wrangling
library(rgdal) # for reading shape file data
# get GRanD data table
readOGR(paste0(your_GRanD_dir,
"/GRanD_dams_v1_1.shp")) %>%
.@data %>%
# ^^ extract data table from shape object
as_tibble() %>%
select(GRAND_ID, DAM_NAME, COUNTRY) ->
grand_data
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/turn652/Documents/IM3/WM development/GRanD/GRanD download/GRanD_dams_v1_1.shp", layer: "GRanD_dams_v1_1"
## with 6862 features
## It has 56 fields
## Integer64 fields read as strings: DIS_AVG_LS
grand_data
## # A tibble: 6,862 x 3
## GRAND_ID DAM_NAME COUNTRY
## <int> <chr> <chr>
## 1 1 Terror Lake United States
## 2 2 Mayo Canada
## 3 3 Blue Lake United States
## 4 4 Green Lake United States
## 5 5 Long Lake Dam United States
## 6 6 W.A.C. Bennett Canada
## 7 7 Peace Canyon Canada
## 8 8 Swan Lake United States
## 9 9 Anyox Canada
## 10 10 Upper Silvis United States
## # … with 6,852 more rows
Now let’s connect these IDs to the simulation data. These are currently stored in a dedicated GitHub repo here.
data_dir <- "https://raw.githubusercontent.com/swd-turner/HydroSimGlobal/master/"
The first file we need is GRanD_ID.csv
. This is gonna help us to index through lists of data to extract data for relevant dams.
library(vroom) # for data reading
index <- vroom(paste0(data_dir, "GRanD_ID.csv"), delim = ",")
# join up the index to the grand data
index %>% left_join(grand_data, by = c("GRanD_ID" = "GRAND_ID")) ->
index_grand
Now we wanna get the actual simulation data. These are in R list format and are pickled in .RData files (I warned you this wasn’t pretty). There are 1593 elements to these lists, each representing a dam. Each of these elements is itself a list, containing various simulation data for the relevant dam (generation, water storage, reservoir levels, etc.). Compounding matters further, the data are stored as ts
(time series) objects—a format that is rarely used today.
We can use base functions url
and load
to get this data straight into our environment:
# url for GCM forced simulations with CNRM under A2 emissions scenario
data_url <- url(paste0(data_dir, "/GCM-forced%20simulations/CNRM/cnrm_A2"))
load(data_url)
Now a list object named cnrm_A2
should appear in your environment. If we look at the first element we see a list of six time series 2001 to 2100:
str(cnrm_A2[[1]])
## List of 6
## $ Storage_Mm^3 : Time-Series [1:1200] from 2001 to 2101: 74300 73271 72995 71227 67589 ...
## $ Release_Mm^3 : Time-Series [1:1200] from 2001 to 2101: 2542 1525 2796 4575 1271 ...
## $ Evaporation_Mm^3: Time-Series [1:1200] from 2001 to 2101: 2.10e-04 3.97e-01 4.71 1.95e+01 1.28e+02 ...
## $ WaterLevel_m : Time-Series [1:1200] from 2001 to 2101: 184 183 183 182 180 ...
## $ Spill_Mm^3 : Time-Series [1:1200] from 2001 to 2101: 0 0 0 0 0 0 0 0 0 0 ...
## $ Power_MW : Time-Series [1:1200] from 2001 to 2101: 1363 816 1490 2413 666 ...
From index_grand
, created earlier, we see that this first element (index = 1) of cnrm_A2
corresponds to GRanD ID 6 (or the dam W.A.C. Bennett, Canada). So now we’re gonna cycle through all the countries of that table, pick out all the relevant indices, extract the simulated generation (Power_MW
) for those indices, and finally, aggregate to country-level generation (MWh).
This will require purrr::map
.
library(purrr)
index_grand %>% split(.$COUNTRY) %>%
map_dfr(function(x){
x %>% pull(COUNTRY) %>% first() -> country
x %>% pull(Index) %>%
map_dfr(
function(dam){
cnrm_A2[[dam]] %>% .$Power_MW %>%
tibble(gen = ., time = time(.)) %>%
mutate(gen = as.double(gen) * 732) %>%
# ^^ 732 is approx hrs per month to convert MW to MWh
mutate(year = as.integer(substr(time, 1, 4)),
month = factor(rep(month.abb, 100), levels = month.abb),
index = dam) %>%
select(index, year, month, gen_MWh = gen)
}
) %>% mutate(country = !!country) %>%
group_by(country, year, month) %>%
summarise(gen_MWh = sum(gen_MWh)) %>% ungroup()
})
## # A tibble: 128,400 x 4
## country year month gen_MWh
## <chr> <int> <fct> <dbl>
## 1 Afghanistan 2001 Jan 67015.
## 2 Afghanistan 2001 Feb 81655.
## 3 Afghanistan 2001 Mar 81655.
## 4 Afghanistan 2001 Apr 81655.
## 5 Afghanistan 2001 May 81655.
## 6 Afghanistan 2001 Jun 60081.
## 7 Afghanistan 2001 Jul 52977.
## 8 Afghanistan 2001 Aug 29872.
## 9 Afghanistan 2001 Sep 17176.
## 10 Afghanistan 2001 Oct 15815.
## # … with 128,390 more rows
Lesson: provide data as netcdf or plain old csv in future.