New R tool winfapReader reviewed

Apr 09, 2020 / Sean Turner

Yesterday on Twitter I came across a new R package from Ilaria Prosdocimi. The package makes it super-easy to obtain annual maxima and peak-over-threshold data for UK rivers. Here’s a quick review and demo of the package.

You can install easily using devtools:

devtools::install_github("ilapros/winfapReader")

For this analysis I’m gonna load winfapReader along with a few other libraries.

library(winfapReader)
library(dplyr)
library(purrr)
library(lubridate) # for dates
library(ggplot2)   # for plots

I couldn’t find any functionality in winfapReader for identifying stations of interest, so I browsed the National River Flow Archive data viewer to get a few stations in Renfrewshire, Scotland (where I grew up). The stations are:

84011 Gryfe at Craigend
84017 Black Cart Water at Milliken Park
84012 White Cart Water at Hawkhead

Next, a table with the station IDs and names:

tribble(
  ~Station, ~name,
  84011, "Gryfe at Craigend",
  84017, "Black Cart Water at Milliken Park",
  84012, "White Cart Water at Hawkhead"
) -> stations

The get_amax function in winfapReader returns the time series of annual maxima. The function takes station ids as the argument. You can pump multiple station ids into the function, but I’m gonna set up the vector of station ids and then use purrr::map_dfr to get results a single table (rather than a list), which will be easier to work with.

stations[["Station"]] %>% 
  map_dfr(get_amax) %>% 
  as_tibble() ->
  get_amax_output

get_amax_output

## # A tibble: 138 x 6
##    Station WaterYear Date        Flow Stage Rejected
##      <dbl>     <dbl> <date>     <dbl> <dbl> <lgl>   
##  1   84011      1963 1963-11-24  56.3  2.3  FALSE   
##  2   84011      1964 1964-12-11  66.4  2.42 FALSE   
##  3   84011      1965 1965-10-31  94.2  2.71 FALSE   
##  4   84011      1966 1966-10-05  65.5  2.41 FALSE   
##  5   84011      1967 1968-03-26  77.4  2.54 FALSE   
##  6   84011      1968 1968-10-09  56.3  2.3  FALSE   
##  7   84011      1969 1970-02-01  73.6  2.5  FALSE   
##  8   84011      1970 1970-11-03  64.7  2.4  FALSE   
##  9   84011      1971 1971-10-10  71.8  2.48 FALSE   
## 10   84011      1972 1972-12-11  64.7  2.4  FALSE   
## # … with 128 more rows

(Not sure what the flow units are—I assume cumecs)

Next I’m gonna identify the season of each annual max and neaten up the table:

get_amax_output %>% 
  left_join(stations) %>% 
  mutate(month = month(Date),
         season = case_when(
           month %in% c(12, 1, 2) ~ "winter",
           month %in% 3:5 ~ "spring",
           month %in% 6:8 ~ "summer",
           month %in% 9:11 ~ "autumn"
         )) %>% 
  select(name, WaterYear, Flow, season) -> amax_table

amax_table

## # A tibble: 138 x 4
##    name              WaterYear  Flow season
##    <chr>                 <dbl> <dbl> <chr> 
##  1 Gryfe at Craigend      1963  56.3 autumn
##  2 Gryfe at Craigend      1964  66.4 winter
##  3 Gryfe at Craigend      1965  94.2 autumn
##  4 Gryfe at Craigend      1966  65.5 autumn
##  5 Gryfe at Craigend      1967  77.4 spring
##  6 Gryfe at Craigend      1968  56.3 autumn
##  7 Gryfe at Craigend      1969  73.6 winter
##  8 Gryfe at Craigend      1970  64.7 autumn
##  9 Gryfe at Craigend      1971  71.8 autumn
## 10 Gryfe at Craigend      1972  64.7 winter
## # … with 128 more rows

And finally plot using gglot2:

amax_table %>% 
  ggplot(aes(WaterYear, Flow)) +
  geom_smooth(se = FALSE, col = "lightgrey",
              method = "loess",
              formula = "y ~ x") +
  geom_point(aes(col = season)) +
  facet_wrap(~name, scales = "free_y", ncol = 1) +
  expand_limits(y = 0) +
  theme_classic() +
  theme(strip.background = element_blank())+
  labs(title = "Annual Maximum Flow analysis",
       subtitle = "Three stations in Renfrewshire, Scotland",
       y = "Flow (cumecs)", x = "Water year") +
  geom_vline(xintercept = 1994, alpha = 0.5, linetype = 2)

There we have it. Looks like there was a big flood in the winter of 1994. I must say I don’t recall that particular winter (I was ten). Unsurprisingly the annual max almost always occurs in autumn or winter. No strong trend emerging from any of the stations, although it looks like the White Cart has been regulated somewhat in recent years (maybe new operations at a dam upstream).

Overall winfapReader is a fine wee package that makes analysis of UK floods easier than catching COVID-19 in New York City. A great addition would be an internal vector of the station ids, which would allow one to easily grab all of the metadata straight up (maybe there’s some way of doing this that I missed).

Shout out to Ilaria Prosdocimi and colleagues for providing this tool.