Sunday, February 25, 2024

R Consortium Grants- TidyTuesday 20-0202024

TidyTuesday 20 Feb 2024

The Data

This week’s data is on grants/financial assistance provided by the R Consortium.

library(pacman)
p_load(readr, tidyverse)
# read in the dataset
# I had downloaded the dataset in my local machine
# reading in
isc_grants <- read_csv("isc_grants.csv")

#check for vars, their types and look at first 5
isc_grants |> 
  glimpse()
## Rows: 85
## Columns: 7
## $ year        <dbl> 2023, 2023, 2023, 2023, 2023, 2022, 2022, 2022, 2022, 2022…
## $ group       <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1…
## $ title       <chr> "The future of DBI (extension 1)", "Secure TLS Communicati…
## $ funded      <dbl> 10000, 10000, 12265, 3000, 15750, 8000, 8000, 22000, 6000,…
## $ proposed_by <chr> "Kirill Müller", "Charlie Gao", "Kristina Riemer", "Mark P…
## $ summary     <chr> "This proposal mostly focuses on the maintenance and suppo…
## $ website     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
isc_grants |> 
  head(5)
## # A tibble: 5 × 7
##    year group title                           funded proposed_by summary website
##   <dbl> <dbl> <chr>                            <dbl> <chr>       <chr>   <chr>  
## 1  2023     1 The future of DBI (extension 1)  10000 Kirill Mül… "This … <NA>   
## 2  2023     1 Secure TLS Communications for R  10000 Charlie Gao "The p… <NA>   
## 3  2023     1 volcalc: Calculate predicted v…  12265 Kristina R… "This … <NA>   
## 4  2023     1 autotest: Automated testing of…   3000 Mark Padgh… "The p… <NA>   
## 5  2023     1 api2r: An R Package for Auto-G…  15750 Jon Harmon  "This … <NA>

The data has 7 variables with self-explanatory names but funded is grant received in dollars while group is time of the year 1 = spring cycle; 2 = fall cycle

#Let's summarise the data year wise to see which year got the highest money
isc_grantsSum <- isc_grants |> 
  group_by(year) |> 
  summarise(Grant = sum(funded, na.rm = TRUE)) |> 
  dplyr::arrange(desc(Grant))
isc_grantsSum
## # A tibble: 8 × 2
##    year  Grant
##   <dbl>  <dbl>
## 1  2018 289972
## 2  2017 192700
## 3  2019 158582
## 4  2020 155788
## 5  2022 111000
## 6  2021 106740
## 7  2016 105600
## 8  2023  51015

The Plot

Now, let’s plot the data

isc_grantsSum |> 
  mutate(year = reorder(year, Grant)) |> # reorder the as per Grant 
  # Since the Grant will be a continuous variable, fill colour would be dark to light, will label dark with white and light with dark colour, so first separate them
  mutate(Grant1 = case_when(
    year %in% c(2018, 2017) ~ Grant,
    .default = NA
  ),
  Grant2 = case_when(
    year %in% c(2018, 2017) ~ NA,
    .default = Grant
  )) |> 
  # plot starts from here
  ggplot(aes(x = Grant,
             y = year)) +
  geom_col(aes(fill = year))+
  geom_text(aes(label = scales::dollar(Grant1)),# label it with dollar sign
            colour = "white",
            hjust = 1.2)+
  geom_text(aes(label = scales::dollar(Grant2)),# label it with dollar sign
            colour = "black",
            hjust = 1.2)+
  scale_fill_brewer(type = "seq",
                     palette = "Oranges", #Oranje
                     name = "") +
  #let's label them with title & caption
  labs(x = "", 
       y = "",
       title = "Year-Wise Total Grant Received through R Consortium",
       caption = "The data is from TidyTuesday GitHub page\nThis is how much grant disbursed by R Consortium")+
  theme_bw() +
  # manual tuning of panel outlook
  theme(panel.border = element_blank(),
        panel.grid = element_blank(),
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.caption = element_text(colour = 'grey40',
                                    hjust = 0),
        plot.title = element_text(face = 'bold'),
        # want the font family as 'serif' - looks better than defult
        text = element_text(family = 'serif'),
        axis.text.y = element_text(size = 14))+
  # let's put lines between the years
  geom_hline(yintercept = seq(0.5, 8.5, 1),
             colour = 'grey50',
             linewidth = 0.2)+
  # to give it a clean look put a vline at zero.
  geom_vline(xintercept = 0,
             colour = "grey50",
             linewidth = 0.5)+
  guides(fill = FALSE) # don't want lengeds

Oh dear! 2023 got least funding

isc_grants |> 
  group_by(year, group) |>
  ##group variable is Whether the grant was awarded in the spring cycle (1) or the fall cycle (2)
  summarise(grantSum = sum(funded, 
                           na.rm = TRUE))
## # A tibble: 15 × 3
## # Groups:   year [8]
##     year group grantSum
##    <dbl> <dbl>    <dbl>
##  1  2016     1    86500
##  2  2016     2    19100
##  3  2017     1   157700
##  4  2017     2    35000
##  5  2018     1   121672
##  6  2018     2   168300
##  7  2019     1    54083
##  8  2019     2   104499
##  9  2020     1    99300
## 10  2020     2    56488
## 11  2021     1    62700
## 12  2021     2    44040
## 13  2022     1    42000
## 14  2022     2    69000
## 15  2023     1    51015

Ok! the group is only 1 for 2023 that means fall cycle is not included in the data but still it is not reaching half-way of the fund granted in 2018.

Here I spent one hour with the dataset.