TidyTuesday 20 Feb 2024
Vikram Ranga
2024-02-25
The Data
This week’s data is on grants/financial assistance provided by the R Consortium.
library(pacman)
p_load(readr, tidyverse)
# read in the dataset
# I had downloaded the dataset in my local machine
# reading in
isc_grants <- read_csv("isc_grants.csv")
#check for vars, their types and look at first 5
isc_grants |>
glimpse()
## Rows: 85
## Columns: 7
## $ year <dbl> 2023, 2023, 2023, 2023, 2023, 2022, 2022, 2022, 2022, 2022…
## $ group <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1…
## $ title <chr> "The future of DBI (extension 1)", "Secure TLS Communicati…
## $ funded <dbl> 10000, 10000, 12265, 3000, 15750, 8000, 8000, 22000, 6000,…
## $ proposed_by <chr> "Kirill Müller", "Charlie Gao", "Kristina Riemer", "Mark P…
## $ summary <chr> "This proposal mostly focuses on the maintenance and suppo…
## $ website <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
isc_grants |>
head(5)
## # A tibble: 5 × 7
## year group title funded proposed_by summary website
## <dbl> <dbl> <chr> <dbl> <chr> <chr> <chr>
## 1 2023 1 The future of DBI (extension 1) 10000 Kirill Mül… "This … <NA>
## 2 2023 1 Secure TLS Communications for R 10000 Charlie Gao "The p… <NA>
## 3 2023 1 volcalc: Calculate predicted v… 12265 Kristina R… "This … <NA>
## 4 2023 1 autotest: Automated testing of… 3000 Mark Padgh… "The p… <NA>
## 5 2023 1 api2r: An R Package for Auto-G… 15750 Jon Harmon "This … <NA>
The data has 7 variables with self-explanatory names but funded is grant received in dollars while group is time of the year 1 = spring cycle; 2 = fall cycle
#Let's summarise the data year wise to see which year got the highest money
isc_grantsSum <- isc_grants |>
group_by(year) |>
summarise(Grant = sum(funded, na.rm = TRUE)) |>
dplyr::arrange(desc(Grant))
isc_grantsSum
## # A tibble: 8 × 2
## year Grant
## <dbl> <dbl>
## 1 2018 289972
## 2 2017 192700
## 3 2019 158582
## 4 2020 155788
## 5 2022 111000
## 6 2021 106740
## 7 2016 105600
## 8 2023 51015
The Plot
Now, let’s plot the data
isc_grantsSum |>
mutate(year = reorder(year, Grant)) |> # reorder the as per Grant
# Since the Grant will be a continuous variable, fill colour would be dark to light, will label dark with white and light with dark colour, so first separate them
mutate(Grant1 = case_when(
year %in% c(2018, 2017) ~ Grant,
.default = NA
),
Grant2 = case_when(
year %in% c(2018, 2017) ~ NA,
.default = Grant
)) |>
# plot starts from here
ggplot(aes(x = Grant,
y = year)) +
geom_col(aes(fill = year))+
geom_text(aes(label = scales::dollar(Grant1)),# label it with dollar sign
colour = "white",
hjust = 1.2)+
geom_text(aes(label = scales::dollar(Grant2)),# label it with dollar sign
colour = "black",
hjust = 1.2)+
scale_fill_brewer(type = "seq",
palette = "Oranges", #Oranje
name = "") +
#let's label them with title & caption
labs(x = "",
y = "",
title = "Year-Wise Total Grant Received through R Consortium",
caption = "The data is from TidyTuesday GitHub page\nThis is how much grant disbursed by R Consortium")+
theme_bw() +
# manual tuning of panel outlook
theme(panel.border = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
plot.caption = element_text(colour = 'grey40',
hjust = 0),
plot.title = element_text(face = 'bold'),
# want the font family as 'serif' - looks better than defult
text = element_text(family = 'serif'),
axis.text.y = element_text(size = 14))+
# let's put lines between the years
geom_hline(yintercept = seq(0.5, 8.5, 1),
colour = 'grey50',
linewidth = 0.2)+
# to give it a clean look put a vline at zero.
geom_vline(xintercept = 0,
colour = "grey50",
linewidth = 0.5)+
guides(fill = FALSE) # don't want lengeds
Oh dear! 2023 got least funding
isc_grants |>
group_by(year, group) |>
##group variable is Whether the grant was awarded in the spring cycle (1) or the fall cycle (2)
summarise(grantSum = sum(funded,
na.rm = TRUE))
## # A tibble: 15 × 3
## # Groups: year [8]
## year group grantSum
## <dbl> <dbl> <dbl>
## 1 2016 1 86500
## 2 2016 2 19100
## 3 2017 1 157700
## 4 2017 2 35000
## 5 2018 1 121672
## 6 2018 2 168300
## 7 2019 1 54083
## 8 2019 2 104499
## 9 2020 1 99300
## 10 2020 2 56488
## 11 2021 1 62700
## 12 2021 2 44040
## 13 2022 1 42000
## 14 2022 2 69000
## 15 2023 1 51015
Ok! the group is only 1 for 2023 that means fall cycle is not included in the data but still it is not reaching half-way of the fund granted in 2018.
Here I spent one hour with the dataset.