Tidy Tuesday 24 Jan 2024
Vikram Ranga
2024-02-03
Tidy Tuesday
Tidy Tuesday is an initiative wherein a task is given along with the dataset. The github page is here: https://github.com/rfordatascience/tidytuesday?tab=readme-ov-file. The task for 24 Jan’ 2024 is to reproduce two graphs. I could only finish one. The two graphs that were to be reproduced are:
I am taking on graph #2
#Read the data & have a glimpse of it
engEd <- read.csv('D:/R Script/Tidytuesday/english_education.csv')
#head(engEd, 10)
#could also use glimpse(engEd)
Variables of interest
Variable of interest are for the second graph, let’s check them them. If need be make some changes in them.
# variable of interest are two
#1. size_flag -> representing how big is the human settlement
#2. income_flag
# For the size_flag, the unique values are more than what were represented in the graph
engEd$size_flag |> unique()
## [1] "Small Towns" "Medium Towns" "Large Towns" "City"
## [5] "Inner London BUA" "Outer london BUA" "Other Small BUAs" "Not BUA"
filter out the values which are not required.
engEdtowns <- engEd |>
filter(
size_flag %in% c("Small Towns", "Medium Towns", "Large Towns", "City")
) |>
mutate(size_flag = case_when( # here I am changing the value city to cities.
size_flag == "City" ~ "Cities",
.default = size_flag
))
Still need little data manipulation
engEdtownsDf <- engEdtowns |>
mutate(income_flag = case_when(
income_flag == "Higher deprivation towns" ~ "Higher income deprivation",
income_flag == "Lower deprivation towns" ~ "Lower income deprivation",
income_flag == "Mid deprivation towns" ~ "Mid income deprivation",
income_flag == "Cities" ~ "Higher income deprivation",
.default = income_flag
)) |>
filter(!(is.na(income_flag))) |>
group_by(size_flag) |>
count(income_flag) |>
mutate(perc = n/sum(n)*100)
The plot
Now, let’s code to make the plot
engEdtownsDf |>
mutate(income_flag = factor(income_flag, levels = c("Higher income deprivation", "Mid income deprivation", "Lower income deprivation"))) |>
ggplot()+
geom_col(aes(x = size_flag, y = perc, fill = income_flag),
width = 0.6,
position = position_stack(reverse = TRUE))
The above plot is giving the information but many things are missing such as bars horizontal and colours are different, no captions. I will use cale_fill_manual() for colour setting and, oord_flip() for setting vertical bars to horizontal. I am also adding title, subtitle and a caption as used in the graph #2.
engEdtownsDf |>
mutate(income_flag = factor(income_flag, levels = c("Higher income deprivation", "Mid income deprivation", "Lower income deprivation"))) |>
ggplot()+
geom_col(aes(x = size_flag, y = perc, fill = income_flag),
width = 0.6,
position = position_stack(reverse = TRUE)) + # reverse is true otherwise blue is coming first and purple was coming last - so I reversed it here.
scale_fill_manual(values = c("#7b0b60", "#d3d3d3", "#40579a"))+
guides(fill = guide_legend(override.aes = list(shape = 21, size = 6)))+
coord_flip() +
labs(x = "", y = "",
title = 'Small towns are less likely to be classed as having high income \ndeprivation',
subtitle = "Income deprivation group by town size, England",
caption = "Source: Office for National Statistics analysis using Longitudal Educational Outcome (LEO) \nfrom the Department of Education (DfE)")
The elements missing in the above graph are:
- Legend shape, position and size.
- Font colour of caption.
- Background colouring, major x axis gridlines, and font size of axes.
- X-axis breaks must be 0-100 by 20 increment.
Lets put them in the code
engEdtownsDf |>
mutate(income_flag = factor(income_flag, levels = c("Higher income deprivation", "Mid income deprivation", "Lower income deprivation"))) |>
ggplot()+
geom_col(aes(x = size_flag, y = perc, fill = income_flag),
width = 0.8,
position = position_stack(reverse = TRUE),
key_glyph = draw_key_point) + # change the shape of legend keys
scale_fill_manual(values = c("#7b0b60", "#d3d3d3", "#40579a"))+
guides(fill = guide_legend(override.aes = list(shape = 21, size = 6)))+ # change the shape & size of legend keys
coord_flip() +
labs(x = "", y = "",
title = 'Small towns are less likely to be classed as having high income \ndeprivation',
subtitle = "Income deprivation group by town size, England",
caption = "Source: Office for National Statistics analysis using Longitudal Educational Outcome (LEO) \nfrom the Department of Education (DfE)")+
geom_hline(yintercept = c(0,100), size = 0.6, colour="#d3d3d3")+
theme_minimal() + # change background colour
theme(legend.position = "top",
legend.title = element_blank(),
panel.grid.major.y = element_blank(), # don't need y-axis grids
plot.title = element_text(colour = 'black',
size = 16,
face = 'bold'),
plot.caption = element_text(size = 14,
hjust = 0,
color = 'grey59'),
axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12)
)+
scale_y_continuous(breaks=seq(0, 100, 20)) # for x-axis breaks
So, this is my attempt. Please do let me know what do you think of it. Cheers!
No comments:
Post a Comment