Saturday, February 3, 2024

Tidy Tuesday 24 Jan 2024

Tidy Tuesday 24 Jan 2024

Tidy Tuesday

Tidy Tuesday is an initiative wherein a task is given along with the dataset. The github page is here: https://github.com/rfordatascience/tidytuesday?tab=readme-ov-file. The task for 24 Jan’ 2024 is to reproduce two graphs. I could only finish one. The two graphs that were to be reproduced are:

Graph 1
Graph 1
Graph 2
Graph 2

I am taking on graph #2

#Read the data & have a glimpse of it
engEd <- read.csv('D:/R Script/Tidytuesday/english_education.csv')
#head(engEd, 10)
#could also use glimpse(engEd)

Variables of interest

Variable of interest are for the second graph, let’s check them them. If need be make some changes in them.

# variable of interest are two
#1. size_flag -> representing how big is the human settlement
#2. income_flag

# For the size_flag, the unique values are more than what were represented in the graph
engEd$size_flag |> unique() 
## [1] "Small Towns"      "Medium Towns"     "Large Towns"      "City"            
## [5] "Inner London BUA" "Outer london BUA" "Other Small BUAs" "Not BUA"

filter out the values which are not required.

engEdtowns <- engEd |> 
  filter(
    size_flag %in% c("Small Towns", "Medium Towns", "Large Towns", "City")
  ) |> 
  mutate(size_flag = case_when( # here I am changing the value city to cities.
    size_flag == "City" ~ "Cities",
    .default = size_flag
  ))  

Still need little data manipulation

engEdtownsDf <- engEdtowns |> 
  mutate(income_flag = case_when(
    income_flag == "Higher deprivation towns" ~ "Higher income deprivation",
    income_flag == "Lower deprivation towns" ~ "Lower income deprivation",
    income_flag == "Mid deprivation towns" ~ "Mid income deprivation",
    income_flag == "Cities" ~ "Higher income deprivation",
    .default = income_flag
  )) |>
  filter(!(is.na(income_flag))) |> 
  group_by(size_flag) |> 
  count(income_flag) |> 
  mutate(perc = n/sum(n)*100)

The plot

Now, let’s code to make the plot

engEdtownsDf |> 
  mutate(income_flag = factor(income_flag, levels = c("Higher income deprivation", "Mid income deprivation", "Lower income deprivation"))) |> 
  ggplot()+
  geom_col(aes(x = size_flag, y = perc, fill = income_flag), 
           width = 0.6, 
           position = position_stack(reverse = TRUE)) 

The above plot is giving the information but many things are missing such as bars horizontal and colours are different, no captions. I will use cale_fill_manual() for colour setting and, oord_flip() for setting vertical bars to horizontal. I am also adding title, subtitle and a caption as used in the graph #2.

engEdtownsDf |> 
  mutate(income_flag = factor(income_flag, levels = c("Higher income deprivation", "Mid income deprivation", "Lower income deprivation"))) |> 
  ggplot()+
  geom_col(aes(x = size_flag, y = perc, fill = income_flag), 
           width = 0.6, 
           position = position_stack(reverse = TRUE)) + # reverse is true otherwise blue is coming first and purple was coming last - so I reversed it here.
  scale_fill_manual(values = c("#7b0b60", "#d3d3d3", "#40579a"))+
  guides(fill = guide_legend(override.aes = list(shape = 21, size = 6)))+
  coord_flip() +
  labs(x = "", y = "", 
       title = 'Small towns are less likely to be classed as having high income \ndeprivation',
       subtitle = "Income deprivation group by town size, England",
       caption = "Source: Office for National Statistics analysis using Longitudal Educational Outcome (LEO) \nfrom the Department of Education (DfE)")

The elements missing in the above graph are:

  1. Legend shape, position and size.
  2. Font colour of caption.
  3. Background colouring, major x axis gridlines, and font size of axes.
  4. X-axis breaks must be 0-100 by 20 increment.

Lets put them in the code

engEdtownsDf |> 
  mutate(income_flag = factor(income_flag, levels = c("Higher income deprivation", "Mid income deprivation", "Lower income deprivation"))) |> 
  ggplot()+
  geom_col(aes(x = size_flag, y = perc, fill = income_flag), 
           width = 0.8, 
           position = position_stack(reverse = TRUE),
           key_glyph = draw_key_point) + # change the shape of legend keys
  scale_fill_manual(values = c("#7b0b60", "#d3d3d3", "#40579a"))+
  guides(fill = guide_legend(override.aes = list(shape = 21, size = 6)))+ # change the shape & size of legend keys
  coord_flip() +
  labs(x = "", y = "", 
       title = 'Small towns are less likely to be classed as having high income \ndeprivation',
       subtitle = "Income deprivation group by town size, England",
       caption = "Source: Office for National Statistics analysis using Longitudal Educational Outcome (LEO) \nfrom the Department of Education (DfE)")+
  geom_hline(yintercept = c(0,100), size = 0.6, colour="#d3d3d3")+
  theme_minimal() + # change background colour
  theme(legend.position = "top",
        legend.title = element_blank(),
        panel.grid.major.y = element_blank(), # don't need y-axis grids
        plot.title = element_text(colour = 'black',
                                  size = 16,
                                  face = 'bold'),
        plot.caption = element_text(size = 14, 
                                    hjust = 0,
                                    color = 'grey59'),
        axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 12)
        )+
  scale_y_continuous(breaks=seq(0, 100, 20)) # for x-axis breaks

So, this is my attempt. Please do let me know what do you think of it. Cheers!

No comments:

Post a Comment