Tidy Tuesday 24 Jan 2024

Tidy Tuesday

Tidy Tuesday is an initiative wherein a task is given along with the dataset. The github page is here: https://github.com/rfordatascience/tidytuesday?tab=readme-ov-file. The task for 24 Jan’ 2024 is to reproduce two graphs. I could only finish one. The two graphs that were to be reproduced are:

Graph 1

Graph 2

I am taking on graph #2

#Read the data & have a glimpse of it
engEd <- read.csv('D:/R Script/Tidytuesday/english_education.csv')
#head(engEd, 10)
#could also use glimpse(engEd)

Variables of interest

Variable of interest are for the second graph, let’s check them them. If need be make some changes in them.

# variable of interest are two
#1. size_flag -> representing how big is the human settlement
#2. income_flag

# For the size_flag, the unique values are more than what were represented in the graph
engEd$size_flag |> unique()

## [1] "Small Towns"      "Medium Towns"     "Large Towns"      "City"            
## [5] "Inner London BUA" "Outer london BUA" "Other Small BUAs" "Not BUA"

filter out the values which are not required.

engEdtowns <- engEd |> 
  filter(
    size_flag %in% c("Small Towns", "Medium Towns", "Large Towns", "City")
  ) |> 
  mutate(size_flag = case_when( # here I am changing the value city to cities.
    size_flag == "City" ~ "Cities",
    .default = size_flag
  ))

Still need little data manipulation

engEdtownsDf <- engEdtowns |> 
  mutate(income_flag = case_when(
    income_flag == "Higher deprivation towns" ~ "Higher income deprivation",
    income_flag == "Lower deprivation towns" ~ "Lower income deprivation",
    income_flag == "Mid deprivation towns" ~ "Mid income deprivation",
    income_flag == "Cities" ~ "Higher income deprivation",
    .default = income_flag
  )) |>
  filter(!(is.na(income_flag))) |> 
  group_by(size_flag) |> 
  count(income_flag) |> 
  mutate(perc = n/sum(n)*100)

The plot

Now, let’s code to make the plot

engEdtownsDf |> 
  mutate(income_flag = factor(income_flag, levels = c("Higher income deprivation", "Mid income deprivation", "Lower income deprivation"))) |> 
  ggplot()+
  geom_col(aes(x = size_flag, y = perc, fill = income_flag), 
           width = 0.6, 
           position = position_stack(reverse = TRUE))

The above plot is giving the information but many things are missing such as bars horizontal and colours are different, no captions. I will use cale_fill_manual() for colour setting and, oord_flip() for setting vertical bars to horizontal. I am also adding title, subtitle and a caption as used in the graph #2.

engEdtownsDf |> 
  mutate(income_flag = factor(income_flag, levels = c("Higher income deprivation", "Mid income deprivation", "Lower income deprivation"))) |> 
  ggplot()+
  geom_col(aes(x = size_flag, y = perc, fill = income_flag), 
           width = 0.6, 
           position = position_stack(reverse = TRUE)) + # reverse is true otherwise blue is coming first and purple was coming last - so I reversed it here.
  scale_fill_manual(values = c("#7b0b60", "#d3d3d3", "#40579a"))+
  guides(fill = guide_legend(override.aes = list(shape = 21, size = 6)))+
  coord_flip() +
  labs(x = "", y = "", 
       title = 'Small towns are less likely to be classed as having high income \ndeprivation',
       subtitle = "Income deprivation group by town size, England",
       caption = "Source: Office for National Statistics analysis using Longitudal Educational Outcome (LEO) \nfrom the Department of Education (DfE)")

The elements missing in the above graph are:

Legend shape, position and size.
Font colour of caption.
Background colouring, major x axis gridlines, and font size of axes.
X-axis breaks must be 0-100 by 20 increment.

Lets put them in the code

engEdtownsDf |> 
  mutate(income_flag = factor(income_flag, levels = c("Higher income deprivation", "Mid income deprivation", "Lower income deprivation"))) |> 
  ggplot()+
  geom_col(aes(x = size_flag, y = perc, fill = income_flag), 
           width = 0.8, 
           position = position_stack(reverse = TRUE),
           key_glyph = draw_key_point) + # change the shape of legend keys
  scale_fill_manual(values = c("#7b0b60", "#d3d3d3", "#40579a"))+
  guides(fill = guide_legend(override.aes = list(shape = 21, size = 6)))+ # change the shape & size of legend keys
  coord_flip() +
  labs(x = "", y = "", 
       title = 'Small towns are less likely to be classed as having high income \ndeprivation',
       subtitle = "Income deprivation group by town size, England",
       caption = "Source: Office for National Statistics analysis using Longitudal Educational Outcome (LEO) \nfrom the Department of Education (DfE)")+
  geom_hline(yintercept = c(0,100), size = 0.6, colour="#d3d3d3")+
  theme_minimal() + # change background colour
  theme(legend.position = "top",
        legend.title = element_blank(),
        panel.grid.major.y = element_blank(), # don't need y-axis grids
        plot.title = element_text(colour = 'black',
                                  size = 16,
                                  face = 'bold'),
        plot.caption = element_text(size = 14, 
                                    hjust = 0,
                                    color = 'grey59'),
        axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 12)
        )+
  scale_y_continuous(breaks=seq(0, 100, 20)) # for x-axis breaks

So, this is my attempt. Please do let me know what do you think of it. Cheers!

Spatial Ideas

Saturday, February 3, 2024

Tidy Tuesday 24 Jan 2024

Tidy Tuesday 24 Jan 2024

Vikram Ranga

2024-02-03

Tidy Tuesday

Variables of interest

The plot

No comments:

Post a Comment