Saturday, February 17, 2024

Tidy Tuesday for Valentine's day

Tidy Tuesday for 13-02-2024
library(tidyverse)
library(ggalt) #-> for smooth lines

The Context

The valentine’s day is a no brainer for anyone. I found a funny meme on social media for the occasion:

For English readers translation is: My neighbour aunty said that her daughter got flowers & chocolate for being topper in a coaching class and on 14th Feb they are taking her on tour :-p.

Tidy Tuesday this time put some data related to valentine’s day and one graph. So let’s check out the graph:

The Data

There are 3 data files:

#data1
giftAge <- read_csv('D:/R Script/Tidytuesday/13-02-2024/gifts_age.csv')
cat("Age wise Spending")
## Age wise Spending
giftAge |> 
  head(5)
## # A tibble: 5 × 9
##   Age   SpendingCelebrating Candy Flowers Jewelry GreetingCards EveningOut
##   <chr>               <dbl> <dbl>   <dbl>   <dbl>         <dbl>      <dbl>
## 1 18-24                  51    70      50      33            33         41
## 2 25-34                  40    62      44      34            33         37
## 3 35-44                  31    58      41      29            42         30
## 4 45-54                  19    60      37      20            42         31
## 5 55-64                  18    50      32      13            43         29
## # ℹ 2 more variables: Clothing <dbl>, GiftCards <dbl>
#data2
giftGender <- read_csv('D:/R Script/Tidytuesday/13-02-2024/gifts_gender.csv')
cat("Gender wise Spending")
## Gender wise Spending
giftGender |> 
  head(5)
## # A tibble: 2 × 9
##   Gender SpendingCelebrating Candy Flowers Jewelry GreetingCards EveningOut
##   <chr>                <dbl> <dbl>   <dbl>   <dbl>         <dbl>      <dbl>
## 1 Men                     27    52      56      30            37         33
## 2 Women                   27    59      19      14            43         29
## # ℹ 2 more variables: Clothing <dbl>, GiftCards <dbl>
#data3
hisSpend <- read_csv('D:/R Script/Tidytuesday/13-02-2024/historical_spending.csv')
cat("Historical Spending")
## Historical Spending
hisSpend |> 
  head(5)
## # A tibble: 5 × 10
##    Year PercentCelebrating PerPerson Candy Flowers Jewelry GreetingCards
##   <dbl>              <dbl>     <dbl> <dbl>   <dbl>   <dbl>         <dbl>
## 1  2010                 60      103    8.6    12.3    21.5          5.91
## 2  2011                 58      116.  10.8    12.6    26.2          8.09
## 3  2012                 59      126.  10.8    13.5    29.6          6.93
## 4  2013                 60      131.  11.6    13.5    30.9          8.32
## 5  2014                 54      134.  10.8    15      30.6          7.97
## # ℹ 3 more variables: EveningOut <dbl>, Clothing <dbl>, GiftCards <dbl>

The Graph(s)

Now let’s try to make the graph(s):

hisSpend$Perc <- hisSpend$PercentCelebrating/100
spline_int <- as.data.frame(spline(hisSpend$Year, 
                                   hisSpend$PercentCelebrating))
p <- hisSpend |> 
  ggplot(aes(x = Year, y = Perc)) + 
  geom_hline(yintercept = seq(0.5, 0.64, 0.02),
             colour = 'grey90', 
             linewidth = 0.4) +
  geom_xspline(spline_shape = -0.4, size = 1, color = 'red')+
  geom_point(shape = 20, colour = 'white', size = 4)+
  geom_point(shape = 20, colour = 'red', size = 2)+
  geom_text(aes(x = Year + 0.015, 
                y = Perc + 0.006,
                label = scales::percent(Perc, accuracy = 1, trim = F)))+
  scale_color_continuous(name = "",
                         label = "Percent Celebrating") +
  scale_x_continuous(breaks = seq(2010, 2022, 1))+
  scale_y_continuous(breaks = seq(0.50, 0.65, 1))+
  labs(x = "", 
       y = "",
       caption = "Source: NDR's 2024 Valentine's Day Spending Survey, conducted by Prosper Insights & Analytics")+
  theme_bw() + 
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.text.y = element_blank(),
        legend.position = 'bottom',
        axis.ticks = element_blank(),
        axis.text.x = element_text(angle = 30),
        panel.border = element_blank(),
        plot.caption = element_text(
          colour = "grey50",
          size = 6,
          hjust = 0
        )
      ) 
  
p

It seems people used to celebrate it more than now. There is a sharp dip in 2014 and sharp rise in 2020 may be because of COVID, people needed a reason to celebrate. Overall, its declining! (The data is truncated, the above graph is from 2007 till 2024 and this one is subset of that)

Now, we are done with this graph lets explore rest two datasets.

The Others

# tidy the data using pivot_longer
giftAgeL <- giftAge |> 
  pivot_longer(
    cols = !Age,
    names_to = "Category",
    values_to = "Number"
  )
# turn numbers into percentages
giftAgeL <- giftAgeL |> 
  group_by(Age) |> 
  mutate(Perc = round((Number/sum(Number)), 2))

#plot
giftAgeL |> 
  ggplot(aes(x = Age, y = Perc, fill = Category, group = Category)) + 
  geom_col(position = 'dodge') +
  # lets label the bars otherwise info. is not clear
  geom_text(aes(label = scales::percent(Perc, accuracy = 1)),
            position = position_dodge(width = 0.9), # this is important because dodge bars we have plotted
            size = 3,
            color = 'black',
            angle = 90, # plot the text vertically
            hjust = 1.2) +
  scale_fill_manual(values = c('#66c2a5','#fc8d62','#8da0cb','#e78ac3','#a6d854','#ffd92f','#e5c494','#b3b3b3'), name = "")+
   theme_bw()+
  theme(legend.position = 'top',
        axis.text.x = element_text(angle = 30), # x-axis label turn them to 30 degree
        panel.grid = element_blank(), # remove grid lines
        panel.border = element_blank(), # remove border
        axis.ticks = element_blank(), # remove ticks
        axis.text.y = element_blank())+ # remove y axis labels coz we have printed y axis value as %
   labs(x = "", 
        y = "")+ # Remove axis labels
  guides(fill = guide_legend(nrow = 1)) # put all the legends in one row

Now, see the graph -> it seems as we grow old importance of greeting cards increases. Candy remains popular at all age group - no wonders there :-)

No comments:

Post a Comment