Saturday, July 20, 2024

Equity Candles using R's ggplot2

Equity Candle plot

Equity Boxplots aka Candles :-p

Boxplots or box-whisker plots were introduced by John Tukey along with Fast Fourier Transform algorithm. I have used boxplot quite a lot in my research and many operational projects, just to see how the data varies. Boxplots are so powerful that it is an essential part of Exploratory data analysis. Though I never interpreted or used boxplots for share-market but I have seen people referring them to as candles. When I first heard someone to refer them as candles, I find it amusing. Anyways they are, no doubt, full of information and probably that’s why people in equity market use them to make decisions. Recently, I came to know about a python library called Altair, in one of their examples they have prepared boxplots. The plot is as below:

05/3106/0706/1406/2106/2807/0507/1207/1907/26Date in 2009232425262728293031323334Price
Altair’s example

Now, I want to recreate the graph using R’s popular ggplot2 package. I have downloaded the dataset provided in the altair’s example

import altair as alt
import pandas as pd
from vega_datasets import data
source.to_excel("~/Documents/R Scripts/EquityData/equityData.xlsx")

The Plot

Now, I can access the data in R and plot it

equityData <- readxl::read_xlsx("~/Documents/R Scripts/EquityData/equityData.xlsx")
equityData <- equityData |> 
  mutate(Color = case_when(
    open > close ~ 'r',
    .default = 'g'
  ))
equityData |> 
  pivot_longer(
    cols = c(open, close, high, low),
    names_to = "Var",
    values_to = 'Values'
  ) |> 
  mutate(date = lubridate::date(date)) |> 
  ggplot() +
  geom_boxplot(aes(x = date,
                   y  = Values,
                   group = date,
                   fill = factor(Color)))+
  scale_fill_manual(values = c('g' = "green", 
                                'r' = 'red')
                     )+
  scale_x_date(date_breaks = '1 week', date_labels = "%m/%d")+
  scale_y_continuous(breaks = seq(23, 34, 1))+
  labs(x = "Date in 2009",
       y = "Price")+
  theme_linedraw() +
  theme(
    legend.position = 'none',
    axis.text.x = element_text(size = 8),
    axis.title = element_text(face = 'bold')
  )

Let’s see how this code works:

# first let's see what is the data
head(equityData, 3)
## # A tibble: 3 × 9
##    ...1 date                 open  high   low close signal    ret Color
##   <dbl> <dttm>              <dbl> <dbl> <dbl> <dbl> <chr>   <dbl> <chr>
## 1     0 2009-06-01 00:00:00  28.7  30.0  28.4  30.0 short  -4.89  g    
## 2     1 2009-06-02 00:00:00  30.0  30.1  28.3  29.6 short  -0.323 r    
## 3     2 2009-06-03 00:00:00  29.6  31.8  29.6  31.0 short   3.69  g

We have four variables that determine the variability of stock in a day. This information is going to plot boxplot. But I want all the values in one column against a common date. So let’s use pivot_longer. Next we want to give them a color if the stock opened at higher price and closed at lower, we mark them as red and green otherwise.

equityData |> 
  mutate(Color = case_when(
    open > close ~ 'r',
    .default = 'g'
  )) |> head(3)
## # A tibble: 3 × 9
##    ...1 date                 open  high   low close signal    ret Color
##   <dbl> <dttm>              <dbl> <dbl> <dbl> <dbl> <chr>   <dbl> <chr>
## 1     0 2009-06-01 00:00:00  28.7  30.0  28.4  30.0 short  -4.89  g    
## 2     1 2009-06-02 00:00:00  30.0  30.1  28.3  29.6 short  -0.323 r    
## 3     2 2009-06-03 00:00:00  29.6  31.8  29.6  31.0 short   3.69  g
equityData |> 
  pivot_longer(
    cols = c(open, close, high, low),
    names_to = "Var",
    values_to = 'Values'
  ) |> 
  head(3)
## # A tibble: 3 × 7
##    ...1 date                signal   ret Color Var   Values
##   <dbl> <dttm>              <chr>  <dbl> <chr> <chr>  <dbl>
## 1     0 2009-06-01 00:00:00 short  -4.89 g     open    28.7
## 2     0 2009-06-01 00:00:00 short  -4.89 g     close   30.0
## 3     0 2009-06-01 00:00:00 short  -4.89 g     high    30.0

Now, we can plot them using geom_boxplot.

equityData |> 
  pivot_longer(
    cols = c(open, close, high, low),
    names_to = "Var",
    values_to = 'Values'
  ) |> 
  mutate(date = lubridate::date(date)) |> 
  ggplot() +
  geom_boxplot(aes(x = date,
                   y  = Values,
                   group = date,
                   fill = factor(Color)))+
  scale_fill_manual(values = c('g' = "green", 
                                'r' = 'red')
                     )+
  scale_x_date(date_breaks = '1 week', date_labels = "%m/%d")+
  scale_y_continuous(breaks = seq(23, 34, 1))+
  labs(x = "Date in 2009",
       y = "Price")+
  theme_linedraw() +
  theme(
    legend.position = 'none',
    axis.text.x = element_text(size = 8),
    axis.title = element_text(face = 'bold')
    
  )

I have used scale_fill_manual to assign colours manually. Secondly x and y axis were tweaked i.e. x axis is a date variable axis and we wish to print date at an interval by 1 week with label format %m/%d. Rest of things are straightforward.