Equity Candle plot
Vikram Ranga
2024-07-20
Equity Boxplots aka Candles :-p
Boxplots or box-whisker plots were introduced by John Tukey along with Fast Fourier Transform algorithm. I have used boxplot quite a lot in my research and many operational projects, just to see how the data varies. Boxplots are so powerful that it is an essential part of Exploratory data analysis. Though I never interpreted or used boxplots for share-market but I have seen people referring them to as candles. When I first heard someone to refer them as candles, I find it amusing. Anyways they are, no doubt, full of information and probably that’s why people in equity market use them to make decisions. Recently, I came to know about a python library called Altair, in one of their examples they have prepared boxplots. The plot is as below:
Now, I want to recreate the graph using R’s popular ggplot2 package. I have downloaded the dataset provided in the altair’s example
import altair as alt
import pandas as pd
from vega_datasets import data
source.to_excel("~/Documents/R Scripts/EquityData/equityData.xlsx")
The Plot
Now, I can access the data in R and plot it
equityData <- readxl::read_xlsx("~/Documents/R Scripts/EquityData/equityData.xlsx")
equityData <- equityData |>
mutate(Color = case_when(
open > close ~ 'r',
.default = 'g'
))
equityData |>
pivot_longer(
cols = c(open, close, high, low),
names_to = "Var",
values_to = 'Values'
) |>
mutate(date = lubridate::date(date)) |>
ggplot() +
geom_boxplot(aes(x = date,
y = Values,
group = date,
fill = factor(Color)))+
scale_fill_manual(values = c('g' = "green",
'r' = 'red')
)+
scale_x_date(date_breaks = '1 week', date_labels = "%m/%d")+
scale_y_continuous(breaks = seq(23, 34, 1))+
labs(x = "Date in 2009",
y = "Price")+
theme_linedraw() +
theme(
legend.position = 'none',
axis.text.x = element_text(size = 8),
axis.title = element_text(face = 'bold')
)
Let’s see how this code works:
# first let's see what is the data
head(equityData, 3)
## # A tibble: 3 × 9
## ...1 date open high low close signal ret Color
## <dbl> <dttm> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 0 2009-06-01 00:00:00 28.7 30.0 28.4 30.0 short -4.89 g
## 2 1 2009-06-02 00:00:00 30.0 30.1 28.3 29.6 short -0.323 r
## 3 2 2009-06-03 00:00:00 29.6 31.8 29.6 31.0 short 3.69 g
We have four variables that determine the variability of stock in a day. This information is going to plot boxplot. But I want all the values in one column against a common date. So let’s use pivot_longer. Next we want to give them a color if the stock opened at higher price and closed at lower, we mark them as red and green otherwise.
equityData |>
mutate(Color = case_when(
open > close ~ 'r',
.default = 'g'
)) |> head(3)
## # A tibble: 3 × 9
## ...1 date open high low close signal ret Color
## <dbl> <dttm> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 0 2009-06-01 00:00:00 28.7 30.0 28.4 30.0 short -4.89 g
## 2 1 2009-06-02 00:00:00 30.0 30.1 28.3 29.6 short -0.323 r
## 3 2 2009-06-03 00:00:00 29.6 31.8 29.6 31.0 short 3.69 g
equityData |>
pivot_longer(
cols = c(open, close, high, low),
names_to = "Var",
values_to = 'Values'
) |>
head(3)
## # A tibble: 3 × 7
## ...1 date signal ret Color Var Values
## <dbl> <dttm> <chr> <dbl> <chr> <chr> <dbl>
## 1 0 2009-06-01 00:00:00 short -4.89 g open 28.7
## 2 0 2009-06-01 00:00:00 short -4.89 g close 30.0
## 3 0 2009-06-01 00:00:00 short -4.89 g high 30.0
Now, we can plot them using geom_boxplot.
equityData |>
pivot_longer(
cols = c(open, close, high, low),
names_to = "Var",
values_to = 'Values'
) |>
mutate(date = lubridate::date(date)) |>
ggplot() +
geom_boxplot(aes(x = date,
y = Values,
group = date,
fill = factor(Color)))+
scale_fill_manual(values = c('g' = "green",
'r' = 'red')
)+
scale_x_date(date_breaks = '1 week', date_labels = "%m/%d")+
scale_y_continuous(breaks = seq(23, 34, 1))+
labs(x = "Date in 2009",
y = "Price")+
theme_linedraw() +
theme(
legend.position = 'none',
axis.text.x = element_text(size = 8),
axis.title = element_text(face = 'bold')
)
I have used scale_fill_manual to assign colours manually. Secondly x and y axis were tweaked i.e. x axis is a date variable axis and we wish to print date at an interval by 1 week with label format %m/%d. Rest of things are straightforward.