FinTech & Analytics: Programming Data Visualisation in R

Archie Dolit

Installing R Packages and Importing Data

Install and Lauch R Packages

Check, install and launch ggiraph, plotly, DT and tidyverse packages of R

packages = c('DT', 'ggiraph', 'plotly', 'tidyverse')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

Importing Data

Use read_csv() of readr package to import Exam_data.csv into R

exam_data <- read_csv("data/Exam_data.csv")
glimpse(exam_data)

Rows: 322
Columns: 7
$ ID      <chr> "Student321", "Student305", "Student289", "Student22~
$ CLASS   <chr> "3I", "3I", "3H", "3F", "3I", "3I", "3I", "3I", "3I"~
$ GENDER  <chr> "Male", "Female", "Male", "Male", "Male", "Female", ~
$ RACE    <chr> "Malay", "Malay", "Chinese", "Chinese", "Malay", "Ma~
$ ENGLISH <dbl> 21, 24, 26, 27, 27, 31, 31, 31, 33, 34, 34, 36, 36, ~
$ MATHS   <dbl> 9, 22, 16, 77, 11, 16, 21, 18, 19, 49, 39, 35, 23, 3~
$ SCIENCE <dbl> 15, 16, 16, 31, 25, 16, 25, 27, 15, 37, 42, 22, 32, ~

summary(exam_data)

      ID               CLASS              GENDER         
 Length:322         Length:322         Length:322        
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  
                                                         
                                                         
                                                         
     RACE              ENGLISH          MATHS          SCIENCE     
 Length:322         Min.   :21.00   Min.   : 9.00   Min.   :15.00  
 Class :character   1st Qu.:59.00   1st Qu.:58.00   1st Qu.:49.25  
 Mode  :character   Median :70.00   Median :74.00   Median :65.00  
                    Mean   :67.18   Mean   :69.33   Mean   :61.16  
                    3rd Qu.:78.00   3rd Qu.:85.00   3rd Qu.:74.75  
                    Max.   :96.00   Max.   :99.00   Max.   :96.00

Year end examination grades of a cohort of primary 3 students from a local school.
There are a total of seven attributes. Four of them are categorical data type and the other three are in continuous data type.
- The categorical attributes are: ID, CLASS, GENDER and RACE.
- The continuous attributes are: MATHS, ENGLISH and SCIENCE.

Static Visualisation

Comparing Base R Histogram vs ggplot 2

Base R histogram

hist(exam_data$MATHS)

ggplot2 histogram

ggplot(data = exam_data, aes(x=MATHS)) +
  geom_histogram( bins = 10,
                  boundary = 100,
                  color = "black",
                  fill = "grey") +
  ggtitle("Distribution of Maths Score")

Essential Elements in ggplot2

Geometric Objects: geom_bar

Plot a bar chart

ggplot(data = exam_data,
       aes(x=RACE)) +
  geom_bar()

Geometric Objects: geom_dotplot

The width of a dot corresponds to the bin width (or maximum width, depending on the binning algorithm), and dots are stacked, with each dot representing one observation.

ggplot(data = exam_data,
       aes(x=MATHS,
           fill = RACE)) +
  geom_dotplot(binwidth = 2.5,
               dotsize = 0.5) +
  scale_y_continuous(NULL,
                     breaks = NULL)

Geometric Objects: geom_histogram

geom_histogram() is used to create a simple histogram by using values in MATHS field of exam_data:

bin argument was changed to 20 from the defaul value of 30
color argument, used to change the outline colour, was set to black
fill argument, used to shade the histogram, was set to light blue

ggplot(data = exam_data,
       aes(x=MATHS)) + 
  geom_histogram(bins = 20,
                 color = "black",
                 fill = "light blue")

Modifying a geometric object by changing aes()

The interior colour of the histogram was changed using the sub-group of aesthetics and fill argument

ggplot(data = exam_data,
       aes(x=MATHS,
           fill = GENDER)) + 
  geom_histogram(bins = 20,
                 color = "grey30")

Geometric Objects: geom-density

geom-density() computes and plots kernel density estimate, which is a smoothed version of the histogram

ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_density()

Two kernel density lines by using colour or fill arguments of aes()

ggplot(data=exam_data, 
       aes(x = MATHS, 
           colour = GENDER)) +
  geom_density()

Geometric Objects: geom_boxplot

Boxplots by using geom_boxplot()

ggplot(data=exam_data, 
       aes(y = MATHS,
           x= GENDER)) +
  geom_boxplot()

Notches are used in box plots to help visually assess whether the medians of distributions differ. If the notches do not overlap, this is evidence that the medians are different.

ggplot(data=exam_data, 
       aes(y = MATHS, 
           x= GENDER)) +
  geom_boxplot(notch=TRUE)

Combined geom Objects

Plot data points using both geom_boxplot() and geom_point()

ggplot(data = exam_data,
       aes(y = MATHS,
           x = GENDER)) + 
  geom_boxplot() +
  geom_point(position = "jitter",
             size = 0.5)

Interactive Data Visualisation with R - ggiraph methods

Interactive dotplot

Interactivity: hovering displays student’s ID

p <- ggplot(data = exam_data,
            aes(x = MATHS)) +
  geom_dotplot_interactive(
    aes(tooltip = ID),
    stackgroups = TRUE,
    binwidth = 1,
    method = "histodot") +
  scale_y_continuous(NULL,
                     breaks = NULL)
girafe(
    ggobj = p,
    width_svg = 6,
    height_svg = 6*0.618
  )

Hover effect with data_id aesthetic

Interactivity: Elements associated with a data_id (i.e CLASS) will be highlighted upon mouse over

p <- ggplot(data = exam_data,
            aes(x = MATHS)) +
  geom_dotplot_interactive(
    aes(data_id = CLASS,
        tooltip = CLASS),
    stackgroups = TRUE,
    binwidth = 1,
    method = "histodot") +
  scale_y_continuous(NULL,
                     breaks = NULL)
girafe(
    ggobj = p,
    width_svg = 6,
    height_svg = 6*0.618
  )

Styling hover effect

css codes are used to change the highlighting effect

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(              
    aes(data_id = CLASS),              
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") +               
  scale_y_continuous(NULL,               
                     breaks = NULL)
girafe(                                  
  ggobj = p,                             
  width_svg = 6,                         
  height_svg = 6*0.618,
  options = list(
    opts_hover(css = "fill: #202020;"),
    opts_hover_inv(css = "opacity:0.2;")
  ))

Click effect with onclick

exam_data$onclick <- sprintf("window.open(\"%s%s\")",
"https://www.moe.gov.sg/schoolfinder?journey=Primary%20school", as.character(exam_data$ID) )
p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(              
    aes(onclick = onclick),
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") +               
  scale_y_continuous(NULL,               
                     breaks = NULL)
girafe(                                  
  ggobj = p,                             
  width_svg = 6,                         
  height_svg = 6*0.618)

Interactive Data Visualisation with R - plotly methods

Interactive scatter plot

plot_ly(data = exam_data, 
             x = ~MATHS, 
             y = ~ENGLISH)

Visual Variable

color argument is mapped to a qualitative visual variable (i.e. RACE)

plot_ly(data = exam_data, 
        x = ~ENGLISH, 
        y = ~MATHS, 
        color = ~RACE)

Changing colour pallete

colors argument is used to change the default colour palette to ColorBrewel colour palette.

plot_ly(data = exam_data, 
        x = ~ENGLISH, 
        y = ~MATHS, 
        color = ~RACE, 
        colors = "Set1")

Customising colour scheme

pal <- c("red", "purple", "blue", "green")
plot_ly(data = exam_data, 
        x = ~ENGLISH, 
        y = ~MATHS, 
        color = ~RACE, 
        colors = pal)

text argument is used to change the default tooltip

plot_ly(data = exam_data, 
        x = ~ENGLISH, 
        y = ~MATHS,
        text = ~paste("Student ID:", ID,
                      "<br>Class:", CLASS),
        color = ~RACE, 
        colors = "Set1")

Working with layout

layout argument is used to change the default tooltip.

plot_ly(data = exam_data, 
        x = ~ENGLISH, 
        y = ~MATHS,
        text = ~paste("Student ID:", ID,     
                      "<br>Class:", CLASS),  
        color = ~RACE, 
        colors = "Set1") %>%
  layout(title = 'English Score versus Maths Score ',
         xaxis = list(range = c(0, 100)),
         yaxis = list(range = c(0, 100)))

Interactive Data Visualisation with R - ggplotly methods

Interactive scatter plot

Only extra line you need to include in the code chunk is ggplotly()

p <- ggplot(data=exam_data, 
            aes(x = MATHS,
                y = ENGLISH)) +
  geom_point(dotsize = 1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))
ggplotly(p)

Coordinated Multiple Views with plotly

Two scatterplots and places them next to each other side-by-side by using subplot() of plotly package

p1 <- ggplot(data=exam_data, 
              aes(x = MATHS,
                  y = ENGLISH)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))
p2 <- ggplot(data=exam_data, 
            aes(x = MATHS,
                y = SCIENCE)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))
subplot(ggplotly(p1),
        ggplotly(p2))

Coordinated Multiple Views with plotly

To create a coordinated scatterplots, highlight_key() of plotly package is used

d <- highlight_key(exam_data)
p1 <- ggplot(data=d, 
            aes(x = MATHS,
                y = ENGLISH)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))
p2 <- ggplot(data=d, 
            aes(x = MATHS,
                y = SCIENCE)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))
subplot(ggplotly(p1),
        ggplotly(p2))

Click on a data point of one of the scatterplot and see how the corresponding point on the other scatterplot is selected.

Thing to learn from the code chunk:

highlight_key() simply creates an object of class crosstalk::SharedData.

Interactive Data Table: DT package

A wrapper of the JavaScript Library DataTables
Data objects in R can be rendered as HTML tables using the JavaScript library ‘DataTables’ (typically via R Markdown or Shiny).

DT::datatable(exam_data)

Linked brushing: crosstalk method

Two scatterplots and places them next to each other side-by-side by using subplot() of plotly package

d <- highlight_key(exam_data)
p <- ggplot(d, aes(ENGLISH, MATHS)) + 
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))
gg <- highlight(ggplotly(p), 
                "plotly_selected")
crosstalk::bscols(gg, DT::datatable(d), widths = 5)

Things to learn from the code chunk:

highlight() is a function of plotly package. It sets a variety of options for brushing (i.e., highlighting) multiple plots. These options are primarily designed for linking multiple plotly graphs, and may not behave as expected when linking plotly to another htmlwidget package via crosstalk. In some cases, other htmlwidgets will respect these options, such as persistent selection in leaflet.
bscols() is a helper function of crosstalk package. It makes it easy to put HTML elements side by side. It can be called directly from the console but is especially designed to work in an R Markdown document. Warning: This will bring in all of Bootstrap!.

Reference

Lesson 7: Programming Data Visualisation in R In-Class Exercise

Programming Data Visualisation in R

Installing R Packages and Importing Data

Install and Lauch R Packages

Importing Data

Static Visualisation

Comparing Base R Histogram vs ggplot 2

Base R histogram

ggplot2 histogram

Essential Elements in ggplot2

Geometric Objects: geom_bar

Geometric Objects: geom_dotplot

Geometric Objects: geom_histogram

Modifying a geometric object by changing aes()

Geometric Objects: geom-density

Geometric Objects: geom_boxplot

Combined geom Objects

Interactive Data Visualisation with R - ggiraph methods

Interactive dotplot

Tooltip effect with tooltip aesthetic

Hover effect with data_id aesthetic

Styling hover effect

Click effect with onclick

Interactive Data Visualisation with R - plotly methods

Interactive scatter plot

Visual Variable

Changing colour pallete

Customising colour scheme

Customising tooltip

Working with layout

Interactive Data Visualisation with R - ggplotly methods

Interactive scatter plot

Coordinated Multiple Views with plotly

Coordinated Multiple Views with plotly

Interactive Data Table: DT package

Linked brushing: crosstalk method

Reference

Citation