---
title: Creating swimmer plots with ease 
author: Jessica Weiss
output:
  html_document:
    mathjax:  default
    fig_caption:  true
    toc: true
    section_numbering: true
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{Creating swimmers plots with ease}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(swimplot)
library(ggplot2)
```

# Introduction to swimmers plots

A swimmer plot is a graphical tool used to display individual trajectories over time.

A swimmer plot is able to tell a full story using horizontal bars to represent each subject (or study unit), while lines, points, and arrows are utilized to display additional information.

The "swimmer" package has a variety of functions which add layers to a swimmer plot by implementing ggplot functions. 

This vignette goes through examples to create swimmers plots, and demonstrates converting a dataframe to the required format.

# A working example 

## The Data and research question 

- This is a clinical trial of 36 patients in which patients are randomized to one of two treatment arms, at 5 months patients are intended to switch arms, for each patient the adverse events and response information is recorded. The data is stored in three dataframes, `ClinicalTrial.Arm`, `ClinicalTrial.AE`, and `ClinicalTrial.Response`

\tiny
```{r, echo = TRUE,fig.align='centre'}
knitr::kable(head(ClinicalTrial.Arm,10))
knitr::kable(head(ClinicalTrial.AE,10))
knitr::kable(head(ClinicalTrial.Response,10))
```


##  Basic plot
- The `swimmer_plot()` function creates the base of the swimmer plot
- The required arguments are a dataframe,  an id column name, and the column name of where the bars end
- By default the bars are in increasing order, but any order can be specified
- A column name for the fill, transparency and colour (outline of the bars) can also be included
- Individual bars can change colour/transparency over time
- Other aesthetics can be manipulated using `geom_bar()` arguments (eg. fill,width, alpha)  




```{r, echo = TRUE,fig.align='centre'}
swimmer_plot(df=ClinicalTrial.Arm,id='id',end='End_trt',fill='lightblue',width=.85)
```

##  Modifying the order and colours of the bars

The `swimmer_plot()` function includes the option to have the bars change colours. Each section of the bars should be in a different row, where each row includes the time that section ends. By default the bars are plotted in increasing order, a column name can be used in the argument id_order to have the bars sorted first by a column, or string of IDs can be specified to have the bars in a specific  order. Here the bars are ordered by the starting treatment, and follow up time. 
\tiny

```{r, echo = TRUE,fig.align='centre'}
arm_plot <- swimmer_plot(df=ClinicalTrial.Arm,id='id',end='End_trt',name_fill='Arm',
                         id_order='Arm',col="black",alpha=0.75,width=.8)

arm_plot
```

## Stratification 

Plots can be stratified by any variables in the dataframe

```{r, echo = TRUE,fig.align='centre'}
swim_plot_stratify <-swimmer_plot(df=ClinicalTrial.Arm,id='id',end='End_trt',name_fill='Arm',
col="black",alpha=0.75,width=.8,base_size = 14,stratify= c('Age','Sex'))

swim_plot_stratify
```


## Adding points 
- Points are added with the `swimmer_points()` function
- The required arguments are a dataframe,  an id column name, and the column name of the point's location 
- The shape, size, fill, stroke, and transparency can all be mapped to columns 
- The argument adj.y can be used to adjust the height position of points withing a bar
- Other aesthetics can be manipulated using `geom_point()` arguments  

```{r, echo = TRUE,fig.align='centre'}
AE_plot <- arm_plot + swimmer_points(df_points=
 ClinicalTrial.AE,id='id',time='time',name_shape =
 'event',size=2.5,fill='white',col='black')
AE_plot
```

Multiple aesthetics can be mapped to different columns 
```{r, echo = TRUE,fig.align='centre'}
arm_plot + swimmer_points(df_points=
 ClinicalTrial.AE,id='id',time='time',name_shape =
 'event',size=2.5,fill='white',name_col = 'Related')

```


## Adding lines 
- Lines are added with the `swimmer_lines()` function
- The required arguments are a dataframe,  an id column name, and the column names of the line's start and end locations
- The linetype, colour, size, and transparency can all be mapped to columns 
- The argument adj.y can be used to adjust the height position of lines withing a bar
- Other aesthetics can be manipulated using `geom_segment()` arguments   

```{r, echo = TRUE,fig.align='centre'}
Response_plot <- arm_plot +
swimmer_lines(df_lines=ClinicalTrial.Response,id='id',start =
'Response_start',end='Response_end',name_col='Response',size=1)

Response_plot
```

## Adding lines and points together
- The function `swimmer_points_from_lines()` adds points to a plot at the start and end of each line
- The required arguments are the same as `swimmer_lines()`
- An additional argument "cont" can be used to specify lines which do not end
- Other aesthetics can be manipulated using `geom_point()` arguments  

```{r, echo = TRUE,fig.align='centre'}
Response_plot_with_points <- Response_plot+
swimmer_points_from_lines(df_lines=ClinicalTrial.Response,id='id',start =
'Response_start',end = 'Response_end', cont =
'Continued_response',name_col='Response',size=2)

Response_plot_with_points
```

## Adding arrows
- Arrows are added at the end of bars with the `swimmer_arrows()` function
- The required arguments are a dataframe,  an id column name, and the column names of the location the arrows begin
- An additional argument "cont" can be used if only some bars will have an arrow 
- The colour can be mapped to a column 
- Features of the arrows which can be modified include the size, length and type of arrow
- Other aesthetics can be manipulated using `geom_segment()` arguments

The example below uses arrows to demonstrate patients remaining on treatment after the end of follow up

```{r, echo = TRUE,fig.align='centre'}
AE_plot+
swimmer_arrows(df_arrows=ClinicalTrial.Arm,id='id',arrow_start='End_trt',
cont = 'Continued_treatment',name_col='Arm',type =
 "open",cex=1)
```


Since none of the patients continue on "Off treatment" the arrow colours do not match the bars, this can be fixed by adding the layer `scale_color_discrete(drop=FALSE)`, the option show.legend = FALSE can also be employed as the arrow legend is not necessary

```{r, echo = TRUE,fig.align='centre'}
AE_plot <- AE_plot+
swimmer_arrows(df_arrows=ClinicalTrial.Arm,id='id',arrow_start='End_trt',
cont = 'Continued_treatment',name_col='Arm',show.legend = FALSE,type =
 "open",cex=1) + scale_color_discrete(drop=FALSE)

AE_plot
```

Another arrow example, here the arrows are also used to demonstrate a continued treatment
```{r, echo = TRUE,fig.align='centre'}
Response_plot_with_points <- Response_plot_with_points+
 swimmer_arrows(df_arrows=ClinicalTrial.Response,id='id',arrow_start='Response_end',
 cont = 'Continued_response',name_col='Response',show.legend = FALSE,type =
 "open",cex=1)

Response_plot_with_points
```


## Making the plots more aesthetically pleasing with ggplot manipulations

### Modifying Colours and shapes
\tiny
```{r col1 , echo=T,warnings=FALSE,message=FALSE}
AE_plot <-  AE_plot +
  scale_fill_manual(name="Treatment",values=c("Arm A" = "#e41a1c", "Arm B"="#377eb8","Off Treatment"='#4daf4a'))+
  scale_color_manual(name="Treatment",values=c("Arm A"="#e41a1c", "Arm B" ="#377eb8","Off Treatment"='#4daf4a')) +
  scale_shape_manual(name="Adverse event",values=c(AE=21,SAE=24,Death=17),breaks=c('AE','SAE','Death'))

AE_plot
``` 

\tiny
```{r col2 , echo=T,warnings=F,warnings=FALSE,warnings=FALSE,message=FALSE}
Response_plot_with_points <- Response_plot_with_points +
  scale_fill_manual(name="Treatment",values=c("Arm A" ="#e41a1c", "Arm B"="#377eb8","Off Treatment"='#4daf4a'))+
  scale_color_manual(name="Response",values=c("grey20","grey80"))+
  scale_shape_manual(name='',values=c(17,15),breaks=c('Response_start','Response_end'),
                     labels=c('Response start','Response end'))

Response_plot_with_points
``` 


### Legends
\tiny

Sometimes there will be points within the fill of the legend, this can be turned off with the layer `guides()`
```{r legend2 , echo=T,warnings=F,message=F,warning=FALSE}

Response_plot_with_points <- Response_plot_with_points+guides(fill = guide_legend(override.aes = list(shape = NA)))
Response_plot_with_points

``` 

### Add arrows to the legend 
\tiny
A work around to add arrows to the legend is using the symbol for an arrow within `annotate()`
```{r legend3 ,echo=T, eval=T,warnings=F,message=F,warning=FALSE}

Response_plot_with_points <- Response_plot_with_points+
  annotate("text", x=3.5, y=20.45, label="Continued response",size=3.25)+
  annotate("text",x=2.5, y=20.25, label=sprintf('\u2192'),size=8.25)+
  coord_flip(clip = 'off', ylim = c(0, 17))
Response_plot_with_points

``` 

### axis 

\tiny
The swimmer plot is a bar plot that has been turned on its side, so to modify the x axis it is actually required to change the y axis. This is also the case for adding axis labels 
```{r axis2 , echo=T,warnings=FALSE,messgages=FALSE}

Response_plot_with_points +  scale_y_continuous(name = "Time since enrollment (months)",breaks = seq(0,18,by=3))

``` 

### Formatting the legend when an aesthetic is mapped in multiple layers 

Sometimes multiple layers of the swimmers plot will include the same aesthetic the plot below uses "fill" with both the points and with the bars. Using guides, and override.aes the legends can be manipulated to divide the layers in the legend

```{r Legend with multiple , echo=T,warnings=FALSE,messgages=FALSE}

#Overriding legends to have colours for the events and no points in the lines
p1 <- arm_plot + swimmer_points(df_points=ClinicalTrial.AE,id='id',time='time',name_shape =
                                       'event',size=2.5,col='black',name_fill = 'event') +
  scale_shape_manual(values=c(21,22,23),breaks=c('AE','SAE','Death'))
  

p1 +scale_fill_manual(name="Treatment",values=c("AE"='grey90',"SAE" ="grey40","Death" =1,"Arm A"="#e41a1c", "Arm B" ="#377eb8","Off Treatment"="#4daf4a"))
```


This plot legend is difficult to follow 

However, by removing the AE fills from the legend, and adding them to the points it is much easier to follow the plot
```{r Legend with multiple2 , echo=T,warnings=FALSE,messgages=FALSE}
#First step is to correct the fill legend 

p2 <- p1 + scale_fill_manual(name="Treatment",values=c("AE"='grey90',"SAE" ="grey40","Death" =1,"Arm A"="#e41a1c", "Arm B" ="#377eb8","Off Treatment"="#4daf4a"),breaks = c("Arm A","Arm B","Off Treatment"))
p2
##Then use guides to add the colours to the 

#Setting the colours of the filled points to match the AE type 
p2 + guides(shape = guide_legend(override.aes = list(fill=c('grey90','grey40',1))),fill = guide_legend(override.aes = list(shape = NA))) 

``` 

# When there gaps between sections in a single bar


There may be situations where you want to include gaps between sections of colours in a single bar, or have bars that do not start  at time zero.
```{r, echo=T,warnings=FALSE,messgages=FALSE}

Gap_data <- data.frame(patient_ID=c('ID:3','ID:1','ID:1','ID:1','ID:2',
                                    'ID:2','ID:2','ID:3','ID:3'),
                       start=c(10,1,2,7,2,10,14,5,0),
                       end=c(20,2,4,10,7,14,22,7,3),
                       treatment=c("A","B","C","A","A","C","A","B","C"))

knitr::kable(Gap_data)

```

When a start and end are specified any spaces in between are filled in with a section of "NA"
```{r, echo=T,warnings=FALSE,messgages=FALSE}

swimmer_plot(df=Gap_data,id='patient_ID',name_fill="treatment",col=1,
id_order = c('ID:1','ID:2','ID:3')) +theme_bw()
```

Additional "NA" information can be added to the end of a bar when the colour variables is NA
```{r, echo=T,warnings=FALSE,messgages=FALSE}

Gap_data <- rbind(Gap_data,data.frame(patient_ID='ID:2',start=22,end=26,treatment=NA))
knitr::kable(Gap_data)
```

scale_fill_manual can be used to have the NA sections filled in transparently with the argument na.value=NA 

```{r, echo=T,warnings=FALSE,messgages=FALSE}
swimmer_plot(df=Gap_data,id='patient_ID',name_fill="treatment",col=1,
id_order = c('ID:1','ID:2','ID:3')) +
ggplot2::theme_bw()+ggplot2::scale_fill_manual(name="Treatment",
 values=c("A"="#e41a1c", "B"="#377eb8","C"="#4daf4a",na.value=NA),breaks=c("A","B","C"))+
  ggplot2::scale_y_continuous(breaks=c(0:26))
```


 

# Formatting data 

For all of the function to run, the data must be in the long format. This means that each event must be on a new row. An event would be a single point, a line segment, or an arrow. If a study unit has multiple events occur they must be recorded over multiple rows. Often times data is given in the long format (eg. One row per patient).

## Long data

Here is an example data.frame in the long format.  
```{r, echo=T,warnings=FALSE,messgages=FALSE}

wide_example <- structure(list(ID = c("ID:001", "ID:002", "ID:003"), Date.begin.Treatment = structure(c(14307, 
14126, 15312), class = "Date"), AE = structure(c(16133, 14491, 
NA), class = "Date"), SAE = structure(c(16316, NA, 16042), class = "Date"), 
    Death.date = structure(c(16499, NA, 17869), class = "Date"), 
    Response1 = c("SD", "SD", NA), Response1.Start = structure(c(14745, 
    14345, NA), class = "Date"), Response1.End = structure(c(15111, 
    14418, NA), class = "Date"), Response2 = c("CR", "PR", NA
    ), Response2.Start = structure(c(15768, 14674, NA), class = "Date"), 
    Response2.End = structure(c(16133, 14856, NA), class = "Date"), 
    Response3 = c(NA, "CR", NA), Response3.Start = structure(c(NA, 
    14856, NA), class = "Date"), Response3.End = structure(c(NA, 
    15587, NA), class = "Date"), Last.follow.up = structure(c(16499, 
    17048, 17869), class = "Date")), class = "data.frame", row.names = c(NA, 
-3L))
```

\tiny
```{r, echo=F,warnings=FALSE,messgages=FALSE, fig.width = 8, fig.height = 4.5}
knitr::kable(wide_example)
```

All of the dates need to be converted to time. For each patient the Date.begin.Treatment is the starting point (Time 0)
```{r, echo=TRUE,warnings=FALSE,messgages=FALSE, fig.width = 8, fig.height = 4.5}
date_cols <- c("Date.begin.Treatment","AE","SAE",'Death.date','Response1.Start', 'Response1.End','Response2.Start', 'Response2.End',
               'Response3.Start' ,'Response3.End' ,'Last.follow.up') # Getting the columns with dates
wide_example[date_cols] <- lapply(wide_example[date_cols], as.numeric) # Converting to numbers 
wide_example[date_cols] <- round((wide_example[date_cols]-wide_example$Date.begin.Treatment)/365.25,1) #Calcuating the time in years since the start of treatment
knitr::kable(wide_example)
```

The wide data can be used to create the bars of the swimmer plot
```{r,echo=T,warnings=FALSE,messgages=FALSE}
plot <- swimmer_plot(df=wide_example,id='ID',end='Last.follow.up',col='black',fill='grey')
plot
```

# When there is one column per event type

The `gather_()` function from the `tidyr` package can be used to change data from the wide to long format. When each event type has its only column with the exact time, the function only needs to be run once
```{r,echo=T,warning=FALSE,messgage=FALSE, fig.width = 8, fig.height = 4.5}
library(tidyr)
data_time_points <- wide_example[,c('ID','AE','SAE','Death.date')]
points_long <- gather_(data=data_time_points,"point", "time", 
                       gather_cols=c('AE','SAE','Death.date'),na.rm=T)
knitr::kable(points_long,align='c',row.names = F)
```

The points can now be added to the plot
```{r,echo=T,warning=FALSE,messgage=FALSE, fig.width = 8, fig.height = 4.5}
plot+ swimmer_points(df=points_long,id='ID',name_shape = 'point',size=8)
```

# When there are multiple column per event type

When there are separate columns for the data, and event type it is more complex. In this data the response start, end, and response types are all stored in different columns, but must be kept together per patient and event. 
```{r,warning=FALSE,messgage=FALSE,echo=T, fig.width = 8, fig.height = 4.5}
long_start <- gather_(data=wide_example[,c('ID','Response1.Start','Response2.Start','Response3.Start')],
                      "response_number", "start_time", gather_cols=c('Response1.Start','Response2.Start',
                                                                'Response3.Start'),na.rm=T)

long_start$response_number <- substring(long_start$response_number,1,9) # Will be used to match to the end and types
```

```{r,warning=FALSE,messgage=FALSE,echo=FALSE, fig.width = 8, fig.height = 4.5}
knitr::kable(long_start,align='c',row.names = F)
```

Separate dataframes are created for the end time, and response, then they are all merged together by the id, and response_number
```{r fig.height=4.5,echo=TRUE, fig.width=8, message=FALSE, warning=FALSE}
long_end <- gather_(data=wide_example[,c('ID','Response1.End','Response2.End','Response3.End')],
                    "response_number", "end_time", gather_cols=c('Response1.End','Response2.End',
                                                            'Response3.End'),na.rm=T)
long_end$response_number <- substring(long_end$response_number,1,9)

long_response <- gather_(data=wide_example[,c('ID','Response1','Response2','Response3')],
                         "response_number", "Response", gather_cols=c('Response1','Response2','Response3'),
                         na.rm=T)

long_response_full <- Reduce(function(...) merge(..., all=TRUE,by=c('ID','response_number')), 
                            list(long_start, long_end, long_response))
```

```{r fig.height=4.5,echo=F, fig.width=8, message=FALSE, warning=FALSE}
knitr::kable(long_response_full,align='c',row.names = F)
```


The lines can then be added to the plot
```{r,echo=TRUE,warnings=FALSE,messgages=FALSE, fig.width = 8, fig.height = 4.5}
plot+ 
  swimmer_points(df=points_long,id='ID',name_shape = 'point',size=8)+
  swimmer_lines(df_lines = long_response_full,id='ID',start = 'start_time',end='end_time',
                name_col='Response',size=25)
```