Overwatch is a team-based multiplayer first-person shooter with a player base of around 10 million players per month. Each match sees a uniquely chosen set of heroes go head to head in a 6 vs 6 battle. Players can choose from a selection of 32 heroes which are divided among 3 main classes: tank, damage and support. Each team is allowed two heroes from each class.
The composition of heroes on a team changes from game to game based on many variables such as personal preference, opposition hero picks, synergy with other heroes and the skill level of players. Different heroes require different skillsets, with some skillsets taking longer to develop than others.
Overwatch is played competitively with a sophisticated skill rating system with 7 divisions ranging from Bronze (lowest) to Grandmaster (highest). Skill rating (SR) has a broad range from 0 to 5000 and determines a player’s division placement. The formula for determining SR is mostly kept secret but generally players lose or gain SR based on whether they win or lose a game. Due to the sophistication of the SR system an individuals SR, and therefore division, are an accurate representation of their ability in the game.
As previously mentioned, how often specific heroes are picked partly depends on the skill level of players. The current study shall set out to use visualisations to explore the relationship between hero pick rate and skill level (division). The visualisations should be able to identify which specific heroes are picked more in lower ranks than higher ranks and vice versa.
The dataset used in this study was manually scraped from Overbuff.com. It includes data from the 3 months before it was scraped (March 24th) on competitive Overwatch played on PC. The dataset includes all 32 heroes and their average pick rate, win rate, tie rate and on fire rates in each of the 7 divisions. Average pick rates in each of the divisions are the only columns of interest for the present study, meaning the data will need cleaning. In this dataset, pick rate refers to the number of times a hero is picked as a percentage of all the heroes. This means that the sum of the percentage pick rate of the 32 heroes in one division will total 100%.
The way the data is presented by Overbuff makes it difficult to compare hero pick rates across divisions. Each division has to be clicked and separately viewed. This would not be difficult to compare across divisions if there were only 5 heroes. However, as there are 32 heroes it makes it very difficult to see trends of heroes picked more often in different divisions.
#Load the data set
df = read.csv("data/raw/hero_stats_3_months.csv")
#Check the data is as expected
head(df, 3)
## hero bronze_pick_rate bronze_win_rate bronze_tie_rate bronze_on_fire
## 1 Ana 0.0750 0.4706 0.0203 0.0766
## 2 Ashe 0.0243 0.4718 0.0167 0.0612
## 3 Baptiste 0.0322 0.4643 0.0234 0.0977
## silver_pick_rate silver_win_rate silver_tie_rate silver_on_fire
## 1 0.0948 0.4911 0.0203 0.0852
## 2 0.0254 0.4947 0.0180 0.0636
## 3 0.0352 0.4776 0.0223 0.1051
## gold_pick_rate gold_win_rate gold_tie_rate gold_on_fire platinum_pick_rate
## 1 0.1171 0.5022 0.0205 0.0951 0.1296
## 2 0.0256 0.5063 0.0181 0.0692 0.0252
## 3 0.0354 0.4837 0.0229 0.1158 0.0360
## platinum_win_rate platinum_tie_rate platinum_on_fire diamond_pick_rate
## 1 0.5151 0.0204 0.1025 0.1263
## 2 0.5220 0.0194 0.0754 0.0238
## 3 0.4981 0.0232 0.1276 0.0370
## diamond_win_rate diamond_tie_rate diamond_on_fire master_pick_rate
## 1 0.5303 0.0197 0.1004 0.0984
## 2 0.5289 0.0189 0.0758 0.0225
## 3 0.5096 0.0217 0.1309 0.0496
## master_win_rate master_tie_rate master_on_fire grandmaster_pick_rate
## 1 0.5285 0.0193 0.0895 0.0419
## 2 0.5192 0.0207 0.0750 0.0286
## 3 0.5088 0.0207 0.1371 0.0565
## grandmaster_win_rate grandmaster_tie_rate grandmaster_on_fire
## 1 0.5465 0.0182 0.0894
## 2 0.5458 0.0192 0.0817
## 3 0.5438 0.0201 0.1722
The table below highlights the variables that shall be used in the following visualisations and provides an explanation of what each variable is. The pick rates for the divisions are given in a decimal form and represent the amount a hero is picked as a percentage. For example, if a heroes bronze_pick_rate value is 0.0750 this means that hero was picked 7.5% of the time in the Bronze division.
Variable Title | Explanation |
---|---|
hero | The name of the hero |
bronze_pick_rate | The percentage a hero is picked in the Bronze division |
silver_pick_rate | The percentage a hero is picked in the Silver division |
gold_pick_rate | The percentage a hero is picked in the Gold division |
platinum_pick_rate | The percentage a hero is picked in the Platinum division |
master_pick_rate | The percentage a hero is picked in the Master division |
grandmaster_pick_rate | The percentage a hero is picked in the Grandmaster division |
Visualisation 1 sets out to use all the data available for a more accurate image of how hero pick rate varies across the 7 divisions. To do this a stacked bar chart will be used.
A stacked bar chart will show a stand out trend of which heroes are picked more across all divisions. A more detailed study of the visualisation will show the differences in pick rates of each hero across the 7 divisions. Using plotly to create the visualisation (instead of ggplot) allows the use of the hover label feature. This feature is especially useful with stacked bar charts as many bars do not have a 0 starting point making it hard to determine the value that the bar represents.
The dataset for visualisation 1 was ordered by total pick rate. Normally this step would not be needed as plotly has a reorder function that can change the order of the x axis. However, adding traces in a loop overrides this function meaning this step was necessary to be performed in the data cleaning process.
#Transform original dataset to include only the heroes and pick rates for each division
dfp1 <- df %>% select(1, 2, 6, 10, 14, 18, 22, 26)
#Create a variable that is the sum of pick rates across all divisions so that visualisaion 1 can be ordered as desired
dfp1$totalpick <- rowSums(dfp1[,c(2:8)])
#Transform dfp1 to arrange heroes by total pick rate to create the desired order in the visualisation
dfp1 <- dfp1 %>% arrange(desc(totalpick))
#Check the data is as expected
head(dfp1, 5)
## hero bronze_pick_rate silver_pick_rate gold_pick_rate platinum_pick_rate
## 1 Ana 0.0750 0.0948 0.1171 0.1296
## 2 Reinhardt 0.0880 0.1027 0.1100 0.1164
## 3 Zarya 0.0549 0.0729 0.0862 0.0957
## 4 Mercy 0.0958 0.0937 0.0910 0.0782
## 5 McCree 0.0471 0.0562 0.0646 0.0709
## diamond_pick_rate master_pick_rate grandmaster_pick_rate totalpick
## 1 0.1263 0.0984 0.0419 0.6831
## 2 0.1032 0.0778 0.0620 0.6601
## 3 0.0962 0.0852 0.0829 0.5740
## 4 0.0673 0.0699 0.0649 0.5608
## 5 0.0765 0.0711 0.0610 0.4474
#Save processed data
write.csv(dfp1, file = ("data/processed/dfp1.csv"))
#Set the plot space and add the first division
p1 <- plot_ly(dfp1, x = ~hero,
y = ~bronze_pick_rate,
name = "Bronze Pick Rate",
type = "bar",
width = 900,
height = 600,
#Set the colour of the first division and create a black line around each bar
marker = list(color = "#E0B5AC",
line = list(color = "#000000, 0.5)",
width = 1))) %>%
#Set the layout specifications of the visualisation
layout(title = "Hero Pick Rate by Division",
#Set the order of the heroes to be determined by the order they appear in the dataset
xaxis = list(categoryarray = ~hero,
categoryorder = "array",
#Set the x axis title and the angle and size of the labels for each hero
title = "Hero",
tickangle =270),
tickfont = list(size = 10),
#Set the y axis title, scale format, range and type of bar chart
yaxis = list(title = "Pick Rate",
tickformat = "%",
range = c(0, 0.70)),
barmode = "stack")
#Create a value containing the 6 divisions not yet plotted for use in a for loop
divplot <- colnames(dfp1[c(3:8)])
#Create a value for use in the add trace loop to set the colours for each division (colours match the exact hex codes that Overbuff assigned to each division)
colours <- c("#CAD6CC", "#EFEB8B", "#4E98BB", "#D9E8F3", "#EDA152", "#FEE072")
#Create a loop to add a trace for each of the remaining divisions
for(i in divplot){
p1 <- p1 %>% add_trace(x = dfp1[["hero"]],
y = dfp1[[i]],
#Replace "_" with " " and capitalise the division titles
name = gsub(x = i, pattern = "\\_", replacement = " ") %>% str_to_title(i),
#Change the colours of the divisions per the "colours" variable previously created
marker = list(color = colours[[match(i, divplot)]]))
}
#Display p1 visualisation
p1
Visualisation 1 allows comparison of the pick rates across all divisions. However, a limitation of the design is that it is hard to see stand out trends due to the overwhelming amount of data that is being shown. In hindsight, the inclusion of all the data may have been more detrimental than useful in respect of the readability of the visualisation.
Upon inspection of visualisation 1 it seems that the relationship between skill level (division) and pick rate is mostly linear, meaning that no heroes were picked a lot in the middle divisions but barely picked in the highest and lowest divisions. As a result of this, it was thought appropriate to omit the data of the middle divisions and just compare the data from the highest and lowest divisions. This simplifies the visualisation but to further simplify it heroes were ranked by their pick rate in each division and directly compared. A bump chart perfectly facilitates this comparison so it was deemed the best type of chart to use.
It was necessary to create a new dataset as the data for this visualisation needed to be in a long format as opposed to a wide format in order to facilitate the ranking of heroes.
#Transform the original data set to only include the heroes and pick rates for the lowest and highest divisions
dfp2 <- df %>% select(1, 2, 26)
#Pivot the data from wide to long to facilitate the ranking of heroes
dfp2 <- gather(dfp2, division, pick_rate, bronze_pick_rate, grandmaster_pick_rate, factor_key = TRUE)
#Create a new variable that ranks heroes based on their pick rate in each division
dfp2 <- dfp2 %>%
group_by(division) %>%
arrange(division, desc(pick_rate), hero) %>%
mutate(rank = row_number()) %>%
ungroup()
#Check the data is as expected
head(dfp2, 5)
## # A tibble: 5 x 4
## hero division pick_rate rank
## <chr> <fct> <dbl> <int>
## 1 Mercy bronze_pick_rate 0.0958 1
## 2 Reinhardt bronze_pick_rate 0.088 2
## 3 Ana bronze_pick_rate 0.075 3
## 4 Moira bronze_pick_rate 0.0643 4
## 5 Zarya bronze_pick_rate 0.0549 5
#Save processed data
write.csv(dfp2, file = ("data/processed/dfp2.csv"))
#Set the plot space
p2 <- ggplot(dfp2, aes(x = division,
y = rank,
group = hero)) +
#Add a line for each hero and make it slightly transparent for better visability of crossing lines
geom_line(aes(color = hero,
alpha = 1),
size = 2) +
#Add a point for each hero to better clarify the start and end point of each line
geom_point(aes(color = hero,
alpha = 1),
size = 4) +
#Add a white dot in the middle of each point
geom_point(color = "#FFFFFF", size = 1) +
#Create geom_text to facilitate adding heroes names by their Bronze pick rate ranking
geom_text(data = dfp2 %>% filter(division == "bronze_pick_rate"),
#Add heroes names by their Bronze pick rate ranking
aes(label = hero, x = 0.8) , hjust = 0.5,
#Make the font bold, black and a specific size
fontface = "bold", color = "#000000", size = 4) +
#Create geom_text to facilitate adding heroes names by their Grandmaster pick rate ranking
geom_text(data = dfp2 %>% filter(division == "grandmaster_pick_rate"),
#Add heroes names by their Grandmaster pick rate ranking
aes(label = hero, x = 2.2) , hjust = 0.5,
#Make the font bold, black and a specific size
fontface = "bold", color = "#000000", size = 4) +
#Specify tick labels on the x axis
scale_x_discrete(labels = c("Bronze Pick Rate Rank", "Grandmaster Pick Rate Rank")) +
#Reverse the y axis scale to have the top heroes as the most picked and specify the scale to go from 1 to 32 in increments of 1
scale_y_reverse(breaks = 1:32) +
#Specify the size of the plot title and make it bold
theme(plot.title = element_text(size=16, face = "bold"),
#Specify the size of the tick labels on the x axis
axis.text.x = element_text(size=12),
#Specify the size of the axis titles and make them bold
axis.title = element_text(size=14,face = "bold"),
#Change the panelling on the plot to make it easy to compare across the visualisation
panel.grid.major.y = element_blank(),
#Remove the legend
legend.position = "none") +
#Specify the labels of the x and y axis
labs(x = "Division",
y = "Hero Rank",
#Specify the title and subtitle
title = "Overwatch Hero Pick Rate",
subtitle = "Heroes ranked by % pick rate in the lowest and highest divisions (Bronze and Grandmaster) over the last 3 months") +
#Specify the colours of the heroes lines to match the colour associated with each hero in Overwatch
scale_color_manual(values = c("#2178E9", "#67121D", "#3EA2CB", "#6E994D", "#F3B23A", "#FF7FD1", "#DDCCAC", "#04ADEF", "#84FE01", "#938848", "#D39308", "#8BEC22", "#C83C3C", "#9ADBF4", "#FFE16C", "#B3365D", "#BCC84A", "#2178E9", "#A49D86", "#AA958E", "#C19477", "#5DAB81", "#5D81EB", "#A07DC8", "#5CECFF", "#FF6200", "#F8911B", "#7878D4", "#8991A6", "#4D7883", "#F571A8", "#C79C00"))
#Display p2 visualisation
p2
Overall, visualisation 2 makes it far easier to spot trends of heroes that are picked significantly more or less in higher or lower divisions. The steeper the slope of the line, the bigger the difference in pick rate there is from the lowest division to the highest division. Identifying specific heroes is made easier by the choice of colour of the lines. Each hero’s line is represented by the colour that is associated with that specific hero.
Although visualisation 1 was deemed too busy to easily pick out trends it was a necessary step to take to arrive at the final visualisation (#2). Visualisation 1 demonstrated that too much data made the visualisation more difficult to interpret. Visualisation 1 also showed that the relationship between skill level (division) and pick rate was mostly linear meaning that the data for the middle divisions was not necessary to be included in the final visualisation.
The final visualisation is more than adequate for the purpose it was created, to identify trends in hero pick rate across skill level. Multiple heroes can easily be identified as being picked significantly more at higher skill levels. These heroes are identified by the steepest lines going up to the right and include heroes such as Zenyatta, Wrecking Ball, Tracer, Bridgette and Widowmaker. Multiple heroes can easily be identified as being picked significantly more at lower skill levels. These heroes can be identified by the steepest lines going down to the right and include heroes such as Moira, Junkrat, Soldier:76, Orisa, Genji and Pharah.
For a future visualisation it may be interesting to make use of another variable in the dataset which is win rate. Examining the win rate of heroes in relation to their pick rate may provide an interesting insight as to why certain heroes are picked more often. Do the heroes picked more often have higher win rates? If not then this would create a further interesting question as to why the heroes with higher win rates are not picked more often.
The repo for this project (including the markdown file, original data and image files) is available online via github: https://github.com/AlexMontgomery-git/PSY6422_Project