keeping with the themes of my previous data projects when I was left to my own devices, we are going to look at the pokemon datasets and uncover trends and biases!
# first as always let's bring in the libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(ggbeeswarm)
## Warning: package 'ggbeeswarm' was built under R version 4.4.3
library(ggridges)
## Warning: package 'ggridges' was built under R version 4.4.3
library(waffle)
## Warning: package 'waffle' was built under R version 4.4.3
library(tidytuesdayR)
## Warning: package 'tidytuesdayR' was built under R version 4.4.3
library(reshape2)
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
library(plot3D)
## Warning: package 'plot3D' was built under R version 4.4.3
library(plotly)
## Warning: package 'plotly' was built under R version 4.4.3
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
# and set the theme
theme_set(theme_bw(base_size = 17))
# now we will bring in the dataset itself which we will be getting from the tidy Tuesday website
# use the tidytuesday function
tuesdata <- tidytuesdayR::tt_load('2025-04-01')
## ---- Compiling #TidyTuesday Information for 2025-04-01 ----
## --- There is 1 file available ---
##
##
## ── Downloading files ───────────────────────────────────────────────────────────
##
## 1 of 1: "pokemon_df.csv"
# and this will pull out as a dataframe
pokemon_df <- tuesdata$pokemon_df
# shorten it for fun
pdf <- pokemon_df
# and let's take a lil peek
head(pdf)
## # A tibble: 6 × 22
## id pokemon species_id height weight base_experience type_1 type_2 hp
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
## 1 1 bulbasaur 1 0.7 6.9 64 grass poison 45
## 2 2 ivysaur 2 1 13 142 grass poison 60
## 3 3 venusaur 3 2 100 236 grass poison 80
## 4 4 charmander 4 0.6 8.5 62 fire <NA> 39
## 5 5 charmeleon 5 1.1 19 142 fire <NA> 58
## 6 6 charizard 6 1.7 90.5 240 fire flying 78
## # ℹ 13 more variables: attack <dbl>, defense <dbl>, special_attack <dbl>,
## # special_defense <dbl>, speed <dbl>, color_1 <chr>, color_2 <chr>,
## # color_f <chr>, egg_group_1 <chr>, egg_group_2 <chr>, url_icon <chr>,
## # generation_id <dbl>, url_image <chr>
# and find out the variables we have
colnames(pdf)
## [1] "id" "pokemon" "species_id" "height"
## [5] "weight" "base_experience" "type_1" "type_2"
## [9] "hp" "attack" "defense" "special_attack"
## [13] "special_defense" "speed" "color_1" "color_2"
## [17] "color_f" "egg_group_1" "egg_group_2" "url_icon"
## [21] "generation_id" "url_image"
From here on we’ll run through a couple of visualizations that will help us see the diversity of pokemon and maybe even some interesting trends
# I made a color palette set from the official type symbols. let's bring that in!
pokepal <- read.csv(file = "../pokemon_col_palette.csv",header = TRUE)
pokepal <- pokepal %>% arrange(Type)
col <- unique(pokepal$Colorhex)
# and a palette for the OG types
pokepal2 <- read.csv(file = "../pokemon_col_palette_OG.csv",header = TRUE)
pokepal2 <- pokepal2 %>% arrange(Type)
col2 <- unique(pokepal2$Colorhex)
this won’t count for the figures but lets get a quick histogram of the different pokemon types
# type_1
ggplot(data = pdf)+aes(x=type_1,fill = type_1)+geom_histogram(stat = "count")+scale_fill_manual(values = col)+theme(axis.text.x = element_blank())
## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`
# type_2
ggplot(data = pdf)+aes(x=type_2,fill = type_2)+geom_histogram(stat = "count")+scale_fill_manual(values = col)+theme(axis.text.x = element_blank())
## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`
Interesting, so we can see there are some clear favorites as far as types, the great majority are water types as their type 1 followed by normal. What is interesting is that there are very few that have flying as their primary type but it is the most common secondary type!
Alrighty let’s make some hot plots!
First let’s see how the density of types changes over generations!
# first just to make things a little more even we are only going to work with the OG types which are
# Normal, Fire, Water, Electric, Grass, Ice, Fighting, Poison, Ground, Flying, Psychic, Bug, Rock, Ghost, Dragon
# and we are going to exclude flying since there are no pure flyers
#make a list of the Original types
OG <- c("normal","fire","water","electric","grass","ice","fighting","poison","ground","psychic","bug","rock","ghost","dragon")
#and filter anything out that is not that type
pdf.f <- pdf %>% filter(type_1 %in% OG)
# And for the sake of this we'll only be using the type 1 data
ggplot(data = pdf.f)+aes(x=generation_id,y=type_1,fill = type_1)+ggridges::geom_density_ridges(alpha=.3)+ggridges::theme_ridges()+scale_fill_manual(values = col2)+scale_x_continuous(breaks=seq(1,7, 1))
## Picking joint bandwidth of 0.81
## Warning: Removed 131 rows containing non-finite outside the scale range
## (`stat_density_ridges()`).
Cool stuff, now we swarm!!!
#let's check attack stats
ggplot(data = pdf.f)+aes(x=type_1,y=attack,color=type_1)+ggbeeswarm::geom_beeswarm(method = "center",size=2)+scale_color_manual(values = col2)
## Warning: In `position_beeswarm`, method `center` discretizes the data axis (a.k.a the
## continuous or non-grouped axis).
## This may result in changes to the position of the points along that axis,
## proportional to the value of `cex`.
## This warning is displayed once per session.
# and defense
ggplot(data = pdf.f)+aes(x=type_1,y=defense,color=type_1)+ggbeeswarm::geom_beeswarm(method = "center",size=2)+scale_color_manual(values = col2)
# and hp because why not
ggplot(data = pdf.f)+aes(x=type_1,y=hp,color=type_1)+ggbeeswarm::geom_beeswarm(method = "center",size=2)+scale_color_manual(values = col2)
So we see there is actually a pretty good spread of all the stats and
its relatively equal for each of the types. But this makes me wonder, is
there a consistent sweet spot in regards to attack and defense? perhaps
a 2D density plot will have the answer!
ggplot(pdf, aes(x=attack, y=defense) ) +
geom_hex(bins = 25) +
scale_fill_continuous(type = "viridis") +
theme_bw()
Perhaps as expected, there seems to be a bit of a liner relationshipin
the preference of ratios in that the majority are around balanced
i.e. defense and attack are close to each other. This might be because
the majority of pokemon are the first evolution and when they get strong
(evolve) they become more specialized and attack or defense might become
more dominant.
# let's put it into 3D with all the three primary stats
x <- pdf$attack
y <- pdf$defense
z <- pdf$hp
scatter3D(x, y, z, phi = 0, bty = "g",
pch = 20, cex = 2, ticktype = "detailed")
plot_ly(pdf, x = ~attack, y = ~defense, z = ~hp, color = ~type_1)
## No trace type specified:
## Based on info supplied, a 'scatter3d' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#scatter3d
## No scatter3d mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
# nevermind thats pretty unuseful
### alright now for the question that we all want to know. numerically is there a better pokemon type? or if nothing else which type is best for which stat? For this we will turn to pie chart and look at the stats
# first we will just
#make a list of the new school types without flying. Sorry flying
NS <- c("normal","fire","water","electric","grass","ice","fighting","poison","ground","psychic","bug","rock","ghost","dragon","dark","fairy","steel")
#and filter anything out that is not that type
pdf.ns <- pdf %>% filter(type_1 %in% NS)
# now we want just the stat columns and the type
pdf.nsf <- pdf.ns %>% select(id,type_1,hp,attack,defense,special_attack,special_defense,speed)
#and we will melt it so it will work better in ggplot
pdf.m <- melt(data = pdf.nsf,id=c("id","type_1"))
# to the lab Kronk! IT'S TIME TO MUTATE!!!!
pdf.mg <- pdf.m %>% group_by(type_1,variable)
pdf.mgs <- pdf.mg %>% summarise(mean_stat=mean(value))
## `summarise()` has grouped output by 'type_1'. You can override using the
## `.groups` argument.
p <- pdf.mgs
ggplot(p, aes(x=type_1,y=mean_stat,fill=variable)) +
geom_bar(stat="identity",width=2) +
coord_polar(theta='y')+facet_wrap(type_1~.,scales = "free")+scale_fill_manual(values = c("#264653","#2a9d8f","#8ab17d","#e9c46a","#f4a261","#e76f51"))
# that's actually not quite helpful. As the criticism says, it is pretty hard to really judge the areas and compare them
ggplot(p, aes(x=type_1,y=mean_stat,fill = variable))+geom_bar(position="fill", stat="identity")+coord_flip()+scale_fill_manual(values = c("#264653","#2a9d8f","#8ab17d","#e9c46a","#f4a261","#e76f51"))
ggplot(p, aes(x=type_1,y=mean_stat,fill = variable))+geom_bar(position="stack", stat="identity")+coord_flip()+scale_fill_manual(values = c("#264653","#2a9d8f","#8ab17d","#e9c46a","#f4a261","#e76f51"))
AHA! I knew it! So when we break down the stats by percent they appear to be decently equal. However! When we stack them as is to add up to a total we see there are clear winners if we think of the total amount of points that can be alloted to each stat. Dragons, are the favored type in that regard in that overall more points are attributed to each of their stats. So statistically speaking if your team is entirely dragon type you on average have a higher base stat, which if you ask me is essentially cheating/choosing easy mode. Meanwhile if you want to have a challenge apparently you should have an all bug team.
# one last thing, we