keeping with the themes of my previous data projects when I was left to my own devices, we are going to look at the pokemon datasets and uncover trends and biases!

# first as always let's bring in the libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(ggbeeswarm)
## Warning: package 'ggbeeswarm' was built under R version 4.4.3
library(ggridges)
## Warning: package 'ggridges' was built under R version 4.4.3
library(waffle)
## Warning: package 'waffle' was built under R version 4.4.3
library(tidytuesdayR)
## Warning: package 'tidytuesdayR' was built under R version 4.4.3
library(reshape2)
## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(plot3D)
## Warning: package 'plot3D' was built under R version 4.4.3
library(plotly)
## Warning: package 'plotly' was built under R version 4.4.3
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
# and set the theme

theme_set(theme_bw(base_size = 17))
# now we will bring in the dataset itself which we will be getting from the tidy Tuesday website

# use the tidytuesday function

tuesdata <- tidytuesdayR::tt_load('2025-04-01')
## ---- Compiling #TidyTuesday Information for 2025-04-01 ----
## --- There is 1 file available ---
## 
## 
## ── Downloading files ───────────────────────────────────────────────────────────
## 
##   1 of 1: "pokemon_df.csv"
# and this will pull out as a dataframe

pokemon_df <- tuesdata$pokemon_df

# shorten it for fun

pdf <- pokemon_df

# and let's take a lil peek 

head(pdf)
## # A tibble: 6 × 22
##      id pokemon    species_id height weight base_experience type_1 type_2    hp
##   <dbl> <chr>           <dbl>  <dbl>  <dbl>           <dbl> <chr>  <chr>  <dbl>
## 1     1 bulbasaur           1    0.7    6.9              64 grass  poison    45
## 2     2 ivysaur             2    1     13               142 grass  poison    60
## 3     3 venusaur            3    2    100               236 grass  poison    80
## 4     4 charmander          4    0.6    8.5              62 fire   <NA>      39
## 5     5 charmeleon          5    1.1   19               142 fire   <NA>      58
## 6     6 charizard           6    1.7   90.5             240 fire   flying    78
## # ℹ 13 more variables: attack <dbl>, defense <dbl>, special_attack <dbl>,
## #   special_defense <dbl>, speed <dbl>, color_1 <chr>, color_2 <chr>,
## #   color_f <chr>, egg_group_1 <chr>, egg_group_2 <chr>, url_icon <chr>,
## #   generation_id <dbl>, url_image <chr>
# and find out the variables we have

colnames(pdf)
##  [1] "id"              "pokemon"         "species_id"      "height"         
##  [5] "weight"          "base_experience" "type_1"          "type_2"         
##  [9] "hp"              "attack"          "defense"         "special_attack" 
## [13] "special_defense" "speed"           "color_1"         "color_2"        
## [17] "color_f"         "egg_group_1"     "egg_group_2"     "url_icon"       
## [21] "generation_id"   "url_image"

From here on we’ll run through a couple of visualizations that will help us see the diversity of pokemon and maybe even some interesting trends

# I made a color palette set from the official type symbols. let's bring that in!

pokepal <- read.csv(file = "../pokemon_col_palette.csv",header = TRUE)

pokepal <- pokepal %>% arrange(Type)

col <- unique(pokepal$Colorhex)

# and a palette for the OG types

pokepal2 <- read.csv(file = "../pokemon_col_palette_OG.csv",header = TRUE)

pokepal2 <- pokepal2 %>% arrange(Type)

col2 <- unique(pokepal2$Colorhex)

this won’t count for the figures but lets get a quick histogram of the different pokemon types

# type_1

ggplot(data = pdf)+aes(x=type_1,fill = type_1)+geom_histogram(stat = "count")+scale_fill_manual(values = col)+theme(axis.text.x = element_blank())
## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`

# type_2

ggplot(data = pdf)+aes(x=type_2,fill = type_2)+geom_histogram(stat = "count")+scale_fill_manual(values = col)+theme(axis.text.x = element_blank())
## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`

Interesting, so we can see there are some clear favorites as far as types, the great majority are water types as their type 1 followed by normal. What is interesting is that there are very few that have flying as their primary type but it is the most common secondary type!

Alrighty let’s make some hot plots!

First let’s see how the density of types changes over generations!

# first just to make things a little more even we are only going to work with the OG types which are

# Normal, Fire, Water, Electric, Grass, Ice, Fighting, Poison, Ground, Flying, Psychic, Bug, Rock, Ghost, Dragon

# and we are going to exclude flying since there are no pure flyers

#make a list of the Original types

OG <- c("normal","fire","water","electric","grass","ice","fighting","poison","ground","psychic","bug","rock","ghost","dragon")

#and filter anything out that is not that type

pdf.f <- pdf %>% filter(type_1 %in% OG)

# And for the sake of this we'll only be using the type 1 data

ggplot(data = pdf.f)+aes(x=generation_id,y=type_1,fill = type_1)+ggridges::geom_density_ridges(alpha=.3)+ggridges::theme_ridges()+scale_fill_manual(values = col2)+scale_x_continuous(breaks=seq(1,7, 1))
## Picking joint bandwidth of 0.81
## Warning: Removed 131 rows containing non-finite outside the scale range
## (`stat_density_ridges()`).

Cool stuff, now we swarm!!!

#let's check attack stats

ggplot(data = pdf.f)+aes(x=type_1,y=attack,color=type_1)+ggbeeswarm::geom_beeswarm(method = "center",size=2)+scale_color_manual(values = col2)
## Warning: In `position_beeswarm`, method `center` discretizes the data axis (a.k.a the
## continuous or non-grouped axis).
## This may result in changes to the position of the points along that axis,
## proportional to the value of `cex`.
## This warning is displayed once per session.

# and defense

ggplot(data = pdf.f)+aes(x=type_1,y=defense,color=type_1)+ggbeeswarm::geom_beeswarm(method = "center",size=2)+scale_color_manual(values = col2)

# and hp because why not

ggplot(data = pdf.f)+aes(x=type_1,y=hp,color=type_1)+ggbeeswarm::geom_beeswarm(method = "center",size=2)+scale_color_manual(values = col2)

So we see there is actually a pretty good spread of all the stats and its relatively equal for each of the types. But this makes me wonder, is there a consistent sweet spot in regards to attack and defense? perhaps a 2D density plot will have the answer!

ggplot(pdf, aes(x=attack, y=defense) ) +
  geom_hex(bins = 25) +
  scale_fill_continuous(type = "viridis") +
  theme_bw()

Perhaps as expected, there seems to be a bit of a liner relationshipin the preference of ratios in that the majority are around balanced i.e. defense and attack are close to each other. This might be because the majority of pokemon are the first evolution and when they get strong (evolve) they become more specialized and attack or defense might become more dominant.

# let's put it into 3D with all the three primary stats
x <- pdf$attack
y <- pdf$defense
z <- pdf$hp

scatter3D(x, y, z, phi = 0, bty = "g",
        pch = 20, cex = 2, ticktype = "detailed")

plot_ly(pdf, x = ~attack, y = ~defense, z = ~hp, color = ~type_1)
## No trace type specified:
##   Based on info supplied, a 'scatter3d' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter3d
## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
# nevermind thats pretty unuseful
### alright now for the question that we all want to know. numerically is there a better pokemon type? or if nothing else which type is best for which stat? For this we will turn to pie chart and look at the stats

# first we will just

#make a list of the new school types without flying. Sorry flying

NS <- c("normal","fire","water","electric","grass","ice","fighting","poison","ground","psychic","bug","rock","ghost","dragon","dark","fairy","steel")

#and filter anything out that is not that type

pdf.ns <- pdf %>% filter(type_1 %in% NS)

# now we want just the stat columns and the type

pdf.nsf <- pdf.ns %>% select(id,type_1,hp,attack,defense,special_attack,special_defense,speed)

#and we will melt it so it will work better in ggplot

pdf.m <- melt(data = pdf.nsf,id=c("id","type_1"))

# to the lab Kronk! IT'S TIME TO MUTATE!!!! 

pdf.mg <- pdf.m %>% group_by(type_1,variable)

pdf.mgs <- pdf.mg %>% summarise(mean_stat=mean(value))
## `summarise()` has grouped output by 'type_1'. You can override using the
## `.groups` argument.
p <- pdf.mgs

ggplot(p, aes(x=type_1,y=mean_stat,fill=variable)) + 
          geom_bar(stat="identity",width=2) + 
          coord_polar(theta='y')+facet_wrap(type_1~.,scales = "free")+scale_fill_manual(values = c("#264653","#2a9d8f","#8ab17d","#e9c46a","#f4a261","#e76f51"))

# that's actually not quite helpful. As the criticism says, it is pretty hard to really judge the areas and compare them

ggplot(p, aes(x=type_1,y=mean_stat,fill = variable))+geom_bar(position="fill", stat="identity")+coord_flip()+scale_fill_manual(values = c("#264653","#2a9d8f","#8ab17d","#e9c46a","#f4a261","#e76f51"))

ggplot(p, aes(x=type_1,y=mean_stat,fill = variable))+geom_bar(position="stack", stat="identity")+coord_flip()+scale_fill_manual(values = c("#264653","#2a9d8f","#8ab17d","#e9c46a","#f4a261","#e76f51"))

AHA! I knew it! So when we break down the stats by percent they appear to be decently equal. However! When we stack them as is to add up to a total we see there are clear winners if we think of the total amount of points that can be alloted to each stat. Dragons, are the favored type in that regard in that overall more points are attributed to each of their stats. So statistically speaking if your team is entirely dragon type you on average have a higher base stat, which if you ask me is essentially cheating/choosing easy mode. Meanwhile if you want to have a challenge apparently you should have an all bug team.

# one last thing, we