Determine the Goal

I have heard so much about neural nets so I decided to give one a try. This is to be my initial taste, knowing there is so much to learn and explore.

Acquire Some Data

I downloaded publicly accessible powerlifting competition results found here.

Review and Understand

The download comes with an excellent Readme file. The first order of business was to review the time frame and select the years I wanted to work with.

#load libraries
library(dplyr)
library(tidyr)
library(lubridate)
library(stringr)
library(ggplot2)
library(forecast)
library(fpp3)
library(randomForest)
library(keras)

#load data
lift_data <- read.csv("openpowerlifting-2024-11-09-128b465c.csv")

#review variable names 
names(lift_data)
##  [1] "Name"             "Sex"              "Event"            "Equipment"       
##  [5] "Age"              "AgeClass"         "BirthYearClass"   "Division"        
##  [9] "BodyweightKg"     "WeightClassKg"    "Squat1Kg"         "Squat2Kg"        
## [13] "Squat3Kg"         "Squat4Kg"         "Best3SquatKg"     "Bench1Kg"        
## [17] "Bench2Kg"         "Bench3Kg"         "Bench4Kg"         "Best3BenchKg"    
## [21] "Deadlift1Kg"      "Deadlift2Kg"      "Deadlift3Kg"      "Deadlift4Kg"     
## [25] "Best3DeadliftKg"  "TotalKg"          "Place"            "Dots"            
## [29] "Wilks"            "Glossbrenner"     "Goodlift"         "Tested"          
## [33] "Country"          "State"            "Federation"       "ParentFederation"
## [37] "Date"             "MeetCountry"      "MeetState"        "MeetTown"        
## [41] "MeetName"         "Sanctioned"
#review data
head(lift_data, 10)
##                 Name Sex Event Equipment  Age AgeClass BirthYearClass
## 1     E.S. Denisenko   F     B       Raw 28.5    24-34          24-39
## 2    I.S. Lebetskaya   F     B       Raw 43.5    40-44          40-49
## 3      K. Yakimovich   F     B       Raw 26.5    24-34          24-39
## 4       A.G. Golneva   F     B       Raw 19.5    20-23          19-23
## 5  E.V. Marunevskaya   F     B       Raw 19.5    20-23          19-23
## 6      D. Gimbitskiy   M     B       Raw 30.5    24-34          24-39
## 7          S. Kozlov   M     B       Raw 40.5    40-44          40-49
## 8    V.O. Shevchenko   M     B       Raw 21.5    20-23          19-23
## 9       M.S. Drozdov   M     B       Raw 23.5    24-34          24-39
## 10      Yu.G. Rimsha   M     B       Raw 35.5    35-39          24-39
##         Division BodyweightKg WeightClassKg Squat1Kg Squat2Kg Squat3Kg Squat4Kg
## 1           Open         67.3                     NA       NA       NA       NA
## 2           Open         73.2                     NA       NA       NA       NA
## 3           Open         60.6                     NA       NA       NA       NA
## 4  Juniors 17-21         50.3                     NA       NA       NA       NA
## 5  Juniors 17-21         63.7                     NA       NA       NA       NA
## 6           Open         82.0          82.5       NA       NA       NA       NA
## 7           Open         79.3          82.5       NA       NA       NA       NA
## 8           Open         80.1          82.5       NA       NA       NA       NA
## 9           Open         88.0            90       NA       NA       NA       NA
## 10          Open         89.2            90       NA       NA       NA       NA
##    Best3SquatKg Bench1Kg Bench2Kg Bench3Kg Bench4Kg Best3BenchKg Deadlift1Kg
## 1            NA    -40.0    -45.0    -45.0       NA           NA          NA
## 2            NA     80.0     85.0     90.0       NA         90.0          NA
## 3            NA     40.0     42.5     45.0       NA         45.0          NA
## 4            NA     32.5     35.0    -37.5       NA         35.0          NA
## 5            NA     40.0     42.5    -45.0       NA         42.5          NA
## 6            NA    127.5    132.5       NA       NA        132.5          NA
## 7            NA    150.0    160.0   -167.5       NA        160.0          NA
## 8            NA     87.5     95.0    -97.5       NA         95.0          NA
## 9            NA    165.0    172.5    177.5       NA        177.5          NA
## 10           NA    160.0    162.5    165.0       NA        165.0          NA
##    Deadlift2Kg Deadlift3Kg Deadlift4Kg Best3DeadliftKg TotalKg Place   Dots
## 1           NA          NA          NA              NA      NA    DQ     NA
## 2           NA          NA          NA              NA    90.0     1  88.80
## 3           NA          NA          NA              NA    45.0     2  49.57
## 4           NA          NA          NA              NA    35.0     2  43.67
## 5           NA          NA          NA              NA    42.5     1  45.40
## 6           NA          NA          NA              NA   132.5     2  90.07
## 7           NA          NA          NA              NA   160.0     1 110.90
## 8           NA          NA          NA              NA    95.0     3  65.46
## 9           NA          NA          NA              NA   177.5     1 116.10
## 10          NA          NA          NA              NA   165.0     3 107.17
##     Wilks Glossbrenner Goodlift Tested Country State  Federation
## 1      NA           NA       NA                      GSF-Belarus
## 2   86.89        76.50    70.06                      GSF-Belarus
## 3   49.79        43.98    38.42                      GSF-Belarus
## 4   44.76        39.73    34.55                      GSF-Belarus
## 5   45.26        39.96    35.25                      GSF-Belarus
## 6   89.09        85.74    66.97                      GSF-Belarus
## 7  109.84       105.88    82.28                      GSF-Belarus
## 8   64.81        62.44    48.60                      GSF-Belarus
## 9  114.65       109.99    86.55                      GSF-Belarus
## 10 105.82       101.46    79.91                      GSF-Belarus
##    ParentFederation       Date MeetCountry MeetState MeetTown        MeetName
## 1                   2019-06-22     Belarus           Luninets Bison Power Cup
## 2                   2019-06-22     Belarus           Luninets Bison Power Cup
## 3                   2019-06-22     Belarus           Luninets Bison Power Cup
## 4                   2019-06-22     Belarus           Luninets Bison Power Cup
## 5                   2019-06-22     Belarus           Luninets Bison Power Cup
## 6                   2019-06-22     Belarus           Luninets Bison Power Cup
## 7                   2019-06-22     Belarus           Luninets Bison Power Cup
## 8                   2019-06-22     Belarus           Luninets Bison Power Cup
## 9                   2019-06-22     Belarus           Luninets Bison Power Cup
## 10                  2019-06-22     Belarus           Luninets Bison Power Cup
##    Sanctioned
## 1         Yes
## 2         Yes
## 3         Yes
## 4         Yes
## 5         Yes
## 6         Yes
## 7         Yes
## 8         Yes
## 9         Yes
## 10        Yes
#date range 
lift_data %>% 
  summarise(start= min(Date),
            end = max(Date))
##        start        end
## 1 1964-09-05 2024-11-03
#exploring the date range to determine appropriate cut off 
lift_data %>% mutate(year = year(lift_data$Date)) %>% group_by(year) %>% count() %>% 
  ggplot(aes(x = year, y = n)) +
  geom_col()

Consider the Correlations

Since this data set does not contain many features, I planned to include them all in the neural net. However, I was curious to see how the features related to the target variable and to each other. For this investigation, I selected the deadlift weight as the target variable.

#narrow down lift_data to records with Best3DeadliftKg present
#select variables for correlations and keep only complete records
#narrow to records 2020 and later
deadlift <- lift_data %>% 
  mutate(Year= year(ymd(lift_data$Date))) %>% 
  select(Name, Sex, Equipment, Age, BodyweightKg, Best3DeadliftKg, Year) %>% 
  filter(complete.cases(.) &
           Year >= 2010) 

#correlation for the continious variables
cor(deadlift[, c('Age','BodyweightKg', 'Best3DeadliftKg', 'Year')])
##                        Age BodyweightKg Best3DeadliftKg          Year
## Age             1.00000000   0.16422671    0.0259555327  0.0103941884
## BodyweightKg    0.16422671   1.00000000    0.6190932243  0.0150590699
## Best3DeadliftKg 0.02595553   0.61909322    1.0000000000 -0.0009767003
## Year            0.01039419   0.01505907   -0.0009767003  1.0000000000
#correlation for the categorial variables
cat_vars <- c("Sex", "Equipment")

# Loop to run ANOVA for each categorical variable
for (var in cat_vars) {
  formula <- as.formula(paste("Best3DeadliftKg ~", var)) 
  anova_result <- aov(formula, data = deadlift)    
  print(summary(anova_result))                
}
##                  Df    Sum Sq   Mean Sq F value Pr(>F)    
## Sex               2 2.171e+09 1.086e+09  479776 <2e-16 ***
## Residuals   1291622 2.922e+09 2.263e+03                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##                  Df   Sum Sq  Mean Sq F value Pr(>F)    
## Equipment         5 8.38e+07 16760652    4321 <2e-16 ***
## Residuals   1291619 5.01e+09     3879                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I also ran a random forest model to see the importance of each feature in a traditional machine learning approach.

#turn categorical variables into factors
deadlift$Sex <- as.factor(deadlift$Sex)
deadlift$Equipment <- as.factor(deadlift$Equipment)

#build RF model
set.seed(555)  # For reproducibility

library(ranger)

rf_model <- ranger(Best3DeadliftKg ~ Sex + Equipment + Age + BodyweightKg, 
                   data = deadlift,
                   num.trees = 100, 
                   importance = "impurity")
## Growing trees.. Progress: 39%. Estimated remaining time: 48 seconds.
## Growing trees.. Progress: 72%. Estimated remaining time: 24 seconds.
## Growing trees.. Progress: 85%. Estimated remaining time: 16 seconds.
## Growing trees.. Progress: 98%. Estimated remaining time: 2 seconds.
# View the model summary
print(rf_model)
## Ranger result
## 
## Call:
##  ranger(Best3DeadliftKg ~ Sex + Equipment + Age + BodyweightKg,      data = deadlift, num.trees = 100, importance = "impurity") 
## 
## Type:                             Regression 
## Number of trees:                  100 
## Sample size:                      1291625 
## Number of independent variables:  4 
## Mtry:                             2 
## Target node size:                 5 
## Variable importance mode:         impurity 
## Splitrule:                        variance 
## OOB prediction error (MSE):       1252.073 
## R squared (OOB):                  0.6824977
importance(rf_model) 
##          Sex    Equipment          Age BodyweightKg 
##   1639893692     36032490    537424583   1364903507

Get Personal

Instead of just running the training and tests sets, I wanted to see what the model would predict for me. What kind of weight should I expect to deadlift if I was a powerlifter given my current age, sex, and weight.

I uploaded my own information to merge with the main dataset before training so that the record’s values could also be normalized with the main dataset. Below, you can see that the record is then set aside when the training and test sets are selected.

#upload personal data points for imputation at last step
personal_data <- read.csv("personal_data_points.csv", 
                          colClasses = c("character", "character", "character", 
                                         "numeric", "numeric",
                                         "numeric", "numeric"))

deadlift <- deadlift %>% bind_rows(personal_data)

Prepare the Data

To run the neural net, I setup dummy categorical variables, normalized the continuous variables, and split the data into training and test sets.

# setup dummy categorical variables
deadlift2 <- model.matrix(~ Sex + Equipment - 1, data = deadlift) %>%
  cbind(deadlift[, c("Age", "BodyweightKg", "Year", "Best3DeadliftKg")]) 

# setup predictors (X) and target (y)
X <- as.matrix(deadlift2[, -ncol(deadlift2)])  #exclude Best3DeadliftKg
y <- deadlift2[, ncol(deadlift2)]

# Normalize continuous variables
normalize <- function(x) (x - min(x)) / (max(x) - min(x))
X <- apply(X, 2, normalize)

# Pull out personal data stored in the last row for prediction later
X_personal <- X[nrow(X), ]

X <-  X[-nrow(X), ]
y <- y[-length(y)]

# Split data into training and testing
set.seed(777)
# 80% for training
train_index <- sample(1:nrow(X), 0.8 * nrow(X))  
X_train <- X[train_index, ]
y_train <- y[train_index]
X_test <- X[-train_index, ]
y_test <- y[-train_index]

Run the Neural Net

# Build a neural network model
model <- keras_model_sequential() %>%
         layer_dense(units = 16, 
                     activation = "relu", 
                     input_shape = ncol(X_train)) %>% 
         layer_dense(units = 8, 
                     activation = "relu") %>% 
         layer_dense(units = 1, 
                     activation = "linear")  

# Compile the model
model %>% compile(loss = "mse",
                  optimizer = "adam",
                  metrics = c("mae"))

#define cut off for stopping when no improvement is seen
early_stop <- callback_early_stopping(
  monitor = "val_loss",  
  patience = 5,           
  restore_best_weights = TRUE  
)

# Train the model
history <- model %>% fit(
  X_train, y_train,
  epochs = 50,
  batch_size = 16,
  validation_split = 0.2,
  callbacks = list(early_stop)
)
## Epoch 1/50
## 51665/51665 - 207s - loss: 1881.0140 - mae: 30.9314 - val_loss: 1407.5021 - val_mae: 27.4503 - 207s/epoch - 4ms/step
## Epoch 2/50
## 51665/51665 - 203s - loss: 1373.0623 - mae: 27.2040 - val_loss: 1372.6271 - val_mae: 27.1520 - 203s/epoch - 4ms/step
## Epoch 3/50
## 51665/51665 - 199s - loss: 1354.7793 - mae: 27.0113 - val_loss: 1357.4866 - val_mae: 26.9783 - 199s/epoch - 4ms/step
## Epoch 4/50
## 51665/51665 - 203s - loss: 1347.4330 - mae: 26.9290 - val_loss: 1354.7152 - val_mae: 26.9279 - 203s/epoch - 4ms/step
## Epoch 5/50
## 51665/51665 - 194s - loss: 1340.3407 - mae: 26.8480 - val_loss: 1360.6180 - val_mae: 27.0675 - 194s/epoch - 4ms/step
## Epoch 6/50
## 51665/51665 - 200s - loss: 1335.5311 - mae: 26.7908 - val_loss: 1347.6490 - val_mae: 26.8313 - 200s/epoch - 4ms/step
## Epoch 7/50
## 51665/51665 - 181s - loss: 1333.6255 - mae: 26.7690 - val_loss: 1340.9502 - val_mae: 26.7594 - 181s/epoch - 4ms/step
## Epoch 8/50
## 51665/51665 - 204s - loss: 1332.5142 - mae: 26.7540 - val_loss: 1339.5787 - val_mae: 26.7634 - 204s/epoch - 4ms/step
## Epoch 9/50
## 51665/51665 - 131s - loss: 1330.6134 - mae: 26.7368 - val_loss: 1338.2775 - val_mae: 26.7411 - 131s/epoch - 3ms/step
## Epoch 10/50
## 51665/51665 - 1539s - loss: 1329.3612 - mae: 26.7171 - val_loss: 1352.1714 - val_mae: 26.8365 - 1539s/epoch - 30ms/step
## Epoch 11/50
## 51665/51665 - 130s - loss: 1327.8395 - mae: 26.7003 - val_loss: 1338.3467 - val_mae: 26.7630 - 130s/epoch - 3ms/step
## Epoch 12/50
## 51665/51665 - 89s - loss: 1326.8043 - mae: 26.6915 - val_loss: 1336.8727 - val_mae: 26.7471 - 89s/epoch - 2ms/step
## Epoch 13/50
## 51665/51665 - 977s - loss: 1324.9185 - mae: 26.6687 - val_loss: 1340.1594 - val_mae: 26.7304 - 977s/epoch - 19ms/step
## Epoch 14/50
## 51665/51665 - 92s - loss: 1322.2379 - mae: 26.6428 - val_loss: 1327.9929 - val_mae: 26.6395 - 92s/epoch - 2ms/step
## Epoch 15/50
## 51665/51665 - 45s - loss: 1318.8793 - mae: 26.5994 - val_loss: 1325.5049 - val_mae: 26.6110 - 45s/epoch - 875us/step
## Epoch 16/50
## 51665/51665 - 44s - loss: 1315.6921 - mae: 26.5631 - val_loss: 1324.1324 - val_mae: 26.6131 - 44s/epoch - 846us/step
## Epoch 17/50
## 51665/51665 - 44s - loss: 1313.2931 - mae: 26.5400 - val_loss: 1321.6257 - val_mae: 26.5682 - 44s/epoch - 849us/step
## Epoch 18/50
## 51665/51665 - 45s - loss: 1311.8528 - mae: 26.5231 - val_loss: 1329.7285 - val_mae: 26.6173 - 45s/epoch - 862us/step
## Epoch 19/50
## 51665/51665 - 5874s - loss: 1311.5126 - mae: 26.5182 - val_loss: 1320.6340 - val_mae: 26.5262 - 5874s/epoch - 114ms/step
## Epoch 20/50
## 51665/51665 - 48s - loss: 1310.8038 - mae: 26.5076 - val_loss: 1317.8607 - val_mae: 26.4978 - 48s/epoch - 936us/step
## Epoch 21/50
## 51665/51665 - 44s - loss: 1310.7500 - mae: 26.5073 - val_loss: 1317.4490 - val_mae: 26.4951 - 44s/epoch - 855us/step
## Epoch 22/50
## 51665/51665 - 44s - loss: 1310.4778 - mae: 26.5079 - val_loss: 1327.6125 - val_mae: 26.6148 - 44s/epoch - 843us/step
## Epoch 23/50
## 51665/51665 - 44s - loss: 1310.0178 - mae: 26.5047 - val_loss: 1322.5145 - val_mae: 26.5446 - 44s/epoch - 843us/step
## Epoch 24/50
## 51665/51665 - 44s - loss: 1310.0438 - mae: 26.5037 - val_loss: 1327.4176 - val_mae: 26.6708 - 44s/epoch - 843us/step
## Epoch 25/50
## 51665/51665 - 44s - loss: 1309.9100 - mae: 26.5019 - val_loss: 1317.3850 - val_mae: 26.5187 - 44s/epoch - 848us/step
## Epoch 26/50
## 51665/51665 - 44s - loss: 1309.9210 - mae: 26.4994 - val_loss: 1317.8644 - val_mae: 26.4880 - 44s/epoch - 858us/step
## Epoch 27/50
## 51665/51665 - 941s - loss: 1309.0642 - mae: 26.4951 - val_loss: 1316.9810 - val_mae: 26.5234 - 941s/epoch - 18ms/step
## Epoch 28/50
## 51665/51665 - 44s - loss: 1307.6307 - mae: 26.4755 - val_loss: 1314.1721 - val_mae: 26.4884 - 44s/epoch - 851us/step
## Epoch 29/50
## 51665/51665 - 44s - loss: 1304.6138 - mae: 26.4362 - val_loss: 1311.2686 - val_mae: 26.4342 - 44s/epoch - 845us/step
## Epoch 30/50
## 51665/51665 - 43s - loss: 1301.9065 - mae: 26.4020 - val_loss: 1309.2960 - val_mae: 26.3983 - 43s/epoch - 839us/step
## Epoch 31/50
## 51665/51665 - 44s - loss: 1301.0229 - mae: 26.3874 - val_loss: 1315.3098 - val_mae: 26.4624 - 44s/epoch - 852us/step
## Epoch 32/50
## 51665/51665 - 43s - loss: 1300.4795 - mae: 26.3868 - val_loss: 1316.8231 - val_mae: 26.4737 - 43s/epoch - 841us/step
## Epoch 33/50
## 51665/51665 - 44s - loss: 1299.4963 - mae: 26.3729 - val_loss: 1310.7443 - val_mae: 26.3930 - 44s/epoch - 856us/step
## Epoch 34/50
## 51665/51665 - 442s - loss: 1298.4257 - mae: 26.3625 - val_loss: 1307.4773 - val_mae: 26.4081 - 442s/epoch - 9ms/step
## Epoch 35/50
## 51665/51665 - 44s - loss: 1297.8495 - mae: 26.3520 - val_loss: 1304.9491 - val_mae: 26.3703 - 44s/epoch - 851us/step
## Epoch 36/50
## 51665/51665 - 44s - loss: 1296.9695 - mae: 26.3485 - val_loss: 1306.3848 - val_mae: 26.3793 - 44s/epoch - 842us/step
## Epoch 37/50
## 51665/51665 - 43s - loss: 1296.6132 - mae: 26.3390 - val_loss: 1306.8502 - val_mae: 26.4174 - 43s/epoch - 836us/step
## Epoch 38/50
## 51665/51665 - 44s - loss: 1296.2799 - mae: 26.3364 - val_loss: 1309.5557 - val_mae: 26.3905 - 44s/epoch - 856us/step
## Epoch 39/50
## 51665/51665 - 44s - loss: 1296.2225 - mae: 26.3359 - val_loss: 1313.7152 - val_mae: 26.4102 - 44s/epoch - 847us/step
## Epoch 40/50
## 51665/51665 - 44s - loss: 1295.6417 - mae: 26.3292 - val_loss: 1306.5245 - val_mae: 26.3719 - 44s/epoch - 843us/step
plot(history)

# Evaluate the model
score <- model %>% evaluate(X_test, y_test)
## 8073/8073 - 4s - loss: 1287.4337 - mae: 26.2446 - 4s/epoch - 459us/step
print(score)
##       loss        mae 
## 1287.43372   26.24462

Predict my Deadlift

X_personal_matrix <- t(as.matrix(X_personal))

my_deadlift <- model %>% predict(X_personal_matrix)
## 1/1 - 0s - 65ms/epoch - 65ms/step
print(paste(round(my_deadlift,0), "kg"))
## [1] "148 kg"
print(paste(round(my_deadlift*2.20462262185,0), "lbs"))
## [1] "326 lbs"

330lbs is no joke. I will clearly need to hit the gym a bit more if I ever wanted to compete, but it is kind of cool to see what is even remotely possible.

Note: This page uses data from the OpenPowerlifting project, https://www.openpowerlifting.org. You may download a copy of the data at https://data.openpowerlifting.org.