I have heard so much about neural nets so I decided to give one a try. This is to be my initial taste, knowing there is so much to learn and explore.
I downloaded publicly accessible powerlifting competition results found here.
The download comes with an excellent Readme file. The first order of business was to review the time frame and select the years I wanted to work with.
#load libraries
library(dplyr)
library(tidyr)
library(lubridate)
library(stringr)
library(ggplot2)
library(forecast)
library(fpp3)
library(randomForest)
library(keras)
#load data
lift_data <- read.csv("openpowerlifting-2024-11-09-128b465c.csv")
#review variable names
names(lift_data)
## [1] "Name" "Sex" "Event" "Equipment"
## [5] "Age" "AgeClass" "BirthYearClass" "Division"
## [9] "BodyweightKg" "WeightClassKg" "Squat1Kg" "Squat2Kg"
## [13] "Squat3Kg" "Squat4Kg" "Best3SquatKg" "Bench1Kg"
## [17] "Bench2Kg" "Bench3Kg" "Bench4Kg" "Best3BenchKg"
## [21] "Deadlift1Kg" "Deadlift2Kg" "Deadlift3Kg" "Deadlift4Kg"
## [25] "Best3DeadliftKg" "TotalKg" "Place" "Dots"
## [29] "Wilks" "Glossbrenner" "Goodlift" "Tested"
## [33] "Country" "State" "Federation" "ParentFederation"
## [37] "Date" "MeetCountry" "MeetState" "MeetTown"
## [41] "MeetName" "Sanctioned"
#review data
head(lift_data, 10)
## Name Sex Event Equipment Age AgeClass BirthYearClass
## 1 E.S. Denisenko F B Raw 28.5 24-34 24-39
## 2 I.S. Lebetskaya F B Raw 43.5 40-44 40-49
## 3 K. Yakimovich F B Raw 26.5 24-34 24-39
## 4 A.G. Golneva F B Raw 19.5 20-23 19-23
## 5 E.V. Marunevskaya F B Raw 19.5 20-23 19-23
## 6 D. Gimbitskiy M B Raw 30.5 24-34 24-39
## 7 S. Kozlov M B Raw 40.5 40-44 40-49
## 8 V.O. Shevchenko M B Raw 21.5 20-23 19-23
## 9 M.S. Drozdov M B Raw 23.5 24-34 24-39
## 10 Yu.G. Rimsha M B Raw 35.5 35-39 24-39
## Division BodyweightKg WeightClassKg Squat1Kg Squat2Kg Squat3Kg Squat4Kg
## 1 Open 67.3 NA NA NA NA
## 2 Open 73.2 NA NA NA NA
## 3 Open 60.6 NA NA NA NA
## 4 Juniors 17-21 50.3 NA NA NA NA
## 5 Juniors 17-21 63.7 NA NA NA NA
## 6 Open 82.0 82.5 NA NA NA NA
## 7 Open 79.3 82.5 NA NA NA NA
## 8 Open 80.1 82.5 NA NA NA NA
## 9 Open 88.0 90 NA NA NA NA
## 10 Open 89.2 90 NA NA NA NA
## Best3SquatKg Bench1Kg Bench2Kg Bench3Kg Bench4Kg Best3BenchKg Deadlift1Kg
## 1 NA -40.0 -45.0 -45.0 NA NA NA
## 2 NA 80.0 85.0 90.0 NA 90.0 NA
## 3 NA 40.0 42.5 45.0 NA 45.0 NA
## 4 NA 32.5 35.0 -37.5 NA 35.0 NA
## 5 NA 40.0 42.5 -45.0 NA 42.5 NA
## 6 NA 127.5 132.5 NA NA 132.5 NA
## 7 NA 150.0 160.0 -167.5 NA 160.0 NA
## 8 NA 87.5 95.0 -97.5 NA 95.0 NA
## 9 NA 165.0 172.5 177.5 NA 177.5 NA
## 10 NA 160.0 162.5 165.0 NA 165.0 NA
## Deadlift2Kg Deadlift3Kg Deadlift4Kg Best3DeadliftKg TotalKg Place Dots
## 1 NA NA NA NA NA DQ NA
## 2 NA NA NA NA 90.0 1 88.80
## 3 NA NA NA NA 45.0 2 49.57
## 4 NA NA NA NA 35.0 2 43.67
## 5 NA NA NA NA 42.5 1 45.40
## 6 NA NA NA NA 132.5 2 90.07
## 7 NA NA NA NA 160.0 1 110.90
## 8 NA NA NA NA 95.0 3 65.46
## 9 NA NA NA NA 177.5 1 116.10
## 10 NA NA NA NA 165.0 3 107.17
## Wilks Glossbrenner Goodlift Tested Country State Federation
## 1 NA NA NA GSF-Belarus
## 2 86.89 76.50 70.06 GSF-Belarus
## 3 49.79 43.98 38.42 GSF-Belarus
## 4 44.76 39.73 34.55 GSF-Belarus
## 5 45.26 39.96 35.25 GSF-Belarus
## 6 89.09 85.74 66.97 GSF-Belarus
## 7 109.84 105.88 82.28 GSF-Belarus
## 8 64.81 62.44 48.60 GSF-Belarus
## 9 114.65 109.99 86.55 GSF-Belarus
## 10 105.82 101.46 79.91 GSF-Belarus
## ParentFederation Date MeetCountry MeetState MeetTown MeetName
## 1 2019-06-22 Belarus Luninets Bison Power Cup
## 2 2019-06-22 Belarus Luninets Bison Power Cup
## 3 2019-06-22 Belarus Luninets Bison Power Cup
## 4 2019-06-22 Belarus Luninets Bison Power Cup
## 5 2019-06-22 Belarus Luninets Bison Power Cup
## 6 2019-06-22 Belarus Luninets Bison Power Cup
## 7 2019-06-22 Belarus Luninets Bison Power Cup
## 8 2019-06-22 Belarus Luninets Bison Power Cup
## 9 2019-06-22 Belarus Luninets Bison Power Cup
## 10 2019-06-22 Belarus Luninets Bison Power Cup
## Sanctioned
## 1 Yes
## 2 Yes
## 3 Yes
## 4 Yes
## 5 Yes
## 6 Yes
## 7 Yes
## 8 Yes
## 9 Yes
## 10 Yes
#date range
lift_data %>%
summarise(start= min(Date),
end = max(Date))
## start end
## 1 1964-09-05 2024-11-03
#exploring the date range to determine appropriate cut off
lift_data %>% mutate(year = year(lift_data$Date)) %>% group_by(year) %>% count() %>%
ggplot(aes(x = year, y = n)) +
geom_col()
Since this data set does not contain many features, I planned to include them all in the neural net. However, I was curious to see how the features related to the target variable and to each other. For this investigation, I selected the deadlift weight as the target variable.
#narrow down lift_data to records with Best3DeadliftKg present
#select variables for correlations and keep only complete records
#narrow to records 2020 and later
deadlift <- lift_data %>%
mutate(Year= year(ymd(lift_data$Date))) %>%
select(Name, Sex, Equipment, Age, BodyweightKg, Best3DeadliftKg, Year) %>%
filter(complete.cases(.) &
Year >= 2010)
#correlation for the continious variables
cor(deadlift[, c('Age','BodyweightKg', 'Best3DeadliftKg', 'Year')])
## Age BodyweightKg Best3DeadliftKg Year
## Age 1.00000000 0.16422671 0.0259555327 0.0103941884
## BodyweightKg 0.16422671 1.00000000 0.6190932243 0.0150590699
## Best3DeadliftKg 0.02595553 0.61909322 1.0000000000 -0.0009767003
## Year 0.01039419 0.01505907 -0.0009767003 1.0000000000
#correlation for the categorial variables
cat_vars <- c("Sex", "Equipment")
# Loop to run ANOVA for each categorical variable
for (var in cat_vars) {
formula <- as.formula(paste("Best3DeadliftKg ~", var))
anova_result <- aov(formula, data = deadlift)
print(summary(anova_result))
}
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 2.171e+09 1.086e+09 479776 <2e-16 ***
## Residuals 1291622 2.922e+09 2.263e+03
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Equipment 5 8.38e+07 16760652 4321 <2e-16 ***
## Residuals 1291619 5.01e+09 3879
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I also ran a random forest model to see the importance of each feature in a traditional machine learning approach.
#turn categorical variables into factors
deadlift$Sex <- as.factor(deadlift$Sex)
deadlift$Equipment <- as.factor(deadlift$Equipment)
#build RF model
set.seed(555) # For reproducibility
library(ranger)
rf_model <- ranger(Best3DeadliftKg ~ Sex + Equipment + Age + BodyweightKg,
data = deadlift,
num.trees = 100,
importance = "impurity")
## Growing trees.. Progress: 39%. Estimated remaining time: 48 seconds.
## Growing trees.. Progress: 72%. Estimated remaining time: 24 seconds.
## Growing trees.. Progress: 85%. Estimated remaining time: 16 seconds.
## Growing trees.. Progress: 98%. Estimated remaining time: 2 seconds.
# View the model summary
print(rf_model)
## Ranger result
##
## Call:
## ranger(Best3DeadliftKg ~ Sex + Equipment + Age + BodyweightKg, data = deadlift, num.trees = 100, importance = "impurity")
##
## Type: Regression
## Number of trees: 100
## Sample size: 1291625
## Number of independent variables: 4
## Mtry: 2
## Target node size: 5
## Variable importance mode: impurity
## Splitrule: variance
## OOB prediction error (MSE): 1252.073
## R squared (OOB): 0.6824977
importance(rf_model)
## Sex Equipment Age BodyweightKg
## 1639893692 36032490 537424583 1364903507
Instead of just running the training and tests sets, I wanted to see what the model would predict for me. What kind of weight should I expect to deadlift if I was a powerlifter given my current age, sex, and weight.
I uploaded my own information to merge with the main dataset before training so that the record’s values could also be normalized with the main dataset. Below, you can see that the record is then set aside when the training and test sets are selected.
#upload personal data points for imputation at last step
personal_data <- read.csv("personal_data_points.csv",
colClasses = c("character", "character", "character",
"numeric", "numeric",
"numeric", "numeric"))
deadlift <- deadlift %>% bind_rows(personal_data)
To run the neural net, I setup dummy categorical variables, normalized the continuous variables, and split the data into training and test sets.
# setup dummy categorical variables
deadlift2 <- model.matrix(~ Sex + Equipment - 1, data = deadlift) %>%
cbind(deadlift[, c("Age", "BodyweightKg", "Year", "Best3DeadliftKg")])
# setup predictors (X) and target (y)
X <- as.matrix(deadlift2[, -ncol(deadlift2)]) #exclude Best3DeadliftKg
y <- deadlift2[, ncol(deadlift2)]
# Normalize continuous variables
normalize <- function(x) (x - min(x)) / (max(x) - min(x))
X <- apply(X, 2, normalize)
# Pull out personal data stored in the last row for prediction later
X_personal <- X[nrow(X), ]
X <- X[-nrow(X), ]
y <- y[-length(y)]
# Split data into training and testing
set.seed(777)
# 80% for training
train_index <- sample(1:nrow(X), 0.8 * nrow(X))
X_train <- X[train_index, ]
y_train <- y[train_index]
X_test <- X[-train_index, ]
y_test <- y[-train_index]
# Build a neural network model
model <- keras_model_sequential() %>%
layer_dense(units = 16,
activation = "relu",
input_shape = ncol(X_train)) %>%
layer_dense(units = 8,
activation = "relu") %>%
layer_dense(units = 1,
activation = "linear")
# Compile the model
model %>% compile(loss = "mse",
optimizer = "adam",
metrics = c("mae"))
#define cut off for stopping when no improvement is seen
early_stop <- callback_early_stopping(
monitor = "val_loss",
patience = 5,
restore_best_weights = TRUE
)
# Train the model
history <- model %>% fit(
X_train, y_train,
epochs = 50,
batch_size = 16,
validation_split = 0.2,
callbacks = list(early_stop)
)
## Epoch 1/50
## 51665/51665 - 207s - loss: 1881.0140 - mae: 30.9314 - val_loss: 1407.5021 - val_mae: 27.4503 - 207s/epoch - 4ms/step
## Epoch 2/50
## 51665/51665 - 203s - loss: 1373.0623 - mae: 27.2040 - val_loss: 1372.6271 - val_mae: 27.1520 - 203s/epoch - 4ms/step
## Epoch 3/50
## 51665/51665 - 199s - loss: 1354.7793 - mae: 27.0113 - val_loss: 1357.4866 - val_mae: 26.9783 - 199s/epoch - 4ms/step
## Epoch 4/50
## 51665/51665 - 203s - loss: 1347.4330 - mae: 26.9290 - val_loss: 1354.7152 - val_mae: 26.9279 - 203s/epoch - 4ms/step
## Epoch 5/50
## 51665/51665 - 194s - loss: 1340.3407 - mae: 26.8480 - val_loss: 1360.6180 - val_mae: 27.0675 - 194s/epoch - 4ms/step
## Epoch 6/50
## 51665/51665 - 200s - loss: 1335.5311 - mae: 26.7908 - val_loss: 1347.6490 - val_mae: 26.8313 - 200s/epoch - 4ms/step
## Epoch 7/50
## 51665/51665 - 181s - loss: 1333.6255 - mae: 26.7690 - val_loss: 1340.9502 - val_mae: 26.7594 - 181s/epoch - 4ms/step
## Epoch 8/50
## 51665/51665 - 204s - loss: 1332.5142 - mae: 26.7540 - val_loss: 1339.5787 - val_mae: 26.7634 - 204s/epoch - 4ms/step
## Epoch 9/50
## 51665/51665 - 131s - loss: 1330.6134 - mae: 26.7368 - val_loss: 1338.2775 - val_mae: 26.7411 - 131s/epoch - 3ms/step
## Epoch 10/50
## 51665/51665 - 1539s - loss: 1329.3612 - mae: 26.7171 - val_loss: 1352.1714 - val_mae: 26.8365 - 1539s/epoch - 30ms/step
## Epoch 11/50
## 51665/51665 - 130s - loss: 1327.8395 - mae: 26.7003 - val_loss: 1338.3467 - val_mae: 26.7630 - 130s/epoch - 3ms/step
## Epoch 12/50
## 51665/51665 - 89s - loss: 1326.8043 - mae: 26.6915 - val_loss: 1336.8727 - val_mae: 26.7471 - 89s/epoch - 2ms/step
## Epoch 13/50
## 51665/51665 - 977s - loss: 1324.9185 - mae: 26.6687 - val_loss: 1340.1594 - val_mae: 26.7304 - 977s/epoch - 19ms/step
## Epoch 14/50
## 51665/51665 - 92s - loss: 1322.2379 - mae: 26.6428 - val_loss: 1327.9929 - val_mae: 26.6395 - 92s/epoch - 2ms/step
## Epoch 15/50
## 51665/51665 - 45s - loss: 1318.8793 - mae: 26.5994 - val_loss: 1325.5049 - val_mae: 26.6110 - 45s/epoch - 875us/step
## Epoch 16/50
## 51665/51665 - 44s - loss: 1315.6921 - mae: 26.5631 - val_loss: 1324.1324 - val_mae: 26.6131 - 44s/epoch - 846us/step
## Epoch 17/50
## 51665/51665 - 44s - loss: 1313.2931 - mae: 26.5400 - val_loss: 1321.6257 - val_mae: 26.5682 - 44s/epoch - 849us/step
## Epoch 18/50
## 51665/51665 - 45s - loss: 1311.8528 - mae: 26.5231 - val_loss: 1329.7285 - val_mae: 26.6173 - 45s/epoch - 862us/step
## Epoch 19/50
## 51665/51665 - 5874s - loss: 1311.5126 - mae: 26.5182 - val_loss: 1320.6340 - val_mae: 26.5262 - 5874s/epoch - 114ms/step
## Epoch 20/50
## 51665/51665 - 48s - loss: 1310.8038 - mae: 26.5076 - val_loss: 1317.8607 - val_mae: 26.4978 - 48s/epoch - 936us/step
## Epoch 21/50
## 51665/51665 - 44s - loss: 1310.7500 - mae: 26.5073 - val_loss: 1317.4490 - val_mae: 26.4951 - 44s/epoch - 855us/step
## Epoch 22/50
## 51665/51665 - 44s - loss: 1310.4778 - mae: 26.5079 - val_loss: 1327.6125 - val_mae: 26.6148 - 44s/epoch - 843us/step
## Epoch 23/50
## 51665/51665 - 44s - loss: 1310.0178 - mae: 26.5047 - val_loss: 1322.5145 - val_mae: 26.5446 - 44s/epoch - 843us/step
## Epoch 24/50
## 51665/51665 - 44s - loss: 1310.0438 - mae: 26.5037 - val_loss: 1327.4176 - val_mae: 26.6708 - 44s/epoch - 843us/step
## Epoch 25/50
## 51665/51665 - 44s - loss: 1309.9100 - mae: 26.5019 - val_loss: 1317.3850 - val_mae: 26.5187 - 44s/epoch - 848us/step
## Epoch 26/50
## 51665/51665 - 44s - loss: 1309.9210 - mae: 26.4994 - val_loss: 1317.8644 - val_mae: 26.4880 - 44s/epoch - 858us/step
## Epoch 27/50
## 51665/51665 - 941s - loss: 1309.0642 - mae: 26.4951 - val_loss: 1316.9810 - val_mae: 26.5234 - 941s/epoch - 18ms/step
## Epoch 28/50
## 51665/51665 - 44s - loss: 1307.6307 - mae: 26.4755 - val_loss: 1314.1721 - val_mae: 26.4884 - 44s/epoch - 851us/step
## Epoch 29/50
## 51665/51665 - 44s - loss: 1304.6138 - mae: 26.4362 - val_loss: 1311.2686 - val_mae: 26.4342 - 44s/epoch - 845us/step
## Epoch 30/50
## 51665/51665 - 43s - loss: 1301.9065 - mae: 26.4020 - val_loss: 1309.2960 - val_mae: 26.3983 - 43s/epoch - 839us/step
## Epoch 31/50
## 51665/51665 - 44s - loss: 1301.0229 - mae: 26.3874 - val_loss: 1315.3098 - val_mae: 26.4624 - 44s/epoch - 852us/step
## Epoch 32/50
## 51665/51665 - 43s - loss: 1300.4795 - mae: 26.3868 - val_loss: 1316.8231 - val_mae: 26.4737 - 43s/epoch - 841us/step
## Epoch 33/50
## 51665/51665 - 44s - loss: 1299.4963 - mae: 26.3729 - val_loss: 1310.7443 - val_mae: 26.3930 - 44s/epoch - 856us/step
## Epoch 34/50
## 51665/51665 - 442s - loss: 1298.4257 - mae: 26.3625 - val_loss: 1307.4773 - val_mae: 26.4081 - 442s/epoch - 9ms/step
## Epoch 35/50
## 51665/51665 - 44s - loss: 1297.8495 - mae: 26.3520 - val_loss: 1304.9491 - val_mae: 26.3703 - 44s/epoch - 851us/step
## Epoch 36/50
## 51665/51665 - 44s - loss: 1296.9695 - mae: 26.3485 - val_loss: 1306.3848 - val_mae: 26.3793 - 44s/epoch - 842us/step
## Epoch 37/50
## 51665/51665 - 43s - loss: 1296.6132 - mae: 26.3390 - val_loss: 1306.8502 - val_mae: 26.4174 - 43s/epoch - 836us/step
## Epoch 38/50
## 51665/51665 - 44s - loss: 1296.2799 - mae: 26.3364 - val_loss: 1309.5557 - val_mae: 26.3905 - 44s/epoch - 856us/step
## Epoch 39/50
## 51665/51665 - 44s - loss: 1296.2225 - mae: 26.3359 - val_loss: 1313.7152 - val_mae: 26.4102 - 44s/epoch - 847us/step
## Epoch 40/50
## 51665/51665 - 44s - loss: 1295.6417 - mae: 26.3292 - val_loss: 1306.5245 - val_mae: 26.3719 - 44s/epoch - 843us/step
plot(history)
# Evaluate the model
score <- model %>% evaluate(X_test, y_test)
## 8073/8073 - 4s - loss: 1287.4337 - mae: 26.2446 - 4s/epoch - 459us/step
print(score)
## loss mae
## 1287.43372 26.24462
X_personal_matrix <- t(as.matrix(X_personal))
my_deadlift <- model %>% predict(X_personal_matrix)
## 1/1 - 0s - 65ms/epoch - 65ms/step
print(paste(round(my_deadlift,0), "kg"))
## [1] "148 kg"
print(paste(round(my_deadlift*2.20462262185,0), "lbs"))
## [1] "326 lbs"
330lbs is no joke. I will clearly need to hit the gym a bit more if I ever wanted to compete, but it is kind of cool to see what is even remotely possible.
Note: This page uses data from the OpenPowerlifting project, https://www.openpowerlifting.org. You may download a copy of the data at https://data.openpowerlifting.org.