March Madness 2026 Predictor

RStudio Linear Regression XGBoost Random Forest
March Madness Predictor

Overview

The NCAA Tournament is one of the most unpredictable events in sports. As the head of the Men's Basketball Sports Analytics Club at Notre Dame, I wanted to approach this issue with the help of machine learning and analytics. My team and I decided we would treat the project as a research and testing opportunity to try three different machine learning models (LM, XGB, and RF) and report on our results. We used a three-step process to clean data, develop our models, and report our findings.

1. Data Cleaning

Pulled college basketball data over the last 10 years from hoopR and filtered down to 19 different metrics to train the models on. The final data set found the stat differentials to give the models details on how the teams matchup with each other.

2. Model Testing

Trained Linear Regression, Random Forest, and XGBoost models with the cleaned data, then tested the models on the NCAA Tournament games from 2025. Games from the last 3 years and tournament games were weighted higher in the training.

3. Evaluate Performance

Each model's bracket was entered into NCAA's Bracket Challenge, which scored each bracket as the tournament went on. The models were also used to predict the margin of all 64 games, then compared to the real game outcomes.

You can find the four brackets made by the models and the documents explaining the project and research in more detail below.

Brackets

Linear Model Bracket
Gradient-Boosted Bracket
Random Forest Bracket
Ensemble Bracket

Documents

March Madness 2026 Project Documentation

Google Doc • 12 Pages

Open Document