in

# School Soccer Convention Realignment — Regression | by Giovanni Malloy | Aug, 2023

Welcome to half 2 of my sequence on convention realignment! Final summer time when convention realignment was in full swing, Tony Altimore printed a study on Twitter that impressed me to do my very own convention realignment evaluation. This sequence is organized into 4 components (and the complete motivation for it’s discovered partially 1):

1. College Football Conference Realignment — Exploratory Data Analysis in Python
2. College Football Conference Realignment — Regression
3. School Soccer Convention Realignment — Clustering
4. School Soccer Convention Realignment — node2vec

Hopefully, every a part of the sequence gives you with a recent perspective on the way forward for the beloved recreation of faculty soccer. For these of you who didn’t learn half 1 a fast synopsis is that I created my very own information set compiled from sources throughout the online. These information embody basic information about each FBS program, a non-canonical approximation of all college football rivalries, stadium size, historical performance, frequency appearances in AP top 25 polls, whether or not the college is an AAU or R1 establishment (traditionally vital for membership within the Massive Ten and Pac 12), the variety of NFL draft picks, data on program revenue from 2017–2019, and a recent estimate on the scale of faculty soccer fan bases. Because it seems, stadium capability, 2019 income, and historic AP ballot success correlate strongly with the estimated fan base dimension in Tony Altimore’s evaluation:

Supervised Studying

So, this bought me considering: can we create a easy regression mannequin to estimate fan base dimension?

Broadly, we are able to divide machine studying into supervised and unsupervised studying. In supervised studying, the objective is to foretell a pre-defined discrete class or steady variable. In unsupervised studying, the objective is to find traits within the information which are non-obvious. Regression is a sort of supervised studying the place the prediction goal is a steady variable. An excellent reference guide and resource was put collectively by Shervine and Afshine Amidi. (It has been translated into…