In Part I and Part II of this collection, we confirmed how random results can be utilized for modeling high-cardinality categorical in machine studying fashions, and we gave an introduction to the GPBoost
library which implements the GPBoost algorithm combining tree-boosting with random results. On this article, we display how the Python and R packages of the GPBoost
library can be utilized for longitudinal knowledge (aka repeated measures or panel knowledge). You would possibly need to first learn Part II of this collection because it offers a primary introduction to the GPBoost
library. GPBoost
model 1.2.1 is used on this demo.
Desk of contents
∘ 1 Data: description, loading, and sample split
∘ 2 Modeling options for longitudinal data in GPBoost
· · 2.1 Subject grouped random effects
· · 2.2 Fixed effects only
· · 2.3 Subject and time grouped random effects
· · 2.4 Subject random effects with temporal random slopes
· · 2.5 Subject-specific AR(1) / Gaussian process models
· · 2.6 Subject grouped random effects and a joint AR(1) model
∘ 3 Training a GPBoost model
∘ 4 Choosing tuning parameters
∘ 5 Prediction
∘ 6 Conclusion and references
The info used on this demo is the wages knowledge which was already utilized in Part II. It may be downloaded from here. The info set comprises a complete of 28’013 samples for 4’711 individuals for which knowledge was measured over a number of years. Such knowledge is known as longitudinal knowledge, or panel knowledge, since for each topic (particular person ID =idcode
), knowledge was collected repeatedly over time (years = t
). In different phrases, the samples for each degree of the explicit variable idcode
are repeated measurements over time. The response variable is the logarithmic actual wage (ln_wage
), and the info contains a number of predictor variables reminiscent of age, whole work…