Understanding Bayesian Advertising Combine Modeling: A Deep Dive into Prior Specs | by Slava Kisilevich

Exploring mannequin specification with Google’s LightweightMMM

Photograph by Pawel Czerwinski on Unsplash

Bayesian advertising and marketing combine modeling has been receiving an increasing number of consideration, particularly with the latest releases of open supply instruments like LightweightMMM (Google) or PyMC Marketing (PyMC Labs). Though these frameworks simplify the complexities of Bayesian modeling, it’s nonetheless essential for the person to have an understanding of elementary Bayesian ideas and be capable of perceive the mannequin specification.

On this article, I take Google’s LightweightMMM as a sensible instance and present the instinct and which means of the prior specs of this framework. I reveal the simulation of prior samples utilizing Python and the scipy library.

I exploit the information made obtainable by Robyn underneath MIT Licence.

The dataset consists of 208 weeks of income (from 2015–11–23 to 2019–11–11) having:

5 media spend channels: tv_S, ooh_S, print_S, facebook_S, search_S
2 media channels which have additionally the publicity info (Impression, Clicks): facebook_I, search_clicks_P
Natural media with out spend: e-newsletter
Management variables: occasions, holidays, competitor gross sales (competitor_sales_B)

The specification of the LightweightMMM model is outlined as follows:

LMMM Mannequin Specification (picture by the creator)

This specification represents an additive linear regression mannequin that explains the worth of a response (goal variable) at a particular time level t.

Let’s break down every element within the equation:

α: This element represents the intercept or the baseline worth of the response. It’s the anticipated worth of the response when all different elements are zero.
development: This element captures the rising or lowering development of the response over time.
seasonality: This element represents periodic fluctuations within the response.
media_channels: This element accounts for the affect of media channels (television, radio, on-line adverts) on the response.
other_factors: This element encompasses another variables which have affect on the response resembling climate, financial indicators or competitor actions.

Under, I am going by way of every of the elements intimately and clarify tips on how to interpret the prior specs. As a reminder, a previous distribution is an assumed distribution of some parameter with none information of the underlying information.

Intercept

The intercept is outlined to observe a half-normal distribution with a normal deviation of two. A half-normal distribution is a steady likelihood distribution that resembles a standard distribution however is restricted to constructive values solely. The distribution is characterised by a single parameter, the usual deviation (scale). Half-normal distribution implies that the intercept can get solely constructive values.

The next code generates samples from the prior distribution of the intercept and visualizes the likelihood density perform (PDF) for a half-normal distribution with a scale of two. For visualizations of different elements, please confer with the accompanying supply code within the Github repo.

from scipy import statsscale = 2
halfnormal_dist = stats.halfnorm(scale=scale)
samples = halfnormal_dist.rvs(measurement=1000)
plt.determine(figsize=(20, 6))
sns.histplot(samples, bins=50, kde=False, stat='density', alpha=0.5)
sns.lineplot(x=np.linspace(0, 6, 100), 
y=halfnormal_dist.pdf(np.linspace(0, 6, 100)), coloration='r')
plt.title(f"Half-Regular Distribution with scale={scale}")
plt.xlabel('x')
plt.ylabel('P(X=x)')
plt.present()

Half Regular Distribution (picture by the creator)

Pattern

The development is outlined as a power-law relationship between time t and the development worth. The parameter μ represents the amplitude or magnitude of the development, whereas okay controls the steepness or curvature of the development.

The parameter μ is drawn from a standard distribution with a imply of 0 and a normal deviation of 1. This means that μ follows a normal regular distribution, centered round 0, with commonplace deviation of 1. The conventional distribution permits for constructive and unfavourable values of μ, representing upward or downward tendencies, respectively.

The parameter okay is drawn from a uniform distribution between 0.5 and 1.5. The uniform distribution ensures that okay takes values that end in an inexpensive and significant curvature for the development.

The plot under depicts separate elements obtained from the prior distributions: a pattern of the intercept and development, every represented individually.

Pattern and Intercept (picture by the creator)

Seasonality

Every element γ is drawn from a standard distribution with a imply of 0 and a normal deviation of 1.

By combining the cosine and sine features with completely different γ, cyclic patterns can modeled to seize the seasonality current within the information. The cosine and sine features signify the oscillating habits noticed over the interval of 52 models (weeks).

The plot under illustrates a pattern of the seasonality, intercept and development obtained from the prior distributions.

Seasonality, Pattern and Intercept (picture by the creator)

Different elements (management variables)

Different Components specification (picture by the creator)

Every issue coefficient λ is drawn from a standard distribution with a imply of 0 and a normal deviation of 1, which implies that λ can take constructive or unfavourable values, representing the path and magnitude of the affect every issue has on the result.

The plot under depicts separate elements obtained from the prior distributions: a pattern of the intercept, development, seasonality and management variables (competitor_sales_B, e-newsletter, holidays and occasions) every represented individually.

Different elements (mixed) (picture by the creator)

Media Channels

The distribution for β coefficient of a media channel m is specified as a half-normal distribution, the place the usual deviation parameter v is set by the sum of the entire value related to media channel m. The whole value displays the funding or assets allotted to that individual media channel.

Media Transformations

Adstock and Hill Saturation Specification (picture by the creator)

In these equations, we’re modeling the media channels’ habits utilizing a sequence of transformations, resembling adstock and Hill saturation.

The variable media channels represents the reworked media channels at time level t. It’s obtained by making use of a change to the uncooked media channel worth x. The Hill transformation is managed by the parameters Okay a half saturation level (0 < okay ≤ 1), and form S controlling the steepness of the curve (s > 0).

The variable x∗ represents the reworked media channels worth at time t after present process the adstock transformation. It’s calculated by including the present uncooked media channel worth to the product of the earlier reworked worth and the adstock decay parameter λ.

Parameters Okay and S observe gamma distributions with form and scale parameters each set to 1, whereas λ follows a beta distribution with form parameters 2 and 1.

The likelihood density perform of the Hill Saturation parameters Okay and S are illustrated within the plot under:

form = 1
scale = 1gamma_dist = stats.gamma(a=form, scale=scale)
samples = gamma_dist.rvs(measurement=1000)
plt.determine(figsize=(20, 6))
sns.histplot(samples, bins=50, kde=False, stat='density', alpha=0.5)
sns.lineplot(x=np.linspace(0, 6, 100), y=gamma_dist.pdf(np.linspace(0, 6, 100)), coloration='r')
plt.title(f"Gamma Distribution for $K_m$ and $S_m$ with form={form} and scale={scale}")
plt.xlabel('x')
plt.ylabel('P(X=x)')
# Present the plot
plt.present()python

Gamma distribution (picture by the creator)

The likelihood density perform of the adstock parameter λ is proven within the plot under:

Beta distribution (picture by the creator)

A Word on the specification of the adstock parameter λ:

The likelihood density perform of the Beta(α = 2, β = 1) distribution displays a constructive development, indicating that larger values have the next likelihood density. In media evaluation, completely different industries and media actions might reveal various decay charges, with most media channels sometimes exhibiting small decay charges. As an example, Robyn suggests the next ranges of λ decay for frequent media channels: TV (0.3–0.8), OOH/Print/Radio (0.1–0.4), and digital (0–0.3).

Within the context of the Beta(α = 2, β = 1) distribution, larger possibilities are assigned to λ values nearer to 1, whereas decrease possibilities are assigned to values nearer to 0. Consequently, outcomes or observations close to the higher finish of the interval [0, 1] usually tend to happen in comparison with outcomes close to the decrease finish.

Alternatively, within the Bayesian Methods for Media Mix Modeling with Carryover and Shape Effects, the decay parameter is outlined as Beta(α = 3, β = 3), whose likelihood density perform is illustrated under. This distribution is symmetric round 0.5, indicating an equal chance of observing outcomes at each extremes and close to the middle of the interval [0, 1].

The plot under depicts separate elements obtained from the prior distributions: a pattern of the intercept, development, seasonality, management variables and media channels, every represented individually.

All mannequin elements (picture by the creator)

Combining all elements

As talked about earlier, LightweightMMM fashions an additive linear regression by combining numerous elements resembling intercept, development, seasonality, media channels, and different elements sampled from their prior distributions to acquire the predictive response. The plot under visualizes the true response and the anticipated response sampled from the prior predictive distribution.

Visualizing a single pattern in opposition to the true response worth permits us to look at how the mannequin’s prediction compares to the precise consequence for a particular set of parameter values. It might present an intuitive understanding of how the mannequin performs in that individual occasion.

Income: True vs. Prior (picture by the creator)

Prior predictive examine

So as get extra sturdy insights, it’s typically really helpful to pattern a number of instances from the prior predictive distribution and measure the uncertainty. The prior predictive examine helps assess the adequacy of the chosen mannequin and consider whether or not the mannequin’s predictions align with our expectations, earlier than observing any precise information.

The plot depicted under visualizes the prior predictive distribution by displaying the anticipated income (imply) at every level, together with measures of uncertainty. We will see that the true income falls inside the vary of the usual deviation, indicating that the mannequin specification is appropriate for the noticed information.

Prior predictive examine (picture by the creator)

Bayesian advertising and marketing combine modeling might take appreciable time to grasp. I hope that this text helped you to reinforce your understanding of prior distributions and Bayesian advertising and marketing mannequin specs.

The whole code will be downloaded from my Github repo

Thanks for studying!