Dynamic Pricing with Multi-Armed Bandit: Studying by Doing | by Massimiliano Costacurta

Making use of Reinforcement Studying methods to real-world use instances, particularly in dynamic pricing, can reveal many surprises

Within the huge world of decision-making issues, one dilemma is especially owned by Reinforcement Studying methods: exploration versus exploitation. Think about strolling right into a on line casino with rows of slot machines (often known as “one-armed bandits”) the place every machine pays out a distinct, unknown reward. Do you discover and play every machine to find which one has the best payout, or do you stick to 1 machine, hoping it’s the jackpot? This metaphorical situation underpins the idea of the Multi-armed Bandit (MAB) downside. The target is to discover a technique that maximizes the rewards over a sequence of performs. Whereas exploration gives new insights, exploitation leverages the data you already possess.

Now, transpose this precept to dynamic pricing in a retail situation. Suppose you’re an e-commerce retailer proprietor with a brand new product. You aren’t sure about its optimum promoting value. How do you set a value that maximizes your income? Do you have to discover completely different costs to grasp buyer willingness to pay, or must you exploit a value that has been performing effectively traditionally? Dynamic pricing is actually a MAB downside in disguise. At every time step, each candidate value level could be seen as an “arm” of a slot machine and the income generated from that value is its “reward.” One other approach to see that is that the target of dynamic pricing is to swiftly and precisely measure how a buyer base’s demand reacts to various value factors. In less complicated phrases, the purpose is to pinpoint the demand curve that finest mirrors buyer conduct.

On this article, we’ll discover 4 Multi-armed Bandit algorithms to judge their efficacy towards a well-defined (although not simple) demand curve. We’ll then dissect the first strengths and limitations of every algorithm and delve into the important thing metrics which can be instrumental in gauging their efficiency.

Historically, demand curves in economics describe the connection between the value of a product and the amount of the product that customers are prepared to purchase. They often slope downwards, representing the widespread remark that as value rises, demand sometimes falls, and vice-versa. Consider widespread merchandise akin to smartphones or live performance tickets. If costs are lowered, extra individuals have a tendency to purchase, but when costs skyrocket, even the ardent followers would possibly assume twice.

But in our context, we’ll mannequin the demand curve barely in another way: we’re placing value towards chance. Why? As a result of in dynamic pricing eventualities, particularly digital items or companies, it’s typically extra significant to assume by way of the chance of a sale at a given value than to take a position on precise portions. In such environments, every pricing try could be seen as an exploration of the chance of success (or buy), which could be simply modeled as a Bernoulli random variable with a chance p relying on a given take a look at value.

Right here’s the place it will get significantly fascinating: whereas intuitively one would possibly assume the duty of our Multi-armed Bandit algorithms is to unearth that very best value the place the chance of buy is highest, it’s not fairly so simple. In reality, our final objective is to maximise the income (or the margin). This implies we’re not trying to find the value that will get the most individuals to click on ‘purchase’ — we’re trying to find the value that, when multiplied by its related buy chance, offers the best anticipated return. Think about setting a excessive value that fewer individuals purchase, however every sale generates important income. On the flip facet, a really low value would possibly entice extra consumers, however the complete income would possibly nonetheless be decrease than the excessive value situation. So, in our context, speaking concerning the ‘demand curve’ is considerably unconventional, as our goal curve will primarily characterize the chance of buy reasonably than the demand immediately.

Now, attending to the maths, let’s begin by saying that shopper conduct, particularly when coping with value sensitivity, isn’t all the time linear. A linear mannequin would possibly recommend that for each incremental improve in value, there’s a continuing decrement in demand. In actuality, this relationship is commonly extra complicated and nonlinear. One approach to mannequin this conduct is through the use of logistic features, which might seize this nuanced relationship extra successfully. Our chosen mannequin for the demand curve is then:

Right here, a denotes the utmost achievable chance of buy, whereas b modulates the sensitivity of the demand curve towards value modifications. The next worth of b means a steeper curve, approaching extra quickly to decrease buy possibilities as the value will increase.

4 examples of demand curves with completely different mixtures of parameters a and b

For any given value level, we’ll be then capable of get hold of an related buy chance, p. We are able to then enter p right into a Bernoulli random variable generator to simulate the response of a buyer to a selected value proposal. In different phrases, given a value, we are able to simply emulate our reward operate.

Subsequent, we are able to multiply this operate by the value with a purpose to get the anticipated income for a given value level:

Unsurprisingly, this operate doesn’t attain its most in correspondence with the best chance. Additionally, the value related to the utmost doesn’t rely upon the worth of the parameter a, whereas the utmost anticipated return does.

Anticipated income curves with associated maxima

With some recollection from calculus, we are able to additionally derive the formulation for the spinoff (you’ll want to make use of a mix of each the product and the chain rule). It’s not precisely a soothing train, nevertheless it’s nothing too difficult. Right here is the analytical expression of the spinoff of the anticipated income:

This spinoff permits us to seek out the precise value that maximizes our anticipated income curve. In different phrases, through the use of this particular formulation in tandem with some numerical algorithms, we are able to simply decide the value that units it to 0. This, in flip, is the value that maximizes the anticipated income.

And that is precisely what we’d like, since by fixing the values of a and b, we’ll instantly know the goal value that our bandits must discover. Coding this in Python is a matter of some traces of code:

Dynamic Pricing with Multi-Armed Bandit: Studying by Doing | by Massimiliano Costacurta | Aug, 2023

Making use of Reinforcement Studying methods to real-world use instances, particularly in dynamic pricing, can reveal many surprises

New Technology Revolutionizes Insect Research

Open Source AI Has Founders—and the FTC—Buzzing

You Don't Understand AI Until You Watch THIS

Think Deepfakes Aren’t a Risk? Check Out This AI Video of Biden Flinging Slurs at His Enemies

Leak Shows That Google-Funded AI Video Generator Runway Was Trained on Stolen YouTube Content, Pirated Films

Study Finds That AI Is Adding to Employees’ Workload and Burning Them Out

New Technology Revolutionizes Insect Research

Open Source AI Has Founders—and the FTC—Buzzing

Think Deepfakes Aren’t a Risk? Check Out This AI Video of Biden Flinging Slurs at His Enemies

Leak Shows That Google-Funded AI Video Generator Runway Was Trained on Stolen YouTube Content, Pirated Films

Study Finds That AI Is Adding to Employees’ Workload and Burning Them Out

When AI Is Trained With AI-Generated Data, It Starts Spouting Gibberish

Bind AI Copilot (www.getbind.co)

Forensic Analysis Finds Overwhelming Similarities Between OpenAI’s Voice and Scarlett Johansson

WriteText.ai for WooCommerce (writetext.ai)

World’s Largest Radiology AI Marketplace CARPL Raises $6 Million to Accelerate the Adoption of AI in Clinical Workflows

Google for Startups Accelerator: AI First MENA-T

Effectively Serving Open Supply LLMs | by Ryan Shrott | Aug, 2023

LLaMA in R with Keras and TensorFlow

Making use of Reinforcement Studying methods to real-world use instances, particularly in dynamic pricing, can reveal many surprises

Log In

With social network:

Or with username:

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections