Scaling legal guidelines for reward mannequin overoptimization

In reinforcement studying from human suggestions, it is not uncommon to optimize towards a reward mannequin skilled to foretell human preferences. As a result of the reward mannequin is an imperfect proxy, optimizing its worth an excessive amount of can hinder floor reality efficiency, in accordance with Goodhart’s legislation. This impact has been continuously noticed, however not rigorously measured as a result of expense of amassing human desire knowledge. On this work, we use an artificial setup wherein a set “gold-standard” reward mannequin performs the function of people, offering labels used to coach a proxy reward mannequin. We examine how the gold reward mannequin rating modifications as we optimize towards the proxy reward mannequin utilizing both reinforcement studying or best-of-n sampling. We discover that this relationship follows a distinct useful kind relying on the tactic of optimization, and that in each circumstances its coefficients scale easily with the variety of reward mannequin parameters. We additionally examine the impact on this relationship of the scale of the reward mannequin dataset, the variety of reward mannequin and coverage parameters, and the coefficient of the KL penalty added to the reward within the reinforcement studying setup. We discover the implications of those empirical outcomes for theoretical concerns in AI alignment.

Scaling legal guidelines for reward mannequin overoptimization

New Technology Revolutionizes Insect Research

Open Source AI Has Founders—and the FTC—Buzzing

You Don't Understand AI Until You Watch THIS

Think Deepfakes Aren’t a Risk? Check Out This AI Video of Biden Flinging Slurs at His Enemies

Leak Shows That Google-Funded AI Video Generator Runway Was Trained on Stolen YouTube Content, Pirated Films

Study Finds That AI Is Adding to Employees’ Workload and Burning Them Out

New Technology Revolutionizes Insect Research

Open Source AI Has Founders—and the FTC—Buzzing

Think Deepfakes Aren’t a Risk? Check Out This AI Video of Biden Flinging Slurs at His Enemies

Leak Shows That Google-Funded AI Video Generator Runway Was Trained on Stolen YouTube Content, Pirated Films

Study Finds That AI Is Adding to Employees’ Workload and Burning Them Out

When AI Is Trained With AI-Generated Data, It Starts Spouting Gibberish

Bind AI Copilot (www.getbind.co)

Forensic Analysis Finds Overwhelming Similarities Between OpenAI’s Voice and Scarlett Johansson

WriteText.ai for WooCommerce (writetext.ai)

World’s Largest Radiology AI Marketplace CARPL Raises $6 Million to Accelerate the Adoption of AI in Clinical Workflows

Google for Startups Accelerator: AI First MENA-T

Introducing ChatGPT

Introducing Whisper

Log In

With social network:

Or with username:

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections