We’re exploring the use of LLMs to address these challenges. Our large language models like GPT-4 can understand and generate natural language, making them applicable to content moderation. The models can make moderation judgments based on policy guidelines provided to them.
With this system, the process of developing and customizing content policies is trimmed down from months to hours.
- Once a policy guideline is written, policy experts can create a golden set of data by identifying a small number of examples and assigning them labels according to the policy.
- Then, GPT-4 reads the policy and assigns labels to the same dataset, without seeing the answers.
- By examining the discrepancies between GPT-4’s judgments and those of a human, the policy experts can ask GPT-4 to come up with reasoning behind its labels, analyze the ambiguity in policy definitions, resolve confusion and provide further clarification in the policy accordingly. We can repeat steps 2 and 3 until we are satisfied with the policy quality.
This iterative process yields refined content policies that are translated into classifiers, enabling the deployment of the policy and content moderation at scale.
Optionally, to handle large amounts of data at scale, we can use GPT-4’s predictions to fine-tune a much smaller model.