Flag harmful content using Amazon Comprehend toxicity detection

Online communities are driving user engagement across industries like gaming, social media, ecommerce, dating, and e-learning. Members of these online communities trust platform owners to provide a safe and inclusive environment where they can freely consume content and contribute. Content moderators are often employed to review user-generated content and check that it’s safe and compliant with your terms of use. However, the ever-increasing scale, complexity, and variety of inappropriate content makes human moderation workflows unscalable and expensive. The result is poor, harmful, and non-inclusive communities that disengage users and negatively impact the community and business.

Along with user-generated content, machine-generated content has brought a fresh challenge to content moderation. It automatically creates highly realistic content that may be inappropriate or harmful at scale. The industry is facing the new challenge of automatically moderating content generated by AI to protect users from harmful material.

In this post, we introduce toxicity detection, a new feature from Amazon Comprehend that helps you automatically detect harmful content in user- or machine-generated text. This includes plain text, text extracted from images, and text transcribed from audio or video content.

Detect toxicity in text content with Amazon Comprehend

Amazon Comprehend is a natural-language processing (NLP) service that uses machine learning (ML) to uncover valuable insights and connections in text. It offers a range of ML models that can be either pre-trained or customized through API interfaces. Amazon Comprehend now provides a straightforward, NLP-based solution for toxic content detection in text.

The Amazon Comprehend Toxicity Detection API assigns an overall toxicity score to text content, ranging from 0–1, indicating the likelihood of it being toxic. It also categorizes text into the following seven categories and provides a confidence score for each:

HATE_SPEECH – Speech that criticizes, insults, denounces, or dehumanizes a person or a group on the basis of an identity, be it race, ethnicity, gender identity, religion, sexual orientation, ability, national origin, or another identity group.
GRAPHIC – Speech that uses visually descriptive, detailed, and unpleasantly vivid imagery. Such language is often made verbose so as to amplify an insult, or discomfort or harm to the recipient.
HARASSMENT_OR_ABUSE – Speech that imposes disruptive power dynamics between the speaker and hearer (regardless of intent), seeks to affect the psychological well-being of the recipient, or objectifies a person.
SEXUAL – Speech that indicates sexual interest, activity, or arousal by using direct or indirect references to body parts, physical traits, or sex.
VIOLENCE_OR_THREAT – Speech that includes threats that seek to inflict pain, injury, or hostility towards a person or group.
INSULT – Speech that includes demeaning, humiliating, mocking, insulting, or belittling language.
PROFANITY – Speech that contains words, phrases, or acronyms that are impolite, vulgar, or offensive.

You can access the Toxicity Detection API by calling it directly using the AWS Command Line Interface (AWS CLI) and AWS SDKs. Toxicity detection in Amazon Comprehend is currently supported in the English language.

Use cases

Text moderation plays a crucial role in managing user-generated content across diverse formats, including social media posts, online chat messages, forum discussions, website comments, and more. Moreover, platforms that accept video and audio content can use this feature to moderate transcribed audio content.

The emergence of generative AI and large language models (LLMs) represents the latest trend in the field of AI. Consequently, there is a growing need for responsive solutions to moderate content generated by LLMs. The Amazon Comprehend Toxicity Detection API is ideally suited for addressing this need.

Amazon Comprehend Toxicity Detection API request

You can send up to 10 text segments to the Toxicity Detection API, each with a size limit of 1 KB. Every text segment in the request is handled independently. In the following example, we generate a JSON file named toxicity_api_input.json containing the text content, including three sample text segments for moderation. Note that in the example, the profane words are masked as XXXX.

{
  "TextSegments": [     
    {"Text": "and go through the door go through the door he's on the right"},
    {"Text": "he's on the right XXXXX him"},
    {"Text": "what the XXXX are you doing man that's why i didn't want to play"}
  ],
  "LanguageCode": "en"
}

You can use the AWS CLI to invoke the Toxicity Detection API using the preceding JSON file containing the text content:

aws comprehend detect-toxic-content --cli-input-json file://toxicity_api_input.json

Amazon Comprehend Toxicity Detection API response

The Toxicity Detection API response JSON output will include the toxicity analysis result in the ResultList field. ResultList lists the text segment items, and the sequence represents the order in which the text sequences were received in the API request. Toxicity represents the overall confidence score of detection (between 0–1). Labels includes a list of toxicity labels with confidence scores, categorized by the type of toxicity.

The following code shows the JSON response from the Toxicity Detection API based on the request example in the previous section:

{
    "ResultList": [
        {
            "Toxicity": 0.009200000204145908,
            "Labels": [
                { "Name": "PROFANITY", "Score": 0.0007999999797903001},
                { "Name": "HATE_SPEECH", "Score": 0.0017999999690800905},
                { "Name": "INSULT", "Score": 0.003000000026077032},
                { "Name": "GRAPHIC", "Score": 0.0010000000474974513},
                { "Name": "HARASSMENT_OR_ABUSE", "Score": 0.0013000000035390258},
                { "Name": "SEXUAL", "Score": 0.0017000000225380063},
                { "Name": "VIOLENCE_OR_THREAT", "Score": 0.004999999888241291}
            ]
        },
        {
            "Toxicity": 0.7358999848365784,
            "Labels": [
                { "Name": "PROFANITY", "Score": 0.011900000274181366},
                { "Name": "HATE_SPEECH", "Score": 0.019500000402331352},
                { "Name": "INSULT", "Score": 0.0714000016450882},
                { "Name": "GRAPHIC", "Score": 0.006099999882280827},
                { "Name": "HARASSMENT_OR_ABUSE", "Score": 0.018200000748038292},
                { "Name": "SEXUAL", "Score": 0.0027000000700354576},
                { "Name": "VIOLENCE_OR_THREAT", "Score": 0.8145999908447266}
            ]
        },
        {
            "Toxicity": 0.9843000173568726,
            "Labels": [
                { "Name": "PROFANITY", "Score": 0.9369999766349792 },
                { "Name": "HATE_SPEECH", "Score": 0.30880001187324524 },
                { "Name": "INSULT", "Score": 0.42100000381469727 },
                { "Name": "GRAPHIC", "Score": 0.12630000710487366 },
                { "Name": "HARASSMENT_OR_ABUSE", "Score": 0.25519999861717224 },
                { "Name": "SEXUAL", "Score": 0.19169999659061432 },
                { "Name": "VIOLENCE_OR_THREAT", "Score": 0.19539999961853027 }
            ]
        }
    ]
}

In the preceding JSON, the first text segment is considered safe with a low toxicity score. However, the second and third text segments received toxicity scores of 73% and 98%, respectively. For the second segment, Amazon Comprehend detects a high toxicity score for VIOLENCE_OR_THREAT; for the third segment, it detects PROFANITY with a high toxicity score.

Sample request using the Python SDK

The following code snippet demonstrates how to utilize the Python SDK to invoke the Toxicity Detection API. This code receives the same JSON response as the AWS CLI command demonstrated earlier.

import boto3 import base64
# Initialize a Comprehend boto3 client object
comprehend_client = session.client('comprehend')

# Call comprehend Detect Toxic Content API with text segments
response = comprehend_client.detect_toxic_content(
    TextSegments=[
        {"Text":  "and go through the door go through the door he's on the right"},
        {"Text":  "he's on the right XXXXX him"},
        {"Text":  "what the XXXX are you doing man that's why i didn't want to play"}
    ],
    LanguageCode="en"
)

Summary

In this post, we provided an overview of the new Amazon Comprehend Toxicity Detection API. We also described how you can parse the API response JSON. For more information, refer to Comprehend API document.

Amazon Comprehend toxicity detection is now generally available in four Regions: us-east-1, us-west-2, eu-west-1, and ap-southeast-2.

To learn more about content moderation, refer to Guidance for Content Moderation on AWS. Take the first step towards streamlining your content moderation operations with AWS.

About the Authors

Lana Zhang is a Senior Solutions Architect at AWS WWSO AI Services team, specializing in AI and ML for Content Moderation, Computer Vision, Natural Language Processing and Generative AI. With her expertise, she is dedicated to promoting AWS AI/ML solutions and assisting customers in transforming their business solutions across diverse industries, including social media, gaming, e-commerce, media, advertising & marketing.

Ravisha SK is a Senior Product Manager, Technical at AWS with a focus on AI/ML. She has over 10 years of experience in data analytics and machine learning across different domains. In her spare time, she enjoys reading, experimenting in the kitchen and exploring new coffee shops.