Introducing a context-based framework for comprehensively evaluating the social and ethical risks of AI systems
Generative AI systems are already being used to write books, create graphic designs, assist medical practitioners, and are becoming increasingly capable. Ensuring these systems are developed and deployed responsibly requires carefully evaluating the potential ethical and social risks they may pose.
In our new paper, we propose a three-layered framework for evaluating the social and ethical risks of AI systems. This framework includes evaluations of AI system capability, human interaction, and systemic impacts.
We also map the current state of safety evaluations and find three main gaps: context, specific risks, and multimodality. To help close these gaps, we call for repurposing existing evaluation methods for generative AI and for implementing a comprehensive approach to evaluation, as in our case study on misinformation. This approach integrates findings like how likely the AI system is to provide factually incorrect information with insights on how people use that system, and in what context. Multi-layered evaluations can draw conclusions beyond model capability and indicate whether harm — in this case, misinformation — actually occurs and spreads.
To make any technology work as intended, both social and technical challenges must be solved. So to better assess AI system safety, these different layers of context must be taken into account. Here, we build upon earlier research identifying the potential risks of large-scale language models, such as privacy leaks, job automation, misinformation, and more — and introduce a way of comprehensively evaluating these risks going forward.
Context is critical for evaluating AI risks
Capabilities of AI systems are an important indicator of the types of wider risks that may arise. For example, AI systems that are more likely to produce factually inaccurate or misleading outputs may be more prone to creating risks of misinformation, causing issues like lack of public trust.
Measuring these capabilities is core to AI safety assessments, but these assessments alone cannot ensure that AI systems are safe. Whether downstream harm manifests — for example, whether people come to hold false beliefs based on inaccurate model output — depends on context. More specifically, who uses the AI system and with what goal? Does the AI system function as intended? Does it create unexpected externalities? All these questions inform an overall evaluation of the safety of an AI system.
Extending beyond capability evaluation, we propose evaluation that can assess two additional points where downstream risks manifest: human interaction at the point of use, and systemic impact as an AI system is embedded in broader systems and widely deployed. Integrating evaluations of a given risk of harm across these layers provides a comprehensive evaluation of the safety of an AI system.
Human interaction evaluation centres the experience of people using an AI system. How do people use the AI system? Does the system perform as intended at the point of use, and how do experiences differ between demographics and user groups? Can we observe unexpected side effects from using this technology or being exposed to its outputs?
Systemic impact evaluation focuses on the broader structures into which an AI system is embedded, such as social institutions, labour markets, and the natural environment. Evaluation at this layer can shed light on risks of harm that become visible only once an AI system is adopted at scale.
Safety evaluations are a shared responsibility
AI developers need to ensure that their technologies are developed and released responsibly. Public actors, such as governments, are tasked with upholding public safety. As generative AI systems are increasingly widely used and deployed, ensuring their safety is a shared responsibility between multiple actors:
- AI developers are well-placed to interrogate the capabilities of the systems they produce.
- Application developers and designated public authorities are positioned to assess the functionality of different features and applications, and possible externalities to different user groups.
- Broader public stakeholders are uniquely positioned to forecast and assess societal, economic, and environmental implications of novel technologies, such as generative AI.
The three layers of evaluation in our proposed framework are a matter of degree, rather than being neatly divided. While none of them is entirely the responsibility of a single actor, the primary responsibility depends on who’s best placed to perform evaluations at each layer.
Gaps in current safety evaluations of generative multimodal AI
Given the importance of this additional context for evaluating the safety of AI systems, understanding the availability of such tests is important. To better understand the broader landscape, we made a wide-ranging effort to collate evaluations that have been applied to generative AI systems, as comprehensively as possible.
By mapping the current state of safety evaluations for generative AI, we found three main safety evaluation gaps:
- Context: Most safety assessments consider generative AI system capabilities in isolation. Comparatively little work has been done to assess potential risks at the point of human interaction or of systemic impact.
- Risk-specific evaluations: Capability evaluations of generative AI systems are limited in the risk areas that they cover. For many risk areas, few evaluations exist. Where they do exist, evaluations often operationalise harm in narrow ways. For example, representation harms are typically defined as stereotypical associations of occupation to different genders, leaving other instances of harm and risk areas undetected.
- Multimodality: The vast majority of existing safety evaluations of generative AI systems focus solely on text output — big gaps remain for evaluating risks of harm in image, audio, or video modalities. This gap is only widening with the introduction of multiple modalities in a single model, such as AI systems that can take images as inputs or produce outputs that interweave audio, text, and video. While some text-based evaluations can be applied to other modalities, new modalities introduce new ways in which risks can manifest. For example, a description of an animal is not harmful, but if the description is applied to an image of a person it is.
We’re making a list of links to publications that detail safety evaluations of generative AI systems openly accessible via this repository. If you would like to contribute, please add evaluations by filling out this form.
Putting more comprehensive evaluations into practice
Generative AI systems are powering a wave of new applications and innovations. To make sure that potential risks from these systems are understood and mitigated, we urgently need rigorous and comprehensive evaluations of AI system safety that take into account how these systems may be used and embedded in society.
A practical first step is repurposing existing evaluations and leveraging large models themselves for evaluation — though this has important limitations. For more comprehensive evaluation, we also need to develop approaches to evaluate AI systems at the point of human interaction and their systemic impacts. For example, while spreading misinformation through generative AI is a recent issue, we show there are many existing methods of evaluating public trust and credibility that could be repurposed.
Ensuring the safety of widely used generative AI systems is a shared responsibility and priority. AI developers, public actors, and other parties must collaborate and collectively build a thriving and robust evaluation ecosystem for safe AI systems.