Event Series: AI & Equality Community | Events

Testing AI Safety: Why Current Guardrails Fail to Stop Social Bias with Anna-Maria Gueorgiueva | AI & Equality Pub-Talk

Name: Testing AI Safety: Why Current Guardrails Fail to Stop Social Bias with Anna-Maria Gueorgiueva | AI & Equality Pub-Talk
Start: 2026-06-25T15:00:00+01:00
End: 2026-06-25T16:00:00+01:00

June 25 @ 3:00 pm - 4:00 pm

Access paper: https://arxiv.org/abs/2512.19238

How do large language models understand the lived experiences of stigmatized groups, and when does this understanding differ from the human perspective? Can this lead to bias, and if so, do our existing safety tools help mitigate such bias? This work investigated open-source language models for bias against 93 stigmatized groups, identifying that specific types of biases (especially those deemed by humans to be ‘threatening’ such as having HIV or a criminal record) experience significantly more bias than other types of stigmatized identities. To attempt to remedy this, we test guardrail models, models from leading technology companies that are meant to identify discriminatory or bias-eliciting inputs and mitigate harmful outputs. This talk will report on our findings, identifying where existing guardrail models fail and discussing technical and legal solutions.

About the speaker:
Anna-Maria Gueorguieva is a PhD student at the University of Washington Information School and holds B.A. in Data Science and Legal Studies from UC Berkeley. Her research focuses on AI evaluations for social and political impacts and AI regulation. Her work lies at the intersection of empirical methods to investigate AI usage and behavior in combination with the necessary AI regulations needed to limit and remedy harm.

Register here via our community on Circle

Details

Date:: June 25
Time:: 3:00 pm - 4:00 pm
Series:: AI & Equality Community | Events