By Jimmy O’Hara

Ph.D. students in Penn’s Brachio Lab have been assessing the safety of machine learning systems for real-world uses like economic forecasting, cultural politeness, and legal reasoning. In their work, a new focus area has emerged: The extent to which large language models (LLMs) are capable of cyberbullying behavior—and how to debug where needed.

Penn student-led research into LLM cyberbullying is a relevant imperative given the widespread availability of these models. People are increasingly using LLMs for personal, educational, and commercial purposes—and there are growing concerns about the safety of AI for public use. Such timely work reflects Penn’s strategic framework, In Principle and Practice, which highlights how the University is leading on great challenges, including AI research.

“The benefit of language models is that literally everybody can use them—and this is the thing that worries me, that literally everyone can use them,” says Eric Wong, an assistant professor in the Department of Computer and Information Science (CIS) in Penn Engineering and faculty lead for the Brachio Lab, which seeks to improve machine learning for societal benefit.

This area of AI research is especially pertinent, Wong continues, as cases emerge of LLMs encouraging self-harm.

“Every so often, there is someone that has these conversations with a language model, and then it sort of spirals into a darker area,” Wong says, “and eventually it culminates in the language model telling the user to start harming themselves.”

Evaluating the safety and effectiveness of LLMs can help researchers to patch faults that yield cyberbullying outcomes—and help to prevent it from occurring in the first place.

“To make sure that these don’t hurt people or bully people,” Wong explains, “we need to be able to actually profile each individual model, which has different strengths and weaknesses—and that’s what this research does.”

Read more at Penn Today