How do large language models (LLMs), which power chatbots like ChatGPT, actually work?
Few are better qualified to answer this question than Surbhi Goel, Magerman Term Assistant Professor in Computer and Information Science (CIS) at Penn Engineering.
In graduate school, Goel completed her doctoral dissertation on understanding how neural networks work, of which LLMs are just one prominent example. Prior to coming to Penn, Goel joined Microsoft Research as a postdoctoral researcher, giving her a behind-the-scenes look at the company that partners most closely with OpenAI.
Each spring, Goel now teaches CIS 5200: Machine Learning, a foundational machine learning course. Last fall, she designed a new graduate-level seminar course, CIS 7000: Foundations of Modern Machine Learning, which delves into understanding the complexities and mysteries at the heart of AI from a mathematical perspective.
In the 2024-2025 academic year, Goel will co-organize the Special Year on Large Language Models, a series of workshops on the study of LLMs sponsored by the Simons Institute for the Theory of Computing. (Michael Kearns, National Center Chair Professor in CIS, will also participate.)
“You can find hundreds and hundreds of articles about large language models,” Goel told the audience at this year’s Women in Data Science (WiDS) @ Penn Conference in February. “They’ve entered our classrooms, they’re being used in all these real-world applications.”
But, as Goel noted, these models are also something of a “black box,” even to the researchers who design them. Building from explanations of the simplest version of a language model to the design of the revolutionary “transformer” architecture behind tools like ChatGPT, Goel demystifies how some of the AI systems reshaping society actually work.