Skip to main content
Pascal Poupart
Pascal Poupart

Reinforcement Learning for Improved Text and Image Generation

  • Title: Reinforcement Learning for Improved Text and Image Generation
  • Speaker:  Dr. Pascal Poupart David R. | Cheriton School of Computer Science | University of Waterloo
  • Date: Thursday, Oct 23, 2025
  • Time: 4:00pm (EST)
  • Room: Mathematics Boardroom | LH3058

Summary

Reinforcement learning (RL) has become a key tool to train large language models (LLMs). In this talk, I will explain how RL from human feedback can improve the alignment of LLMs. I will also discuss recent advances in reward-guided text generation. Finally, I will explain how to leverage reinforcement learning to orchestrate an ensemble of diffusion models to follow complex instructions to generate detailed images.

Biography

Pascal Poupart is a Professor in the David R. Cheriton School of Computer Science at the University of Waterloo (Canada). He is also a Canada CIFAR AI Chair at the Vector Institute and a member of the Waterloo AI Institute. He serves on the advisory board of the NSF AI Institute for Advances in Optimization (2022-present) at Georgia Tech, UC Berkeley and the University of Southern California. He served as Research Director and Principal Research Scientist at the Waterloo Borealis AI Research Lab at Royal Bank of Canada (2018-2020). He also served as scientific advisor for ProNavigator (2017-2019), ElementAI (2017-2018) and DialPad (2017-2018). His research focuses on the development of algorithms for Machine Learning with applications to Natural Language Processing and Material Discovery. He is most well-known for his contributions to the development of Reinforcement Learning algorithms. Notable projects that his research team is currently working on include democratizing large language models, inverse constraint learning, mean-field RL, RL for generalist agents, Bayesian federated learning, uncertainty quantification, probabilistic deep learning, conversational agents, transcription error correction, sports analytics, and material discovery for CO2 recycling.
Unknown Spif - $key