Cornell Tech Frontiers of AI Symposium

May 28–29, 2026

Tata Innovation Center, Cornell Tech

The Cornell Tech Frontiers of AI Symposium is a new annual event that brings together leading academic researchers around selected topics, with the goal of fostering collaboration, catalyzing new research directions, and disseminating knowledge. The symposium is organized by Yoav Artzi and Kilian Weinberger.

The symposium is preceded by the Cornell Tech Frontiers of AI Summit on May 27, 2026, a public-facing event that brings together leading researchers, practitioners, and entrepreneurs, featuring research talks, startup spotlights, and panels.

Due to limited space, attendance is by invitation only.

The Frontiers of AI Symposium is made possible with generous support from the Secunda Family Foundation.

For directions and travel information, see Visit Cornell Tech.

Speakers

May 28: NextGen AI Models

Aditi Raghunathan (CMU) / Carl Vondrick (Columbia) / John Langford (MSR) / Sewon Min (Berkeley) / Sherry Yang (NYU) / Surbhi Goel (UPenn) / Tatsunori Hashimoto (Stanford) / Yoav Artzi (Cornell)

May 29: AI Reasoning and Scientific Discoveries

Berthy Feng (MIT) / Fei Sha (Meta) / Jacob Gardner (UPenn) / Kyunghyun Cho (NYU) / Peter Frazier (Cornell) / Volodymyr Kuleshov (Cornell) / Yisong Yue (Caltech)

Schedule

Day 1 — May 28: NextGen AI Models

8:45–9:30	Light breakfast
9:30–10:15	John Langford (MSR) / Next-Latent Prediction Transformers Learn Compact World Models Abstract. Transformers replace recurrence with a memory that grows with sequence length and self-attention that enables ad-hoc look ups over past tokens. Consequently, they lack an inherent incentive to compress history into compact latent states with consistent transition rules. This often leads to learning solutions that generalize poorly. We introduce Next-Latent Prediction (NextLat), which extends standard next-token training with self-supervised predictions in the latent space. Specifically, NextLat trains a transformer to learn latent representations that are predictive of its next latent state given the next output token. Theoretically, we show that these latents provably converge to belief states, compressed information of the history necessary to predict the future. This simple auxiliary objective injects a recurrent inductive bias into transformers, while leaving their architecture, parallel training, and inference unchanged. NextLat effectively encourages the transformer to form compact internal world models with its own belief states and transition dynamics — a crucial property absent in standard next-token prediction transformers. Empirically, across benchmarks in world modeling, reasoning, planning, and language modeling, NextLat demonstrates significant gains over standard next-token training in downstream accuracy, representation compression, and lookahead planning. Furthermore, NextLat enables variable-length self-speculative decoding, accelerating inference by up to 3.3× in the language domain. NextLat stands as a simple and efficient paradigm for shaping transformer representations toward stronger generalization. Bio. See: https://en.wikipedia.org/wiki/John_Langford_(computer_scientist)
10:15–10:20	Micro break
10:20–11:05	Tatsunori Hashimoto (Stanford) / New directions in synthetic data Abstract. Synthetic data has been a effective, if boring set of techniques: prompt some language model to restructure your corpus to match some downstream task, with occasionally some distillation. In this talk, we will take a more expansive view of synthetic data as a general algorithmic tool for generative modeling, arguing that the design space and possibilities of synthetic data are much bigger than it might seem. Through a few recent works, we will show that synthetic data has major benefits beyond transforming the data — improving in-domain perplexities, and enabling unique algorithmic primitives, such as neighborhood smoothing and concatenated ‘mega’ documents. With this broader view, we will point towards a nascent but interesting possibility of treating data itself as an algorithmic object to be engineered and optimized end-to-end. Bio. Tatsunori Hashimoto is an Assistant Professor in the Computer Science Department at Stanford University. Work from his group spans many aspects of statistical machine learning and language models. He received his Ph.D. at MIT under the supervision of Tommi Jaakkola and David Gifford, and is the recipient of several awards including the Sloan and NSF CAREER and his works have been recognized with paper awards at ICML, ICLR, and CHI.
11:05–11:25	Break
11:25–12:10	Lightning Talks Surbhi Goel (UPenn) / Calibrated Uncertainty for Multi-Agent Collaboration Abstract. AI systems increasingly operate in settings where no single model or person sees the full problem. Each agent or human may hold only part of the relevant information, yet current systems often fail to combine these pieces into a better joint decision. Since sharing everything is often infeasible, and may not help even when possible, the core question is what low-bandwidth messages can support effective collaboration. In this talk, I will describe my work on calibrated predictions as an interface for decision-making under partial information. Rooted in agreement theory, the idea is that agents can exchange forecasts rather than raw observations. When these forecasts are calibrated, simple protocols can quickly reach agreement, improve on individual decisions, and under suitable conditions match the decision quality achievable if all information were combined. I will connect these guarantees to experiments on collaborative maze solving with LLM agents, where calibrated uncertainty substantially improves performance over other forms of communication. Bio. Surbhi Goel is the Magerman Term Assistant Professor of Computer and Information Science at the University of Pennsylvania, affiliated with the Theory group, the ASSET Center, and the Warren Center. Her research develops theoretical foundations for safe, reliable, and trustworthy AI. She is the recipient of the Schmidt Sciences AI2050 Early Career Fellowship, the Alfred P. Sloan Research Fellowship, an NSF CAREER award, and an Amazon Research award. She is also the co-founder of the Learning Theory Alliance (LeT-All), a community-building and mentorship initiative. Carl Vondrick (Columbia) / Advances in Multimodal Perception Abstract. Animals use a variety of senses to understand the world, combining sight, touch, sound, and more. While there has been significant advances in machine vision, I will discuss another modality that we have been exploring over the last year: olfaction. I will talk about the progress we have made so far to build machines that can smell, and discuss future directions for where this technology could go in the future. Paper. https://arxiv.org/abs/2511.20544 Bio. Carl Vondrick is the YM Associate Professor of Computer Science at Columbia University. Previously, he was a Research Scientist at Google, and he received his PhD from MIT. His research interests are in computer vision, machine learning, and their applications. He is the recipient of the PAMI Young Researcher Award and the NSF CAREER award. is research is supported by the NSF, DARPA, Amazon, Google, and Toyota. For more information, please visit his website at https://www.cs.columbia.edu/~vondrick/. Sherry Yang (NYU) / From Pretraining World Models to Post-Training Physical Agents Abstract. While deep neural networks have achieved superhuman performance in domains with low-cost simulations from AlphaGo to LLMs for code generation, their application to the physical world is bottlenecked by a fundamental challenge: high-cost interactions from robots. This talk outlines strategies for pretraining world models as high-fidelity simulators for robotics, and discusses RL post training for physical agents in a learned world model. Bio. Sherry Yang is an Assistant Professor of Computer Science at NYU Courant and a Staff Research Scientist at Google DeepMind. She researches in machine learning with a focus on reinforcement learning and generative modeling. Her current research interests include learning world models and agents, and their applications in robotics and AI for science. Her research has been recognized by the Best Paper award at ICLR and various media outlets such as VentureBeat and TWIML. She has organized tutorials, workshops, and served as Area Chairs at major conferences (NeurIPS, ICLR, ICML, CVPR). Prior to her current role, she was a post-doc at Stanford working with Percy Liang. She received her Ph.D. from UC Berkeley advised by Pieter Abbeel and Master’s and Bachelor's degrees from MIT.
12:10–1:30	Lunch (catered)
1:30–2:15	Sewon Min (Berkeley) / Rethinking Modularity and Abstraction in LLMs Abstract. Today's LLMs are powerful, but I argue in this talk that they are still flawed in two ways. First, they are deployed as monolithic systems: even narrowly scoped tasks require a massive full model. Second, they are not native enough: in fact, text abstractions themselves may be unnecessary. In this talk, I will present two recent works that address these issues. First, we focus on mixture-of-experts (MoE) models, a dominant architecture in LLMs. While MoEs appear to be modular, we show that in practice they are not: restricting inference to a subset of experts causes severe degradation, and this is intrinsic to how they are trained. We show, however, that it is possible to train an MoE such that modularity emerges naturally, without imposing human priors. Our model, EMO, enables selective use of expert subsets — down to 12.5% with minimal performance loss — while naturally organizing experts by domain. In the second part, I argue for removing text abstractions altogether: humans perceive the world visually, and models should operate directly in pixel space. While ambitious, recent advances in VLMs make this increasingly feasible. I will present PixelRAG, a retrieval-augmented generation model that retrieves web information directly in pixel space. By eliminating complex and lossy HTML parsing, PixelRAG simplifies the pipeline while outperforming text-based RAG, even on text-centric benchmarks like SimpleQA and NQ, and also introduces a new efficiency lever through image compression. Bio. Sewon Min is an Assistant Professor in EECS at UC Berkeley, affiliated with Berkeley AI Research (BAIR), and a Research Scientist at the Allen Institute for AI. Her research focuses on understanding and advancing large language models (LLMs), with the goal of improving their performance, flexibility, adaptability, factuality, and reasoning through new architectures and training methods. She also develops tools and infrastructure for data and model auditing. Her work has received multiple best paper awards, dissertation awards from ACM, ACL, and AAAI, and several fellowships. She earned her Ph.D. from the University of Washington and has held research positions at Meta AI, Google, and Salesforce.
2:15–3:00	Yoav Artzi (Cornell) / Sparks of New Pre-Training Abstract. This talk covers new pre-training techniques. First, we introduce the hypothesis that state and prediction representations, which are entangled in transformers, are better separated. We design a simple architectural modification that effectively separates them, and provides 2.6x token efficiency during pre-training. The second technique is focused on externalizing knowledge by pre-training an LLM to rely on an external knowledge base, while inducing this KB from the pre-training data. We re-model the Limited Memory Language Model (LMLM) paradigm we introduced in prior work, with a new expressive continuous query mechanism. This dramatically increases the expressivity of the LMLM paradigm, and allows scaling to general web text. Our Co-LMLM model outperforms a model of similar size that is trained on 40x the amount of tokens, while presenting all the advantages of the LMLM class — knowledge control, provenance, editing, and factuality. Together, these techniques demonstrate that there are many avenues to bring about fundamental change in LLMs through new pre-training paradigms. Bio. Yoav Artzi is an Associate Professor in the Department of Computer Science and Cornell Tech at Cornell University, a visiting faculty research at Google DeepMind, and arXiv's associate faculty director. His research focuses on language modeling and learning in interactive and situated scenarios. His work was acknowledged by awards and honorable mentions at ACL, EMNLP, NAACL, and IROS, as well as a TACL test-of-time award. Yoav holds a B.Sc. from Tel Aviv University and a Ph.D. from the University of Washington.
3:00–3:30	Coffee break
3:30–4:15	Aditi Raghunathan (CMU) / Next-Gen Pretraining for Downstream Flexibility Abstract. Pretraining LLMs at scale is reaching its limits — not in raw benchmark performance, but in the flexibility of what we can do with the resulting model. In this talk, I will argue that the path forward requires rethinking pretraining itself, including the optimizer, the architecture, and the objective. First, I will present a surprising finding: more pretraining can make models worse downstream, harder to finetune and more fragile under quantization. We trace this catastrophic overtraining to a simple culprit: sensitivity to perturbation, which grows steadily over the course of pretraining. Targeting sensitivity directly, through interventions like data curriculum and sharpness-aware optimization, yields up to 40% better downstream performance across both finetuning and quantization. Next, I will turn to unlearning, which has emerged as one of the central problems for privacy, copyright, and safe deployment of LLMs, and has proven remarkably resistant to post-hoc fixes. I will show why: standard training entangles memorization with generalization in the same neurons. We introduce Memorization Sinks, which exploit learning dynamics to disentangle the two by design. The result is the first natively unlearnable language models, where each of millions of training sources can be cleanly removed by deactivating a small set of neurons, matching a model retrained from scratch without that source. Finally, I will discuss how to enable creativity in tasks that require a far-sighted leap of thought, like scientific discovery, and argue that next-token prediction is the wrong default for the generative flexibility these tasks demand. Bio. Aditi Raghunathan is an Assistant Professor of Computer Science at Carnegie Mellon University. Her work advances trustworthy AI by translating insights from the scientific study of frontier model failures into methods that make them robust and safe. She is a recipient of the Sloan Research Fellowship, NSF CAREER Award, Okawa Research Award, Schmidt AI2050 Early Career Fellowship, Google Research Scholar Award, Forbes 30 Under 30 recognition, Arthur Samuel Best Thesis Award at Stanford, and multiple PhD fellowships. Her work has also been recognized with an Outstanding Paper Award at ICML 2025 and several workshop paper awards.
4:30–5:15	Panel: Aditi Raghunathan, John Langford, Sewon Min, Yoav Artzi / Moderator: Tatsunori Hashimoto

Day 2 — May 29: AI Reasoning and Scientific Discoveries

8:45–9:30	Light breakfast
9:30–10:15	Kyunghyun Cho (NYU) / Generalists vs. Specialists: Learning to Search Abstract. Although large language models (LLMs) have shown promise in biomolecule optimization problems, they incur heavy computational costs and struggle to satisfy precise constraints. On the other hand, specialized solvers like LaMBO-2 offer efficiency and fine-grained control but require more domain expertise. Comparing these approaches is challenging due to expensive laboratory validation and inadequate synthetic benchmarks. We address this by introducing Ehrlich functions, a synthetic test suite that captures the geometric structure of biophysical sequence optimization problems. With prompting alone, off-the-shelf LLMs struggle to optimize Ehrlich functions. In response, we propose LLOME (Language Model Optimization with Margin Expectation), a bilevel optimization routine for online black-box optimization. When combined with a novel preference learning loss, we find LLOME can not only learn to solve some Ehrlich functions, but can even perform as well as or better than LaMBO-2 on moderately difficult Ehrlich variants. However, LLMs also exhibit some likelihood-reward miscalibration and struggle without explicit rewards. Our results indicate LLMs can occasionally provide significant benefits, but specialized solvers are still competitive and incur less overhead. Paper. https://proceedings.mlr.press/v267/chen25bg.html Bio. Kyunghyun Cho is the Glen de Vries Professor of Health Statistics and a professor of computer science and data science at New York University. He is also a CIFAR Fellow of Learning in Machines & Brains and an Associate Member of the National Academy of Engineering of Korea. Early 2021, he co-founded Prescient Design which was acquired by Genentech late 2021. Since then, he served as an Executive Director of Frontier Research and a Senior Fellow at Genentech until January 2026. He served as a (co-)Program Chair of ICLR 2020, NeurIPS 2022 and ICML 2022 and also on the boards of ICML and ICLR. He was one of the three founding Editors-in-Chief of the Transactions on Machine Learning Research (TMLR) until 2024. He was a research scientist at Facebook AI Research from June 2017 to May 2020 and a postdoctoral fellow at University of Montreal until Summer 2015 under the supervision of Prof. Yoshua Bengio, after receiving MSc and PhD degrees from Aalto University April 2011 and April 2014, respectively, under the supervision of Prof. Juha Karhunen, Dr. Tapani Raiko and Dr. Alexander Ilin. He received the Samsung Ho-Am Prize in Engineering in 2021. He tries his best to find a balance among machine learning, natural language processing, and life, but almost always fails to do so.
10:15–10:20	Micro break
10:20–11:05	Peter Frazier (Cornell) / Learning from Evolution's Winners Abstract. Over billions of years, evolution has discovered proteins with remarkable capabilities that are valuable for medicine and materials science. Frogs produce peptides that kill bacteria, fish produce antifreeze proteins that prevent ice crystals from damaging cell membranes in freezing water, and mussels produce proteins that form strong underwater adhesives. Modern genomic sequencing has created massive datasets describing these naturally evolved molecules, creating new opportunities for AI systems to learn relationships between protein sequence and function and to help design proteins with capabilities beyond those found in nature. But these datasets are biased. Evolution only show us the winners: protein sequences that survived selection and spread through populations. We do not observe the many sequences that were explored but failed to function. Similar selection effects also arise in experimental protein engineering methods such as biopanning. This creates a major challenge for AI systems trained on natural biological sequence data. This talk presents a framework for protein function prediction that explicitly models this evolutionary selection process. The key insight is that missing sequences are informative. If a protein sequence could easily have arisen through mutation but is never observed in nature, this provides evidence against functionality. By modeling mutational accessibility and evolutionary reach, our approach distinguishes sequences that are missing because they are nonfunctional from sequences that are missing simply because evolution never reached them. We demonstrate this approach on viral protein prediction tasks involving membrane fusion, receptor binding, and immune evasion, where it improves prediction of previously unseen functional variants relative to existing protein modeling and positive–unlabeled learning methods. More broadly, this work suggests that future AI systems for biology may benefit not only from scaling models and datasets, but also from modeling the scientific processes that determine which data become observable in the first place. Paper. https://arxiv.org/pdf/2605.06879 Bio. Peter Frazier is the Eleanor and Howard Morgan Professor of Operations Research and Information Engineering at Cornell University. His research spans AI for science and decision-making, including Bayesian optimization, large language models for decision support and preference learning, and multi-armed bandits. During the pandemic, he led Cornell's COVID-19 Mathematical Modeling Team, which helped design the Ithaca campus's asymptomatic testing program and provided university leadership with science-based decision support. From 2015–2024, he worked as a scientist at Uber, where he designed and optimized pricing systems. He recently founded Saddlepoint Labs, a startup using computer vision and sensor data to improve the reproducibility of wet lab experiments. He is the winner of best paper awards from the ACM Conference on Economics and Computation, the INFORMS Section on Auctions and Market Design, the INFORMS Applied Probability Society, the INFORMS Computing Society, and the Winter Simulation Conference.
11:05–11:25	Break
11:25–12:10	Lightning Talks Jacob Gardner (UPenn) / Self Driving Datasets: From 20 Million Papers to Structured Biomedical Knowledge Abstract. Manually curated biomedical repositories — spanning bioactivity, genomics, and chemistry — are expensive to maintain, lag the primary literature, and often discard the experimental nuance that determines whether measurements from different studies are comparable. We show that PubMed itself can be turned into structured datasets — autonomously and cost effectively — that are larger, more nuanced, and more accurate than the curated databases they would replace. I will discuss three coupled contributions: (1) an LLM-based entity-tagging pipeline grounded in nine biomedical ontologies that tags 4.5 billion entities across 19 categories in a 22.5M-paper, 2.5-trillion-token PubMed corpus; (2) hybrid sparse–dense retrieval infrastructure supporting surgical entity-filtered semantic queries over the tagged corpus; and (3) Starling, a multi-agent deep research system that, given only a natural-language task description, autonomously designs precision- and recall-targeted retrieval filters, induces an extraction schema, and emits structured records with nuance-rich supporting passages. Applied to six tasks — blood-brain barrier permeability, oral bioavailability, acute toxicity (LD50), gene–disease associations, protein subcellular localization, and chemical reactions — Starling produces ~5 million records (per-task scale ranges from 131K to 3M); several of these are the largest public datasets we are aware of for the property in question. Frontier-model disagreement on our kept extractions is 0.6–4.7% across tasks, surprisingly substantially below the error rates we measure on the widely used manually curated counterparts (e.g., 16.5% on TDC BBB, 7% on TDC Oral Bioavailability). Beyond scale and accuracy, the attached supporting passages carry nuance that tabular databases discard: for example, oral bioavailability of a molecule might depend on whether the patient is fed or fasting. Together, the corpus, retrieval layer, and agent establish a foundation for multimodal predictive and generative models in AI-driven therapeutic design. Bio. Jake is an assistant professor at the University of Pennsylvania in the Computer and Information Science department. His group does research focusing on applications of AI and machine learning to medicine and chemistry including the development of new antibiotics, vaccines, antibodies, materials and more. Volodymyr Kuleshov (Cornell) / Discrete Diffusion Generative Models: The Next Frontier of Language and Biological Sequence Modeling Abstract. Generative modeling of discrete data such as text or biological sequences is dominated today by autoregressive (AR) approaches. Our work introduces discrete diffusion models, which generate entire sequences in parallel, starting from noise (e.g., a random sequence) and iteratively refining it until it looks like data. Diffusion is not constrained to generate data sequentially, and can thus iteratively revise its own mistakes, leverage bidirectional context, output many tokens at once for faster sampling, and support powerful guidance mechanisms. Specifically, we introduce masked diffusion language models (MDLMs), which close the quality gap with AR models and serve as the basis of most of today's open source diffusion models for language. Combined with remasking and novel extensions of classifier-free and classifier-based guidance, MDLMs are also substantially more controllable than their AR counterparts. The framework extends naturally beyond language to the sciences, where it underpins a new generation of Nucleotide Transformer foundation models: our largest 10B models achieve state-of-the-art results in genome annotation while also enabling effective generation of regulatory sequences. Together, these results suggest that discrete diffusion models are a promising path forward for generative modeling and its applications in language understanding and scientific discovery. Bio. Volodymyr Kuleshov is the Joan Eliasoph, M.D. Assistant Professor at the Jacobs Technion-Cornell Institute at Cornell Tech and in the Computer Science Department at Cornell University. He obtained his Ph.D. in Computer Science from Stanford University, where he was the recipient of the Arthur Samuel Best Thesis Award. Kuleshov’s research interests are in the field of generative modeling and its applications in scientific discovery and health. His work has been featured in Nature Biotechnology, Nature Medicine, Nature Communications, and has been recognized with an NSF CAREER award, NIH MIRA award, as well as multiple industry awards. Kuleshov is also a co-founder of Inception AI, a startup developing the world's first diffusion language models. Berthy Feng (MIT) / Imaging at the Edge of Science: Integrating Scientific Knowledge and AI to Recover Hidden Structure Abstract. Images play a central role in scientific discovery. Whether it’s astronomical, biological, or materials systems, bringing complex phenomena into view enables scientists to probe, model, and fundamentally understand them. However, many of the most important scientific questions lie at the edge of what can be directly observed. We can accomplish extreme imaging through computational methods, bringing the invisible into view by supplementing limited observable data with human-imposed assumptions, or priors. When imaging for science, the challenge is imposing just enough known assumptions to infer the unknown. I create principled methods for bringing advanced priors, such as scientific knowledge and AI, into computational imaging. Using astrophysics as a running example, this talk presents my vision for a framework in which scientists systematically explore different priors, understand their effects on imaging, and extract scientific insights. The talk is organized in three parts: 1. First, we understand the importance of priors in extreme scientific imaging. I present my work on leveraging generative AI to flexibly tune a knob between different priors and understand their effects on imaging. Applied to black-hole imaging, my approach lets us infer physical features of a real black hole by identifying image features that are robust to prior assumptions. 2. Second, we carefully balance scientific assumptions to solve an extreme imaging problem in astrophysics. I present Physics-informed Dynamic Emission Fields (PI-DEF), a method for imaging the dynamic 3D gas near a black hole. PI-DEF strikes a balance between known/unknown physics, imposing known physics as hard constraints on the solution while leaving room for learning unknown physics, such as the velocity field near the black hole. 3. Third, we open an efficient route for bringing in known physics across imaging problems. I present Neural Approximate Mirror Maps (NAMMs), which learn to automatically impose any desired physics constraint onto any image. With NAMMs, we can easily incorporate known constraints (e.g., conservation laws) into generated and reconstructed images. The ideas of my talk naturally extend to many scientific domains, including biology, chemistry, and materials science. Papers. https://arxiv.org/abs/2406.02785 https://arxiv.org/abs/2406.12816 https://arxiv.org/abs/2602.08029 Bio. Berthy Feng is a postdoctoral researcher at MIT CSAIL and a fellow at the NSF Institute for AI and Fundamental Interactions (IAIFI), working with Prof. Bill Freeman. She received her PhD in Computational and Mathematical Sciences at Caltech, working with Prof. Katie Bouman. During her PhD, she was supported by the NSF GRFP and Kortschak Scholarship. Before that, she received her Bachelor’s degree in Computer Science at Princeton University. She builds computational imaging algorithms that integrate physics knowledge and AI to push the limits of what we can see.
12:10–1:30	Lunch (catered)
1:30–2:15	Yisong Yue (Caltech) / The Dark Knowledge of Science Abstract. Scientific discovery depends on knowledge that is rarely observed directly. Some of it lives in human experts: tacit knowledge and experience that often never make it into papers. Other knowledge is embedded in high-dimensional observations: images, spectra, time series, video, and instrument outputs whose scientific meaning is not immediately visible. In both cases, the challenge is to surface hidden structure and turn it into knowledge that can guide future discovery. This talk will cover progress toward turning this hidden, or “dark,” knowledge into reusable artifacts. From high-dimensional observations, scientific foundation models can learn representations that reveal structure not apparent in the raw data, such as fine-grained behavior in video or latent variables behind indirect measurements. From human experts, AI systems can begin to recover the tacit expertise behind scientific judgment: what researchers try, what they reject, and how they revise their understanding. Taken together, these efforts point toward a broader ambition: AI systems that make the dark knowledge of science more visible, computable, and useful for discovery. Bio. Yisong Yue is a Professor of Computing and Mathematical Sciences at the California Institute of Technology. His research centers on machine learning and artificial intelligence, with a focus on making AI work in high-stakes and high-expertise domains. His agenda spans both fundamental and applied work, from novel learning frameworks to deployment in autonomous driving on public roads. He was previously a research scientist at Disney Research and a postdoctoral researcher in the Machine Learning Department and iLab at Carnegie Mellon University. He received his Ph.D. from Cornell University and his B.S. from the University of Illinois at Urbana-Champaign. He previously served as Senior Program Chair of ICLR 2024 and General Chair of ICLR 2025, and currently serves on the ICLR board. His work has received multiple paper awards and nominations across robotics, computer vision, sports analytics, machine learning for health, and information retrieval; during his time in industry, he worked on machine learning for behavior modeling and motion planning in autonomous driving.
2:15–3:00	Fei Sha (Meta) / Advances in Probabilistic Generative Modeling for Scientific Machine Learning Abstract. Leveraging large-scale data and computing accelerator systems, statistical learning has led to significant paradigm shifts in many scientific disciplines. Grand challenges in science have been tackled with exciting synergy between disciplinary science, physics-based simulations via high-performance computing, and powerful learning methods. In this talk, I will describe several vignettes of our research on modeling complex dynamical systems characterized by partial differential equations with turbulent solutions. I will also demonstrate how machine-learning technologies, especially advances in generative AI, are effectively applied to address the computational and modeling challenges in such systems, exemplified by their successful applications to weather forecasting and climate projection. I will also discuss the new challenges and opportunities that future machine-learning research faces. Papers. https://www.pnas.org/doi/full/10.1073/pnas.2420288122 https://arxiv.org/abs/2409.18359 https://arxiv.org/abs/2412.08079 Bio. Fei Sha is an AI Research Scientist at Meta. He is broadly interested in probabilistic modeling, uncertainty quantification, dynamical systems, and probabilistic reasoning in LLMs. Before joining Meta, he led a team of scientists and engineers at Google Research, working in various topics, incuding basic methods and technology for LLMs, probabilistic generative modeling and their applications to dynamical systems (such as weather and climate). Before joining Google Research, he was a Professor of Computer Science and the Zohrab A. Kaprielian Fellow in Engineering at the University of Southern California (USC). He has been recognized with numerous awards and accolades for his innovative work, including being selected as an Alfred P. Sloan Research Fellow in 2013 and receiving an Army Research Office Young Investigator Award in 2012. He has a PhD in Computer and Information Science from the University of Pennsylvania and BSc and MSc degrees from Southeast University (Nanjing, China).
3:00–3:30	Coffee break
3:30–4:15	Panel: Fei Sha, Kyunghyun Cho, Yisong Yue / Moderator: Peter Frazier