Abstract. Transformers replace recurrence with a memory that grows with sequence length and self-attention that enables ad-hoc look ups over past tokens. Consequently, they lack an inherent incentive to compress history into compact latent states with consistent transition rules. This often leads to learning solutions that generalize poorly. We introduce Next-Latent Prediction (NextLat), which extends standard next-token training with self-supervised predictions in the latent space. Specifically, NextLat trains a transformer to learn latent representations that are predictive of its next latent state given the next output token. Theoretically, we show that these latents provably converge to belief states, compressed information of the history necessary to predict the future. This simple auxiliary objective injects a recurrent inductive bias into transformers, while leaving their architecture, parallel training, and inference unchanged. NextLat effectively encourages the transformer to form compact internal world models with its own belief states and transition dynamics — a crucial property absent in standard next-token prediction transformers. Empirically, across benchmarks in world modeling, reasoning, planning, and language modeling, NextLat demonstrates significant gains over standard next-token training in downstream accuracy, representation compression, and lookahead planning. Furthermore, NextLat enables variable-length self-speculative decoding, accelerating inference by up to 3.3× in the language domain. NextLat stands as a simple and efficient paradigm for shaping transformer representations toward stronger generalization.
Cornell Tech Frontiers of AI Symposium
The Cornell Tech Frontiers of AI Symposium is a new annual event that brings together leading academic researchers around selected topics, with the goal of fostering collaboration, catalyzing new research directions, and disseminating knowledge. The symposium is organized by Yoav Artzi and Kilian Weinberger.
The symposium is preceded by the Cornell Tech Frontiers of AI Summit on May 27, 2026, a public-facing event that brings together leading researchers, practitioners, and entrepreneurs, featuring research talks, startup spotlights, and panels.
Due to limited space, attendance is by invitation only.
The Frontiers of AI Symposium is made possible with generous support from the Secunda Family Foundation.
For directions and travel information, see Visit Cornell Tech.
Speakers
May 28: NextGen AI Models
Aditi Raghunathan (CMU) / Carl Vondrick (Columbia) / John Langford (MSR) / Sewon Min (Berkeley) / Sherry Yang (NYU) / Surbhi Goel (UPenn) / Tatsunori Hashimoto (Stanford) / Yoav Artzi (Cornell)
May 29: AI Reasoning and Scientific Discoveries
Berthy Feng (MIT) / Fei Sha (Meta) / Jacob Gardner (UPenn) / Kyunghyun Cho (NYU) / Peter Frazier (Cornell) / Shirley Ho (Flatiron Institute) / Volodymyr Kuleshov (Cornell) / Yisong Yue (Caltech)
Schedule
Day 1 — May 28: NextGen AI Models
| 8:45–9:30 | Light breakfast |
| 9:30–10:15 |
John Langford (MSR) / Next-Latent Prediction Transformers Learn Compact World Models |
| 10:15–10:20 | Micro break |
| 10:20–11:05 |
Tatsunori Hashimoto (Stanford) / New directions in synthetic dataAbstract. Synthetic data has been a effective, if boring set of techniques: prompt some language model to restructure your corpus to match some downstream task, with occasionally some distillation. In this talk, we will take a more expansive view of synthetic data as a general algorithmic tool for generative modeling, arguing that the design space and possibilities of synthetic data are much bigger than it might seem. Through a few recent works, we will show that synthetic data has major benefits beyond transforming the data — improving in-domain perplexities, and enabling unique algorithmic primitives, such as neighborhood smoothing and concatenated ‘mega’ documents. With this broader view, we will point towards a nascent but interesting possibility of treating data itself as an algorithmic object to be engineered and optimized end-to-end. Bio. Tatsunori Hashimoto is an Assistant Professor in the Computer Science Department at Stanford University. Work from his group spans many aspects of statistical machine learning and language models. He received his Ph.D. at MIT under the supervision of Tommi Jaakkola and David Gifford, and is the recipient of several awards including the Sloan and NSF CAREER and his works have been recognized with paper awards at ICML, ICLR, and CHI. |
| 11:05–11:25 | Break |
| 11:25–12:10 |
Lightning Talks
Surbhi Goel (UPenn)
Carl Vondrick (Columbia) / Advances in Multimodal PerceptionAbstract. Animals use a variety of senses to understand the world, combining sight, touch, sound, and more. While there has been significant advances in machine vision, I will discuss another modality that we have been exploring over the last year: olfaction. I will talk about the progress we have made so far to build machines that can smell, and discuss future directions for where this technology could go in the future. Bio. Carl Vondrick is the YM Associate Professor of Computer Science at Columbia University. Previously, he was a Research Scientist at Google, and he received his PhD from MIT. His research interests are in computer vision, machine learning, and their applications. He is the recipient of the PAMI Young Researcher Award and the NSF CAREER award. is research is supported by the NSF, DARPA, Amazon, Google, and Toyota. For more information, please visit his website at https://www.cs.columbia.edu/~vondrick/. Sherry Yang (NYU) / From Pretraining World Models to Post-Training Physical AgentsAbstract. While deep neural networks have achieved superhuman performance in domains with low-cost simulations from AlphaGo to LLMs for code generation, their application to the physical world is bottlenecked by a fundamental challenge: high-cost interactions from robots. This talk outlines strategies for pretraining world models as high-fidelity simulators for robotics, and discusses RL post training for physical agents in a learned world model. Bio. Sherry Yang is an Assistant Professor of Computer Science at NYU Courant and a Staff Research Scientist at Google DeepMind. She researches in machine learning with a focus on reinforcement learning and generative modeling. Her current research interests include learning world models and agents, and their applications in robotics and AI for science. Her research has been recognized by the Best Paper award at ICLR and various media outlets such as VentureBeat and TWIML. She has organized tutorials, workshops, and served as Area Chairs at major conferences (NeurIPS, ICLR, ICML, CVPR). Prior to her current role, she was a post-doc at Stanford working with Percy Liang. She received her Ph.D. from UC Berkeley advised by Pieter Abbeel and Master’s and Bachelor's degrees from MIT. |
| 12:10–1:30 | Lunch (catered) |
| 1:30–2:15 |
Sewon Min (Berkeley) / Rethinking Modularity and Abstraction in LLMsAbstract. Today's LLMs are powerful, but I argue in this talk that they are still flawed in two ways. First, they are deployed as monolithic systems: even narrowly scoped tasks require a massive full model. Second, they are not native enough: in fact, text abstractions themselves may be unnecessary. In this talk, I will present two recent works that address these issues. First, we focus on mixture-of-experts (MoE) models, a dominant architecture in LLMs. While MoEs appear to be modular, we show that in practice they are not: restricting inference to a subset of experts causes severe degradation, and this is intrinsic to how they are trained. We show, however, that it is possible to train an MoE such that modularity emerges naturally, without imposing human priors. Our model, EMO, enables selective use of expert subsets — down to 12.5% with minimal performance loss — while naturally organizing experts by domain. In the second part, I argue for removing text abstractions altogether: humans perceive the world visually, and models should operate directly in pixel space. While ambitious, recent advances in VLMs make this increasingly feasible. I will present PixelRAG, a retrieval-augmented generation model that retrieves web information directly in pixel space. By eliminating complex and lossy HTML parsing, PixelRAG simplifies the pipeline while outperforming text-based RAG, even on text-centric benchmarks like SimpleQA and NQ, and also introduces a new efficiency lever through image compression. Bio. Sewon Min is an Assistant Professor in EECS at UC Berkeley, affiliated with Berkeley AI Research (BAIR), and a Research Scientist at the Allen Institute for AI. Her research focuses on understanding and advancing large language models (LLMs), with the goal of improving their performance, flexibility, adaptability, factuality, and reasoning through new architectures and training methods. She also develops tools and infrastructure for data and model auditing. Her work has received multiple best paper awards, dissertation awards from ACM, ACL, and AAAI, and several fellowships. She earned her Ph.D. from the University of Washington and has held research positions at Meta AI, Google, and Salesforce. |
| 2:15–3:00 |
Yoav Artzi (Cornell)
|
| 3:00–3:30 | Coffee break |
| 3:30–4:15 |
Aditi Raghunathan (CMU) / Next-Gen Pretraining for Downstream FlexibilityAbstract. Pretraining LLMs at scale is reaching its limits — not in raw benchmark performance, but in the flexibility of what we can do with the resulting model. In this talk, I will argue that the path forward requires rethinking pretraining itself, including the optimizer, the architecture, and the objective. First, I will present a surprising finding: more pretraining can make models worse downstream, harder to finetune and more fragile under quantization. We trace this catastrophic overtraining to a simple culprit: sensitivity to perturbation, which grows steadily over the course of pretraining. Targeting sensitivity directly, through interventions like data curriculum and sharpness-aware optimization, yields up to 40% better downstream performance across both finetuning and quantization. Next, I will turn to unlearning, which has emerged as one of the central problems for privacy, copyright, and safe deployment of LLMs, and has proven remarkably resistant to post-hoc fixes. I will show why: standard training entangles memorization with generalization in the same neurons. We introduce Memorization Sinks, which exploit learning dynamics to disentangle the two by design. The result is the first natively unlearnable language models, where each of millions of training sources can be cleanly removed by deactivating a small set of neurons, matching a model retrained from scratch without that source. Finally, I will discuss how to enable creativity in tasks that require a far-sighted leap of thought, like scientific discovery, and argue that next-token prediction is the wrong default for the generative flexibility these tasks demand. Bio. Aditi Raghunathan is an Assistant Professor of Computer Science at Carnegie Mellon University. Her work advances trustworthy AI by translating insights from the scientific study of frontier model failures into methods that make them robust and safe. She is a recipient of the Sloan Research Fellowship, NSF CAREER Award, Okawa Research Award, Schmidt AI2050 Early Career Fellowship, Google Research Scholar Award, Forbes 30 Under 30 recognition, Arthur Samuel Best Thesis Award at Stanford, and multiple PhD fellowships. Her work has also been recognized with an Outstanding Paper Award at ICML 2025 and several workshop paper awards. |
| 4:30–5:15 | Panel: Aditi Raghunathan, John Langford, Sewon Min, Yoav Artzi / Moderator: Tatsunori Hashimoto |
Day 2 — May 29: AI Reasoning and Scientific Discoveries
| 8:45–9:30 | Light breakfast |
| 9:30–10:15 |
Kyunghyun Cho (NYU) / Generalists vs. Specialists: Learning to SearchAbstract. Although large language models (LLMs) have shown promise in biomolecule optimization problems, they incur heavy computational costs and struggle to satisfy precise constraints. On the other hand, specialized solvers like LaMBO-2 offer efficiency and fine-grained control but require more domain expertise. Comparing these approaches is challenging due to expensive laboratory validation and inadequate synthetic benchmarks. We address this by introducing Ehrlich functions, a synthetic test suite that captures the geometric structure of biophysical sequence optimization problems. With prompting alone, off-the-shelf LLMs struggle to optimize Ehrlich functions. In response, we propose LLOME (Language Model Optimization with Margin Expectation), a bilevel optimization routine for online black-box optimization. When combined with a novel preference learning loss, we find LLOME can not only learn to solve some Ehrlich functions, but can even perform as well as or better than LaMBO-2 on moderately difficult Ehrlich variants. However, LLMs also exhibit some likelihood-reward miscalibration and struggle without explicit rewards. Our results indicate LLMs can occasionally provide significant benefits, but specialized solvers are still competitive and incur less overhead. Bio. Kyunghyun Cho is the Glen de Vries Professor of Health Statistics and a professor of computer science and data science at New York University. He is also a CIFAR Fellow of Learning in Machines & Brains and an Associate Member of the National Academy of Engineering of Korea. Early 2021, he co-founded Prescient Design which was acquired by Genentech late 2021. Since then, he served as an Executive Director of Frontier Research and a Senior Fellow at Genentech until January 2026. He served as a (co-)Program Chair of ICLR 2020, NeurIPS 2022 and ICML 2022 and also on the boards of ICML and ICLR. He was one of the three founding Editors-in-Chief of the Transactions on Machine Learning Research (TMLR) until 2024. He was a research scientist at Facebook AI Research from June 2017 to May 2020 and a postdoctoral fellow at University of Montreal until Summer 2015 under the supervision of Prof. Yoshua Bengio, after receiving MSc and PhD degrees from Aalto University April 2011 and April 2014, respectively, under the supervision of Prof. Juha Karhunen, Dr. Tapani Raiko and Dr. Alexander Ilin. He received the Samsung Ho-Am Prize in Engineering in 2021. He tries his best to find a balance among machine learning, natural language processing, and life, but almost always fails to do so. |
| 10:15–10:20 | Micro break |
| 10:20–11:05 |
Shirley Ho (Flatiron Institute)
|
| 11:05–11:25 | Break |
| 11:25–12:10 |
Lightning Talks
Jacob Gardner (UPenn) / Self Driving Datasets: From 20 Million Papers to Structured Biomedical KnowledgeAbstract. Manually curated biomedical repositories — spanning bioactivity, genomics, and chemistry — are expensive to maintain, lag the primary literature, and often discard the experimental nuance that determines whether measurements from different studies are comparable. We show that PubMed itself can be turned into structured datasets — autonomously and cost effectively — that are larger, more nuanced, and more accurate than the curated databases they would replace. I will discuss three coupled contributions: (1) an LLM-based entity-tagging pipeline grounded in nine biomedical ontologies that tags 4.5 billion entities across 19 categories in a 22.5M-paper, 2.5-trillion-token PubMed corpus; (2) hybrid sparse–dense retrieval infrastructure supporting surgical entity-filtered semantic queries over the tagged corpus; and (3) Starling, a multi-agent deep research system that, given only a natural-language task description, autonomously designs precision- and recall-targeted retrieval filters, induces an extraction schema, and emits structured records with nuance-rich supporting passages. Applied to six tasks — blood-brain barrier permeability, oral bioavailability, acute toxicity (LD50), gene–disease associations, protein subcellular localization, and chemical reactions — Starling produces ~5 million records (per-task scale ranges from 131K to 3M); several of these are the largest public datasets we are aware of for the property in question. Frontier-model disagreement on our kept extractions is 0.6–4.7% across tasks, surprisingly substantially below the error rates we measure on the widely used manually curated counterparts (e.g., 16.5% on TDC BBB, 7% on TDC Oral Bioavailability). Beyond scale and accuracy, the attached supporting passages carry nuance that tabular databases discard: for example, oral bioavailability of a molecule might depend on whether the patient is fed or fasting. Together, the corpus, retrieval layer, and agent establish a foundation for multimodal predictive and generative models in AI-driven therapeutic design. Bio. Jake is an assistant professor at the University of Pennsylvania in the Computer and Information Science department. His group does research focusing on applications of AI and machine learning to medicine and chemistry including the development of new antibiotics, vaccines, antibodies, materials and more. Volodymyr Kuleshov (Cornell) / Discrete Diffusion Generative Models: The Next Frontier of Language and Biological Sequence ModelingAbstract. Generative modeling of discrete data such as text or biological sequences is dominated today by autoregressive (AR) approaches. Our work introduces discrete diffusion models, which generate entire sequences in parallel, starting from noise (e.g., a random sequence) and iteratively refining it until it looks like data. Diffusion is not constrained to generate data sequentially, and can thus iteratively revise its own mistakes, leverage bidirectional context, output many tokens at once for faster sampling, and support powerful guidance mechanisms. Specifically, we introduce masked diffusion language models (MDLMs), which close the quality gap with AR models and serve as the basis of most of today's open source diffusion models for language. Combined with remasking and novel extensions of classifier-free and classifier-based guidance, MDLMs are also substantially more controllable than their AR counterparts. The framework extends naturally beyond language to the sciences, where it underpins a new generation of Nucleotide Transformer foundation models: our largest 10B models achieve state-of-the-art results in genome annotation while also enabling effective generation of regulatory sequences. Together, these results suggest that discrete diffusion models are a promising path forward for generative modeling and its applications in language understanding and scientific discovery. Bio. Volodymyr Kuleshov is the Joan Eliasoph, M.D. Assistant Professor at the Jacobs Technion-Cornell Institute at Cornell Tech and in the Computer Science Department at Cornell University. He obtained his Ph.D. in Computer Science from Stanford University, where he was the recipient of the Arthur Samuel Best Thesis Award. Kuleshov’s research interests are in the field of generative modeling and its applications in scientific discovery and health. His work has been featured in Nature Biotechnology, Nature Medicine, Nature Communications, and has been recognized with an NSF CAREER award, NIH MIRA award, as well as multiple industry awards. Kuleshov is also a co-founder of Inception AI, a startup developing the world's first diffusion language models. Berthy Feng (MIT) / Imaging at the Edge of Science: Integrating Scientific Knowledge and AI to Recover Hidden StructureAbstract. Images play a central role in scientific discovery. Whether it’s astronomical, biological, or materials systems, bringing complex phenomena into view enables scientists to probe, model, and fundamentally understand them. However, many of the most important scientific questions lie at the edge of what can be directly observed. We can accomplish extreme imaging through computational methods, bringing the invisible into view by supplementing limited observable data with human-imposed assumptions, or priors. When imaging for science, the challenge is imposing just enough known assumptions to infer the unknown. I create principled methods for bringing advanced priors, such as scientific knowledge and AI, into computational imaging. Using astrophysics as a running example, this talk presents my vision for a framework in which scientists systematically explore different priors, understand their effects on imaging, and extract scientific insights. The talk is organized in three parts: 1. First, we understand the importance of priors in extreme scientific imaging. I present my work on leveraging generative AI to flexibly tune a knob between different priors and understand their effects on imaging. Applied to black-hole imaging, my approach lets us infer physical features of a real black hole by identifying image features that are robust to prior assumptions. 2. Second, we carefully balance scientific assumptions to solve an extreme imaging problem in astrophysics. I present Physics-informed Dynamic Emission Fields (PI-DEF), a method for imaging the dynamic 3D gas near a black hole. PI-DEF strikes a balance between known/unknown physics, imposing known physics as hard constraints on the solution while leaving room for learning unknown physics, such as the velocity field near the black hole. 3. Third, we open an efficient route for bringing in known physics across imaging problems. I present Neural Approximate Mirror Maps (NAMMs), which learn to automatically impose any desired physics constraint onto any image. With NAMMs, we can easily incorporate known constraints (e.g., conservation laws) into generated and reconstructed images. The ideas of my talk naturally extend to many scientific domains, including biology, chemistry, and materials science. Papers. Bio. Berthy Feng is a postdoctoral researcher at MIT CSAIL and a fellow at the NSF Institute for AI and Fundamental Interactions (IAIFI), working with Prof. Bill Freeman. She received her PhD in Computational and Mathematical Sciences at Caltech, working with Prof. Katie Bouman. During her PhD, she was supported by the NSF GRFP and Kortschak Scholarship. Before that, she received her Bachelor’s degree in Computer Science at Princeton University. She builds computational imaging algorithms that integrate physics knowledge and AI to push the limits of what we can see. |
| 12:10–1:30 | Lunch (catered) |
| 1:30–2:15 |
Yisong Yue (Caltech) / The Dark Knowledge of ScienceAbstract. Scientific discovery depends on knowledge that is rarely observed directly. Some of it lives in human experts: tacit knowledge and experience that often never make it into papers. Other knowledge is embedded in high-dimensional observations: images, spectra, time series, video, and instrument outputs whose scientific meaning is not immediately visible. In both cases, the challenge is to surface hidden structure and turn it into knowledge that can guide future discovery. This talk will cover progress toward turning this hidden, or “dark,” knowledge into reusable artifacts. From high-dimensional observations, scientific foundation models can learn representations that reveal structure not apparent in the raw data, such as fine-grained behavior in video or latent variables behind indirect measurements. From human experts, AI systems can begin to recover the tacit expertise behind scientific judgment: what researchers try, what they reject, and how they revise their understanding. Taken together, these efforts point toward a broader ambition: AI systems that make the dark knowledge of science more visible, computable, and useful for discovery. Bio. Yisong Yue is a Professor of Computing and Mathematical Sciences at the California Institute of Technology. His research centers on machine learning and artificial intelligence, with a focus on making AI work in high-stakes and high-expertise domains. His agenda spans both fundamental and applied work, from novel learning frameworks to deployment in autonomous driving on public roads. He was previously a research scientist at Disney Research and a postdoctoral researcher in the Machine Learning Department and iLab at Carnegie Mellon University. He received his Ph.D. from Cornell University and his B.S. from the University of Illinois at Urbana-Champaign. He previously served as Senior Program Chair of ICLR 2024 and General Chair of ICLR 2025, and currently serves on the ICLR board. His work has received multiple paper awards and nominations across robotics, computer vision, sports analytics, machine learning for health, and information retrieval; during his time in industry, he worked on machine learning for behavior modeling and motion planning in autonomous driving. |
| 2:15–3:00 |
Fei Sha (Meta) / Advances in Probabilistic Generative Modeling for Scientific Machine LearningAbstract. Leveraging large-scale data and computing accelerator systems, statistical learning has led to significant paradigm shifts in many scientific disciplines. Grand challenges in science have been tackled with exciting synergy between disciplinary science, physics-based simulations via high-performance computing, and powerful learning methods. In this talk, I will describe several vignettes of our research on modeling complex dynamical systems characterized by partial differential equations with turbulent solutions. I will also demonstrate how machine-learning technologies, especially advances in generative AI, are effectively applied to address the computational and modeling challenges in such systems, exemplified by their successful applications to weather forecasting and climate projection. I will also discuss the new challenges and opportunities that future machine-learning research faces. Papers. Bio. Fei Sha is an AI Research Scientist at Meta. He is broadly interested in probabilistic modeling, uncertainty quantification, dynamical systems, and probabilistic reasoning in LLMs. Before joining Meta, he led a team of scientists and engineers at Google Research, working in various topics, incuding basic methods and technology for LLMs, probabilistic generative modeling and their applications to dynamical systems (such as weather and climate). Before joining Google Research, he was a Professor of Computer Science and the Zohrab A. Kaprielian Fellow in Engineering at the University of Southern California (USC). He has been recognized with numerous awards and accolades for his innovative work, including being selected as an Alfred P. Sloan Research Fellow in 2013 and receiving an Army Research Office Young Investigator Award in 2012. He has a PhD in Computer and Information Science from the University of Pennsylvania and BSc and MSc degrees from Southeast University (Nanjing, China). |
| 3:00–3:30 | Coffee break |
| 3:30–4:15 |
Peter Frazier (Cornell)
|
| 4:30–5:15 | Panel: Fei Sha, Kyunghyun Cho, Shirley Ho, Yisong Yue / Moderator: Peter Frazier |