USRSE’25

Posted on November 21, 2025 by jbretheim

Last month many members of the RSE Group and other Princeton colleagues attended USRSE’25, the third annual conference from US-RSE. Hosted this year in Philadelphia, the conference theme was “Code, Practices, and People.” Princeton University (authors in bold) contributions included:

Accelerating Research: Strategies from the Field – Jen Rosiere Reynolds, Lance Parsons, Gail Rosenbaum, Joost Wagenaar, and Sarah Stevens (BoF)
Sustainable Models of RSE Support: The Prospects of Centralization in Institutional Research – Eric Manning, Lori Bougher, Colin Swaney, and Sangyoon Park (BoF)
Undate: computing with uncertain and partially-unknown dates – Rebecca S. Koeser (notebook)
Integrating ATR Software with University HPC Infrastructure: balancing diverse compute needs – Christine Roughan and Rebecca S. Koeser (paper)
INnovative Training Enabled by a Research Software Engineering Community of Trainers (INTERSECT) – Jeffrey Carver and Ian Cosden (poster)
Building Scientific Python Packages – Henry Schreiner (poster)
Community Code Review in the Digital Humanities – Julia Damerow, Rebecca S. Koeser, Jeffrey C. Carver, and Malte Vogl (poster)
Surveying the Digital Humanities Research Software Engineering Landscape – Rebecca S. Koeser and Julia Damerow (poster)
Ten Simple Rules for Catalyzing Collaborations and Building Bridges between Research Software Engineers and Software Engineering Researchers – Nasir Eisty, Jeffrey Carver, Johanna Cohoon, Ian Cosden, Carole Goble, and Samuel Grayson (poster)
Developing a Machine Learning-Augmented Solver for the Hydrologic Model ParFlow – Georgios Artavanis, Laura Condon, Andrew Bennett, and Reed Maxwell (talk)
Everything, All at Once, Yesterday: Creating Research Software with Humanities Faculty – Jeri Wieringa and Mary Naydan (talk)
What happened to Curt’s arm? – Curt Hillegas (RAM)
Agile Foundations for RSEs: Building an AI Assistant with Agile – Tisha Charles and David Luet (workshop)

Additionally, Princeton University Professor Reed Maxwell delivered the first keynote address on Accelerating Continental-Scale Groundwater Simulation With a Fusion of Machine Learning, Integrated Hydrologic Models and Community Platforms. His keynote highlighted three of his lab’s software projects centered around hydrologic data, simulations, and visualizations, and he noted contributions to those projects from five current and past RSE Group members (Vineet Bansal, Calla Chenault, Georgios Artavanis, Amy Defnet, and Bill Hasling). Professor Maxwell stated that not only RSE contributions to software, but additionally that “RSEs enable digital education and outreach content.”

All in all, it was inspiring to convene with RSEs from all over the country. We already look forward to next year’s conference to be hosted in the San Francisco Bay Area!

Published in NeurIPS 2025: What Makes a Reward Model a Good Teacher? An Optimization Perspective

Posted on September 27, 2025 by AbhishekB

By Noam Razin, Zixuan Wang, Hubert Strauss, Stanley Wei, Jason D. Lee, Sanjeev Arora

The success of Reinforcement Learning from Human Feedback (RLHF) critically depends on the quality of the reward model. However, while this quality is primarily evaluated through accuracy, it remains unclear whether accuracy fully captures what makes a reward model an effective teacher. We address this question from an optimization perspective. First, we prove that regardless of how accurate a reward model is, if it induces low reward variance, then the RLHF objective suffers from a flat landscape. Consequently, even a perfectly accurate reward model can lead to extremely slow optimization, underperforming less accurate models that induce higher reward variance. We additionally show that a reward model that works well for one language model can induce low reward variance, and thus a flat objective landscape, for another. These results establish a fundamental limitation of evaluating reward models solely based on accuracy or independently of the language model they guide. Experiments using models of up to 8B parameters corroborate our theory, demonstrating the interplay between reward variance, accuracy, and reward maximization rate. Overall, our findings highlight that beyond accuracy, a reward model needs to induce sufficient variance for efficient~optimization.

Read the paper: https://arxiv.org/abs/2503.15477

Discovery of a widespread chemical signalling pathway in the Bacteroidota

Posted on August 20, 2025 by AbhishekB

By Luis Linares-Otoya, Jaden D. Shirkey, Bhuwan Khatri Chhetri, Amira Mira, Abhishek Biswas, Samuel L. Neff, Maria V. Linares-Otoya, Ye Chen, Julio V. Campos-Florian, Mayar L. Ganoza-Yupanqui, Philip D. Jeffrey, Frederick M. Hughson & Mohamed S. Donia

Considerable advances have been made in characterizing bioactive molecules secreted by bacteria, yet the regulatory elements controlling their production remain largely understudied. Here we identify and characterize the N-acyl-cyclolysine (ACL) system—a cell-density-dependent chemical signalling system specific to and widespread in the phylum Bacteroidota (formerly Bacteroidetes)—and show that it regulates the expression of co-localized operons encoding diverse secreted molecules. Using genetic and biochemical analyses, combined with structural studies of a key biosynthetic enzyme, AclA, we elucidate the molecular structure of various ACLs and their complete biosynthetic pathway involving l-lysine acylation and ATP-dependent cyclization. Furthermore, we find that secreted ACLs are sensed by a dedicated transcription factor, AclR, resulting in the expression of associated operons and the autoinduction of ACL biosynthesis. Moreover, we show that different Bacteroidota strains produce structurally diverse ACLs and encode transcription factors with varying ligand specificities. Finally, we find that the acl circuit is widely distributed and transcribed in human gut and oral microbiome samples, with clear evidence for an active role in regulating associated operons under host colonization conditions. Understanding the function of the ACL system in different contexts has the potential to reveal details about the biology, ecology and chemistry of the Bacteroidota and how members of this phylum interact with their environments and hosts.

Read the paper: https://www.nature.com/articles/s41586-025-09418-9

Wrapping Up a Successful INTERSECT RSE Bootcamp at Princeton

Posted on July 21, 2025 by Ian Cosden

We’re thrilled to share that the third annual INTERSECT Research Software Engineering Bootcamp, held July 14-18, 2025 at Princeton University, concluded with great success! This immersive 4.5-day event brought together a vibrant cohort of intermediate research software developers from diverse domains, many of whom lack formal computer science training.

Funded by a National Science Foundation (NSF) grant and organized in collaboration with Dr. Jeff Carver from the University of Alabama, the bootcamp focused on core Research Software Engineering (RSE) practices. Led by volunteer instructors from the broader RSE community, participants engaged in hands-on sessions covering:

Software Design

Collaborative Git & Pull Requests

Code Review

Licensing & Documentation

Testing & CI/CD

Packaging & Distribution

The energy and enthusiasm throughout the week were inspiring. Attendees not only sharpened their technical skills but also built lasting connections across institutions and disciplines. We’re proud to support the growth of the RSE community and grateful to everyone who made this event possible.

More information on INTERSECT, including the open-source curriculum is available here: https://intersect-training.org/.

Amino acid changes in two viral proteins drive attenuation of the yellow fever 17D vaccine

Posted on July 8, 2025 by AbhishekB

By Jiayu Zhang, Elizabeth C. Chavez, Melina Winkler, Jianche Liu, Sebastian Carver, Aaron E. Lin, Abhishek Biswas, Tomokazu Tamura, Anna Tseng, Danyang Wang, Aaron Benhamou, Aoife K. O’ Connell, Mao Matsuo, Jack E. Norton, Devin Kenney, Britt Adamson, Ralph E. Kleiner, Benjamin Burwitz, Nicholas A. Crossland, Florian Douam & Alexander Ploss

The live-attenuated yellow fever 17D vaccine strain differs genetically only minimally from its virulent parent. However, it remains unclear which sequence differences lead to virulence or attenuation. Here we demonstrate, using SHAPE-MaP, that these mutations do not induce global RNA structure changes and show that protein sequence mutations are mostly responsible for the phenotypic differences between 17D and virulent YFV. Using a highly modular, combinatorial genetic approach, we identified key mutations in the envelope (E) and non-structural 2A (NS2A) proteins that increase 17D’s ability to spread and enhance host antiviral responses. Introducing these mutations into infectious clones of virulent YFV genomes results in viral attenuation in vitro and in two mouse models. Collectively, our results define the genetic basis for 17D attenuation and highlight a potentially general approach for creating live-attenuated vaccines by introducing mutations resulting in similar phenotypic changes in other pathogenic viruses.

Read the paper: https://www.nature.com/articles/s41564-025-02047-y

Published in Nature: Mapping and engineering RNA-controlled architecture of the multiphase nucleolus

Posted on July 2, 2025 by AbhishekB

By Sofia A. Quinodoz, Lifei Jiang, Aya A. Abu-Alfa, Troy J. Comi, Hongbo Zhao, Qiwei Yu, Lennard W. Wiesner, Jordy F. Botello, Anita Donlic, Elizabeth Soehalim, Prashant Bhat, Christiane Zorbas, Ludivine Wacheul, Andrej Košmrlj, Denis L. J. Lafontaine, Sebastian Klinge & Clifford P. Brangwynne

Biomolecular condensates are key features of intracellular compartmentalization. As the most prominent nuclear condensate in eukaryotes, the nucleolus is a multiphase liquid-like structure in which ribosomal RNAs (rRNAs) are transcribed and processed, undergoing multiple maturation steps to form the small (SSU) and large (LSU) ribosomal subunits. However, how rRNA processing is coupled to the layered organization of the nucleolus is poorly understood owing to a lack of tools to precisely monitor and perturb nucleolar rRNA processing dynamics. Here we developed two complementary approaches to spatiotemporally map rRNA processing and engineer de novo nucleoli. Using sequencing in parallel with imaging, we found that rRNA processing steps are spatially segregated, with sequential maturation of rRNA required for its outward movement through nucleolar phases. By generating synthetic nucleoli in cells using an engineered rDNA plasmid system, we show that defects in SSU processing can alter the ordering of nucleolar phases, resulting in inside-out nucleoli and preventing rRNA outflux, while LSU precursors are necessary to build the outermost layer of the nucleolus. These findings demonstrate how rRNA is both a scaffold and substrate for the nucleolus, with rRNA acting as a programmable blueprint for the multiphase architecture that facilitates assembly of an essential molecular machine.

Read the paper: https://www.nature.com/articles/s41586-025-09207-4

FutureFill: Fast Generation from Convolutional Sequence Models

Posted on June 23, 2025 by AbhishekB

By Naman Agarwal, Xinyi Chen, Evan Dogariu, Devan Shah, Hubert Strauss, Vlad Feinberg, Daniel Suo, Peter Bartlett, Elad Hazan

We address the challenge of efficient auto-regressive generation in sequence prediction models by introducing FutureFill, a general-purpose fast generation method for any sequence prediction algorithm based on convolutional operators. FutureFill reduces generation time from quadratic to quasilinear in the context length. Moreover, when generating from a prompt, it requires a prefill cache whose size grows only with the number of tokens to be generated, often much smaller than the caches required by standard convolutional or attention based models. We validate our theoretical claims with experiments on synthetic tasks and demonstrate substantial efficiency gains when generating from a deep convolutional sequence prediction model.

Read the paper: https://arxiv.org/abs/2410.03766

Bridging Communities: Ten Simple Rules for RSE–SER Collaboration

Posted on June 21, 2025 by Ian Cosden

We’re excited to announce the publication of a new paper, Ten Simple Rules for Catalyzing Collaborations and Building Bridges between Research Software Engineers (RSEs) and Software Engineering Researchers (SERs), authored by Nasir Eisty, Jeffrey Carver, Johanna Cohoon, Ian Cosden, Carole Goble, and Samuel Grayson.

Published in IEEE Computing in Science & Engineering (CiSE), this work emerged from discussions at a Dagstuhl Seminar and addresses a critical but often overlooked opportunity in the research software ecosystem: fostering collaboration between RSEs and SERs.

While both communities share a passion for improving software in research, they often operate in distinct environments, with different vocabularies, incentives, and expectations. This paper offers ten actionable rules designed to bridge those gaps, encouraging meaningful, sustained partnerships that combine practical experience with theoretical insight.

By working together, RSEs and SERs can drive innovation in tools, practices, and infrastructure, ultimately advancing the quality and impact of scientific research.

Read the preprint: https://arxiv.org/abs/2506.03012

Published version: https://ieeexplore.ieee.org/document/11003859

Hardware-Efficient Attention for Fast Decoding

Posted on May 27, 2025 by AbhishekB

By Ted Zadouri, Hubert Strauss, and Tri Dao

LLM decoding is bottlenecked for large batches and long contexts by loading the key-value (KV) cache from high-bandwidth memory, which inflates per-token latency, while the sequential nature of decoding limits parallelism. We analyze the interplay among arithmetic intensity, parallelization, and model quality and question whether current architectures fully exploit modern hardware. This work redesigns attention to perform more computation per byte loaded from memory to maximize hardware efficiency without trading off parallel scalability. We first propose Grouped-Tied Attention (GTA), a simple variant that combines and reuses key and value states, reducing memory transfers without compromising model quality. We then introduce Grouped Latent Attention (GLA), a parallel-friendly latent attention paired with low-level optimizations for fast decoding while maintaining high model quality. Experiments show that GTA matches Grouped-Query Attention (GQA) quality while using roughly half the KV cache and that GLA matches Multi-head Latent Attention (MLA) and is easier to shard. Our optimized GLA kernel is up to 2x faster than FlashMLA, for example, in a speculative decoding setting when the query length exceeds one. Furthermore, by fetching a smaller KV cache per device, GLA reduces end-to-end latency and increases throughput in online serving benchmarks by up to 2x.

Read the paper: https://arxiv.org/abs/2505.21487

Genome-wide mapping of mesoscale neuronal RNA organization and condensation

Posted on April 20, 2025 by AbhishekB

By Lindsay A. Becker, Sofia A. Quinodoz, Troy J. Comi, Ofer Kimchi, David A. Knowles, and Clifford P. Brangwynne

Subcellular RNA organization can affect critical cellular functions. However, our understanding of RNA microenvironments, particularly biomolecular condensates, remains limited, largely due to a lack of technologies to comprehensively interrogate mesoscale RNA organization. Here, we adapt Split-Pool Recognition of Interactions by Tag Extension to map micron-scale RNA-RNA spatial proximity genome-wide across cell regions (RNA-SPRITE). Deploying RNA-SPRITE, we find extensive, conserved organization of mature mRNAs, with increased colocalization between mRNAs that share RNA-binding protein (RBP) motifs or encode functionally related proteins. Both effects are especially strong in dendrites and axons, suggesting prevalent mRNA co-regulation. Moreover, mRNAs with less compact folding, lower translation efficiency, and specific RBP motifs are more likely to be in RNA-rich condensates. However, perturbations that broadly dissolve or enhance condensation reveal that RBP motif and encoded protein-mediated colocalizations largely remain intact, independent of condensation. These results demonstrate the power of RNA-SPRITE in revealing critical aspects of RNA’s functional organization.

In Brief Unbiased, genome-wide maps of RNA-RNA mesoscale spatial proximity uncover extensive subcellular organization and its governing principles.

Highlights

RNA-SPRITE reveals micron-scale RNA colocalization genome-wide across cell regions
mRNA colocalization specificity is driven by shared motifs and encoded protein function
mRNAs with less compact folding, lower translation efficiency, and distinct protein-binding motifs are more likely to be in condensates
Neurites have a particularly high degree of sequence and function-dependent mRNA organization

Read the paper: https://www.biorxiv.org/content/10.1101/2025.04.19.649570v1