Hardware-Efficient Attention for Fast Decoding

By Ted Zadouri, Hubert Strauss, and Tri Dao

LLM decoding is bottlenecked for large batches and long contexts by loading the key-value (KV) cache from high-bandwidth memory, which inflates per-token latency, while the sequential nature of decoding limits parallelism. We analyze the interplay among arithmetic intensity, parallelization, and model quality and question whether current architectures fully exploit modern hardware. This work redesigns attention to perform more computation per byte loaded from memory to maximize hardware efficiency without trading off parallel scalability. We first propose Grouped-Tied Attention (GTA), a simple variant that combines and reuses key and value states, reducing memory transfers without compromising model quality. We then introduce Grouped Latent Attention (GLA), a parallel-friendly latent attention paired with low-level optimizations for fast decoding while maintaining high model quality. Experiments show that GTA matches Grouped-Query Attention (GQA) quality while using roughly half the KV cache and that GLA matches Multi-head Latent Attention (MLA) and is easier to shard. Our optimized GLA kernel is up to 2x faster than FlashMLA, for example, in a speculative decoding setting when the query length exceeds one. Furthermore, by fetching a smaller KV cache per device, GLA reduces end-to-end latency and increases throughput in online serving benchmarks by up to 2x.

Read the paper: https://arxiv.org/abs/2505.21487

Posted in Uncategorized

Genome-wide mapping of mesoscale neuronal RNA organization and condensation

By Lindsay A. Becker, Sofia A. Quinodoz, Troy J. Comi, Ofer Kimchi, David A. Knowles, and Clifford P. Brangwynne

Subcellular RNA organization can affect critical cellular functions. However, our understanding of RNA microenvironments, particularly biomolecular condensates, remains limited, largely due to a lack of technologies to comprehensively interrogate mesoscale RNA organization. Here, we adapt Split-Pool Recognition of Interactions by Tag Extension to map micron-scale RNA-RNA spatial proximity genome-wide across cell regions (RNA-SPRITE). Deploying RNA-SPRITE, we find extensive, conserved organization of mature mRNAs, with increased colocalization between mRNAs that share RNA-binding protein (RBP) motifs or encode functionally related proteins. Both effects are especially strong in dendrites and axons, suggesting prevalent mRNA co-regulation. Moreover, mRNAs with less compact folding, lower translation efficiency, and specific RBP motifs are more likely to be in RNA-rich condensates. However, perturbations that broadly dissolve or enhance condensation reveal that RBP motif and encoded protein-mediated colocalizations largely remain intact, independent of condensation. These results demonstrate the power of RNA-SPRITE in revealing critical aspects of RNA’s functional organization.

In Brief Unbiased, genome-wide maps of RNA-RNA mesoscale spatial proximity uncover extensive subcellular organization and its governing principles.

Highlights

  • RNA-SPRITE reveals micron-scale RNA colocalization genome-wide across cell regions
  • mRNA colocalization specificity is driven by shared motifs and encoded protein function
  • mRNAs with less compact folding, lower translation efficiency, and distinct protein-binding motifs are more likely to be in condensates
  • Neurites have a particularly high degree of sequence and function-dependent mRNA organization

Read the paper: https://www.biorxiv.org/content/10.1101/2025.04.19.649570v1

Posted in Uncategorized

Published in Journal of Data Mining & Digital Humanities: Machine transliteration of long text with error detection and correction

By Mohamed Abdellatif, Joel U. Bretheim, and Marina Rustow

Different writing systems have been (historically and contemporarily) used to write out the same language. This is typically done by substituting letters (or symbols, in the case of non-alphanumeric systems). However, depending on the language and the involved writing systems, the process may not be purely deterministic. Quoting Becker and Becker [2000]


even such basic acts as transliteration involve interpretation– to the extent that there is
meaning in the medium itself

.
.
In transliteration itself there is exuberance (that is, meaning is added) and deficiency
(meaning is lost).


This gives significance to the problem of Machine Translation in the intersection of Digital Humanities and Natural Language Understanding. Transformer-based models achieved success modeling human languages. However, many of them have the limitation of handling an input of maximum length of 512 tokens. To reuse a pre-trained model with this limitation for downstream tasks (e.g., Machine Transliteration) on input of sequences longer than 512 tokens, we propose a method to segment the input into interleaving (not mutually exclusive) pieces, invoke the model in a piecewise manner and construct the result. To consolidate the result, we propose a method to detect and correct potential (duplication and elimination) errors that reduces Word Error Rate from 0.0985 to 0.0.

Read the paper: https://zenodo.org/records/14982300

Posted in Uncategorized

Published in Genome Biology: Genome-wide CRISPR guide RNA design and specificity analysis with GuideScan2

By Henri Schmidt, Minsi Zhang, Dimitar Chakarov, Vineet Bansal, Haralambos Mourelatos, Francisco J. Sánchez-Rivera, Scott W. Lowe, Andrea Ventura, Christina S. Leslie & Yuri Pritykin

We present GuideScan2 for memory-efficient, parallelizable construction of high-specificity CRISPR guide RNA (gRNA) databases and user-friendly design and analysis of individual gRNAs and gRNA libraries for targeting coding and non-coding regions in custom genomes. GuideScan2 analysis identifies widespread confounding effects of low-specificity gRNAs in published CRISPR screens and enables construction of a gRNA library that reduces off-target effects in a gene essentiality screen. GuideScan2 also enables the design and experimental validation of allele-specific gRNAs in a hybrid mouse genome. GuideScan2 will facilitate CRISPR experiments across a wide range of applications.

Read the paper: https://doi.org/10.1186/s13059-025-03488-8

Posted in Uncategorized

Landscape of human protein-coding somatic mutations across tissues and individuals

By Huixin Xu, Rob Bierman, Dayna Akey, Cooper Koers, Troy Comi, Claire McWhite, and Joshua M. Akey

Although somatic mutations are fundamentally important to human biology, disease, and aging, many outstanding questions remain about their rates, spectrum, and determinants in apparently healthy tissues. Here, we performed high-coverage exome sequencing on 265 samples from 14 GTEx donors sampled for a median of 17.5 tissues per donor (spanning 46 total tissues). Using a novel probabilistic method tailored to the unique structure of our data, we identified 8,470 somatic variants. We leverage our compendium of somatic mutations to quantify the burden of deleterious somatic variants among tissues and individuals, identify molecular features such as chromatin accessibility that exhibit significantly elevated somatic mutation rates, provide novel biological insights into mutational mechanisms, and infer developmental trajectories based on patterns of multi-tissue somatic mosaicism. Our data provides a high-resolution portrait of somatic mutations across genes, tissues, and individuals.

Read the paper: https://www.biorxiv.org/content/10.1101/2025.01.07.631808v1

Posted in Uncategorized

Radially patterned morphogenesis of murine hair follicle placodes ensures robust epithelial budding

By Leybova L, Biswas A, Sharan R, Trejo BM, Kim K, Soto-Muniz Y, Jones RA, Phillips BK, Devenport D.

The bending of simple cellular sheets into complex three-dimensional (3D) forms requires developmental patterning cues to specify where deformations occur, but how positional information directs morphological change is poorly understood. Here, we investigate how morphogen signaling and cell fate diversification contribute to the morphogenesis of murine hair placodes, in which collective cell movements transform radially symmetric primordia into bilaterally symmetric tubes. Through live imaging and 3D volumetric reconstructions, we demonstrate that Wnt and Shh establish radial patterns of cell fate, cell morphology, and movement within developing placodes. Cell fate diversity at different radial positions provides unique and essential contributions to placode morphogenesis. Further, we show that downstream of radial patterning, gradients of classical cadherin expression are required for efficient epithelial rearrangements. Given that the transformation of epithelial discs into 3D tubes is a common morphological motif used to shape diverse organ primordia, mechanisms of radially patterned morphogenesis are likely highly conserved across evolution.

Read the paper: https://doi.org/10.1016/j.devcel.2024.09.022

Posted in Uncategorized

Nuclear instance segmentation and tracking for preimplantation mouse embryos

By Hayden Nunley, Binglun Shao, David Denberg, Prateek Grover, Jaspreet Singh, Maria Avdeeva, Bradley Joyce, Rebecca Kim-Yip, Abraham Kohrman, Abhishek Biswas, Aaron Watters, Zsombor Gal, Alison Kickuth, Madeleine Chalifoux, Stanislav Y. Shvartsman, Lisa M. Brown, Eszter Posfai

For investigations into fate specification and morphogenesis in time-lapse images of preimplantation embryos, automated 3D instance segmentation and tracking of nuclei are invaluable. Low signal-to-noise ratio, high voxel anisotropy, high nuclear density, and variable nuclear shapes can limit the performance of segmentation methods, while tracking is complicated by cell divisions, low frame rates, and sample movements. Supervised machine learning approaches can radically improve segmentation accuracy and enable easier tracking, but they often require large amounts of annotated 3D data. Here, we first report a previously unreported mouse line expressing near-infrared nuclear reporter H2B-miRFP720. We then generate a dataset (termed BlastoSPIM) of 3D images of H2B-miRFP720-expressing embryos with ground truth for nuclear instances. Using BlastoSPIM, we benchmark seven convolutional neural networks and identify Stardist-3D as the most accurate instance segmentation method. With our BlastoSPIM-trained Stardist-3D models, we construct a complete pipeline for nuclear instance segmentation and lineage tracking from the eight-cell stage to the end of preimplantation development (>100 nuclei). Finally, we demonstrate the usefulness of BlastoSPIM as pre-train data for related problems, both for a different imaging modality and for different model systems.

Read the paper: https://doi.org/10.1242/dev.202817

Posted in Uncategorized

US-RSE’24

The RSE Group was excited to send several members to US-RSE’24, the second annual conference from US-RSE, held this year in Albuquerque. The conference theme was “Yesterday, Today, Tomorrow: A celebration of all that RSEs have done for computing in the past, in the present, and in the future.” Princeton University (authors in bold) contributions included:

  1. Exploring the Potential Impact of Advancements in Artificial Intelligence on the RSE Profession – Sujay Suresh Kumar and David Luet (BoF)
  2. Navigating the Remote Landscape: Working Effectively with StakeholdersTroy Comi (BoF)
  3. RSEs in domain-specific ecosystems – Julia Damerow, Rebecca S. Koeser, Laure Thompson, and Jeri E. Wieringa
  4. INnovative Training Enabled by a Research Software Engineering Community of Trainers (INTERSECT) – Jeffrey C. Carver and Ian A. Cosden (poster)
  5. Getting Scientist buy-in on best practices: A Case StudyBob Caddy (RAM)
  6. The quirks of leading technical staffCurt Hillegas (RAM)
  7. The Creation of an RSE Career Path at Princeton UniversityIan A. Cosden, Joel Bretheim, David Luet, and Beth Holtz (talk)
  8. Establishing RSE Programs – From early stage formalization to mature models – Ian Cosden, Sandra Gesing and Adam Rubens (workshop)

It was wonderful spending time together out in the desert celebrating RSE accomplishments and getting inspired for the year ahead to continue our efforts in building robust and sustainable research software. Next year’s conference will be much closer to home… Philadelphia!

Posted in Uncategorized

Published in eLife: Scale matters: Large language models with billions (rather than millions) of parameters better match neural representations of natural language

By Zhuoqiao Hong, Haocheng Wang, Zaid Zada, Harshvardhan Gazula, David Turner, Bobbi Abrey, Leonard Niekerken, Werner Doyle, Sasha Devore, Patricia Dugan, Daniel Friedman, Orin Devinsky, Adeen Flinker, Uri Hasson, Samuel A. Nastase, Ariel Goldstein

Recent research has used large language models (LLMs) to study the neural basis of naturalistic language processing in the human brain. LLMs have rapidly grown in complexity, leading to improved language processing capabilities. However, neuroscience researchers haven’t kept up with the quick progress in LLM development. Here, we utilized several families of transformer-based LLMs to investigate the relationship between model size and their ability to capture linguistic information in the human brain. Crucially, a subset of LLMs were trained on a fixed training set, enabling us to dissociate model size from architecture and training set size. We used electrocorticography (ECoG) to measure neural activity in epilepsy patients while they listened to a 30-minute naturalistic audio story. We fit electrode wise encoding models using contextual embeddings extracted from each hidden layer of the LLMs to predict word-level neural signals. In line with prior work, we found that larger LLMs better capture the structure of natural language and better predict neural activity. We also found a log-linear relationship where the encoding performance peaks in relatively earlier layers as model size increases. We also observed variations in the best-performing layer across different brain regions, corresponding to an organized language processing hierarchy.

Read the paper: https://doi.org/10.7554/eLife.101204.1

Posted in Uncategorized

Published in Journal of Open-Source Software: SubsetTools: A Python package to subset data to build and run ParFlow hydrologic models

By Amanda K. Triplett, Georgios Artavanis, William M. Hasling, Reed M. Maxwell, Amy Defnet, Amy M. Johnson, William Lytle, Andrew Bennett, Elena Leonarduzzi, Lisa K. Gallagher, Laura E. Condon 

Hydrologic models are an integral part of understanding and managing water supply. There are countless hydrologic models available that differ in their complexity, scale and focus on different parts of the hydrologic cycle. ParFlow is a fully integrated, physics-based model that simulates surface and subsurface flow simultaneously (Ashby & Falgout, 1996; Jones & Woodward, 2001; Kollet & Maxwell, 2006; Maxwell, 2013). ParFlow is also coupled with a land surface model which allows it to simulate the full terrestrial hydrologic cycle from bedrock to treetops (Kollet & Maxwell, 2008; Maxwell & Miller, 2005). It has been applied to a myriad of watersheds across the US and around the world to answer questions of water supply and groundwater–surface water interactions.

ParFlow is a scientifically rigorous hydrologic model; however, its application by the broader community has been limited to a degree by its technical complexity which creates a high barrier to entry for new users. Intensive training and hydrologic expertise is required to appropriately build a ParFlow model from scratch.

SubsetTools is a Python package that seeks to lower the barrier to entry by allowing a user to subset published and verified ParFlow inputs and model configurations to build their own watershed models. These tools allow a user to set up and run a model in a matter of minutes, rather than weeks or months. SubsetTools is designed to interface with two domains covering the contiguous United States (CONUS), CONUS1 (Maxwell et al., 2015, 2015; O’Neill et al., 2021) and CONUS2 (Yang et al., 2023). These domains determine the structure and attributes of the hydrogeologic inputs used to build the ParFlow model. SubsetTools is the first package of its kind to fetch and process all necessary inputs and create a functional ParFlow model, all in a single workflow.

Read the paper: https://doi.org/10.21105/joss.06752

Posted in Uncategorized