Published in Genome Biology: Genome-wide CRISPR guide RNA design and specificity analysis with GuideScan2

By Henri Schmidt, Minsi Zhang, Dimitar Chakarov, Vineet Bansal, Haralambos Mourelatos, Francisco J. Sánchez-Rivera, Scott W. Lowe, Andrea Ventura, Christina S. Leslie & Yuri Pritykin

We present GuideScan2 for memory-efficient, parallelizable construction of high-specificity CRISPR guide RNA (gRNA) databases and user-friendly design and analysis of individual gRNAs and gRNA libraries for targeting coding and non-coding regions in custom genomes. GuideScan2 analysis identifies widespread confounding effects of low-specificity gRNAs in published CRISPR screens and enables construction of a gRNA library that reduces off-target effects in a gene essentiality screen. GuideScan2 also enables the design and experimental validation of allele-specific gRNAs in a hybrid mouse genome. GuideScan2 will facilitate CRISPR experiments across a wide range of applications.

Read the paper: https://doi.org/10.1186/s13059-025-03488-8

Posted in Uncategorized

Landscape of human protein-coding somatic mutations across tissues and individuals

By Huixin Xu, Rob Bierman, Dayna Akey, Cooper Koers, Troy Comi, Claire McWhite, and Joshua M. Akey

Although somatic mutations are fundamentally important to human biology, disease, and aging, many outstanding questions remain about their rates, spectrum, and determinants in apparently healthy tissues. Here, we performed high-coverage exome sequencing on 265 samples from 14 GTEx donors sampled for a median of 17.5 tissues per donor (spanning 46 total tissues). Using a novel probabilistic method tailored to the unique structure of our data, we identified 8,470 somatic variants. We leverage our compendium of somatic mutations to quantify the burden of deleterious somatic variants among tissues and individuals, identify molecular features such as chromatin accessibility that exhibit significantly elevated somatic mutation rates, provide novel biological insights into mutational mechanisms, and infer developmental trajectories based on patterns of multi-tissue somatic mosaicism. Our data provides a high-resolution portrait of somatic mutations across genes, tissues, and individuals.

Read the paper: https://www.biorxiv.org/content/10.1101/2025.01.07.631808v1

Posted in Uncategorized

Radially patterned morphogenesis of murine hair follicle placodes ensures robust epithelial budding

By Leybova L, Biswas A, Sharan R, Trejo BM, Kim K, Soto-Muniz Y, Jones RA, Phillips BK, Devenport D.

The bending of simple cellular sheets into complex three-dimensional (3D) forms requires developmental patterning cues to specify where deformations occur, but how positional information directs morphological change is poorly understood. Here, we investigate how morphogen signaling and cell fate diversification contribute to the morphogenesis of murine hair placodes, in which collective cell movements transform radially symmetric primordia into bilaterally symmetric tubes. Through live imaging and 3D volumetric reconstructions, we demonstrate that Wnt and Shh establish radial patterns of cell fate, cell morphology, and movement within developing placodes. Cell fate diversity at different radial positions provides unique and essential contributions to placode morphogenesis. Further, we show that downstream of radial patterning, gradients of classical cadherin expression are required for efficient epithelial rearrangements. Given that the transformation of epithelial discs into 3D tubes is a common morphological motif used to shape diverse organ primordia, mechanisms of radially patterned morphogenesis are likely highly conserved across evolution.

Read the paper: https://doi.org/10.1016/j.devcel.2024.09.022

Posted in Uncategorized

Nuclear instance segmentation and tracking for preimplantation mouse embryos

By Hayden Nunley, Binglun Shao, David Denberg, Prateek Grover, Jaspreet Singh, Maria Avdeeva, Bradley Joyce, Rebecca Kim-Yip, Abraham Kohrman, Abhishek Biswas, Aaron Watters, Zsombor Gal, Alison Kickuth, Madeleine Chalifoux, Stanislav Y. Shvartsman, Lisa M. Brown, Eszter Posfai

For investigations into fate specification and morphogenesis in time-lapse images of preimplantation embryos, automated 3D instance segmentation and tracking of nuclei are invaluable. Low signal-to-noise ratio, high voxel anisotropy, high nuclear density, and variable nuclear shapes can limit the performance of segmentation methods, while tracking is complicated by cell divisions, low frame rates, and sample movements. Supervised machine learning approaches can radically improve segmentation accuracy and enable easier tracking, but they often require large amounts of annotated 3D data. Here, we first report a previously unreported mouse line expressing near-infrared nuclear reporter H2B-miRFP720. We then generate a dataset (termed BlastoSPIM) of 3D images of H2B-miRFP720-expressing embryos with ground truth for nuclear instances. Using BlastoSPIM, we benchmark seven convolutional neural networks and identify Stardist-3D as the most accurate instance segmentation method. With our BlastoSPIM-trained Stardist-3D models, we construct a complete pipeline for nuclear instance segmentation and lineage tracking from the eight-cell stage to the end of preimplantation development (>100 nuclei). Finally, we demonstrate the usefulness of BlastoSPIM as pre-train data for related problems, both for a different imaging modality and for different model systems.

Read the paper: https://doi.org/10.1242/dev.202817

Posted in Uncategorized

Published in eLife: Scale matters: Large language models with billions (rather than millions) of parameters better match neural representations of natural language

By Zhuoqiao Hong, Haocheng Wang, Zaid Zada, Harshvardhan Gazula, David Turner, Bobbi Abrey, Leonard Niekerken, Werner Doyle, Sasha Devore, Patricia Dugan, Daniel Friedman, Orin Devinsky, Adeen Flinker, Uri Hasson, Samuel A. Nastase, Ariel Goldstein

Recent research has used large language models (LLMs) to study the neural basis of naturalistic language processing in the human brain. LLMs have rapidly grown in complexity, leading to improved language processing capabilities. However, neuroscience researchers haven’t kept up with the quick progress in LLM development. Here, we utilized several families of transformer-based LLMs to investigate the relationship between model size and their ability to capture linguistic information in the human brain. Crucially, a subset of LLMs were trained on a fixed training set, enabling us to dissociate model size from architecture and training set size. We used electrocorticography (ECoG) to measure neural activity in epilepsy patients while they listened to a 30-minute naturalistic audio story. We fit electrode wise encoding models using contextual embeddings extracted from each hidden layer of the LLMs to predict word-level neural signals. In line with prior work, we found that larger LLMs better capture the structure of natural language and better predict neural activity. We also found a log-linear relationship where the encoding performance peaks in relatively earlier layers as model size increases. We also observed variations in the best-performing layer across different brain regions, corresponding to an organized language processing hierarchy.

Read the paper: https://doi.org/10.7554/eLife.101204.1

Posted in Uncategorized

Published in Journal of Open-Source Software: SubsetTools: A Python package to subset data to build and run ParFlow hydrologic models

By Amanda K. Triplett, Georgios Artavanis, William M. Hasling, Reed M. Maxwell, Amy Defnet, Amy M. Johnson, William Lytle, Andrew Bennett, Elena Leonarduzzi, Lisa K. Gallagher, Laura E. Condon 

Hydrologic models are an integral part of understanding and managing water supply. There are countless hydrologic models available that differ in their complexity, scale and focus on different parts of the hydrologic cycle. ParFlow is a fully integrated, physics-based model that simulates surface and subsurface flow simultaneously (Ashby & Falgout, 1996; Jones & Woodward, 2001; Kollet & Maxwell, 2006; Maxwell, 2013). ParFlow is also coupled with a land surface model which allows it to simulate the full terrestrial hydrologic cycle from bedrock to treetops (Kollet & Maxwell, 2008; Maxwell & Miller, 2005). It has been applied to a myriad of watersheds across the US and around the world to answer questions of water supply and groundwater–surface water interactions.

ParFlow is a scientifically rigorous hydrologic model; however, its application by the broader community has been limited to a degree by its technical complexity which creates a high barrier to entry for new users. Intensive training and hydrologic expertise is required to appropriately build a ParFlow model from scratch.

SubsetTools is a Python package that seeks to lower the barrier to entry by allowing a user to subset published and verified ParFlow inputs and model configurations to build their own watershed models. These tools allow a user to set up and run a model in a matter of minutes, rather than weeks or months. SubsetTools is designed to interface with two domains covering the contiguous United States (CONUS), CONUS1 (Maxwell et al., 2015, 2015; O’Neill et al., 2021) and CONUS2 (Yang et al., 2023). These domains determine the structure and attributes of the hydrogeologic inputs used to build the ParFlow model. SubsetTools is the first package of its kind to fetch and process all necessary inputs and create a functional ParFlow model, all in a single workflow.

Read the paper: https://doi.org/10.21105/joss.06752

Posted in Uncategorized

Published in Science: Recurrent gene flow between Neanderthals and modern humans over the past 200,000 years

By Liming Li, Troy J. Comi, Rob F. Bierman, and Joshua M. Akey

INTRODUCTION

For much of modern human history, we were only one of several different groups of hominins that existed. Studies of ancient and modern DNA have shown that admixture occurred multiple times among different hominin lineages, including between the ancestors of modern humans and Neanderthals. A number of methods have been developed to identify Neanderthal-introgressed sequences in the DNA of modern humans, which have provided insight into how admixture with Neanderthals shaped the biology and evolution of modern human genomes. Although gene flow from an early modern human population to Neanderthals has been described, the consequences of admixture on the Neanderthal genome have received comparatively less attention.

RATIONALE

A better understanding of how admixture with modern humans influenced patterns of Neanderthal genomic variation may provide insights into hominin evolutionary history. For example, DNA sequences inherited from modern human ancestors in Neanderthals can be used to test hypotheses on the frequency, magnitude, and timing of admixture and the population genetics characteristics of Neanderthals. Introgressed modern human sequences in Neanderthals can also be used to refine estimates of Neanderthal ancestry in contemporary individuals. We developed a simple framework to investigate introgressed human sequences in Neanderthals that is predicated on the expectation that sequences inherited from modern human ancestors would be, on average, more genetically diverse and would result in local increases in heterozygosity across the Neanderthal genome.

RESULTS

We first used a method we previously created called IBDmix to identify introgressed Neanderthal sequences in 2000 modern humans sequenced by the 1000 Genomes Project. We found that sequences identified by IBDmix as Neanderthal in African individuals from the 1000 Genomes Project are significantly enriched in regions of high heterozygosity in the Neanderthal genome, whereas no such enrichment is observed with sequences detected as introgressed in non-African individuals. We show that these patterns are caused by gene flow from modern humans to Neanderthals and estimate that the Vindija and Altai Neanderthal genomes have 53.9 Mb (2.5%) and 80.0 Mb (3.7%) of human-introgressed sequences, respectively. We leverage human-introgressed sequences in Neanderthals to revise estimates of the amount of Neanderthal-introgressed sequences in modern humans. Additionally, we show that human-introgressed sequences cause Neanderthal population size to be overestimated and that accounting for their effects decrease estimates of Neanderthal population size by ~20%. Finally, we found evidence for two distinct epochs of human gene flow into Neanderthals.

CONCLUSION

Our results provide insights into the history of admixture between modern humans and Neanderthals, show that gene flow had substantial impacts on patterns of modern human and Neanderthal genomic variation, and show that accounting for human-introgressed sequences in Neanderthals enables more-accurate inferences of admixture and its consequences in both Neanderthals and modern humans. More generally, the smaller estimated population size and inferred admixture dynamics are consistent with a Neanderthal population that was decreasing in size over time and was ultimately being absorbed into the modern human gene pool.

Read the paper: https://doi.org/10.1126/science.adi1768

Posted in Uncategorized

Liver-specific Mettl14 deletion induces nuclear heterotypia and dysregulates RNA export machinery

By Berggren KA, Sinha S, Lin AE, Schwoerer MP, Maya S, Biswas A, Cafiero TR, Liu Y, Gertje HP, Suzuki S, Berneshawi AR, Carver S, Heller B, Hassan N, Ali Q, Beard D, Wang D, Cullen JM, Kleiner RE, Crossland NA, Schwartz RE, Ploss A.

Modification of RNA with N6-methyladenosine (m6A) has gained attention in recent years as a general mechanism of gene regulation. In the liver, m6A, along with its associated machinery, has been studied as a potential biomarker of disease and cancer, with impacts on metabolism, cell cycle regulation, and pro-cancer state signaling. However these observational data have yet to be causally examined in vivo. For example, neither perturbation of the key m6A writers Mettl3 and Mettl14, nor the m6A readers Ythdf1 and Ythdf2 have been thoroughly mechanistically characterized in vivo as they have been in vitro. To understand the functions of these machineries, we developed mouse models and found that deleting Mettl14 led to progressive liver injury characterized by nuclear heterotypia, with changes in mRNA splicing, processing and export leading to increases in mRNA surveillance and recycling.

Read the paper: https://doi.org/10.1101/2024.06.17.599413

Posted in Uncategorized

Published in Genome Biology: HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data

By Matthew A. Myers, Brian J. Arnold, Vineet Bansal, Metin Balaban, Katelyn M. Mullen, Simone Zaccaria & Benjamin J. Raphael

Bulk DNA sequencing of multiple samples from the same tumor is becoming common, yet most methods to infer copy-number aberrations (CNAs) from this data analyze individual samples independently. We introduce HATCHet2, an algorithm to identify haplotype- and clone-specific CNAs simultaneously from multiple bulk samples. HATCHet2 extends the earlier HATCHet method by improving identification of focal CNAs and introducing a novel statistic, the minor haplotype B-allele frequency (mhBAF), that enables identification of mirrored-subclonal CNAs. We demonstrate HATCHet2’s improved accuracy using simulations and a single-cell sequencing dataset. HATCHet2 analysis of 10 prostate cancer patients reveals previously unreported mirrored-subclonal CNAs affecting cancer genes.

Read the paper: https://doi.org/10.1186/s13059-024-03267-x

Posted in Uncategorized

Published in Journal of Open-Source Software: hf_hydrodata: A Python package for accessing hydrologic simulations and observations across the United States

By Amy Defnet, William Hasling, Laura Condon, Amy Johnson, Georgios Artavanis, Amanda Triplett, William Lytle, and Reed Maxwell

The field of hydrologic modeling, or modeling of the terrestrial hydrologic cycle, is very data intensive. Models require many inputs to define topography, geology and atmospheric conditions. Additionally, in situ observations such as streamflow rate and depth to groundwater can be used to evaluate model outputs and calibrate input parameters. There are many public organizations and research groups in the United States which produce and make freely available parts of this required data. However, the data have a wide range of spatiotemporal resolutions, file types, and methods of access. This makes finding and accessing all the data required for analysis a very time-consuming part of most hydrologic studies. The hf_hydrodata package is designed to simplify this data acquisition process by providing access to a broad array of variables, all of which have been pre-processed for consistency.

Reas the paper: https://doi.org/10.21105/joss.06623

Posted in Uncategorized