Nuclear instance segmentation and tracking for preimplantation mouse embryos

By Hayden Nunley, Binglun Shao, David Denberg, Prateek Grover, Jaspreet Singh, Maria Avdeeva, Bradley Joyce, Rebecca Kim-Yip, Abraham Kohrman, Abhishek Biswas, Aaron Watters, Zsombor Gal, Alison Kickuth, Madeleine Chalifoux, Stanislav Y. Shvartsman, Lisa M. Brown, Eszter Posfai

For investigations into fate specification and morphogenesis in time-lapse images of preimplantation embryos, automated 3D instance segmentation and tracking of nuclei are invaluable. Low signal-to-noise ratio, high voxel anisotropy, high nuclear density, and variable nuclear shapes can limit the performance of segmentation methods, while tracking is complicated by cell divisions, low frame rates, and sample movements. Supervised machine learning approaches can radically improve segmentation accuracy and enable easier tracking, but they often require large amounts of annotated 3D data. Here, we first report a previously unreported mouse line expressing near-infrared nuclear reporter H2B-miRFP720. We then generate a dataset (termed BlastoSPIM) of 3D images of H2B-miRFP720-expressing embryos with ground truth for nuclear instances. Using BlastoSPIM, we benchmark seven convolutional neural networks and identify Stardist-3D as the most accurate instance segmentation method. With our BlastoSPIM-trained Stardist-3D models, we construct a complete pipeline for nuclear instance segmentation and lineage tracking from the eight-cell stage to the end of preimplantation development (>100 nuclei). Finally, we demonstrate the usefulness of BlastoSPIM as pre-train data for related problems, both for a different imaging modality and for different model systems.

Read the paper: https://doi.org/10.1242/dev.202817

Published in eLife: Scale matters: Large language models with billions (rather than millions) of parameters better match neural representations of natural language

By Zhuoqiao Hong, Haocheng Wang, Zaid Zada, Harshvardhan Gazula, David Turner, Bobbi Abrey, Leonard Niekerken, Werner Doyle, Sasha Devore, Patricia Dugan, Daniel Friedman, Orin Devinsky, Adeen Flinker, Uri Hasson, Samuel A. Nastase, Ariel Goldstein

Recent research has used large language models (LLMs) to study the neural basis of naturalistic language processing in the human brain. LLMs have rapidly grown in complexity, leading to improved language processing capabilities. However, neuroscience researchers haven’t kept up with the quick progress in LLM development. Here, we utilized several families of transformer-based LLMs to investigate the relationship between model size and their ability to capture linguistic information in the human brain. Crucially, a subset of LLMs were trained on a fixed training set, enabling us to dissociate model size from architecture and training set size. We used electrocorticography (ECoG) to measure neural activity in epilepsy patients while they listened to a 30-minute naturalistic audio story. We fit electrode wise encoding models using contextual embeddings extracted from each hidden layer of the LLMs to predict word-level neural signals. In line with prior work, we found that larger LLMs better capture the structure of natural language and better predict neural activity. We also found a log-linear relationship where the encoding performance peaks in relatively earlier layers as model size increases. We also observed variations in the best-performing layer across different brain regions, corresponding to an organized language processing hierarchy.

Read the paper: https://doi.org/10.7554/eLife.101204.1

Published in Journal of Open-Source Software: SubsetTools: A Python package to subset data to build and run ParFlow hydrologic models

By Amanda K. Triplett, Georgios Artavanis, William M. Hasling, Reed M. Maxwell, Amy Defnet, Amy M. Johnson, William Lytle, Andrew Bennett, Elena Leonarduzzi, Lisa K. Gallagher, Laura E. Condon 

Hydrologic models are an integral part of understanding and managing water supply. There are countless hydrologic models available that differ in their complexity, scale and focus on different parts of the hydrologic cycle. ParFlow is a fully integrated, physics-based model that simulates surface and subsurface flow simultaneously (Ashby & Falgout, 1996; Jones & Woodward, 2001; Kollet & Maxwell, 2006; Maxwell, 2013). ParFlow is also coupled with a land surface model which allows it to simulate the full terrestrial hydrologic cycle from bedrock to treetops (Kollet & Maxwell, 2008; Maxwell & Miller, 2005). It has been applied to a myriad of watersheds across the US and around the world to answer questions of water supply and groundwater–surface water interactions.

ParFlow is a scientifically rigorous hydrologic model; however, its application by the broader community has been limited to a degree by its technical complexity which creates a high barrier to entry for new users. Intensive training and hydrologic expertise is required to appropriately build a ParFlow model from scratch.

SubsetTools is a Python package that seeks to lower the barrier to entry by allowing a user to subset published and verified ParFlow inputs and model configurations to build their own watershed models. These tools allow a user to set up and run a model in a matter of minutes, rather than weeks or months. SubsetTools is designed to interface with two domains covering the contiguous United States (CONUS), CONUS1 (Maxwell et al., 2015, 2015; O’Neill et al., 2021) and CONUS2 (Yang et al., 2023). These domains determine the structure and attributes of the hydrogeologic inputs used to build the ParFlow model. SubsetTools is the first package of its kind to fetch and process all necessary inputs and create a functional ParFlow model, all in a single workflow.

Read the paper: https://doi.org/10.21105/joss.06752

Posted in Uncategorized

Published in Science: Recurrent gene flow between Neanderthals and modern humans over the past 200,000 years

By Liming Li, Troy J. Comi, Rob F. Bierman, and Joshua M. Akey

INTRODUCTION

For much of modern human history, we were only one of several different groups of hominins that existed. Studies of ancient and modern DNA have shown that admixture occurred multiple times among different hominin lineages, including between the ancestors of modern humans and Neanderthals. A number of methods have been developed to identify Neanderthal-introgressed sequences in the DNA of modern humans, which have provided insight into how admixture with Neanderthals shaped the biology and evolution of modern human genomes. Although gene flow from an early modern human population to Neanderthals has been described, the consequences of admixture on the Neanderthal genome have received comparatively less attention.

RATIONALE

A better understanding of how admixture with modern humans influenced patterns of Neanderthal genomic variation may provide insights into hominin evolutionary history. For example, DNA sequences inherited from modern human ancestors in Neanderthals can be used to test hypotheses on the frequency, magnitude, and timing of admixture and the population genetics characteristics of Neanderthals. Introgressed modern human sequences in Neanderthals can also be used to refine estimates of Neanderthal ancestry in contemporary individuals. We developed a simple framework to investigate introgressed human sequences in Neanderthals that is predicated on the expectation that sequences inherited from modern human ancestors would be, on average, more genetically diverse and would result in local increases in heterozygosity across the Neanderthal genome.

RESULTS

We first used a method referred to as IBDmix to identify introgressed Neanderthal sequences in 2000 modern humans sequenced by the 1000 Genomes Project. We found that sequences identified by IBDmix as Neanderthal in African individuals from the 1000 Genomes Project are significantly enriched in regions of high heterozygosity in the Neanderthal genome, whereas no such enrichment is observed with sequences detected as introgressed in non-African individuals. We show that these patterns are caused by gene flow from modern humans to Neanderthals and estimate that the Vindija and Altai Neanderthal genomes have 53.9 Mb (2.5%) and 80.0 Mb (3.7%) of human-introgressed sequences, respectively. We leverage human-introgressed sequences in Neanderthals to revise estimates of the amount of Neanderthal-introgressed sequences in modern humans. Additionally, we show that human-introgressed sequences cause Neanderthal population size to be overestimated and that accounting for their effects decrease estimates of Neanderthal population size by ~20%. Finally, we found evidence for two distinct epochs of human gene flow into Neanderthals.

CONCLUSION

Our results provide insights into the history of admixture between modern humans and Neanderthals, show that gene flow had substantial impacts on patterns of modern human and Neanderthal genomic variation, and show that accounting for human-introgressed sequences in Neanderthals enables more-accurate inferences of admixture and its consequences in both Neanderthals and modern humans. More generally, the smaller estimated population size and inferred admixture dynamics are consistent with a Neanderthal population that was decreasing in size over time and was ultimately being absorbed into the modern human gene pool.

Read the paper: https://doi.org/10.1126/science.adi1768

Posted in Uncategorized

Liver-specific Mettl14 deletion induces nuclear heterotypia and dysregulates RNA export machinery

By Berggren KA, Sinha S, Lin AE, Schwoerer MP, Maya S, Biswas A, Cafiero TR, Liu Y, Gertje HP, Suzuki S, Berneshawi AR, Carver S, Heller B, Hassan N, Ali Q, Beard D, Wang D, Cullen JM, Kleiner RE, Crossland NA, Schwartz RE, Ploss A.

Modification of RNA with N6-methyladenosine (m6A) has gained attention in recent years as a general mechanism of gene regulation. In the liver, m6A, along with its associated machinery, has been studied as a potential biomarker of disease and cancer, with impacts on metabolism, cell cycle regulation, and pro-cancer state signaling. However these observational data have yet to be causally examined in vivo. For example, neither perturbation of the key m6A writers Mettl3 and Mettl14, nor the m6A readers Ythdf1 and Ythdf2 have been thoroughly mechanistically characterized in vivo as they have been in vitro. To understand the functions of these machineries, we developed mouse models and found that deleting Mettl14 led to progressive liver injury characterized by nuclear heterotypia, with changes in mRNA splicing, processing and export leading to increases in mRNA surveillance and recycling.

Read the paper: https://doi.org/10.1101/2024.06.17.599413

Posted in Uncategorized

Published in Genome Biology: HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data

By Matthew A. Myers, Brian J. Arnold, Vineet Bansal, Metin Balaban, Katelyn M. Mullen, Simone Zaccaria & Benjamin J. Raphael

Bulk DNA sequencing of multiple samples from the same tumor is becoming common, yet most methods to infer copy-number aberrations (CNAs) from this data analyze individual samples independently. We introduce HATCHet2, an algorithm to identify haplotype- and clone-specific CNAs simultaneously from multiple bulk samples. HATCHet2 extends the earlier HATCHet method by improving identification of focal CNAs and introducing a novel statistic, the minor haplotype B-allele frequency (mhBAF), that enables identification of mirrored-subclonal CNAs. We demonstrate HATCHet2’s improved accuracy using simulations and a single-cell sequencing dataset. HATCHet2 analysis of 10 prostate cancer patients reveals previously unreported mirrored-subclonal CNAs affecting cancer genes.

Read the paper: https://doi.org/10.1186/s13059-024-03267-x

Posted in Uncategorized

Published in Journal of Open-Source Software: hf_hydrodata: A Python package for accessing hydrologic simulations and observations across the United States

By Amy Defnet, William Hasling, Laura Condon, Amy Johnson, Georgios Artavanis, Amanda Triplett, William Lytle, and Reed Maxwell

The field of hydrologic modeling, or modeling of the terrestrial hydrologic cycle, is very data intensive. Models require many inputs to define topography, geology and atmospheric conditions. Additionally, in situ observations such as streamflow rate and depth to groundwater can be used to evaluate model outputs and calibrate input parameters. There are many public organizations and research groups in the United States which produce and make freely available parts of this required data. However, the data have a wide range of spatiotemporal resolutions, file types, and methods of access. This makes finding and accessing all the data required for analysis a very time-consuming part of most hydrologic studies. The hf_hydrodata package is designed to simplify this data acquisition process by providing access to a broad array of variables, all of which have been pre-processed for consistency.

Reas the paper: https://doi.org/10.21105/joss.06623

Posted in Uncategorized

Published in IEEE Transactions on Power Electronics: How MagNet: Machine Learning Framework for Modeling Power Magnetic Material Characteristics

By Haoran Li, Diego Serrano, Thomas Guillod, Shukai Wang, Evan Dogariu, Andrew Nadler, Min Luo, Vineet Bansal, Niraj K. Jha, Yuxin Chen, Charles R. Sullivan, and Minjie Chen

This article applies machine learning to power magnetics modeling. We first introduce an open-source database—MagNet—which hosts a large amount of experimentally measured excitation data for many materials across a variety of operating conditions, consisting of more than 500 000 data points in its current state. The processes for data acquisition and data quality control are explained. We then demonstrate a few neural network-based power magnetics modeling tools for modeling the core losses and B–H loops. The neural network allows multiple factors that may influence the magnetic characteristics to be modeled in a unified framework, where the nonlinear behaviors are captured with high accuracy and high generality. Neural network models are found to be effective in compressing the measurement data and predicting the material characteristics, paving the way for “neural networks as datasheets” to assist power magnetics design. Transfer learning is applied to the training of neural network models to further reduce the data size requirement while maintaining sufficient model accuracy.

Read the paper: https://doi.org/10.1109/TPEL.2023.3309232

Posted in Uncategorized

Published in PEARC ’23: Jobstats: A Slurm-Compatible Job Monitoring Platform for CPU and GPU Clusters

Josko Plazonic, Jonathan Halverson, and Troy Comi

Job monitoring on high-performance computing clusters is important for evaluating hardware performance, troubleshooting failed jobs, identifying inefficient jobs and more. The combination of the Prometheus monitoring framework and the Grafana visualization toolkit has proven successful in recent years. This work shows how four Prometheus exporters can be configured for a Slurm cluster to provide detailed job-level information on CPU/GPU efficiencies and CPU/GPU memory usage as well as node-level Network File System (NFS) statistics and cluster-level General Parallel File System (GPFS) activity. A novel approach was devised to efficiently store a summary of this data in the Slurm database for each completed job. The open-source job monitoring platform introduced here can be used for batch, interactive and Open OnDemand jobs. Several tools are presented that use the Prometheus and Slurm databases to create dashboards, utilization reports and alerts.

Read the paper: https://doi.org/10.1145/3569951.3604396

Posted in Uncategorized

Region-specific reversal of epidermal planar polarity in the rosette fancy mouse

By Maureen Cetera, Rishabh Sharan, Gabriela Hayward-Lara, Brooke Phillips, Abhishek Biswas, Madalene Halley, Evalyn Beall, Bridgett vonHoldt, Danelle Devenport

The planar cell polarity (PCP) pathway collectively orients cells with respect to a body axis. Hair follicles of the murine epidermis provide a striking readout of PCP activity in their uniform alignment across the skin. Here, we characterize, from the molecular to tissue-scale, PCP establishment in the rosette fancy mouse, a natural variant with posterior-specific whorls in its fur, to understand how epidermal polarity is coordinated across the tissue. We find that rosette hair follicles emerge with reversed orientations specifically in the posterior region, creating a mirror image of epidermal polarity. The rosette trait is associated with a missense mutation in the core PCP gene Fzd6, which alters a consensus site for N-linked glycosylation, inhibiting its membrane localization. Unexpectedly, the Fzd6 trafficking defect does not block asymmetric localization of the other PCP proteins. Rather, the normally uniform axis of PCP asymmetry rotates where the PCP-directed cell movements that orient follicles are reversed, suggesting the PCP axis rotates 180°. Collectively, our multiscale analysis of epidermal polarity reveals PCP patterning can be regionally decoupled to produce posterior whorls in the rosette fancy mouse.

Read the paper: https://doi.org/10.1242/dev.202078

Posted in Uncategorized