top of page
csm_library_60b356a07a.jpg

Publications

The following publications provide scientific background on, and new results arising from, the Recon4IMD project.

open-flying-old-books.jpg

De novo and inherited dominant variants in U4 and U6 snRNAs cause retinitis pigmentosa

The U4 small nuclear RNA (snRNA) forms a duplex with the U6 snRNA and, together with U5 and ∼30 proteins, is part of the U4/U6.U5 tri-snRNP complex, located at the core of the major spliceosome. Recently, recurrent de novo variants in the U4 RNA, transcribed from the RNU4-2 gene, and in at least two other RNU genes were discovered to cause neurodevelopmental disorder. We detected inherited and de novo heterozygous variants in RNU4-2 (n.18_19insA and n.56T>C) and in four out of the five RNU6 paralogues (n.55_56insG and n.56_57insG) in 135 individuals from 62 families with non-syndromic retinitis pigmentosa (RP), a rare form of hereditary blindness. We show that these variants are recurrent among RP families and invariably cluster in close proximity within the three-way junction (between stem-I, the 5’ stem-loop and stem-II) of the U4/U6 duplex, affecting its natural conformation. Interestingly, this region binds to numerous splicing factors of the tri-snRNP complex including PRPF3, PRPF8 and PRPF31, previously associated with RP as well. The U4 and U6 variants identified seem to affect snRNP biogenesis, namely the U4/U6 di-snRNP, which is an assembly intermediate of the tri-snRNP. 

Bi-allelic KICS2 mutations impair KICSTOR complex-mediated mTORC1 regulation, causing intellectual disability and epilepsy

Nutrient-dependent mTORC1 regulation upon amino acid deprivation is mediated by the KICSTOR complex, comprising SZT2, KPTN, ITFG2, and KICS2, recruiting GATOR1 to lysosomes. Previously, pathogenic SZT2 and KPTN variants have been associated with autosomal recessive intellectual disability and epileptic encephalopathy. We identified bi-allelic KICS2 variants in eleven affected individuals presenting with intellectual disability and epilepsy. These variants partly affected KICS2 stability, compromised KICSTOR complex formation, and demonstrated a deleterious impact on nutrient-dependent mTORC1 regulation of 4EBP1 and S6K. Phosphoproteome analyses extended these findings to show that KICS2 variants changed the mTORC1 proteome, affecting proteins that function in translation, splicing, and ciliogenesis. Depletion of Kics2 in zebrafish resulted in ciliary dysfunction consistent with a role of mTORC1 in cilia biology. These in vitro and in vivo functional studies confirmed the pathogenicity of identified KICS2 variants. Our genetic and experimental data provide evidence that variants in KICS2 are a factor involved in intellectual disability due to its dysfunction impacting mTORC1 regulation and cilia biology.

open-flying-old-books.jpg

Genomic reanalysis of a pan-European rare-disease resource yields new diagnoses

Genetic diagnosis of rare diseases requires accurate identification and interpretation of genomic variants. Clinical and molecular scientists from 37 expert centers across Europe created the Solve-Rare Diseases Consortium (Solve-RD) resource, encompassing clinical, pedigree and genomic rare-disease data (94.5% exomes, 5.5% genomes), and performed systematic reanalysis for 6,447 individuals (3,592 male, 2,855 female) with previously undiagnosed rare diseases from 6,004 families. We established a collaborative, two-level expert review infrastructure that allowed a genetic diagnosis in 506 (8.4%) families. Of 552 disease-causing variants identified, 464 (84.1%) were single-nucleotide variants or short insertions/deletions. 

Dominant variants in major spliceosome U4 and U5 small nuclear RNA genes cause neurodevelopmental disorders through splicing disruption

The major spliceosome contains five small nuclear RNAs (snRNAs; U1, U2, U4, U5 and U6) essential for splicing. Variants in RNU4-2, encoding U4, cause a neurodevelopmental disorder called ReNU syndrome. We investigated de novo variants in 50 snRNA-encoding genes in a French cohort of 23,649 individuals with rare disorders and gathered additional cases through international collaborations. Altogether, we identified 145 previously unreported probands with (likely) pathogenic variants in RNU4-2 and 21 individuals with de novo and/or recurrent variants in RNU5B-1 and RNU5A-1, encoding U5. Pathogenic variants typically arose de novo on the maternal allele and cluster in regions critical for splicing.

open-flying-old-books.jpg

Long-Read Sequencing Identifies Mosaic Sequence Variations in Friedreich’s Ataxia-GAA Repeats

Friedreich’s ataxia (FRDA) is an autosomal recessive neurodegenerative disorder characterized by ataxia, sensory loss and pyramidal signs. While the majority of FRDA cases are caused by biallelic GAA trinucleotide repeat expansions in intron 1 of FXN, there is a subset of patients harboring a heterozygous pathogenic small variant compound-heterozygous with a GAA repeat expansion. We report on the diagnostic journey of a 21-year-old patient who was clinically suspected of having FRDA at the age of 12 years. Genetic testing included fragment analysis, gene panel analysis and exome sequencing, which only detected one pathogenic heterozygous missense variant (c.389 G>T,p.Gly130Val) in FXN. Although conventional repeat analyses failed to detect GAA expansions in our patient, subsequent short-read genome sequencing (GS) indicated a potential GAA repeat expansion. This finding was confirmed by long-read GS, which in addition revealed a complex pattern of interruptions. Both large and small GAA expansions with divergent interruptions containing G, A, GA, GAG and/or GAAG sequences were present within one allele, indicating mosaic sequence variations. 

Constraint-based modeling of bioenergetic differences between synaptic and non-synaptic components of dopaminergic neurons in Parkinson’s disease

Emerging evidence suggests that different metabolic characteristics, particularly bioenergetic differences, between the synaptic terminal and soma may contribute to the selective vulnerability of dopaminergic neurons in patients with Parkinson’s disease (PD).

o investigate the metabolic differences, we generated four thermodynamically flux-consistent metabolic models representing the synaptic and non-synaptic (somatic) components under both control and PD conditions. Differences in bioenergetic features and metabolite exchanges were analyzed between these models to explore potential mechanisms underlying the selective vulnerability of dopaminergic neurons. Bioenergetic rescue analyses were performed to identify potential therapeutic targets for mitigating observed energy failure and metabolic dysfunction in PD models.

All models predicted that oxidative phosphorylation plays a significant role under lower energy demand, while glycolysis predominates when energy demand exceeds mitochondrial constraints. 

open-flying-old-books.jpg

EEFSEC deficiency: A selenopathy with early-onset neurodegeneration

Inborn errors of selenoprotein expression arise from deleterious variants in genes encoding selenoproteins or selenoprotein biosynthetic factors, some of which are associated with neurodegenerative disorders. This study shows that bi-allelic selenocysteine tRNA-specific eukaryotic elongation factor (EEFSEC) variants cause selenoprotein deficiency, leading to progressive neurodegeneration. EEFSEC deficiency, an autosomal recessive disorder, manifests with global developmental delay, progressive spasticity, ataxia, and seizures. Cerebral MRI primarily demonstrated a cerebellar pathology, including hypoplasia and progressive atrophy. Exome or genome sequencing identified six different bi-allelic EEFSEC variants in nine individuals from eight unrelated families. 

Coupling metabolomics and exome sequencing reveals graded effects of rare damaging heterozygous variants on gene function and human traits

Genetic studies of the metabolome can uncover enzymatic and transport processes shaping human metabolism. Using rare variant aggregation testing based on whole-exome sequencing data to detect genes associated with levels of 1,294 plasma and 1,396 urine metabolites, we discovered 235 gene–metabolite associations, many previously unreported. Complementary approaches (genetic, computational (in silico gene knockouts in whole-body models of human metabolism) and one experimental proof of principle) provided orthogonal evidence that studies of rare, damaging variants in the heterozygous state permit inferences concordant with those from inborn errors of metabolism. 

open-flying-old-books.jpg

Dynamic whole-body models for infant metabolism

Comprehensive, sex-specific whole-body models (WBMs) accounting for organ-specific metabolism have been developed to allow for the simulation of adult and infant metabolism. These WBMs are evaluated daily, giving insights into metabolic flux changes that occur in one day of an infant’s or adult’s life. However, for medical applications, such as in metabolic diseases and their treatment, an evaluation and concentration predictions on a shorter time scale would be beneficial. Therefore, we developed a dynamic infant-WBM that couples metabolite dynamics in short time frames through physiology-based pharma-cokinetic models with the existing infant whole-body models. We then tailored the dynamic infant-WBM enabling the prediction of isovalerylcarnitine (C5), a clinical biomarker used for the inherited metabolic disease isovaleric aciduria (IVA). Our results show that, as expected, the predicted C5 concentrations exceeded the newborn screening thresholds during the time (36 - 72 hours) newborn screening blood samples are taken in the IVA models but not in models simulating healthy infants. We also demonstrate how the dynamic infant-WBMs can be used to test the effect changes in dietary intake have on the biomarker. Since the dynamic infant-WBMs were parametrised with literature-derived experimental or estimated values, we show how uncertainty quantification can be applied to quantify the parameter uncertainties. We found that the fractional unbound plasma needed to be estimated correctly, as this parameter strongly impacted C5 concentration predictions of the dynamic infant-WBMs. 

Personalised metabolic whole-body models for newborns and infants predict growth and biomarkers of inherited metabolic diseases

Extensive whole-body models (WBMs) accounting for organ-specific dynamics have been developed to simulate adult metabolism. However, there is currently a lack of models representing infant metabolism taking into consideration its special requirements in energy balance, nutrition, and growth. Here, we present a resource of organ-resolved, sex-specific, anatomically accurate models of newborn and infant metabolism, referred to as infant-whole-body models (infant-WBMs), spanning the first 180 days of life. These infant-WBMs were parameterised to represent the distinct metabolic characteristics of newborns and infants accurately. In particular, we adjusted the changes in organ weights, the energy requirements of brain development, heart function, and thermoregulation, as well as dietary requirements and energy requirements for physical activity. Subsequently, we validated the accuracy of the infant-WBMs by showing that the predicted neonatal and infant growth was consistent with the recommended growth by the World Health Organisation. We assessed the infant-WBMs’ reliability and capabilities for personalisation by simulating 10,000 newborn models, personalised with blood concentration measurements from newborn screening and birth weight. Moreover, we demonstrate that the models can accurately predict changes over time in known blood biomarkers in inherited metabolic diseases. 

open-flying-old-books.jpg

EnzChemRED, a rich enzyme chemistry relation extraction dataset.

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts where enzymes and the chemical reactions they catalyze are annotated using identifiers from the protein knowledgebase UniProtKB and the chemical ontology ChEBI. We show that fine-tuning language models with EnzChemRED significantly boosts their ability to identify proteins and chemicals in text (86.30% F1 score) and to extract the chemical conversions (86.66% F1 score) and the enzymes that catalyze those conversions (83.79% F1 score). We apply our methods to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea.

Integration of proteomic data with genome-scale metabolic models: A methodological overview

The integration of proteomics data with constraint-based reconstruction and analysis (COBRA) models plays a pivotal role in understanding the relationship between genotype and phenotype and bridges the gap between genome-level phenomena and functional adaptations. Integrating a generic genome-scale model with information on proteins enables generation of a context-specific metabolic model which improves the accuracy of model prediction. This review explores methodologies for incorporating proteomics data into genome-scale models. Available methods are grouped into four distinct categories based on their approach to integrate proteomics data and their depth of modeling. Within each category section various methods are introduced in chronological order of publication demonstrating the progress of this field. Furthermore, challenges and potential solutions to further progress are outlined, including the limited availability of appropriate in vitro data, experimental enzyme turnover rates, and the trade-off between model accuracy, computational tractability, and data scarcity. 

open-flying-old-books.jpg

fluxTrAM: Integration of tracer-based metabolomics data into atomically resolved genome-scale metabolic networks for metabolic flux analysis

Quantitative inference of intracellular reaction rates is essential for characterising metabolic phenotypes. The classical experimental method for measuring metabolic fluxes makes use of stable-isotope tracing of metabolites through the metabolic network, followed by mass spectrometry analysis. The most common 13C-based metabolic flux analysis requires multidisciplinary knowledge in analytical chemistry, cell biology, and mathematical modelling, as well as the use of multiple independent tools for handling mass spectrometry data. Besides, flux analysis is usually carried out within a small network to validate a specific biological hypothesis. To overcome interdisciplinary barriers and extend flux interpretation towards a genome-scale level, we developed fluxTrAM, a semi-automated pipeline for processing tracer- based metabolomics data and integrating it with atomically resolved genome-scale metabolic networks to enable flux predictions at genome-scale.  

DNA-binding affinity and specificity determine the phenotypic diversity in BCL11B-related disorders

BCL11B is a Cys2-His2 zinc-finger (C2H2-ZnF) domain-containing, DNA-binding, transcription factor with established roles in the development of various organs and tissues, primarily the immune and nervous systems. BCL11B germline variants have been associated with a variety of developmental syndromes. However, genotype-phenotype correlations along with pathophysiologic mechanisms of selected variants mostly remain elusive. To dissect these, we performed genotype-phenotype correlations of 92 affected individuals harboring a pathogenic or likely pathogenic BCL11B variant, followed by immune phenotyping, analysis of chromatin immunoprecipitation DNA-sequencing data, dual-luciferase reporter assays, and molecular modeling. These integrative analyses enabled us to define three clinical subtypes of BCL11B-related disorders. It is likely that gene-disruptive BCL11B variants and missense variants affecting zinc-binding cysteine and histidine residues cause mild to moderate neurodevelopmental delay with increased propensity for behavioral and dental anomalies, allergies and asthma, and reduced type 2 innate lymphoid cells.

open-flying-old-books.jpg

Integrative omics approaches to advance rare disease diagnostics

Over the past decade high-throughput DNA sequencing approaches, namely whole exome and whole genome sequencing became a standard procedure in Mendelian disease diagnostics. Implementation of these technologies greatly facilitated diagnostics and shifted the analysis paradigm from variant identification to prioritisation and evaluation. The diagnostic rates vary widely depending on the cohort size, heterogeneity and disease and range from around 30% to 50% leaving the majority of patients undiagnosed. Advances in omics technologies and computational analysis provide an opportunity to increase these unfavourable rates by providing evidence for disease-causing variant validation and prioritisation. This review aims to provide an overview of the current application of several omics technologies including RNA-sequencing, proteomics, metabolomics and DNA-methylation profiling for diagnostics of rare genetic diseases in general and inborn errors of metabolism in particular.

Personalized whole‐body models integrate metabolism, physiology, and the gut microbiome

Comprehensive molecular‐level models of human metabolism have been generated on a cellular level. However, models of whole‐body metabolism have not been established as they require new methodological approaches to integrate molecular and physiological data. We developed a new metabolic network reconstruction approach that used organ‐specific information from literature and omics data to generate two sex‐specific whole‐body metabolic (WBM) reconstructions. These reconstructions capture the metabolism of 26 organs and six blood cell types. Each WBM reconstruction represents whole‐body organ‐resolved metabolism with over 80,000 biochemical reactions in an anatomically and physiologically consistent manner. We parameterized the WBM reconstructions with physiological, dietary, and metabolomic data. The resulting WBM models could recapitulate known inter‐organ metabolic cycles and energy use. We also illustrate that the WBM models can predict known biomarkers of inherited metabolic diseases in different biofluids. 

open-flying-old-books.jpg

Integrative omics approaches to advance rare disease diagnostics

Over the past decade high-throughput DNA sequencing approaches, namely whole exome and whole genome sequencing became a standard procedure in Mendelian disease diagnostics. Implementation of these technologies greatly facilitated diagnostics and shifted the analysis paradigm from variant identification to prioritisation and evaluation. The diagnostic rates vary widely depending on the cohort size, heterogeneity and disease and range from around 30% to 50% leaving the majority of patients undiagnosed. Advances in omics technologies and computational analysis provide an opportunity to increase these unfavourable rates by providing evidence for disease-causing variant validation and prioritisation. This review aims to provide an overview of the current application of several omics technologies including RNA-sequencing, proteomics, metabolomics and DNA-methylation profiling for diagnostics of rare genetic diseases in general and inborn errors of metabolism in particular.

Personalized whole‐body models integrate metabolism, physiology, and the gut microbiome

Comprehensive molecular‐level models of human metabolism have been generated on a cellular level. However, models of whole‐body metabolism have not been established as they require new methodological approaches to integrate molecular and physiological data. We developed a new metabolic network reconstruction approach that used organ‐specific information from literature and omics data to generate two sex‐specific whole‐body metabolic (WBM) reconstructions. These reconstructions capture the metabolism of 26 organs and six blood cell types. Each WBM reconstruction represents whole‐body organ‐resolved metabolism with over 80,000 biochemical reactions in an anatomically and physiologically consistent manner. We parameterized the WBM reconstructions with physiological, dietary, and metabolomic data. The resulting WBM models could recapitulate known inter‐organ metabolic cycles and energy use.

open-flying-old-books.jpg

Software Quality Indicators: extraction, categorisation and recommendations from canonical sources

Research software plays a central role in modern science, and its quality is increasingly recognized as essential for reproducibility, sustainability, and trust. Numerous initiatives have proposed indicators to guide quality assessment, yet these indicators are dispersed across domains and vary in scope, terminology, and practical use. This work presents a curated catalogue of software quality indicators tailored to the needs of research software.

Developed during BioHackathon Europe 2024 and refined in collaboration with the ELIXIR Tools Platformvand EVERSE project, the catalogue consolidates and structures indicators from a range of authoritative sources.

Over 300 indicators were gathered and systematically reviewed for relevance, clarity, and implementation feasibility. Each was classified into thematic categories such as Documentation, Security, Usability, and Sustainability and annotated with target applicability, ease of evaluation, and recommended actions. Redundant, overly abstract, or narrowly scoped indicators were excluded or flagged, while additional tags highlighted cross-cutting concerns such as licensing, testing, and community practices.

Recon4IMD is co-funded by the European Union's Horizon Europe Framework Programme (101080997), the Swiss State Secretariat for Education, Research and Innovation (23.00232), and by United Kingdom Research and Innovation (10083717 & 10080153).

bottom of page