

Publications
The following publications provide scientific background on, and new results arising from, the Recon4IMD project.

De novo and inherited dominant variants in U4 and U6 snRNAs cause retinitis pigmentosa
The U4 small nuclear RNA (snRNA) forms a duplex with the U6 snRNA and, together with U5 and ∼30 proteins, is part of the U4/U6.U5 tri-snRNP complex, located at the core of the major spliceosome. Recently, recurrent de novo variants in the U4 RNA, transcribed from the RNU4-2 gene, and in at least two other RNU genes were discovered to cause neurodevelopmental disorder. We detected inherited and de novo heterozygous variants in RNU4-2 (n.18_19insA and n.56T>C) and in four out of the five RNU6 paralogues (n.55_56insG and n.56_57insG) in 135 individuals from 62 families with non-syndromic retinitis pigmentosa (RP), a rare form of hereditary blindness. We show that these variants are recurrent among RP families and invariably cluster in close proximity within the three-way junction (between stem-I, the 5’ stem-loop and stem-II) of the U4/U6 duplex, affecting its natural conformation. Interestingly, this region binds to numerous splicing factors of the tri-snRNP complex including PRPF3, PRPF8 and PRPF31, previously associated with RP as well. The U4 and U6 variants identified seem to affect snRNP biogenesis, namely the U4/U6 di-snRNP, which is an assembly intermediate of the tri-snRNP.
Bi-allelic KICS2 mutations impair KICSTOR complex-mediated mTORC1 regulation, causing intellectual disability and epilepsy
Nutrient-dependent mTORC1 regulation upon amino acid deprivation is mediated by the KICSTOR complex, comprising SZT2, KPTN, ITFG2, and KICS2, recruiting GATOR1 to lysosomes. Previously, pathogenic SZT2 and KPTN variants have been associated with autosomal recessive intellectual disability and epileptic encephalopathy. We identified bi-allelic KICS2 variants in eleven affected individuals presenting with intellectual disability and epilepsy. These variants partly affected KICS2 stability, compromised KICSTOR complex formation, and demonstrated a deleterious impact on nutrient-dependent mTORC1 regulation of 4EBP1 and S6K. Phosphoproteome analyses extended these findings to show that KICS2 variants changed the mTORC1 proteome, affecting proteins that function in translation, splicing, and ciliogenesis. Depletion of Kics2 in zebrafish resulted in ciliary dysfunction consistent with a role of mTORC1 in cilia biology. These in vitro and in vivo functional studies confirmed the pathogenicity of identified KICS2 variants. Our genetic and experimental data provide evidence that variants in KICS2 are a factor involved in intellectual disability due to its dysfunction impacting mTORC1 regulation and cilia biology.

Genomic reanalysis of a pan-European rare-disease resource yields new diagnoses
Genetic diagnosis of rare diseases requires accurate identification and interpretation of genomic variants. Clinical and molecular scientists from 37 expert centers across Europe created the Solve-Rare Diseases Consortium (Solve-RD) resource, encompassing clinical, pedigree and genomic rare-disease data (94.5% exomes, 5.5% genomes), and performed systematic reanalysis for 6,447 individuals (3,592 male, 2,855 female) with previously undiagnosed rare diseases from 6,004 families. We established a collaborative, two-level expert review infrastructure that allowed a genetic diagnosis in 506 (8.4%) families. Of 552 disease-causing variants identified, 464 (84.1%) were single-nucleotide variants or short insertions/deletions. These variants were either located in recently published novel disease genes (n = 67), recently reclassified in ClinVar (n = 187) or reclassified by consensus expert decision within Solve-RD (n = 210). Bespoke bioinformatics analyses identified the remaining 15.9% of causative variants (n = 88).
Dominant variants in major spliceosome U4 and U5 small nuclear RNA genes cause neurodevelopmental disorders through splicing disruption
The major spliceosome contains five small nuclear RNAs (snRNAs; U1, U2, U4, U5 and U6) essential for splicing. Variants in RNU4-2, encoding U4, cause a neurodevelopmental disorder called ReNU syndrome. We investigated de novo variants in 50 snRNA-encoding genes in a French cohort of 23,649 individuals with rare disorders and gathered additional cases through international collaborations. Altogether, we identified 145 previously unreported probands with (likely) pathogenic variants in RNU4-2 and 21 individuals with de novo and/or recurrent variants in RNU5B-1 and RNU5A-1, encoding U5. Pathogenic variants typically arose de novo on the maternal allele and cluster in regions critical for splicing. RNU4-2 variants mainly localize to two structures, the stem III and T-loop/quasi-pseudoknot, which position the U6 ACAGAGA box for 5′ splice site recognition and associate with different phenotypic severity. RNU4-2 variants result in specific defects in alternative 5′ splice site usage and methylation patterns (episignatures) that correlate with variant location and clinical severity.

Long-Read Sequencing Identifies Mosaic Sequence Variations in Friedreich’s Ataxia-GAA Repeats
Friedreich’s ataxia (FRDA) is an autosomal recessive neurodegenerative disorder characterized by ataxia, sensory loss and pyramidal signs. While the majority of FRDA cases are caused by biallelic GAA trinucleotide repeat expansions in intron 1 of FXN, there is a subset of patients harboring a heterozygous pathogenic small variant compound-heterozygous with a GAA repeat expansion. We report on the diagnostic journey of a 21-year-old patient who was clinically suspected of having FRDA at the age of 12 years. Genetic testing included fragment analysis, gene panel analysis and exome sequencing, which only detected one pathogenic heterozygous missense variant (c.389 G>T,p.Gly130Val) in FXN. Although conventional repeat analyses failed to detect GAA expansions in our patient, subsequent short-read genome sequencing (GS) indicated a potential GAA repeat expansion. This finding was confirmed by long-read GS, which in addition revealed a complex pattern of interruptions. Both large and small GAA expansions with divergent interruptions containing G, A, GA, GAG and/or GAAG sequences were present within one allele, indicating mosaic sequence variations. Our findings underscore the complexity of repeat expansions which can exhibit both interruptions and somatic instability. We also highlight the utility of long-read GS in unraveling intricate genetic profiles, ultimately contributing to more accurate diagnoses in clinical practice.
Constraint-based modeling of bioenergetic differences between synaptic and non-synaptic components of dopaminergic neurons in Parkinson’s disease
Emerging evidence suggests that different metabolic characteristics, particularly bioenergetic differences, between the synaptic terminal and soma may contribute to the selective vulnerability of dopaminergic neurons in patients with Parkinson’s disease (PD).
o investigate the metabolic differences, we generated four thermodynamically flux-consistent metabolic models representing the synaptic and non-synaptic (somatic) components under both control and PD conditions. Differences in bioenergetic features and metabolite exchanges were analyzed between these models to explore potential mechanisms underlying the selective vulnerability of dopaminergic neurons. Bioenergetic rescue analyses were performed to identify potential therapeutic targets for mitigating observed energy failure and metabolic dysfunction in PD models.
All models predicted that oxidative phosphorylation plays a significant role under lower energy demand, while glycolysis predominates when energy demand exceeds mitochondrial constraints.

EEFSEC deficiency: A selenopathy with early-onset neurodegeneration
Inborn errors of selenoprotein expression arise from deleterious variants in genes encoding selenoproteins or selenoprotein biosynthetic factors, some of which are associated with neurodegenerative disorders. This study shows that bi-allelic selenocysteine tRNA-specific eukaryotic elongation factor (EEFSEC) variants cause selenoprotein deficiency, leading to progressive neurodegeneration. EEFSEC deficiency, an autosomal recessive disorder, manifests with global developmental delay, progressive spasticity, ataxia, and seizures. Cerebral MRI primarily demonstrated a cerebellar pathology, including hypoplasia and progressive atrophy. Exome or genome sequencing identified six different bi-allelic EEFSEC variants in nine individuals from eight unrelated families. These variants showed reduced EEFSEC function in vitro, leading to lower levels of selenoproteins in fibroblasts. In line with the clinical phenotype, an eEFSec-RNAi Drosophila model displays progressive impairment of motor function, which is reflected in the synaptic defects in this model organisms. This study identifies EEFSEC deficiency as an inborn error of selenocysteine metabolism. It reveals the pathophysiological mechanisms of neurodegeneration linked to selenoprotein metabolism, suggesting potential targeted therapies.
Coupling metabolomics and exome sequencing reveals graded effects of rare damaging heterozygous variants on gene function and human traits
Genetic studies of the metabolome can uncover enzymatic and transport processes shaping human metabolism. Using rare variant aggregation testing based on whole-exome sequencing data to detect genes associated with levels of 1,294 plasma and 1,396 urine metabolites, we discovered 235 gene–metabolite associations, many previously unreported. Complementary approaches (genetic, computational (in silico gene knockouts in whole-body models of human metabolism) and one experimental proof of principle) provided orthogonal evidence that studies of rare, damaging variants in the heterozygous state permit inferences concordant with those from inborn errors of metabolism. Allelic series of functional variants in transporters responsible for transcellular sulfate reabsorption (SLC13A1, SLC26A1) exhibited graded effects on plasma sulfate and human height and pinpointed alleles associated with increased odds of diverse musculoskeletal traits and diseases in the population. This integrative approach can identify new players in incompletely characterized human metabolic reactions and reveal metabolic readouts informative of human traits and diseases.

Dynamic whole-body models for infant metabolism
Comprehensive, sex-specific whole-body models (WBMs) accounting for organ-specific metabolism have been developed to allow for the simulation of adult and infant metabolism. These WBMs are evaluated daily, giving insights into metabolic flux changes that occur in one day of an infant’s or adult’s life. However, for medical applications, such as in metabolic diseases and their treatment, an evaluation and concentration predictions on a shorter time scale would be beneficial. Therefore, we developed a dynamic infant-WBM that couples metabolite dynamics in short time frames through physiology-based pharma-cokinetic models with the existing infant whole-body models. We then tailored the dynamic infant-WBM enabling the prediction of isovalerylcarnitine (C5), a clinical biomarker used for the inherited metabolic disease isovaleric aciduria (IVA). Our results show that, as expected, the predicted C5 concentrations exceeded the newborn screening thresholds during the time (36 - 72 hours) newborn screening blood samples are taken in the IVA models but not in models simulating healthy infants. We also demonstrate how the dynamic infant-WBMs can be used to test the effect changes in dietary intake have on the biomarker. Since the dynamic infant-WBMs were parametrised with literature-derived experimental or estimated values, we show how uncertainty quantification can be applied to quantify the parameter uncertainties. We found that the fractional unbound plasma needed to be estimated correctly, as this parameter strongly impacted C5 concentration predictions of the dynamic infant-WBMs. Overall, the dynamic infant-WBMs hold promise for personalised medicine, as it enables personalised biomarker concentration predictions of healthy and diseased infant metabolism in various time intervals.
Personalised metabolic whole-body models for newborns and infants predict growth and biomarkers of inherited metabolic diseases
Extensive whole-body models (WBMs) accounting for organ-specific dynamics have been developed to simulate adult metabolism. However, there is currently a lack of models representing infant metabolism taking into consideration its special requirements in energy balance, nutrition, and growth. Here, we present a resource of organ-resolved, sex-specific, anatomically accurate models of newborn and infant metabolism, referred to as infant-whole-body models (infant-WBMs), spanning the first 180 days of life. These infant-WBMs were parameterised to represent the distinct metabolic characteristics of newborns and infants accurately. In particular, we adjusted the changes in organ weights, the energy requirements of brain development, heart function, and thermoregulation, as well as dietary requirements and energy requirements for physical activity. Subsequently, we validated the accuracy of the infant-WBMs by showing that the predicted neonatal and infant growth was consistent with the recommended growth by the World Health Organisation. We assessed the infant-WBMs’ reliability and capabilities for personalisation by simulating 10,000 newborn models, personalised with blood concentration measurements from newborn screening and birth weight. Moreover, we demonstrate that the models can accurately predict changes over time in known blood biomarkers in inherited metabolic diseases. By this, the infant-WBM resource can provide valuable insights into infant metabolism on an organ-resolved level and enable a holistic view of the metabolic processes occurring in infants, considering the unique energy and dietary requirements as well as growth patterns specific to this population. As such, the infant-WBM resource holds promise for personalised medicine, as the infant-WBMs could be a first step to digital metabolic twins for newborn and infant metabolism for personalised systematic simulations and treatment planning.

EnzChemRED, a rich enzyme chemistry relation extraction dataset.
Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts where enzymes and the chemical reactions they catalyze are annotated using identifiers from the protein knowledgebase UniProtKB and the chemical ontology ChEBI. We show that fine-tuning language models with EnzChemRED significantly boosts their ability to identify proteins and chemicals in text (86.30% F1 score) and to extract the chemical conversions (86.66% F1 score) and the enzymes that catalyze those conversions (83.79% F1 score). We apply our methods to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea.
Integration of proteomic data with genome-scale metabolic models: A methodological overview
The integration of proteomics data with constraint-based reconstruction and analysis (COBRA) models plays a pivotal role in understanding the relationship between genotype and phenotype and bridges the gap between genome-level phenomena and functional adaptations. Integrating a generic genome-scale model with information on proteins enables generation of a context-specific metabolic model which improves the accuracy of model prediction. This review explores methodologies for incorporating proteomics data into genome-scale models. Available methods are grouped into four distinct categories based on their approach to integrate proteomics data and their depth of modeling. Within each category section various methods are introduced in chronological order of publication demonstrating the progress of this field. Furthermore, challenges and potential solutions to further progress are outlined, including the limited availability of appropriate in vitro data, experimental enzyme turnover rates, and the trade-off between model accuracy, computational tractability, and data scarcity. In conclusion, methods employing simpler approaches demand fewer kinetic and omics data, consequently leading to a less complex mathematical problem and reduced computational expenses. On the other hand, approaches that delve deeper into cellular mechanisms and aim to create detailed mathematical models necessitate more extensive kinetic and omics data, resulting in a more complex and computationally demanding problem. However, in some cases, this increased cost can be justified by the potential for more precise predictions.

fluxTrAM: Integration of tracer-based metabolomics data into atomically resolved genome-scale metabolic networks for metabolic flux analysis
Quantitative inference of intracellular reaction rates is essential for characterising metabolic phenotypes. The classical experimental method for measuring metabolic fluxes makes use of stable-isotope tracing of metabolites through the metabolic network, followed by mass spectrometry analysis. The most common 13C-based metabolic flux analysis requires multidisciplinary knowledge in analytical chemistry, cell biology, and mathematical modelling, as well as the use of multiple independent tools for handling mass spectrometry data. Besides, flux analysis is usually carried out within a small network to validate a specific biological hypothesis. To overcome interdisciplinary barriers and extend flux interpretation towards a genome-scale level, we developed fluxTrAM, a semi-automated pipeline for processing tracer- based metabolomics data and integrating it with atomically resolved genome-scale metabolic networks to enable flux predictions at genome-scale. fluxTrAM integrates different software packages inside and outside of the COBRA Toolbox v3.4 for the generation of metabolite structure and reaction databases for a genome-scale model, labelled mass spectrometry data processing into standardised mass isotopologue distribution data (MID), and metabolic flux analysis.
DNA-binding affinity and specificity determine the phenotypic diversity in BCL11B-related disorders
BCL11B is a Cys2-His2 zinc-finger (C2H2-ZnF) domain-containing, DNA-binding, transcription factor with established roles in the development of various organs and tissues, primarily the immune and nervous systems. BCL11B germline variants have been associated with a variety of developmental syndromes. However, genotype-phenotype correlations along with pathophysiologic mechanisms of selected variants mostly remain elusive. To dissect these, we performed genotype-phenotype correlations of 92 affected individuals harboring a pathogenic or likely pathogenic BCL11B variant, followed by immune phenotyping, analysis of chromatin immunoprecipitation DNA-sequencing data, dual-luciferase reporter assays, and molecular modeling. These integrative analyses enabled us to define three clinical subtypes of BCL11B-related disorders. It is likely that gene-disruptive BCL11B variants and missense variants affecting zinc-binding cysteine and histidine residues cause mild to moderate neurodevelopmental delay with increased propensity for behavioral and dental anomalies, allergies and asthma, and reduced type 2 innate lymphoid cells. Missense variants within C2H2-ZnF DNA-contacting α helices cause highly variable clinical presentations ranging from multisystem anomalies with demise in the first years of life to late-onset, hyperkinetic movement disorder with poor fine motor skills. Those not in direct DNA contact cause a milder phenotype through reduced, target-specific transcriptional activity.

Integrative omics approaches to advance rare disease diagnostics
Over the past decade high-throughput DNA sequencing approaches, namely whole exome and whole genome sequencing became a standard procedure in Mendelian disease diagnostics. Implementation of these technologies greatly facilitated diagnostics and shifted the analysis paradigm from variant identification to prioritisation and evaluation. The diagnostic rates vary widely depending on the cohort size, heterogeneity and disease and range from around 30% to 50% leaving the majority of patients undiagnosed. Advances in omics technologies and computational analysis provide an opportunity to increase these unfavourable rates by providing evidence for disease-causing variant validation and prioritisation. This review aims to provide an overview of the current application of several omics technologies including RNA-sequencing, proteomics, metabolomics and DNA-methylation profiling for diagnostics of rare genetic diseases in general and inborn errors of metabolism in particular.
Personalized whole‐body models integrate metabolism, physiology, and the gut microbiome
Comprehensive molecular‐level models of human metabolism have been generated on a cellular level. However, models of whole‐body metabolism have not been established as they require new methodological approaches to integrate molecular and physiological data. We developed a new metabolic network reconstruction approach that used organ‐specific information from literature and omics data to generate two sex‐specific whole‐body metabolic (WBM) reconstructions. These reconstructions capture the metabolism of 26 organs and six blood cell types. Each WBM reconstruction represents whole‐body organ‐resolved metabolism with over 80,000 biochemical reactions in an anatomically and physiologically consistent manner. We parameterized the WBM reconstructions with physiological, dietary, and metabolomic data. The resulting WBM models could recapitulate known inter‐organ metabolic cycles and energy use. We also illustrate that the WBM models can predict known biomarkers of inherited metabolic diseases in different biofluids. Predictions of basal metabolic rates, by WBM models personalized with physiological data, outperformed current phenomenological models. Finally, integrating microbiome data allowed the exploration of host–microbiome co‐metabolism. Overall, the WBM reconstructions, and their derived computational models, represent an important step toward virtual physiological humans.

Integration of proteomics with genomics and transcriptomics increases the diagnostic rate of Mendelian disorders
By lack of functional evidence, genome-based diagnostic rates cap at approximately 50% across diverse Mendelian diseases. Here we demonstrate the effectiveness of combining genomics, transcriptomics, and, for the first time, proteomics and phenotypic descriptors, in a systematic diagnostic approach to discover the genetic cause of mitochondrial diseases. On fibroblast cell lines from 145 individuals, tandem mass tag labelled proteomics detected approximately 8,000 proteins per sample and covered over 50% of all Mendelian disease-associated genes. By providing independent functional evidence, aberrant protein expression analysis allowed validation of candidate protein-destabilising variants and of variants leading to aberrant RNA expression. Overall, our integrative computational workflow led to genetic resolution for 21% of 121 genetically unsolved cases and to the discovery of two novel disease genes. With increasing democratization of high-throughput omics assays, our approach and code provide a blueprint for implementing multi-omics based Mendelian disease diagnostics in routine clinical practice.
Personalized metabolic whole-body models for newborns and infants predict growth and biomarkers of inherited metabolic diseases
Comprehensive whole-body models (WBMs) accounting for organ-specific dynamics have been developed to simulate adult metabolism, but such models do not exist for infants. Here, we present a resource of 360 organ-resolved, sex-specific models of newborn and infant metabolism (infant-WBMs) spanning the first 180 days of life. These infant-WBMs were parameterized to represent the distinct metabolic characteristics of newborns and infants, including nutrition, energy requirements, and thermoregulation. We demonstrate that the predicted infant growth was consistent with the recommendation by the World Health Organization. We assessed the infant-WBMs’ reliability and capabilities for personalization by simulating 10,000 newborns based on their blood metabolome and birth weight. Furthermore, the infant-WBMs accurately predicted changes in known biomarkers over time and metabolic responses to treatment strategies for inherited metabolic diseases. The infant-WBM resource holds promise for personalized medicine, as the infant-WBMs could be a first step to digital metabolic twins for newborn and infant metabolism.