

Publications
The following publications provide scientific background on, and new results arising from, the Recon4IMD project.

EEFSEC deficiency: A selenopathy with early-onset neurodegeneration
Inborn errors of selenoprotein expression arise from deleterious variants in genes encoding selenoproteins or selenoprotein biosynthetic factors, some of which are associated with neurodegenerative disorders. This study shows that bi-allelic selenocysteine tRNA-specific eukaryotic elongation factor (EEFSEC) variants cause selenoprotein deficiency, leading to progressive neurodegeneration. EEFSEC deficiency, an autosomal recessive disorder, manifests with global developmental delay, progressive spasticity, ataxia, and seizures. Cerebral MRI primarily demonstrated a cerebellar pathology, including hypoplasia and progressive atrophy. Exome or genome sequencing identified six different bi-allelic EEFSEC variants in nine individuals from eight unrelated families. These variants showed reduced EEFSEC function in vitro, leading to lower levels of selenoproteins in fibroblasts. In line with the clinical phenotype, an eEFSec-RNAi Drosophila model displays progressive impairment of motor function, which is reflected in the synaptic defects in this model organisms. This study identifies EEFSEC deficiency as an inborn error of selenocysteine metabolism. It reveals the pathophysiological mechanisms of neurodegeneration linked to selenoprotein metabolism, suggesting potential targeted therapies.
Coupling metabolomics and exome sequencing reveals graded effects of rare damaging heterozygous variants on gene function and human traits
Genetic studies of the metabolome can uncover enzymatic and transport processes shaping human metabolism. Using rare variant aggregation testing based on whole-exome sequencing data to detect genes associated with levels of 1,294 plasma and 1,396 urine metabolites, we discovered 235 gene–metabolite associations, many previously unreported. Complementary approaches (genetic, computational (in silico gene knockouts in whole-body models of human metabolism) and one experimental proof of principle) provided orthogonal evidence that studies of rare, damaging variants in the heterozygous state permit inferences concordant with those from inborn errors of metabolism. Allelic series of functional variants in transporters responsible for transcellular sulfate reabsorption (SLC13A1, SLC26A1) exhibited graded effects on plasma sulfate and human height and pinpointed alleles associated with increased odds of diverse musculoskeletal traits and diseases in the population. This integrative approach can identify new players in incompletely characterized human metabolic reactions and reveal metabolic readouts informative of human traits and diseases.

Dynamic whole-body models for infant metabolism
Comprehensive, sex-specific whole-body models (WBMs) accounting for organ-specific metabolism have been developed to allow for the simulation of adult and infant metabolism. These WBMs are evaluated daily, giving insights into metabolic flux changes that occur in one day of an infant’s or adult’s life. However, for medical applications, such as in metabolic diseases and their treatment, an evaluation and concentration predictions on a shorter time scale would be beneficial. Therefore, we developed a dynamic infant-WBM that couples metabolite dynamics in short time frames through physiology-based pharma-cokinetic models with the existing infant whole-body models. We then tailored the dynamic infant-WBM enabling the prediction of isovalerylcarnitine (C5), a clinical biomarker used for the inherited metabolic disease isovaleric aciduria (IVA). Our results show that, as expected, the predicted C5 concentrations exceeded the newborn screening thresholds during the time (36 - 72 hours) newborn screening blood samples are taken in the IVA models but not in models simulating healthy infants. We also demonstrate how the dynamic infant-WBMs can be used to test the effect changes in dietary intake have on the biomarker. Since the dynamic infant-WBMs were parametrised with literature-derived experimental or estimated values, we show how uncertainty quantification can be applied to quantify the parameter uncertainties. We found that the fractional unbound plasma needed to be estimated correctly, as this parameter strongly impacted C5 concentration predictions of the dynamic infant-WBMs. Overall, the dynamic infant-WBMs hold promise for personalised medicine, as it enables personalised biomarker concentration predictions of healthy and diseased infant metabolism in various time intervals.
Personalised metabolic whole-body models for newborns and infants predict growth and biomarkers of inherited metabolic diseases
Extensive whole-body models (WBMs) accounting for organ-specific dynamics have been developed to simulate adult metabolism. However, there is currently a lack of models representing infant metabolism taking into consideration its special requirements in energy balance, nutrition, and growth. Here, we present a resource of organ-resolved, sex-specific, anatomically accurate models of newborn and infant metabolism, referred to as infant-whole-body models (infant-WBMs), spanning the first 180 days of life. These infant-WBMs were parameterised to represent the distinct metabolic characteristics of newborns and infants accurately. In particular, we adjusted the changes in organ weights, the energy requirements of brain development, heart function, and thermoregulation, as well as dietary requirements and energy requirements for physical activity. Subsequently, we validated the accuracy of the infant-WBMs by showing that the predicted neonatal and infant growth was consistent with the recommended growth by the World Health Organisation. We assessed the infant-WBMs’ reliability and capabilities for personalisation by simulating 10,000 newborn models, personalised with blood concentration measurements from newborn screening and birth weight. Moreover, we demonstrate that the models can accurately predict changes over time in known blood biomarkers in inherited metabolic diseases. By this, the infant-WBM resource can provide valuable insights into infant metabolism on an organ-resolved level and enable a holistic view of the metabolic processes occurring in infants, considering the unique energy and dietary requirements as well as growth patterns specific to this population. As such, the infant-WBM resource holds promise for personalised medicine, as the infant-WBMs could be a first step to digital metabolic twins for newborn and infant metabolism for personalised systematic simulations and treatment planning.

EnzChemRED, a rich enzyme chemistry relation extraction dataset.
Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts where enzymes and the chemical reactions they catalyze are annotated using identifiers from the protein knowledgebase UniProtKB and the chemical ontology ChEBI. We show that fine-tuning language models with EnzChemRED significantly boosts their ability to identify proteins and chemicals in text (86.30% F1 score) and to extract the chemical conversions (86.66% F1 score) and the enzymes that catalyze those conversions (83.79% F1 score). We apply our methods to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea.
Integration of proteomic data with genome-scale metabolic models: A methodological overview
The integration of proteomics data with constraint-based reconstruction and analysis (COBRA) models plays a pivotal role in understanding the relationship between genotype and phenotype and bridges the gap between genome-level phenomena and functional adaptations. Integrating a generic genome-scale model with information on proteins enables generation of a context-specific metabolic model which improves the accuracy of model prediction. This review explores methodologies for incorporating proteomics data into genome-scale models. Available methods are grouped into four distinct categories based on their approach to integrate proteomics data and their depth of modeling. Within each category section various methods are introduced in chronological order of publication demonstrating the progress of this field. Furthermore, challenges and potential solutions to further progress are outlined, including the limited availability of appropriate in vitro data, experimental enzyme turnover rates, and the trade-off between model accuracy, computational tractability, and data scarcity. In conclusion, methods employing simpler approaches demand fewer kinetic and omics data, consequently leading to a less complex mathematical problem and reduced computational expenses. On the other hand, approaches that delve deeper into cellular mechanisms and aim to create detailed mathematical models necessitate more extensive kinetic and omics data, resulting in a more complex and computationally demanding problem. However, in some cases, this increased cost can be justified by the potential for more precise predictions.

fluxTrAM: Integration of tracer-based metabolomics data into atomically resolved genome-scale metabolic networks for metabolic flux analysis
Quantitative inference of intracellular reaction rates is essential for characterising metabolic phenotypes. The classical experimental method for measuring metabolic fluxes makes use of stable-isotope tracing of metabolites through the metabolic network, followed by mass spectrometry analysis. The most common 13C-based metabolic flux analysis requires multidisciplinary knowledge in analytical chemistry, cell biology, and mathematical modelling, as well as the use of multiple independent tools for handling mass spectrometry data. Besides, flux analysis is usually carried out within a small network to validate a specific biological hypothesis. To overcome interdisciplinary barriers and extend flux interpretation towards a genome-scale level, we developed fluxTrAM, a semi-automated pipeline for processing tracer- based metabolomics data and integrating it with atomically resolved genome-scale metabolic networks to enable flux predictions at genome-scale. fluxTrAM integrates different software packages inside and outside of the COBRA Toolbox v3.4 for the generation of metabolite structure and reaction databases for a genome-scale model, labelled mass spectrometry data processing into standardised mass isotopologue distribution data (MID), and metabolic flux analysis. To demonstrate the utility of this pipeline, we generated 13C-labeled metabolomics data on an in vitro human induced pluripotent stem cell (iPSC)-derived dopaminergic neuronal culture and processed 13C-labeled MID datasets. In parallel, we generated a cheminformatic database of standardised and context-specific metabolite structures, and atom-mapped reactions for a genome-scale dopaminergic neuronal metabolic model. MID data could be exported into established flux inference software for conventional flux inference on a core model scale. It could also be integrated into the atomically resolved metabolic model for flux inference at genome-scale using moiety fluxomics method. The core model flux solution and moiety flux solution were then compared to two additional flux solutions predicted via flux balance analysis and entropic flux balance analysis. The extensive computational flux analysis and comparison helped to better evaluate the obtained flux feasibility of the neuron-specific genome-scale model and suggested new tracer-based metabolomics experiments with novel labeling configurations, such as labelling a moiety within the thymidine metabolite. Overall, fluxTrAM enables the automation of labelled liquid chromatography (LC)-mass spectrometry (MS) data processing into MID datasets and atom mapping for any given genome-scale metabolic model. It contributes to the standardisation and high throughput of metabolic flux analysis at genome- scale.

Integrative omics approaches to advance rare disease diagnostics
Over the past decade high-throughput DNA sequencing approaches, namely whole exome and whole genome sequencing became a standard procedure in Mendelian disease diagnostics. Implementation of these technologies greatly facilitated diagnostics and shifted the analysis paradigm from variant identification to prioritisation and evaluation. The diagnostic rates vary widely depending on the cohort size, heterogeneity and disease and range from around 30% to 50% leaving the majority of patients undiagnosed. Advances in omics technologies and computational analysis provide an opportunity to increase these unfavourable rates by providing evidence for disease-causing variant validation and prioritisation. This review aims to provide an overview of the current application of several omics technologies including RNA-sequencing, proteomics, metabolomics and DNA-methylation profiling for diagnostics of rare genetic diseases in general and inborn errors of metabolism in particular.
Personalized whole‐body models integrate metabolism, physiology, and the gut microbiome
Comprehensive molecular‐level models of human metabolism have been generated on a cellular level. However, models of whole‐body metabolism have not been established as they require new methodological approaches to integrate molecular and physiological data. We developed a new metabolic network reconstruction approach that used organ‐specific information from literature and omics data to generate two sex‐specific whole‐body metabolic (WBM) reconstructions. These reconstructions capture the metabolism of 26 organs and six blood cell types. Each WBM reconstruction represents whole‐body organ‐resolved metabolism with over 80,000 biochemical reactions in an anatomically and physiologically consistent manner. We parameterized the WBM reconstructions with physiological, dietary, and metabolomic data. The resulting WBM models could recapitulate known inter‐organ metabolic cycles and energy use. We also illustrate that the WBM models can predict known biomarkers of inherited metabolic diseases in different biofluids. Predictions of basal metabolic rates, by WBM models personalized with physiological data, outperformed current phenomenological models. Finally, integrating microbiome data allowed the exploration of host–microbiome co‐metabolism. Overall, the WBM reconstructions, and their derived computational models, represent an important step toward virtual physiological humans.

Integration of proteomics with genomics and transcriptomics increases the diagnostic rate of Mendelian disorders
By lack of functional evidence, genome-based diagnostic rates cap at approximately 50% across diverse Mendelian diseases. Here we demonstrate the effectiveness of combining genomics, transcriptomics, and, for the first time, proteomics and phenotypic descriptors, in a systematic diagnostic approach to discover the genetic cause of mitochondrial diseases. On fibroblast cell lines from 145 individuals, tandem mass tag labelled proteomics detected approximately 8,000 proteins per sample and covered over 50% of all Mendelian disease-associated genes. By providing independent functional evidence, aberrant protein expression analysis allowed validation of candidate protein-destabilising variants and of variants leading to aberrant RNA expression. Overall, our integrative computational workflow led to genetic resolution for 21% of 121 genetically unsolved cases and to the discovery of two novel disease genes. With increasing democratization of high-throughput omics assays, our approach and code provide a blueprint for implementing multi-omics based Mendelian disease diagnostics in routine clinical practice.
Personalized metabolic whole-body models for newborns and infants predict growth and biomarkers of inherited metabolic diseases
Comprehensive whole-body models (WBMs) accounting for organ-specific dynamics have been developed to simulate adult metabolism, but such models do not exist for infants. Here, we present a resource of 360 organ-resolved, sex-specific models of newborn and infant metabolism (infant-WBMs) spanning the first 180 days of life. These infant-WBMs were parameterized to represent the distinct metabolic characteristics of newborns and infants, including nutrition, energy requirements, and thermoregulation. We demonstrate that the predicted infant growth was consistent with the recommendation by the World Health Organization. We assessed the infant-WBMs’ reliability and capabilities for personalization by simulating 10,000 newborns based on their blood metabolome and birth weight. Furthermore, the infant-WBMs accurately predicted changes in known biomarkers over time and metabolic responses to treatment strategies for inherited metabolic diseases. The infant-WBM resource holds promise for personalized medicine, as the infant-WBMs could be a first step to digital metabolic twins for newborn and infant metabolism.