AbstractMissing and inconsistent nutrient values in food-composition databases hinder comparative nutrition research. We present NutriMatch, a scalable harmonization method that embeds food descriptions with a large-language model, aligns nutritionally equivalent items, and imputes missing nutrients to enrich FCDB coverage. Applied to the Israeli Human Phenotype Project, NutriMatch expanded logged nutrient profiles from 21 to 151 nutrients; in the Australian PREDICT cohort, coverage rose from 43 to the same 151 nutrients using a validated external FCDB. Enriched nutrient sets improved 2-year obesity prediction (AUC 0.63 → 0.67) and increased median R² for body-fat percentage, waist circumference, continuous glucose monitoring, and several blood biomarkers. Models trained in HPP transferred to PREDICT without retraining, achieving Pearson r = 0.32 for visceral fat mass versus 0.24 with baseline nutrients. NutriMatch thus delivers rapid, reproducible harmonization of FCDBs, phenotype-informative nutrient enrichment, and enables robust cross-cohort studies. IntroductionThe study of diet and its relationship with health outcomes is fundamental in epidemiological research. Evidence consistently shows that adults adhering to a healthy diet enjoy longer lifespans and reduced risks of chronic diseases1,2,3. Traditionally, dietary data have been captured using methods like Food Frequency Questionnaires or 24-h dietary recalls. However, these approaches are prone to underreporting, as they rely on memory and self-estimation, which can lead to inaccuracies in portion size assessment and recall bias4.In recent years, dietary assessment has increasingly turned to technology, particularly diet-tracking mobile apps, which have gained traction in both academic and clinical settings5,6. These tools allow for real-time diet logging and have demonstrated promising results in improving health outcomes7. Quality-focused diet logging has been linked to cardiovascular benefits and the reliable collection of dietary data for analysis against energy expenditures5,7.These methods rely on connecting a person’s food intake to nutritional information. Food composition databases (FCDBs) are a common tool leveraged for this purpose, providing detailed nutrient profiles8,9. These databases provide standardized nutrient values for thousands of food items, enabling researchers to analyze dietary patterns with greater precision. Intake-logging platforms such as ASA24 automate the step from self-reported 24-h recalls to nutrient totals by mapping each food entry onto several reference FCDBs, streamlining large-scale dietary assessment for researchers10. Similarly, FoodRepo integrates barcoded food logging with FCDBs for large-scale digital nutrition studies11.However, a major hurdle in this integration is the incompleteness of FCDBs12. Many of these databases lack comprehensive nutrient data, requiring researchers Read More
NutriMatch: harmonizing food composition databases with large language models for enhanced nutritional prediction
- by stefan