Predicting phenotypes from genomic data can significantly advance agriculture. Genomic selection, which uses genome-wide DNA markers to identify individuals with high genetic value, enhances the accuracy of breeding programs. While linear models are routinely used for genomic selection (GS), machine learning (ML) models offer complementary potential. In this study, robust ML-based models were developed to predict five phenotypic traits—three related to flowering time and two to leaf number—in Arabidopsis thaliana, a model plant with a fully sequenced genome. Using explainable artificial intelligence (XAI), specifically SHapley Additive exPlanations (SHAP) values, we identified SNPs that contributed most to trait prediction. Many of these SNPs were located in or near genes known to regulate flowering and stem elongation, such as DOG1 and VIN3, supporting the biological plausibility of the model. SHAP also enabled local interpretability at the single-plant level, revealing the genotypic basis of individual predictions. Our results indicate that integrating ML with XAI improves model interpretability and provides predictive performance comparable to traditional methods. This approach confirms known genotype–phenotype relationships and highlights new candidate loci, paving the way for functional validation. The proposed methodology offers promising applications in precision breeding and translation of insights from Arabidopsis to crop species.

Leveraging Explainable Artificial Intelligence for Genotype-to-Phenotype Prediction: A Case Study in Arabidopsis thaliana

Zoani C.;
2025-01-01

Abstract

Predicting phenotypes from genomic data can significantly advance agriculture. Genomic selection, which uses genome-wide DNA markers to identify individuals with high genetic value, enhances the accuracy of breeding programs. While linear models are routinely used for genomic selection (GS), machine learning (ML) models offer complementary potential. In this study, robust ML-based models were developed to predict five phenotypic traits—three related to flowering time and two to leaf number—in Arabidopsis thaliana, a model plant with a fully sequenced genome. Using explainable artificial intelligence (XAI), specifically SHapley Additive exPlanations (SHAP) values, we identified SNPs that contributed most to trait prediction. Many of these SNPs were located in or near genes known to regulate flowering and stem elongation, such as DOG1 and VIN3, supporting the biological plausibility of the model. SHAP also enabled local interpretability at the single-plant level, revealing the genotypic basis of individual predictions. Our results indicate that integrating ML with XAI improves model interpretability and provides predictive performance comparable to traditional methods. This approach confirms known genotype–phenotype relationships and highlights new candidate loci, paving the way for functional validation. The proposed methodology offers promising applications in precision breeding and translation of insights from Arabidopsis to crop species.
2025
Arabidopsis thaliana
explainable artificial intelligence
genotype-to-phenotype prediction
machine learning
regression analysis
SHAP
File in questo prodotto:
File Dimensione Formato  
Leveraging Explainable.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 9.54 MB
Formato Adobe PDF
9.54 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12079/86268
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
social impact