Leveraging Explainable Artificial Intelligence for Genotype-to-Phenotype Prediction: A Case Study in Arabidopsis thaliana

ENEA-IRIS Open Archive è l’archivio della produzione scientifica dell'ENEA, realizzato con l'obiettivo di raccogliere, catalogare e rendere facilmente accessibili in rete i risultati della ricerca. Gli autori dell’ENEA provvedono a depositare le proprie pubblicazioni (articoli su rivista, presentazioni a congressi, report, ecc.). In particolare, quelle finanziate dalla Commissione Europea nell’ambito del programma H2020 (che prevede il deposito obbligatorio in un Repository), una volta caricate, vengono automaticamente importate dal portale europeo OpenAIRE. È possibile inserire, o importare direttamente dalle banche dati previste, le informazioni descrittive del documento e anche allegare, ove consentito dalla normativa sul diritto d'autore, il testo completo della pubblicazione.

ENEA-IRIS Open Archive utilizza la piattaforma IRIS (Institutional Research Information System) sviluppata da CINECA.

Predicting phenotypes from genomic data can significantly advance agriculture. Genomic selection, which uses genome-wide DNA markers to identify individuals with high genetic value, enhances the accuracy of breeding programs. While linear models are routinely used for genomic selection (GS), machine learning (ML) models offer complementary potential. In this study, robust ML-based models were developed to predict five phenotypic traits—three related to flowering time and two to leaf number—in Arabidopsis thaliana, a model plant with a fully sequenced genome. Using explainable artificial intelligence (XAI), specifically SHapley Additive exPlanations (SHAP) values, we identified SNPs that contributed most to trait prediction. Many of these SNPs were located in or near genes known to regulate flowering and stem elongation, such as DOG1 and VIN3, supporting the biological plausibility of the model. SHAP also enabled local interpretability at the single-plant level, revealing the genotypic basis of individual predictions. Our results indicate that integrating ML with XAI improves model interpretability and provides predictive performance comparable to traditional methods. This approach confirms known genotype–phenotype relationships and highlights new candidate loci, paving the way for functional validation. The proposed methodology offers promising applications in precision breeding and translation of insights from Arabidopsis to crop species.

Leveraging Explainable Artificial Intelligence for Genotype-to-Phenotype Prediction: A Case Study in Arabidopsis thaliana

Novielli P.;Nazzicari N.;Pavan S.;Delvento C.;Diacono D.;Zoani C.;Bellotti R.;Tangaro S.

2025-01-01

Abstract

Predicting phenotypes from genomic data can significantly advance agriculture. Genomic selection, which uses genome-wide DNA markers to identify individuals with high genetic value, enhances the accuracy of breeding programs. While linear models are routinely used for genomic selection (GS), machine learning (ML) models offer complementary potential. In this study, robust ML-based models were developed to predict five phenotypic traits—three related to flowering time and two to leaf number—in Arabidopsis thaliana, a model plant with a fully sequenced genome. Using explainable artificial intelligence (XAI), specifically SHapley Additive exPlanations (SHAP) values, we identified SNPs that contributed most to trait prediction. Many of these SNPs were located in or near genes known to regulate flowering and stem elongation, such as DOG1 and VIN3, supporting the biological plausibility of the model. SHAP also enabled local interpretability at the single-plant level, revealing the genotypic basis of individual predictions. Our results indicate that integrating ML with XAI improves model interpretability and provides predictive performance comparable to traditional methods. This approach confirms known genotype–phenotype relationships and highlights new candidate loci, paving the way for functional validation. The proposed methodology offers promising applications in precision breeding and translation of insights from Arabidopsis to crop species.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole chiave
	
				Arabidopsis thaliana
explainable artificial intelligence
genotype-to-phenotype prediction
machine learning
regression analysis
SHAP
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Leveraging Explainable.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 9.54 MB Formato Adobe PDF Visualizza/Apri	9.54 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12079/86268

Citazioni

ND

2

social impact