Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)

ENEA-IRIS Open Archive è l’archivio della produzione scientifica dell'ENEA, realizzato con l'obiettivo di raccogliere, catalogare e rendere facilmente accessibili in rete i risultati della ricerca. Gli autori dell’ENEA provvedono a depositare le proprie pubblicazioni (articoli su rivista, presentazioni a congressi, report, ecc.). In particolare, quelle finanziate dalla Commissione Europea nell’ambito del programma H2020 (che prevede il deposito obbligatorio in un Repository), una volta caricate, vengono automaticamente importate dal portale europeo OpenAIRE. È possibile inserire, o importare direttamente dalle banche dati previste, le informazioni descrittive del documento e anche allegare, ove consentito dalla normativa sul diritto d'autore, il testo completo della pubblicazione.

ENEA-IRIS Open Archive utilizza la piattaforma IRIS (Institutional Research Information System) sviluppata da CINECA.

The need for artificial intelligence (AI) and machine learning (ML) models to optimize data center (DC) operations increases as the volume of operations management data upsurges tremendously. These strategies can assist operators in better understanding their DC operations and help them make informed decisions upfront to maintain service reliability and availability. The strategies include developing models that optimize energy efficiency, identifying inefficient resource utilization and scheduling policies, and predicting outages. In addition to model hyperparameter tuning, feature subset selection (FSS) is critical for identifying relevant features for effectively modeling DC operations to provide insight into the data, optimize model performance, and reduce computational expenses. Hence, this paper introduces the Shapley Additive exPlanation (SHAP) values method, a class of additive feature attribution values for identifying relevant features that is rarely discussed in the literature. We compared its effectiveness with several commonly used, importance-based feature selection methods. The methods were tested on real DC operations data streams obtained from the ENEA CRESCO6 cluster with 20,832 cores. To demonstrate the effectiveness of SHAP compared to other methods, we selected the top ten most important features from each method, retrained the predictive models, and evaluated their performance using the MAE, RMSE, and MPAE evaluation criteria. The results presented in this paper demonstrate that the predictive models trained using features selected with the SHAP-assisted method performed well, with a lower error and a reasonable execution time compared to other methods.

Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)

Gebreyesus Y.;Dalton D.;Nixon S.;De Chiara D.;Chinnici M.

2023-01-01

Abstract

The need for artificial intelligence (AI) and machine learning (ML) models to optimize data center (DC) operations increases as the volume of operations management data upsurges tremendously. These strategies can assist operators in better understanding their DC operations and help them make informed decisions upfront to maintain service reliability and availability. The strategies include developing models that optimize energy efficiency, identifying inefficient resource utilization and scheduling policies, and predicting outages. In addition to model hyperparameter tuning, feature subset selection (FSS) is critical for identifying relevant features for effectively modeling DC operations to provide insight into the data, optimize model performance, and reduce computational expenses. Hence, this paper introduces the Shapley Additive exPlanation (SHAP) values method, a class of additive feature attribution values for identifying relevant features that is rarely discussed in the literature. We compared its effectiveness with several commonly used, importance-based feature selection methods. The methods were tested on real DC operations data streams obtained from the ENEA CRESCO6 cluster with 20,832 cores. To demonstrate the effectiveness of SHAP compared to other methods, we selected the top ten most important features from each method, retrained the predictive models, and evaluated their performance using the MAE, RMSE, and MPAE evaluation criteria. The results presented in this paper demonstrate that the predictive models trained using features selected with the SHAP-assisted method performed well, with a lower error and a reasonable execution time compared to other methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Parole chiave
	
				artificial intelligence
data center
feature selection
game theory
machine learning
SHAP
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Machine Learning for Data Center Optimizations_ Feature Selection Using Shapley Additive exPlanation (SHAP).pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 2.57 MB Formato Adobe PDF Visualizza/Apri	2.57 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12079/75148

Citazioni

ND

123

social impact