Nowadays the estimation of power production yield by stand-alone and grid-connected Photovoltaic (PV) plants is crucial for technical and economic feasibility design analyses. The main goal is to overcome renewables unpredictability by properly estimating the power production and by suitably balancing generation and consumption. In this context, many methods can be applied to forecast renewables energy production. The scope of this paper is a comparative analysis of three different methods to estimate the power production of a preexisting PV plant. It is installed at ENEA Research Centre located in Portici (South Italy) and it is integrated in a Micro Grid (MG) configuration. In detail a phenomenological model proposed by Sandia National Laboratories and two statistical learning models, a Multi-Layer Perceptron (MLP) Neural Network and a Regression approach, are compared. These models are deeply different also in terms of required input data and parameters. In detail, phenomenological model application requires the availability of design parameters and technical devices specifications. Statistical machine learning models need, however, input variable previously acquired datasets. The a-Si/μc-Si PV plant, installed at Portici, represents an adequate case study for the three models comparison, as both design and acquired data are available. In fact, the plant was designed at the ENEA Research Centre so this makes possible the knowledge of the design parameters and, being a part of the MG, its data are continuously acquired and transmitted to other network devices. Obtained results demonstrate more accurate power predictions can be reached by statistical machine learning approaches. The main novelty of the paper consists in the optimization of the considered models by the appropriate identification of the minimum and more representative training dataset. Authors underline the unnecessary use of thousands samples by suitably selecting the dataset size and samples by means of a Genetic Algorithm. The optimization strategy effectiveness is verified comparing the prediction performances obtained employing the optimal dataset with those obtained with a randomly chosen dataset. In this scenario, Genetic Algorithm strategy represents a successful approach to the suitable identification of statistical models datasets. © 2016 Elsevier Ltd.