Foundation models are gaining considerable interest for their capacity of solving many downstream tasks without fine-tuning parameters on specific datasets. The same solutions can connect visual and linguistic representations through image-text contrastive learning. These abilities allow an artificial agent to act similarly to a human, but significant cognitive processes still need to be introduced in the learning process. The present study proposes an advancement to more human-like artificial intelligence by introducing CognitiveNet, a learnable architecture integrating foundation models. Starting from the latest studies in the field of Artificial Consciousness, a hierarchy of cognitive layers has been modeled and pre-trained for estimating the emotional content of images. By employing CLIP as the backbone model, significant concordant emotional activity was produced. Furthermore, the proposed model overcomes the accuracy of CLIP in classifying CIFAR-10 and -100 datasets through supervised optimization, suggesting CognitiveNet as a promising solution for solving classification tasks through online meta-learning.
CognitiveNet: Enriching Foundation Models with Emotions and Awareness
Chinnici M.;
2023-01-01
Abstract
Foundation models are gaining considerable interest for their capacity of solving many downstream tasks without fine-tuning parameters on specific datasets. The same solutions can connect visual and linguistic representations through image-text contrastive learning. These abilities allow an artificial agent to act similarly to a human, but significant cognitive processes still need to be introduced in the learning process. The present study proposes an advancement to more human-like artificial intelligence by introducing CognitiveNet, a learnable architecture integrating foundation models. Starting from the latest studies in the field of Artificial Consciousness, a hierarchy of cognitive layers has been modeled and pre-trained for estimating the emotional content of images. By employing CLIP as the backbone model, significant concordant emotional activity was produced. Furthermore, the proposed model overcomes the accuracy of CLIP in classifying CIFAR-10 and -100 datasets through supervised optimization, suggesting CognitiveNet as a promising solution for solving classification tasks through online meta-learning.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.