Split-word Architecture in Recurrent Neural Networks POS-Tagging

ENEA-IRIS Open Archive è l’archivio della produzione scientifica dell'ENEA, realizzato con l'obiettivo di raccogliere, catalogare e rendere facilmente accessibili in rete i risultati della ricerca. Gli autori dell’ENEA provvedono a depositare le proprie pubblicazioni (articoli su rivista, presentazioni a congressi, report, ecc.). In particolare, quelle finanziate dalla Commissione Europea nell’ambito del programma H2020 (che prevede il deposito obbligatorio in un Repository), una volta caricate, vengono automaticamente importate dal portale europeo OpenAIRE. È possibile inserire, o importare direttamente dalle banche dati previste, le informazioni descrittive del documento e anche allegare, ove consentito dalla normativa sul diritto d'autore, il testo completo della pubblicazione.

ENEA-IRIS Open Archive utilizza la piattaforma IRIS (Institutional Research Information System) sviluppata da CINECA.

We analyze Recurrent Neural Network (RNN) architectures to handle the problem of Part-of-Speech (POS) Tagging. When linguistic rules are inserted ad-hoc into the decision algorithm, there is a difficulty in understanding the role of prior information and learning. The real potential of recurrent networks is demonstrated in this paper on the Italian language in a purely data-driven approach, where we can reach the state-of-the-art on the UD_Italian-ISTD (Italian Stanford Dependency Treebank) dataset in comparison to TINT. We propose a methodology for splitting words that are mapped to embedding spaces and fed to forward-backward networks.

Split-word Architecture in Recurrent Neural Networks POS-Tagging

Di Gennaro, Giovanni;Ospedale, Armando;Di Girolamo, Antonio;Buonanno, Amedeo;Palmieri, Francesco A. N.;Fedele, Gianfranco

2022-01-01

Abstract

We analyze Recurrent Neural Network (RNN) architectures to handle the problem of Part-of-Speech (POS) Tagging. When linguistic rules are inserted ad-hoc into the decision algorithm, there is a difficulty in understanding the role of prior information and learning. The real potential of recurrent networks is demonstrated in this paper on the Italian language in a purely data-driven approach, where we can reach the state-of-the-art on the UD_Italian-ISTD (Italian Stanford Dependency Treebank) dataset in comparison to TINT. We propose a methodology for splitting words that are mapped to embedding spaces and fed to forward-backward networks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Codice ISBN
	
				978-1-7281-8671-9
			
	Parole chiave
	
				Machine learning
Natural language processing
POS-tagger
RNN
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12079/73470

Citazioni

ND

1

social impact