Health Misinformation Detection: {A} Chunking Strategy Integrated
to Retrieval-Augmented Generation (short paper)

Taib, Walid; Saadallah, Idriss; Ichou, Abdelali; Hamdi, Abderrahmene; Bruno, Alessandro; Pier Luigi Mazzeo,; Chetouani, Aladine; Tliba, Marouane; Mohamed Amine Kerkouri,

Generative AI (GenAI) and Natural Language Processing (NLP) have advanced significantly in recent years, exhibiting breakthroughs and pushing the bar of accuracy rates in text mining. Cascade effects have been observed in many application domains, spanning text analysis, question answering, classification, and new textual content generation. The latter has allowed many end-users to perceive AI as ready-to-go solutions to optimise their daily workflow. However, dark and bright sides lurk behind textual content generation, as trustworthy and unverified content can be effortlessly generated. That has fuelled a significant challenge in our society: fake news. Although fake news has existed for a while, it remains an unsolved issue. Generative AI has brought it to a new level by enabling the automated production of large volumes of high-quality, individually targeted fake content. Our work is part of the HeReFaNMi (Health-Related Fake News Mitigation) project, which focuses on health-related fake news mitigation by using NLP, Language Models, and a Retrieval-Augmented Generation (RAG) system. We propose a new chunking mechanism that streamlines the overall RAG framework pipeline. BERT and BERT+RAG have been compared on the health-related fake news classification task on a dataset of 2000 health-related articles equally split into two categories (’fake’ and ’credible’). Preliminary experimental results reveal improvements in Accuracy, Recall, and F1-score.

Health Misinformation Detection: {A} Chunking Strategy Integratedto Retrieval-Augmented Generation (short paper), 2024.

Health Misinformation Detection: {A} Chunking Strategy Integrated to Retrieval-Augmented Generation (short paper)

Walid Taib;Idriss Saadallah;Abdelali Ichou;Abderrahmene Hamdi;Alessandro Bruno;Pier Luigi Mazzeo;Aladine Chetouani;Marouane Tliba;Mohamed Amine Kerkouri

2024-01-01

Abstract

Generative AI (GenAI) and Natural Language Processing (NLP) have advanced significantly in recent years, exhibiting breakthroughs and pushing the bar of accuracy rates in text mining. Cascade effects have been observed in many application domains, spanning text analysis, question answering, classification, and new textual content generation. The latter has allowed many end-users to perceive AI as ready-to-go solutions to optimise their daily workflow. However, dark and bright sides lurk behind textual content generation, as trustworthy and unverified content can be effortlessly generated. That has fuelled a significant challenge in our society: fake news. Although fake news has existed for a while, it remains an unsolved issue. Generative AI has brought it to a new level by enabling the automated production of large volumes of high-quality, individually targeted fake content. Our work is part of the HeReFaNMi (Health-Related Fake News Mitigation) project, which focuses on health-related fake news mitigation by using NLP, Language Models, and a Retrieval-Augmented Generation (RAG) system. We propose a new chunking mechanism that streamlines the overall RAG framework pipeline. BERT and BERT+RAG have been compared on the health-related fake news classification task on a dataset of 2000 health-related articles equally split into two categories (’fake’ and ’credible’). Preliminary experimental results reveal improvements in Accuracy, Recall, and F1-score.

Scheda breve

Scheda completa

Scheda completa (DC)

	Lingua/e
	
				Inglese
			
	Data di pubblicazione degli atti o dell'intervento
	
				2024
			
	URL
	
				https://ceur-ws.org/Vol-3923/Paper\_5.pdf
			
	Nome del convegno
	
				AIxPAC 2024
Artificial Intelligence for Perception and Artificial Consciousness 2024
			
	Luogo del convegno
	
				Bolzano
			
	Anno del convegno
	
				2024
			
	Rilevanza del convegno
	
				internazionale
			
	Relazione
	
				contributo
			
	Titolo degli Atti
	
				Proceedings of the 2nd Workshop on Artificial Intelligence for Perception
and Artificial Consciousness (AIxPAC 2024) co-located with the 22nd
International Conference of the Italian Association for Artificial
Intelligence (AIxIA 2024), Bolzano, Italy, November 28, 2024
			
	Curatori degli Atti
	
				Alessandro Bruno and
Arianna Pipitone and
Riccardo Manzotti and
Agnese Augello and
Pier Luigi Mazzeo and
Filippo Vella and
Giuseppe Mazzola
			
	Pagina iniziale del contributo
	
				41
			
	Pagina finale del contributo
	
				48
			
	Paese di pubblicazione
	
				Germany
			
	Editore
	
				CEUR-WS.org
			
	Referee
	
				esperti anonimi
			
	Formato
	
				Online
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024)
	
				Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
			
	Settori scientifico-disciplinari (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Numero autori
	
				9
			
	Appare nelle tipologie:
	
				4.01 Contributo in atti di convegno (pubblicato)

File in questo prodotto:

File	Dimensione	Formato
Paper_5.pdf Open Access Tipologia: Documento in Post-print Dimensione 224.98 kB Formato Adobe PDF Visualizza/Apri	224.98 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10808/69989

Citazioni

ND

ND

ND

Health Misinformation Detection: {A} Chunking Strategy Integrated to Retrieval-Augmented Generation (short paper)

Walid Taib;Idriss Saadallah;Abdelali Ichou;Abderrahmene Hamdi;Alessandro Bruno;Pier Luigi Mazzeo;Aladine Chetouani;Marouane Tliba;Mohamed Amine Kerkouri

2024-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)