Generative AI (GenAI) and Natural Language Processing (NLP) have advanced significantly in recent years, exhibiting breakthroughs and pushing the bar of accuracy rates in text mining. Cascade effects have been observed in many application domains, spanning text analysis, question answering, classification, and new textual content generation. The latter has allowed many end-users to perceive AI as ready-to-go solutions to optimise their daily workflow. However, dark and bright sides lurk behind textual content generation, as trustworthy and unverified content can be effortlessly generated. That has fuelled a significant challenge in our society: fake news. Although fake news has existed for a while, it remains an unsolved issue. Generative AI has brought it to a new level by enabling the automated production of large volumes of high-quality, individually targeted fake content. Our work is part of the HeReFaNMi (Health-Related Fake News Mitigation) project, which focuses on health-related fake news mitigation by using NLP, Language Models, and a Retrieval-Augmented Generation (RAG) system. We propose a new chunking mechanism that streamlines the overall RAG framework pipeline. BERT and BERT+RAG have been compared on the health-related fake news classification task on a dataset of 2000 health-related articles equally split into two categories (’fake’ and ’credible’). Preliminary experimental results reveal improvements in Accuracy, Recall, and F1-score.
Health Misinformation Detection: {A} Chunking Strategy Integratedto Retrieval-Augmented Generation (short paper), 2024.
Health Misinformation Detection: {A} Chunking Strategy Integrated to Retrieval-Augmented Generation (short paper)
Alessandro Bruno;
2024-01-01
Abstract
Generative AI (GenAI) and Natural Language Processing (NLP) have advanced significantly in recent years, exhibiting breakthroughs and pushing the bar of accuracy rates in text mining. Cascade effects have been observed in many application domains, spanning text analysis, question answering, classification, and new textual content generation. The latter has allowed many end-users to perceive AI as ready-to-go solutions to optimise their daily workflow. However, dark and bright sides lurk behind textual content generation, as trustworthy and unverified content can be effortlessly generated. That has fuelled a significant challenge in our society: fake news. Although fake news has existed for a while, it remains an unsolved issue. Generative AI has brought it to a new level by enabling the automated production of large volumes of high-quality, individually targeted fake content. Our work is part of the HeReFaNMi (Health-Related Fake News Mitigation) project, which focuses on health-related fake news mitigation by using NLP, Language Models, and a Retrieval-Augmented Generation (RAG) system. We propose a new chunking mechanism that streamlines the overall RAG framework pipeline. BERT and BERT+RAG have been compared on the health-related fake news classification task on a dataset of 2000 health-related articles equally split into two categories (’fake’ and ’credible’). Preliminary experimental results reveal improvements in Accuracy, Recall, and F1-score.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



