Communication concerning the CSR pillars is key to sustainable corporate development. Sentiment analysis (SA) is a sub-area of natural language processing for studying communication through the classification of negative or positive opinions. Measuring sentiment is characterized by pitfalls related to: a) the context, where the polarity classification depends on the domain; b) the methods, if lexicon-based, machine learning, or their combination; c) the language, where the lack of resources (different from English) in literature was observed. Strategic communication based on CSR has no domain resources for investigating sentiment, neither in English nor in other languages. Our contribution is placed within the methodological setting of SA for the sustainability framework. We combined lexicon-based methods with machine-learning ones to build a customized lexicon for analyzing the CSR. The innovation concerns: 1) a domain corpus-based approach for improving a general pre-constructed dictionary; 2) the application for Italian; and 3) the performance assessment through machine learning. We developed an algorithm characterized by a multi-stage model that combines text analysis with network analysis and captures semantic concordances through an index of keyword content in the text. To validate our model from a machine learning perspective, we divided our data collection into five random samples: one sample was utilized as a train set or baseline for the lexicon’s implementation, and four were used as test sets. The study showed a notable increase in performance metrics across all samples, demonstrating the effectiveness of our proposal in building a customized lexicon for analyzing CSR in the Italian context.
CSR & Sentiment Analysis: a new customized dictionary, 2023.
CSR & Sentiment Analysis: a new customized dictionary
Emma, Zavarrone;Alessia, Forciniti
2023-01-01
Abstract
Communication concerning the CSR pillars is key to sustainable corporate development. Sentiment analysis (SA) is a sub-area of natural language processing for studying communication through the classification of negative or positive opinions. Measuring sentiment is characterized by pitfalls related to: a) the context, where the polarity classification depends on the domain; b) the methods, if lexicon-based, machine learning, or their combination; c) the language, where the lack of resources (different from English) in literature was observed. Strategic communication based on CSR has no domain resources for investigating sentiment, neither in English nor in other languages. Our contribution is placed within the methodological setting of SA for the sustainability framework. We combined lexicon-based methods with machine-learning ones to build a customized lexicon for analyzing the CSR. The innovation concerns: 1) a domain corpus-based approach for improving a general pre-constructed dictionary; 2) the application for Italian; and 3) the performance assessment through machine learning. We developed an algorithm characterized by a multi-stage model that combines text analysis with network analysis and captures semantic concordances through an index of keyword content in the text. To validate our model from a machine learning perspective, we divided our data collection into five random samples: one sample was utilized as a train set or baseline for the lexicon’s implementation, and four were used as test sets. The study showed a notable increase in performance metrics across all samples, demonstrating the effectiveness of our proposal in building a customized lexicon for analyzing CSR in the Italian context.File | Dimensione | Formato | |
---|---|---|---|
DeLTA_2023_54_CR.pdf
Non accessibile
Dimensione
755.29 kB
Formato
Adobe PDF
|
755.29 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.