Aligned with the broad field of Affective Computing research, this work examines the capacity of Multimodal Large Language Models to elicit attentional patterns and cognitive states in viewers as they engage with visuals related to a specific domain: corporate sustainability reporting. Sustainability reports are documents officially released by companies and organisations to communicate actions to achieve sustainability goals. They are often lengthy documents, which pose challenges in capturing, orienting and retaining stakeholder attention. In this context, visual elements are commonly used not only to complement textual content but also to act as entry points that guide the reader’s initial focus and/or to convey information. This study examines the eye movements of 41 observers during a webcam-based eye-tracking session as they view pictures from sustainability reports and their corresponding AI-generated versions with zero-shot prompting. Both visuals (original and AI-generated) are presented in the form of A/B tests. First, images are sourced from publicly available sustainability reports and captioned using Contrastive Language–Image Pretraining, a Vision-Language Model trained on image-text pairs, integrated within GPT-4o. Captions serve as zero-shot prompts provided to DALL-E3 for generating images from text. Perceptual features recorded include Time to First Fixation, Time to First Gaze, average gaze duration, number of fixations, total gaze count, and K-coefficient by looking into two time ranges: [0–3] and [0–10] seconds. Last, Wilcoxon signed-rank and paired t-tests are used to assess the statistical significance of similarity and divergence in attention and cognitive dynamics between the two conditions.

Original Versus Zero-Shot Prompted AI Visuals: Examining Human Feedback to Corporate Sustainability Reporting, 2025.

Original Versus Zero-Shot Prompted AI Visuals: Examining Human Feedback to Corporate Sustainability Reporting

Alessandro Bruno
;
Chintan Bhatt;Federica Ricceri
2025-01-01

Abstract

Aligned with the broad field of Affective Computing research, this work examines the capacity of Multimodal Large Language Models to elicit attentional patterns and cognitive states in viewers as they engage with visuals related to a specific domain: corporate sustainability reporting. Sustainability reports are documents officially released by companies and organisations to communicate actions to achieve sustainability goals. They are often lengthy documents, which pose challenges in capturing, orienting and retaining stakeholder attention. In this context, visual elements are commonly used not only to complement textual content but also to act as entry points that guide the reader’s initial focus and/or to convey information. This study examines the eye movements of 41 observers during a webcam-based eye-tracking session as they view pictures from sustainability reports and their corresponding AI-generated versions with zero-shot prompting. Both visuals (original and AI-generated) are presented in the form of A/B tests. First, images are sourced from publicly available sustainability reports and captioned using Contrastive Language–Image Pretraining, a Vision-Language Model trained on image-text pairs, integrated within GPT-4o. Captions serve as zero-shot prompts provided to DALL-E3 for generating images from text. Perceptual features recorded include Time to First Fixation, Time to First Gaze, average gaze duration, number of fixations, total gaze count, and K-coefficient by looking into two time ranges: [0–3] and [0–10] seconds. Last, Wilcoxon signed-rank and paired t-tests are used to assess the statistical significance of similarity and divergence in attention and cognitive dynamics between the two conditions.
Inglese
2025
2025
4th Automatic Affect Analysis and Synthesis Workshop 15 September 2025. In conjunction with the 23rd Int. Conf. on Image Analysis and Processing (ICIAP)
Roma
2025
internazionale
contributo
ICIAP 2025, Part I, LNCS 16169 proceedings
Switzerland
Springer Link
esperti anonimi
Online
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
3
File in questo prodotto:
File Dimensione Formato  
638824_1_En_1_Chapter_Author (1).pdf

Accessibile solo dalla rete interna IULM

Tipologia: Documento in Pre-print
Dimensione 2.29 MB
Formato Adobe PDF
2.29 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10808/70028
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact