Original Versus Zero-Shot Prompted AI Visuals: Examining Human Feedback to Corporate Sustainability Reporting

IRIS

Aligned with the broad field of Aﬀective Computing research, this work examines the capacity of Multimodal Large Language Models to elicit attentional patterns and cognitive states in viewers as they engage with visuals related to a specific domain: corporate sustainability reporting. Sustainability reports are documents oﬃcially released by companies and organisations to communicate actions to achieve sustainability goals. They are often lengthy documents, which pose challenges in capturing, orienting and retaining stakeholder attention. In this context, visual elements are commonly used not only to complement textual content but also to act as entry points that guide the reader’s initial focus and/or to convey information. This study examines the eye movements of 41 observers during a webcam-based eye-tracking session as they view pictures from sustainability reports and their corresponding AI-generated versions with zero-shot prompting. Both visuals (original and AI-generated) are presented in the form of A/B tests. First, images are sourced from publicly available sustainability reports and captioned using Contrastive Language–Image Pretraining, a Vision-Language Model trained on image-text pairs, integrated within GPT-4o. Captions serve as zero-shot prompts provided to DALL-E3 for generating images from text. Perceptual features recorded include Time to First Fixation, Time to First Gaze, average gaze duration, number of fixations, total gaze count, and K-coeﬃcient by looking into two time ranges: [0–3] and [0–10] seconds. Last, Wilcoxon signed-rank and paired t-tests are used to assess the statistical significance of similarity and divergence in attention and cognitive dynamics between the two conditions.

Original Versus Zero-Shot Prompted AI Visuals: Examining Human Feedback to Corporate Sustainability Reporting, 2025.

Original Versus Zero-Shot Prompted AI Visuals: Examining Human Feedback to Corporate Sustainability Reporting

Alessandro Bruno;Chintan Bhatt;Federica Ricceri

2025-01-01

Abstract

Aligned with the broad field of Aﬀective Computing research, this work examines the capacity of Multimodal Large Language Models to elicit attentional patterns and cognitive states in viewers as they engage with visuals related to a specific domain: corporate sustainability reporting. Sustainability reports are documents oﬃcially released by companies and organisations to communicate actions to achieve sustainability goals. They are often lengthy documents, which pose challenges in capturing, orienting and retaining stakeholder attention. In this context, visual elements are commonly used not only to complement textual content but also to act as entry points that guide the reader’s initial focus and/or to convey information. This study examines the eye movements of 41 observers during a webcam-based eye-tracking session as they view pictures from sustainability reports and their corresponding AI-generated versions with zero-shot prompting. Both visuals (original and AI-generated) are presented in the form of A/B tests. First, images are sourced from publicly available sustainability reports and captioned using Contrastive Language–Image Pretraining, a Vision-Language Model trained on image-text pairs, integrated within GPT-4o. Captions serve as zero-shot prompts provided to DALL-E3 for generating images from text. Perceptual features recorded include Time to First Fixation, Time to First Gaze, average gaze duration, number of fixations, total gaze count, and K-coeﬃcient by looking into two time ranges: [0–3] and [0–10] seconds. Last, Wilcoxon signed-rank and paired t-tests are used to assess the statistical significance of similarity and divergence in attention and cognitive dynamics between the two conditions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Lingua/e
	
				Inglese
			
	Data di pubblicazione degli atti o dell'intervento
	
				2025
			
	Data di accettazione
	
				2025
			
	Nome del convegno
	
				4th Automatic Affect Analysis and Synthesis Workshop
15 September 2025. In conjunction with the 23rd Int. Conf. on Image Analysis and Processing (ICIAP)
			
	Luogo del convegno
	
				Roma
			
	Anno del convegno
	
				2025
			
	Rilevanza del convegno
	
				internazionale
			
	Relazione
	
				contributo
			
	Titolo degli Atti
	
				ICIAP 2025, Part I, LNCS 16169 proceedings
			
	Paese di pubblicazione
	
				Switzerland
			
	Editore
	
				Springer Link
			
	Referee
	
				esperti anonimi
			
	Formato
	
				Online
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024)
	
				Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
			
	Settori scientifico-disciplinari (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Numero autori
	
				3
			
	Appare nelle tipologie:
	
				4.01 Contributo in atti di convegno (pubblicato)

File in questo prodotto:

File	Dimensione	Formato
638824_1_En_1_Chapter_Author (1).pdf Accessibile solo dalla rete interna IULM Tipologia: Documento in Pre-print Dimensione 2.29 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.29 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10808/70028

Citazioni

ND

ND

ND

social impact