The (theoretical) elephant in the room. Overlooked assumptions in computer vision analysis of art images

Balbi, Camilla; Calise, Anna

doi:10.4000/signata.4757

Contemporary computer vision software represents an incredible opportunity for both art history researchers and museum practitioners: it is a tool through which images can be described, organized, studied and shared. In this process—the one in which a computer vision software operates over a database of art history images—there are however a variety of dynamics at play. They have to do with theoretical assumptions, historical categories, technological constraints and ideological stances: a set of premises which calls for a closer methodological survey of the process. We propose an account which uses art theory and visual culture studies to scrutinize the different steps and activities which constitute the computer vision analysis: after all, the study of images has historically been a prerogative of art historians. Our intuition is that art images databases somehow provide a “protected environment” in which to observe how old problems, inherent to the discipline, interact with new problems created by the way we consume and design software. The three levels at which we will try to detect biased stances answer three different questions. Which images are we talking about? Which research questions are we asking? Which linguistic and political logics are at play? In order to do so, we will begin the discussion by debunking the myth of a simple parallelism between these new forms of conceptualizing the real and traditional ones, challenging Manovich’s (1999) use of Panofsky’s symbolic form (1927) as a hermeneutic of the database. We will show instead how the art-database logic somehow sticks to the traditional art historical narrative, while at the same time producing new kinds of biases. Then, we will focus on how this technology actually works, and which kind of art historical thought lays behind the algorithm. Our guess is that the praxis of this software is closer to the connoisseurship than to the art historical research. Thirdly, we will analyze the labeling process through which computer vision software creates descriptive metadata of the images in question, using Mitchell’s critical iconology (1994) account to problematize the strong ideological and political stance behind the image-text relationship. Throughout the discourse, and especially in the final paragraph, we will address the transparency and evaluation standards which need to be defined in order to allow a strict methodological approach to guard and guide the process, at times lacking both in the cultural sector and in the wider visual field. What will emerge is an account of computer vision software and processes which appear to be far from ‘neutral’ or ‘objective’ in their extremely layered functioning, built in the midst of diverse stakeholders’ interests and procedural false steps. Granted that these technologies are however contributing to build the visual culture of our time, we detect a series of overlooked assumptions along the way through the lenses of art theory, hoping to contribute to the design of a clearer view.

The (theoretical) elephant in the room. Overlooked assumptions in computer vision analysis of art images, 2023.