Large Language Models for Inclusive Image Captioning

Rostyslav Zatserkovnyi; Roksoliana Zatserkovna; Zoriana Novosad

Large Language Models for Inclusive Image Captioning

dc.contributor.author	Rostyslav Zatserkovnyi
dc.contributor.author	Roksoliana Zatserkovna
dc.contributor.author	Zoriana Novosad
dc.date.accessioned	2026-04-06T15:12:00Z
dc.date.issued	2025-03-21
dc.description	29th International Conference on Information Technology (IT) Žabljak, 19 – 22 February 2025
dc.description.abstract	The recent rapid development of artificial intelligence (AI) has opened up many new possibilities for making digital content more accessible and inclusive. One of the most exciting advancements in this area is the use of large language models (LLMs) for image captioning. Trained on vast amounts of text data, these models, among other capabilities, can generate detailed descriptions of images – this can make visual content on the Web, which often has missing or unreliable captions, more understandable for individuals with visual impairment. In this article, we investigate the possibility of using LLMs to improve image captioning of visual web content. We discuss the current capabilities of this new class of ML models, comparing several free and open-source LLMs that can be utilized for the task. Finally, we propose the architecture of a novel system that can be utilized by visually impaired web users to automatically caption visual content on the websites they visit.
dc.identifier.citation	R. Zatserkovnyi, Z. Novosad and R. Zatserkovna, "Large Language Models for Inclusive Image Captioning," 2025 29th International Conference on Information Technology (IT), Zabljak, Montenegro, 2025, pp. 1-5
dc.identifier.issn	2836-3736 (Print)
dc.identifier.issn	2836-3744 (Electronic)
dc.identifier.other	https://doi.org/10.1109/IT64745.2025.10930278
dc.identifier.uri	https://ieeexplore.ieee.org/document/10930278/
dc.identifier.uri	https://dspace.lute.lviv.ua/handle/123456789/2375
dc.language.iso	en
dc.publisher	IEEE
dc.subject	Visualization
dc.subject	Large language models
dc.subject	Visual impairment
dc.subject	Data models
dc.subject	Service-oriented architecture
dc.subject	Information technology
dc.title	Large Language Models for Inclusive Image Captioning
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: LLM-Inclusive-Image-Captioning-Camera-Ready.pdf
Size:: 376.15 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Scopus