10–13 Sept 2024
Holimo Hotel
Europe/Warsaw timezone

Textual explanations for image classification using multimodal LLM

11 Sept 2024, 11:20
20m
Holimo Hotel

Holimo Hotel

Stara Morawa 11a, 57-550 Stronie Śląskie
Presentation at the conference Machine Learning Session 1 - Machine Learning

Speakers

Bartosz Sawicki (Warsaw University of Technology) Tomasz Leś (Warsaw University of Technology)

Description

In recent years, models based on deep neural networks have demonstrated exceptional capabilities in recognizing and classifying objects in images. Nevertheless, the issue remains with users' trust in the results of these models. In our research, we utilized the multimodal GPT-4o model, which not only classifies objects but also generates explanations for its decisions in natural language. Experiments on artificially generated images showed that both the classifications and explanations were convincing to humans. In the medical context, the model tested on images of skin lesions using the ABCDE method provided justifications that increased the credibility of the responses. These results confirm that textual explanations help users better assess the correctness of the model's answers. Consequently, trust in the model's operation significantly increases. Multimodal models like GPT-4o thus offer not only high accuracy in classification but also decision transparency. This approach is particularly important in critical applications such as medical diagnostics. We conclude that the ability to generate natural language justifications is crucial for the acceptance and trust of users in AI systems.

Authors

Bartosz Sawicki (Warsaw University of Technology) Tomasz Leś (Warsaw University of Technology)

Presentation materials

There are no materials yet.

Peer reviewing

Paper