Speakers
Description
In recent years, models based on deep neural networks have demonstrated exceptional capabilities in recognizing and classifying objects in images. Nevertheless, the issue remains with users' trust in the results of these models. In our research, we utilized the multimodal GPT-4o model, which not only classifies objects but also generates explanations for its decisions in natural language. Experiments on artificially generated images showed that both the classifications and explanations were convincing to humans. In the medical context, the model tested on images of skin lesions using the ABCDE method provided justifications that increased the credibility of the responses. These results confirm that textual explanations help users better assess the correctness of the model's answers. Consequently, trust in the model's operation significantly increases. Multimodal models like GPT-4o thus offer not only high accuracy in classification but also decision transparency. This approach is particularly important in critical applications such as medical diagnostics. We conclude that the ability to generate natural language justifications is crucial for the acceptance and trust of users in AI systems.