У нас вы можете посмотреть бесплатно Benchmarking Generative AI for Chest Radiograph Interpretation: Comparing Radiology Models-Podcast или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Benchmarking Generative AI for Chest Radiograph Interpretation: Comparing General-Purpose, Domain-Specific, and Agentic Radiology Models Summary Generative artificial intelligence (AI) systems are increasingly explored as tools to assist radiologists in image interpretation and automated report generation. The study presented in this work systematically evaluates the performance of several contemporary generative AI models for chest radiograph interpretation, highlighting the differences in clinical reliability, diagnostic performance, and hallucination behavior across model architectures. In this retrospective study, investigators analyzed 212 consecutive frontal chest radiographs obtained from a teleradiology service covering secondary hospitals in the United States between September and November 2023. The cohort included 212 unique patients (124 men and 88 women; mean age 48.9 years). Each radiograph was submitted as an image-only input—without accompanying clinical history—to four generative AI models representing different design paradigms: a general-purpose multimodal model (Gemini 2.5), a domain-specific radiology model (MAIRA-2), and two agentic models designed to orchestrate multiple AI tools (Rad.1 and MedRAX). Three radiologists independently evaluated the generated reports. They assessed whether the report could be accepted without modification, rated overall report quality on a five-point scale, and identified their preferred report among the four outputs. A separate thoracic radiologist evaluated each report for hallucinations, defined as fabricated statements unsupported by the input radiograph. In addition, model performance was assessed across 13 radiographic abnormalities, enabling calculation of pooled sensitivity and specificity. The results demonstrated substantial variability in performance across models. The agentic model Rad.1 consistently achieved the highest report acceptability and quality scores. For example, report acceptability reached 75.5% for Reader 1, compared with 57.1% for MAIRA-2 and 35.8% for Gemini 2.5. Median report quality scores were similarly highest for Rad.1 (median score of 4 across readers), while the general-purpose model Gemini 2.5 received the lowest ratings (median score of 2). Rad.1 was also most frequently selected as the preferred report by all readers. Hallucination rates differed markedly among models. Rad.1 demonstrated the lowest hallucination rate (5.7%), whereas MedRAX showed the highest (53.8%), underscoring the potential risks associated with complex multi-tool pipelines when reliability mechanisms are insufficient. Diagnostic accuracy metrics further highlighted these differences: Rad.1 achieved the highest pooled sensitivity (66.0%) and strong specificity (94.6%), whereas MAIRA-2 exhibited lower sensitivity but higher specificity (96.0%). Illustrative examples in the study reveal substantial variation in AI-generated reports for the same chest radiograph. For instance, in a case of pneumothorax, some models failed to identify the abnormality entirely, while others provided partially correct or misleading interpretations, demonstrating the current inconsistency of generative models in clinical imaging interpretation. Despite promising results, the authors acknowledge several limitations. Only four models and a single version of each were evaluated, and the sample size was relatively small. The study also relied on image-only inputs without clinical context or prior examinations, which differs from real-world radiology workflows. Overall, the findings highlight that generative AI systems for radiology are highly heterogeneous in performance and reliability. Agentic architectures may offer advantages in integrating specialized tools and reasoning pipelines, but careful benchmarking remains essential before clinical implementation. The study emphasizes the need for multidimensional evaluation frameworks—including diagnostic accuracy, hallucination monitoring, and reader preference—to ensure safe integration of generative AI into radiology practice. This work contributes to the growing literature examining the clinical viability of generative AI in medical imaging and reinforces the importance of rigorous task-specific benchmarking before deploying such systems in clinical workflows. APA (7th edition) Citation Hong et al. (2026). Generative artificial intelligence models for chest radiograph interpretation: Comparison of a general-purpose model, domain-specific model, and two agentic models. American Journal of Roentgenology. Hashtags #MedicalAI #Radiology #GenerativeAI #HealthTech #AIHallucinations #AgenticAI #MachineLearning #FutureOfHealthcare #RadiologyAI #ChestRadiograph #MedicalImaging #AIinRadiology #ClinicalAI #RadiologyResearch #AIReportGeneration #HealthcareAI © 2025 AI Chavelle™ by Jeffrey Chen / SmartRad AI. All rights reserved.