Cited 5 times in
Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology
DC Field | Value | Language |
---|---|---|
dc.contributor.author | 박상준 | - |
dc.date.accessioned | 2024-05-23T03:05:39Z | - |
dc.date.available | 2024-05-23T03:05:39Z | - |
dc.date.issued | 2024-01 | - |
dc.identifier.issn | 1361-8415 | - |
dc.identifier.uri | https://ir.ymlib.yonsei.ac.kr/handle/22282913/199156 | - |
dc.description.abstract | The escalating demand for artificial intelligence (AI) systems that can monitor and supervise human errors and abnormalities in healthcare presents unique challenges. Recent advances in vision-language models reveal the challenges of monitoring AI by understanding both visual and textual concepts and their semantic correspondences. However, there has been limited success in the application of vision-language models in the medical domain. Current vision-language models and learning strategies for photographic images and captions call for a web-scale data corpus of image and text pairs which is not often feasible in the medical domain. To address this, we present a model named medical cross-attention vision-language model (Medical X-VL), which leverages key components to be tailored for the medical domain. The model is based on the following components: self-supervised unimodal models in medical domain and a fusion encoder to bridge them, momentum distillation, sentencewise contrastive learning for medical reports, and sentence similarity-adjusted hard negative mining. We experimentally demonstrated that our model enables various zero-shot tasks for monitoring AI, ranging from the zero-shot classification to zero-shot error correction. Our model outperformed current state-of-the-art models in two medical image datasets, suggesting a novel clinical application of our monitoring AI model to alleviate human errors. Our method demonstrates a more specialized capacity for fine-grained understanding, which presents a distinct advantage particularly applicable to the medical domain. | - |
dc.description.statementOfResponsibility | restriction | - |
dc.language | English | - |
dc.publisher | Elsevier | - |
dc.relation.isPartOf | MEDICAL IMAGE ANALYSIS | - |
dc.rights | CC BY-NC-ND 2.0 KR | - |
dc.subject.MESH | Artificial Intelligence* | - |
dc.subject.MESH | Humans | - |
dc.subject.MESH | Language | - |
dc.subject.MESH | Learning | - |
dc.subject.MESH | Radiography | - |
dc.subject.MESH | Radiology* | - |
dc.title | Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology | - |
dc.type | Article | - |
dc.contributor.college | College of Medicine (의과대학) | - |
dc.contributor.department | Dept. of Radiation Oncology (방사선종양학교실) | - |
dc.contributor.googleauthor | Sangjoon Park | - |
dc.contributor.googleauthor | Eun Sun Lee | - |
dc.contributor.googleauthor | Kyung Sook Shin | - |
dc.contributor.googleauthor | Jeong Eun Lee | - |
dc.contributor.googleauthor | Jong Chul Ye | - |
dc.identifier.doi | 10.1016/j.media.2023.103021 | - |
dc.contributor.localId | A06513 | - |
dc.relation.journalcode | J02201 | - |
dc.identifier.eissn | 1361-8423 | - |
dc.identifier.pmid | 37952385 | - |
dc.identifier.url | https://www.sciencedirect.com/science/article/pii/S1361841523002815 | - |
dc.subject.keyword | Error detection | - |
dc.subject.keyword | Monitoring AI | - |
dc.subject.keyword | Radiograph | - |
dc.subject.keyword | Vision-language model | - |
dc.contributor.alternativeName | Park, Sang Joon | - |
dc.contributor.affiliatedAuthor | 박상준 | - |
dc.citation.volume | 91 | - |
dc.citation.startPage | 103021 | - |
dc.identifier.bibliographicCitation | MEDICAL IMAGE ANALYSIS, Vol.91 : 103021, 2024-01 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.