Inflation of Type Ⅰ Error of Log-Rank Test with Inappropriately Generated Censoring Data

YUHSpace

BROWSE

15 31

Cited 0 times in

Inflation of Type Ⅰ Error of Log-Rank Test with Inappropriately Generated Censoring Data

DC Field	Value	Language
dc.contributor.author	김용훈	-
dc.date.accessioned	2026-02-05T06:08:38Z	-
dc.date.available	2026-02-05T06:08:38Z	-
dc.date.issued	2025-02	-
dc.identifier.uri	https://ir.ymlib.yonsei.ac.kr/handle/22282913/210699	-
dc.description.abstract	When simulating survival data, some types of data generation lead to erroneous results. In, in an appropriate generating method called random generation, the event time and censoring time are separately generated based on assumed distributions. In the case where the censoring proportion is fixed, for example, the Weibull distribution does not have a closed form for calculating the censoring distribution parameter, particularly when the shape parameters differ. This often leads to the use of inappropriate data generation methods to simplify the process. In this study, we aimed to investigate the problems caused by inappropriate data generation through simulations and mathematical validation. Specifically, we evaluated Type I error rates of the log-rank test in a two-sample setting and examined the correlation between event times(T) and censoring times(C). In inappropriate generating method Ⅰ, after generating event time based on assumed Exponential and Weibull distributions, censoring indicator is generated using a Bernoulli distribution. In cases where censoring occurs, the censoring time is replaced by the generated event time. Furthermore, in inappropriate generating method Ⅱ, censoring time is generated based on a Uniform(0,T_i ) distribution, introducing between T and C. The Type I error of the log-rank test was well controlled in the random generation whether the predefined censoring proportions were equal or not between groups, whereas it was inflated in the inappropriate method when the censoring proportions were unequal. However, an inappropriate method appeared to effectively control the Type I error when the censoring proportions were equal. This is likely due to the log-rank test, where the dependent censoring between T and C results in the conditional distribution of T given C becoming different from the marginal distribution of T which can distort the actual differences between groups, creating the illusion of well-controlled Type I error rates. Additional simulations were conducted to investigate this issue. One group was generated using the random generating method, while the other group was generated using the inappropriate generating method Ⅱ. The results showed that even when the censoring proportions were equal between the groups, the Type I error of the log-rank test was inflated. This finding suggests that the increase in Type I errors is not due to unequal censoring proportions, but rather due to the inappropriate data-generating process that induced dependent censoring. Additionally, Spearman correlation between event time and censoring time confirmed that improper data generation introduced dependency. 생존 데이터를 생성할 때, 일부 데이터 생성 방식은 잘못된 결과를 초래할 수 있다. 일반적으로 무작위 생성(random generating)이라 불리는 데이터 생성 방법은 사건발생 시간(T)과 중도절단 시간(C)을 각각 독립적으로 분포를 가정하여 생성한다. 그러나, 예를 들어 중도절단율이 고정된 경우, 와이블 분포의 형상모수가 다를 때 중도절단 분포의 척도모수를 계산하기 위한 닫힌 형태의 식이 존재하지 않는다. 이를 단순화하기 위해 일부 연구에서는 부적절한 데이터 생성을 사용해왔다. 본 연구는 이로 인해 발생하는 문제를 증명과 시뮬레이션을 통해 확인하고자 한다. 구체적으로 두 그룹 간 로그-순위 검정의 1종 오류(Type I Error)와 사건발생 및 중도절단 시간 간 상관관계를 검토하였다. 부적절한 데이터 생성 방법 Ⅰ은 사건발생 시간을 생성한 후, 베르누이 분포를 사용하여 중도절단 여부를 나타내는 지시함수를 생성한다. 중도절단이 발생하는 경우, 중도절단 시간은 생성된 사건발생 시간으로 대체한다. 부적절한 데이터 생성 방법 Ⅱ에서는 중도절단 시간을 Uniform(0,T_i) 분포를 기반으로 생성한다. 시뮬레이션 결과, 무작위 생성 방법에서는 사전에 정의된 중도절단율이 두 그룹 간 동일하거나 다른 경우 모두 로그-순위 검정의 1종 오류가 잘 통제되었다. 그러나 부적절한 방법에서는 두 그룹 간 중도절단율이 다른 경우 1종 오류가 증가하였다. 반면, 중도절단율이 동일한 경우 부적절한 방법에서도 1종 오류가 잘 통제되는 것으로 보였다. 이는 T와 C의 종속성으로 인해 C를 조건으로 한 T의 조건부 분포가 T의 주변 분포와 달라지면서, 그룹 간 실제 차이가 왜곡되고 로그-순위 검정에서 1종 오류가 잘 통제되는 것처럼 보이는 결과를 초래한다. 이를 확인하기 위해 추가 시뮬레이션을 진행하였다. 한 그룹은 적절한 방법으로 데이터를 생성하고, 다른 그룹은 부적절한 방법 Ⅱ로 생성하였다. 그 결과, 두 그룹 간 중도절단율이 동일한 경우에도 1종 오류가 통제되지 않았다. 이는 그룹 간 중도절단율의 차이 때문이 아닌, 부적절한 데이터 생성과정에서 비롯된 문제임을 나타낸다. 또한 사건발생 시간과 중도절단 시간 간의 스피어만 상관계수를 확인한 결과, 부적절한 데이터 생성 방식이 사건발생 시간과 중도절단 시간 간에 종속성이 존재함을 확인하였다.	-
dc.description.statementOfResponsibility	open	-
dc.publisher	연세대학교 대학원	-
dc.rights	CC BY-NC-ND 2.0 KR	-
dc.title	Inflation of Type Ⅰ Error of Log-Rank Test with Inappropriately Generated Censoring Data	-
dc.title.alternative	부적절한 중도절단 데이터 생성 시 로그-순위 검정의 제 1종 오류 증가	-
dc.type	Thesis	-
dc.contributor.college	College of Medicine (의과대학)	-
dc.contributor.department	Others	-
dc.description.degree	석사	-
dc.contributor.alternativeName	Kim, Yong Hoon	-
dc.type.local	Thesis	-

Appears in Collections:: 1. College of Medicine (의과대학) > Others (기타) > 2. Thesis

Show simple item record Find it @ YMLIB

License

YUHSpace: Inflation of Type Ⅰ Error of Log-Rank Test with Inappropriately Generated Censoring Data

YUHSpace

BROWSE

Browse

Links