Estimation of postoperative vowel of benign vocal fold lesions using nonlinear speech production modeling

YUHSpace

BROWSE

454 668

Cited 0 times in

Estimation of postoperative vowel of benign vocal fold lesions using nonlinear speech production modeling

Other Titles: 비선형 음성 모델링을 이용한 양성 후두 질환의 수술 후 모음에 대한 예측

Authors: 장승진

Issue Date: 2007

Description: Dept. of Biomedical Engineering/박사

Abstract: [한글]병적인 음성에서 지각적인 비주기성은 기본 주파수의 간격 (jitter), 강도의 떨림 (shimmer)과 잡음과 같은 동요 요인에 의해 주로 발생된다. 이러한 요인들은 주로 성문 진동에 대한 제어 손실, 성문에 발생하는 종양 및 방사와 호흡시 발생하는 잡음의 존재로 인하여 주로 영향 받는다. 본 연구의 가정은 병적인 음성에서 이러한 동요 요인들을 제거하는 것이 수술후의 음성과 비슷한 향상을 발생할 수 있다는 것이다.본 연구에서는, 수술 전/후 모음에 대한 음성 및 전기성문파형 검사 결과를 바탕으로 양성 후두 질환을 위한 수술 후 모음 예측에 대한 모형과 구현을 비선형 외인성 입력을 갖는 자기회귀 방법 (NARX)를 기반으로 한 비선형 음성 모델링을 통하여 수행하였다.먼저, 정확한 음성 분석을 위하여 병적인 음성에 대한 강인한 피치 검출 알고리즘 제안하였다. 기존의 다른 피치 검출 알고리즘과 달리 고속 직교 검출을 기반으로 제안된 피치 검출 알고리즘은 상당히 많이 피치 조대 오차, 특히 피치 반감 오차를 줄일 수 있다.이후, 음성 및 전기성문파형 검사와 관련한 다양한 측정들이 42명의 양성 후두 질환 자들을 대상으로 수술 전/후 두 차례에 걸쳐 검사되었다. 남성 그룹의 평균 피치는 약 12-15 % 감소한 반면에 여성 그룹들의 값은 유의하게 변하지 않았다. 포만트 주파수 (Formant frequency)는 수술 전과 후에 일정한 값을 유지하였다.대부분의 jitter 측정치들은 통계적으로 유의하게 변화한 반면, 일부의 shimmer들만 수술 후 달라졌음을 확인할 수 있었다. harmonic-to-noise ratio (HNR), normalized noise energy (NNE), degree of hoarse (DH), and normalized first harmonic energy (NFHE)와 같은 잡은 예측 관련 측정치들에서는 성별에 따라서 일부의 발성에 대해서만 유의하게 차이를 보였다. 전기성문 파형검사 관련 측정치의 open quotient (OQ), speed quotient (SQ)에서는 변화를 보이지 않았지만, 특이하게도 평균 SQ 값에 의해 구분된 두 그룹의 경우 정상 범위 내로 회귀하는 것을 발견하였다.이러한 검사 결과를 바탕으로 정상적인 음성과 같은 지각적인 정도로 수술 전 모음을 향상시키도록 변조하였다. 변조되는 정도는 수술 전/후 음성의 차이를 기반으로 한 통계적인 결과에 의해서 조정되었다. 피치 거리, 강도 및 기식성 잡음의 변조들이 Pitch synchronous overlap and add (PSOLA), 강도 조정자 및 웨이블릿 문턱치 감소 방법들과 전기성문파형 신호의 기저선 변동 제거에 의하여 수행되었다. 이렇게 변경된 음성, 성문 신호들은 최소 제곱 서포트 벡터 회귀 (SVR)를 기반으로 한 NARX비선형 음성 모델링에서 입력 신호들로 사용되어진다.마지막으로, 음성 및 전기성문파형 검사를 기반으로 한 수술 전 모음의 변조는 주파수 및 동력학 도메인에서 수술 후의 모음과 상당 부분 비슷함을 보였다. 또한 SVR을 기반으로 한 NARX을 이용한 비선형 음성 모델링의 성능은 모음들의 지각적 정도에 있어 LPC 보다 우수하였으며, 이러한 결과는LPC의 경우 자연스러움이 부족한 인공적인 음성을 생성하는 반면에, 자연적인 jitter, shimmer 및 잡음이 보존되기 때문이라 예측된다.

[영문]In pathological voices, perceptual aperiodicity is mainly caused by perturbation factors such as jitter, shimmer, and noise. These factors are mainly affected by lack of control of vocal fold vibration, mass lesions of vocal cords, and presence of noise at emission and breathiness. Our hypothesis is that reduction of these perturbation factors in pathological voice can be enhanced similar to postoperative voice.In benign vocal fold lesions, a design and implementation of estimation of postoperative vowel is studied using nonlinear speech modeling based on nonlinear autoregressive with exogenous input (NARX), according to the acoustic and electroglottographic analysis between preoperative and postoperative sustained vowel.First, robust pitch detection algorithm (PDA) for pathological voice is suggested for accurate acoustic analysis. Compared to other established PDAs, our proposed PDA based on fast orthogonal search can considerably reduce the pitch gross errors, especially pitch halving error.After that, it is investigated that various measurements related with acoustic and electroglottographic analysis are achieved twice before and after laryngeal surgery, for 42 subjects in a relevant of benign vocal fold lesions. Mean pitch of male group decreased about 12-15 % value of preoperative pitch, whereas that of female group does not significantly change. Formant frequencies show constant values before and after surgery. Most of jitter measures are significantly changed, but some of shimmer measures are different later the surgery. In noise estimation relevant measures such as harmonic-to-noise ratio (HNR), normalized noise energy (NNE), degree of hoarse (DH), and normalized first harmonic energy (NFHE), some of phonation significantly present the difference according to sex. No changes are achieved in open quotient (OQ) and speed quotient (SQ) of Electroglottography (EGG) relevant measures, but particular characteristics of SQ group, regressing within normal range, are presented in condition of division of two groups separated by mean SQ value.According to above results, we modify the preoperative voiced sounds in order to enhance the perceptual quality like normal voice. Enhancement rates are adjusted by statistical results based on the difference between preoperative and postoperative speech sounds. Modification of pitch period, intensity, and noise of aspiration are controlled by pitch synchronous overlap and add (PSOLA), intensity modifier, and Wavelet threshold shrinkage methods and baseline wander of EGG signal using empirical mode decomposition (EMD). These modified speech and EGG signal was used as input signals in nonlinear speech modeling, NARX based on Least Square-Support Vector Regression.Finally, modification of preoperative vowel based on acoustic and electroglottographic analysis can resemble amount of postoperative vowel in spectral and dynamic domain. Performance of nonlinear speech modeling using NARX based SVR also showed better than LPC in perceptual quality of voiced sounds, and this result is assumed that natural jitter, shimmer, and noise are conserved, whereas LPC produces artificial sounds due to lack of naturalness

Files in This Item:: TA01051.pdf Download

Appears in Collections:: 1. College of Medicine (의과대학) > Others (기타) > 3. Dissertation

URI: https://ir.ymlib.yonsei.ac.kr/handle/22282913/136086

사서에게 알리기

Show full item record Find it @ YMLIB

License

YUHSpace: Estimation of postoperative vowel of benign vocal fold lesions using nonlinear speech production modeling

YUHSpace

BROWSE

Browse

Links