Speech processing technology is a multidisciplinary research area combined with several academic fields including computer science, linguistics, and mathematics. Since studying computer science during master and doctorate course, I have taken a great interest in speech technology like speech recognition, speech emotion recognition, speech synthesis, and so on. My research goal is to achieve a comfortable and reliable speech-driven user interface for adopting to various smart devices and humanoid robots.
음성 처리 기술은 전산학, 언어학, 수학 등 여러 학문이 융합된 대표적인 분야입니다. 저는 전산학 전공자로서 학위 과정 때부터 음성인식, 음성감정인식, 음성합성 등 다양한 음성 기술에 관심을 가지고 연구를 수행해 왔습니다. 제 연구의 목표는 다양한 스마트 기기와 로봇에 탑재할 수 있는 편리하고 안정된 음성 기반 사용자 인터페이스를 완성하는 것입니다.
최종학력
Ph.D. in Computer Science, Korea Advanced Institute of Science and Technology, Republic of Korea
전공분야
- Speech Processing Technology
- Machine Learning
- Artificial Intelligence
주요 연구
- 2020-2025: Personalized virtual assistant prototype based on user characterized learning of deep neural network. Supported by National Research Foundation (NRF) of Korea.
- 2014-2019: Personalized speech emotion recognition for human-robot affective interaction. Supported by National Research Foundation (NRF) of Korea.
- 2016-2018: Study for delivering UAV mission command using natural language. Supported by Agency for Defense Development (ADD).
- 2012-2014: Real-time emotion recognition from naturally verbalized emotional speech. Supported by Ministry of Education, Science and Technology
- 2012-2013: Identification of abnormal speech for surveillance system. Supported by Korea Research Institute of Standards and Science (KRISS)
주요 강의
주요 논문/저서
[Speech recognition]
- (2023) End-to-end emotional speech recognition using acoustic model adaptation based on knowledge distillation. Multimedia Tools and Applications, 1-18
- (2021) Accented speech recognition based on end-to-end domain adversarial training of neural networks. Applied Sciences, 11(18): 1-13.
- (2020) Front-end of vehicle-embedded speech recognition for voice-driven multi-UAVs control. Applied Sciences, 10(19): 1-27.
- (2020) Automatic language identification using speech rhythm features for multi-lingual speech recognition. Applied Sciences, 10(7): 1-18.
- (2020) Noise cancellation based on voice activity detection using spectral variation for speech recognition in smart home devices. Intelligent Automation And Soft Computing, 26(1): 149-159.
- (2019) Sound learning–based event detection for acoustic surveillance sensors. Multimedia Tools and Applications, 79: 16127-16139.
- (2017) Noise reduction based on robust speech and non-speech detection in vehicular environments. IJMPERD, 7(3): 105-112.
- (2016) Unsupervised noise reduction scheme for voice-based information retrieval in mobile environments. Multimedia Tools and Applications. 75(9): 4981-4996.
- (2015) Unsupervised rapid speaker adaptation based on selective eigenvoice merging for user-specific voice interaction. Engineering Applications of Artificial Intelligence. 40: 95–102.
- (2013) Acoustic interference cancellation for a voice-driven interface in smart TVs. IEEE Transactions on Consumer Electronics. 59(1): 244-249.
[Speech emotion recognition]
- (2016) Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition. Engineering Applications of Artificial Intelligence. 52: 126–134.
- (2015) Emotional information processing based on feature vector enhancement and selection for human-computer interaction via speech. Telecommunication Systems. 60(2): 201-213.
- (2012) Speaker-characterized emotion recognition using online and iterative speaker adaptation. Cognitive Computation. 4(4): 398-408.
[Spoken content retrieval]
- (2012) Online speaker diarization for multimedia data retrieval on mobile devices. International Journal of Pattern Recognition and Artificial Intelligence. 26(8).
- (2012) Multistage utterance verification for keyword recognition-based online spoken content retrieval. IEEE Transactions on Consumer Electronics. 58(3): 1000-1005.
- (2010) GMM adaptation based online speaker segmentation for spoken document retrieval. IEEE Transactions on Consumer Electronics. 56(2): 1123-1129.
Research Homepage (Google Scholar) : http://scholar.google.com/citations?hl=en&user=KBFDG64AAAAJ