Kang Zhang

I am a doctoral student in the integrated Master's/Ph.D. program in the School of Electrical Engineering at KAIST, where I have been studying since 2021. I am a member of the Multimodal AI Lab, advised by Professor Joon Son Chung. I received my B.S. in Electronics and Information Engineering from Harbin Institute of Technology (HIT), where I worked with Professor Sheng Chang Lan on mmWave radar-based hand gesture recognition.

Email  /  Scholar  /  Github

profile photo

Research

I'm interested in computer vision, deep learning, generative AI, and audio processing. * denotes equal contributions.

clean-usnob Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation
Kang Zhang*, Trung X Pham*, Suyeon Lee, Axi Niu, Arda Senocak, Joon Son Chung
Neurips, 2025

We present MGAudio, a novel flow-based framework for open-domain video-to-audio generation, which introduces model-guided dual-role alignment as a central design principle. Unlike prior approaches that rely on classifier-based or classifier-free guidance, MGAudio enables the generative model to guide itself through a dedicated training objective designed for video-conditioned audio generation.

clean-usnob Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision
Chenshuang Zhang, Kang Zhang, Joon Son Chung, In So Kweon, Junmo Kim, Chengzhi Mao
Neurips, 2025

We propose a self-supervised tracker that leverages motion representations inherently learned by pre-trained video diffusion models during early denoising, enabling robust distinction of visually similar objects and achieving up to 6-point improvements over prior methods on benchmarks and new tests.

clean-usnob Cross-view Masked Diffusion Transformers for Person Image Synthesis
Trung X Pham*, Kang Zhang*, Chang D Yoo
ICML, 2024

code

We present X-MDPT, a novel diffusion model designed for pose-guided human image generation. X-MDPT distinguishes itself by employing masked diffusion transformers that operate on latent patches, a departure from the commonly-used Unet structures in existing works.

clean-usnob Physics informed distillation for diffusion models
Joshua Tian Jin Tee*, Kang Zhang*,, Hee Suk Yoon, Dhananjaya Nagaraja Gowda, Chanwoo Kim, Chang D Yoo
TMLR, 2024

code

We introduce Physics Informed Distillation (PID), which employs a student model to represent the solution of the ODE system corresponding to the teacher diffusion model, akin to the principles employed in PINNs.

clean-usnob Bi-mdrg: Bridging image history in multimodal dialogue response generation
Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Kang Zhang, Yu-Jung Heo, Du-Seong Chang, Chang D Yoo
ECCV, 2024

We propose BI-MDRG that bridges the response generation path such that the image history information is utilized for enhanced relevance of text responses to the image content and the consistency of objects in sequential image responses.

clean-usnob ACDMSR: Accelerated conditional diffusion models for single image super-resolution
Axi Niu, Trung X Pham, Kang Zhang, Jinqiu Sun, Yu Zhu, Qingsen Yan, In So Kweon, Yanning Zhang
IEEE Transactions on Broadcasting, 2024

To speed up inference and further enhance the performance, our research revisits diffusion models in image super-resolution and proposes a straightforward yet significant diffusion model-based super-resolution method called ACDMSR.

clean-usnob Learning from multi-perception features for real-word image super-resolution
Axi Niu, Kang Zhang, Trung X Pham, Pei Wang, Jinqiu Sun, In So Kweon, Yanning Zhang
IEEE Transactions on Circuits and Systems for Video Technology, 2024

Actual image super-resolution is an extremely challenging task due to complex degradations existing in the image. To solve this problem, two dominant methodologies have emerged: degradation-estimation-based Addressing actual image super-resolution remains a formidable challenge due to the intricate degradations present in images.

clean-usnob Applying mmWave radar sensors to vocabulary-level dynamic Chinese sign language recognition for the community with deafness and hearing loss
Shengchang Lan, Linting Ye, Kang Zhang
IEEE Sensors Journal, 2023

To facilitate human computer interaction (HCL) for the community with deafness and hearing loss (D&HL), this article explored the feasibility of recognizing a vocabulary of dynamic Chinese sign language (CSL) based on millimeter-wave (mmWave) radar sensors within the scope of data science.

clean-usnob Cdpmsr: Conditional diffusion probabilistic models for single image super-resolution
Axi Niu, Kang Zhang, Trung X Pham, Jinqiu Sun, Yu Zhu, In So Kweon, Yanning Zhang
ICIP, 2023

To further improve the performance and simplify current DPM-based super-resolution methods, we propose a simple but non-trivial DPM-based super-resolution post-process framework i.e. cDPMSR.

clean-usnob Decoupled adversarial contrastive learning for self-supervised adversarial robustness
Chaoning Zhang*, Kang Zhang*, Chenshuang Zhang, Axi Niu, Jiu Feng, Chang D Yoo, In So Kweon
ECCV, 2022

code

This work discards prior practices of directly introducing AT to SSL frameworks and proposed a two-stage framework termed Decoupled Adversarial Contrastive Learning (DeACL).

clean-usnob Dual temperature helps contrastive learning without many negative samples: Towards understanding and simplifying moco
Chaoning Zhang*, Kang Zhang*, Trung X Pham, Axi Niu, Zhinan Qiao, Chang D Yoo, In So Kweon
CVPR, 2022

supplement / code

We point out that InfoNCE loss used in MoCo implicitly attract anchors to their corresponding positive sample with various strength of penalties and identify such inter-anchor hardness-awareness property as a major reason for the necessity of a large dictionary. Our findings motivate us to simplify MoCo v2 via the removal of its dictionary as well as momentum.

clean-usnob How does SimSiam avoid collapse without negative samples? Towards a unified understanding of progress in SSL
Chaoning Zhang*, Kang Zhang*, Chenshuang Zhang, Trung X. Pham, Chang D. Yoo, In So Kweon
ICLR, 2022

We introduce vector decomposition for analyzing the collapse based on the gradient analysis of the l2-normalized representation vector.

clean-usnob On the effect of training convolution neural network for millimeter-wave radar-based hand gesture recognition
Kang Zhang, Shengchang Lan, Guiyuan Zhang
Sensors, 2021
The purpose of this paper was to investigate the effect of a training state-of-the-art convolution neural network (CNN) for millimeter-wave radar-based hand gesture recognition (MR-HGR).
clean-usnob Temporal-range-doppler features interpretation and recognition of hand gestures using MmW FMCW radar sensors
Guiyuan Zhang, Shengchang Lan, Kang Zhang, Linting Ye
European conference on antennas and propagation (EuCAP), 2020

Two optimal types of deep neural networks, 3D-CNN and CNN-LSTM are respectively constructed to reveal the temporal gesture motion signatures encoded in multiple adjacent radar chirps.

clean-usnob Mining spatio-temporal features from mmW radar echoes for hand gesture recognition
Kang Zhang, Shengchang Lan, Guiyuan Zhang
IEEE Asia-Pacific Microwave Conference (APMC), 2019

We use the 77GHz millimeter wave radar to extract the time variation characteristics of the Doppler frequency of the gesture.

clean-usnob Implementation of C4. 5 decision tree in Human Gesture Recognition based on Doppler radars
Guiyuan Zhang, Kang Zhang, Yihan Yun, Gang Lu, Shengchang Lan
International Symposium on Antennas and Propagation (ISAP), 2019

We investigate the feasibility of using a three-dimensional Doppler-radar array at 24GHz to recognize human gestures with a model consisted of ten classical gestures.

clean-usnob A Modified Vivaldi Antenna with Low Self-reflectivity for Bone Health Detection
Gang Lu, Shengchang Lan, Kang Zhang, Lijia Chen, Weichu Chen
IEEE International Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting, 2019

A modified Vivaldi antenna with low self-reflectivity working at 1-4 GHz is designed to improve the radiation gain by opening rectangular grooves and loading parasitic patches on the surface of the radiating patch.

clean-usnob A Real-time Hand Gesture Recognition System using 24 GHz Radar Array
Guiyuan Zhang, Kang Zhang, Shengchang Lan, Yuanxun Liu, Lijia Chen
USNC-URSI Radio Science Meeting (Joint with AP-S Symposium), 2019

This paper presents a description of a real-time hand gesture recognition system.


This website is based on the template by Jon Barron.