Yu-Chu Yu https://chuyu.org r09922104@ntu.edu.tw

Statement of Purpose

My research goal is to develop machine learning (ML) and computer vision (CV) algorithms that are highly applicable to real-world scenarios. My research journey began during my graduate studies at National Taiwan University, where I was fortunate to be advised by Prof. Hsuan-Tien Lin. Currently, I have been pleasured to collaborate with Dr. Fu-En Yang and Prof. Yu-Chiang Frank Wang. I have contributed to two first-author publications [, ], one of which is presented at CVPR 2023 [] and the other is under submission. In the following sections, I will provide a brief overview of my past research experiences and outline potential directions I plan to pursue in the future.

Past Research Experience

Domain Adaptation

Domain Adaptation (DA) is the first research topic that interests me due to its applicable assumption: The training and test data may drawn from different distributions. To tackle the potential domain shift issue, I developed my first research project under the supervision of Prof. Hsuan-Tien Lin and presented our work at CVPR'23 []. We started from an intuitive observation:

The best model for the target domain may not need to perfectly fit the source data, thus, source labels might not be the best ground truth to learn a great target model.
With this hypothesis, we approached DA as a Noisy Label Learning problem, with a massive amount of data with noisy labels (source labels) and many unlabeled data. We introduced a framework, Source Label Adaptation (SLA), to dynamically adapt noisy source labels to be more suitable for the target domain. This source-to-target paradigm is entirely different from traditional target-to-source DA algorithms, allowing us to cooperate with previous state-of-the-art methods, leading to significant improvements.

Continual Learning for Vision-Language Models

After finishing my first project, I was impressed by the remarkable zero-shot generalization abilities of the current large-scale pre-trained vision-language models (VLMs). To enhance the versatility of VLMs, I started to explore Continual Learning for VLMs in collaboration with Prof. Yu-Chiang Frank Wang. The major challenge to continually fine-tuning a VLM is that it can not only forget its previously fine-tuned knowledge but also lose its original zero-shot transferability. To address this issue, we proposed a Selective Dual-Teacher Knowledge Transfer mechanism [] to distill knowledge from both the most recent fine-tuned model and the original pre-trained model. With access to a public reference dataset, we dynamically select an appropriate teacher source to preserve either previously learned knowledge or pre-trained knowledge. If the reference image aligns with the distribution of prior learned data, we distill knowledge from the most recent fine-tuned model. Conversely, if the image lies far from the previous distribution, we then select the pre-trained model to retain its original pre-trained knowledge. This dual-teacher selection mechanism ensures the robustness of VLMs allowing us to fine-tune on various datasets without losing both previously learned knowledge and pre-trained knowledge, presenting a more practical strategy for real-world applications.

Future Research Directions

In the future of my academic career, I am passionate about deploying machine learning algorithms in practical scenarios. Besides Image Recognition, I am particularly drawn to generative tasks such as multi-modal question answering / generation, which offer more direct applications in real-world settings. Upon these exciting research domains, I believe that continual learning will still play a crucial role in developing agents that can act like a human being. Specifically, several recent works such as Continual Diffusion [], continual unlearning / forgetting [] on diffusion models show significant promise. Moreover, I am also interested in enhancing the inference speed of diffusion models to make them more practical for real-world usage. Recently, several works (CM [], LCM [], LCM-Lora [], CTM [], DMD []) have reached one-step generation by distilling knowledge from a pre-trained diffusion model. Study on the fast generation capabilities of diffusion models for applications like text-to-image / video, image-to-image / video, and video-to-video holds great potential and contributes to the community.

References