Current position: Research Engineer at aiXplain (San Jose, California) (LinkedIn, Email) MS Thesis: HyperGAN-CLIP: A Versatile Framework for CLIP-Guided Image Synthesis and Editing using Hypernetworks. July 2023. (PDF, Presentation) |
Generative Adversarial Networks, particularly StyleGAN and its variants, have shown exceptional capability in generating highly realistic images. However, training these models remains challenging in domains where data is scarce, as it typically requires large datasets. In this thesis work, we introduce a versatile framework that enhances the capabilities of a pre-trained StyleGAN for various tasks, including domain adaptation, reference-guided image synthesis, and text-guided image manipulation even when only a small number of training sample are available. We achieve this by integrating the CLIP space into the generator of StyleGAN using hypernetworks. These hypernetworks introduce dynamic adaptability, enabling the pre-trained StyleGAN to be effectively applied to specific domains described by either a reference image or a textual description. To further improve the alignment between the synthesized images and the target domain, we introduce a CLIP-guided discriminator, ensuring the generation of high-quality images. Notably, our approach shows remarkable flexibility and scalability, enabling text-guided image manipulation with text-free training and seamless style transfer between two images. Through extensive qualitative and quantitative experiments, we validate the robustness and effectiveness of our approach, surpassing existing methods in terms of performance.
No comments:
Post a Comment