August 09, 2022

Barış Batuhan Topal, M.S. 2022


Current position: ML Research Engineer at PixerLabs (LinkedIn)
MS Thesis: Domain-adaptive Self-supervised Pre-training for Face and Body Detection in Drawings. August 2022. (PDF, Presentation, Code).

Thesis Abstract:

Drawings are powerful means of pictorial abstraction and communication. Understanding diverse forms of drawings, including digital arts, cartoons, and comics, has been a major problem of interest for the computer vision and computer graphics communities. Although there are large amounts of digitized drawings from comic books and cartoons, they contain vast stylistic variations, which necessitate expensive manual labeling for training domain-specific recognizers.

In this work, I show how self-supervised learning, based on a teacher-student network with a modified student network update design, can be used to build face and body detectors. My setup allows exploiting large amounts of unlabeled data from the target domain when labels are provided for only a small subset of it. I further demonstrate that style transfer can be incorporated into my learning pipeline to bootstrap detectors using a vast amount of out-of-domain labeled images from natural images (i.e., images from the real world). My combined architecture yields detectors with state-of-the-art (SOTA) and near-SOTA performance using minimal annotation effort.

Through the utilization of this detector architecture, I accomplish a set of additional tasks. First, I extract a large set of facial drawing images (∼1.2 million instances) from unlabeled data and train SOTA generative adversarial network (GAN) models to generate and a SOTA GAN inversion model to reconstruct faces. When the detector-aided data is leveraged, these generative models successfully learn diverse stylistic features. Secondly, I implement an annotation tool to enlarge the existing set of annotated data. This tool offers users to annotate bounding boxes of panels, speech bubbles, narrations, faces, and bodies; to associate text boxes with faces and bodies; to transcript the text; to match the same characters in the image.


No comments: