Showing posts with label Publications. Show all posts
Showing posts with label Publications. Show all posts

August 29, 2023

CLIP-guided StyleGAN Inversion for Text-driven Real Image Editing

Ahmet Canberk Baykal, Abdul Basit Anees, Duygu Ceylan, Erkut Erdem, Aykut Erdem and Deniz Yuret. Aug 29, 2023. ACM Transactions On Graphics (TOG), vol 42, issue 5, article no:172, pp 1--18. Presented at ACM SIGGRAPH Asia 2023 in Sydney, Australia, Dec 12-15, 2023. (PDF, arXiv:2307.08397, Demo video).

Abstract: Researchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. However, these approaches have inherent limitations. The former is not very efficient, while the latter often struggles to effectively handle multi-attribute changes. To address these weaknesses, we present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes. The core of our method is the use of novel, lightweight text-conditioned adapter layers integrated into pretrained GAN-inversion networks. We demonstrate that by conditioning the initial inversion step on the CLIP embedding of the target description, we are able to obtain more successful edit directions. Additionally, we use a CLIP-guided refinement step to make corrections in the resulting residual latent codes, which further improves the alignment with the text prompt. Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds, as shown by our qualitative and quantitative results.


Full post...

August 15, 2023

Domain-Adaptive Self-Supervised Face & Body Detection in Drawings

Barış Batuhan Topal, Deniz Yuret, Tevfik Metin Sezgin. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI 2023) Main Track. Pages 1432-1439. August 2023. (PDF, arXiv:2211.10641).

Abstract: Drawings are powerful means of pictorial abstraction and communication. Understanding diverse forms of drawings, including digital arts, cartoons, and comics, has been a major problem of interest for the computer vision and computer graphics communities. Although there are large amounts of digitized drawings from comic books and cartoons, they contain vast stylistic variations, which necessitate expensive manual labeling for training domain-specific recognizers. In this work, we show how self-supervised learning, based on a teacher-student network with a modified student network update design, can be used to build face and body detectors. Our setup allows exploiting large amounts of unlabeled data from the target domain when labels are provided for only a small subset of it. We further demonstrate that style transfer can be incorporated into our learning pipeline to bootstrap detectors using a vast amount of out-of-domain labeled images from natural images (i.e., images from the real world). Our combined architecture yields detectors with state-of-the-art (SOTA) and near-SOTA performance using minimal annotation effort. Our code can be accessed from https://github.com/barisbatuhan/DASS_Detector.


Full post...

September 19, 2022

Self-Supervised Learning with an Information Maximization Criterion

Serdar Ozsoy, Shadi Hamdan, Sercan Ö. Arik, Deniz Yuret, Alper T. Erdogan. In NeurIPS, Dec 2022. (PDF, PDF, arXiv:2209.07999, Poster)

Abstract: Self-supervised learning allows AI systems to learn effective representations from large amounts of data using tasks that do not require costly labeling. Mode collapse, i.e., the model producing identical representations for all inputs, is a central problem to many self-supervised learning approaches, making self-supervised tasks, such as matching distorted variants of the inputs, ineffective. In this article, we argue that a straightforward application of information maximization among alternative latent representations of the same input naturally solves the collapse problem and achieves competitive empirical results. We propose a self-supervised learning method, CorInfoMax, that uses a second-order statistics-based mutual information measure that reflects the level of correlation among its arguments. Maximizing this correlative information measure between alternative representations of the same input serves two purposes: (1) it avoids the collapse problem by generating feature vectors with non-degenerate covariances; (2) it establishes relevance among alternative representations by increasing the linear dependence among them. An approximation of the proposed information maximization objective simplifies to a Euclidean distance-based objective function regularized by the log-determinant of the feature covariance matrix. The regularization term acts as a natural barrier against feature space degeneracy. Consequently, beyond avoiding complete output collapse to a single point, the proposed approach also prevents dimensional collapse by encouraging the spread of information across the whole feature space. Numerical experiments demonstrate that CorInfoMax achieves better or competitive performance results relative to the state-of-the-art SSL approaches.


Full post...

June 20, 2022

Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters

İlker Kesen, Ozan Arkan Can, Erkut Erdem, Aykut Erdem, Deniz Yuret. June 20, 2022. Best paper at the 5th Multimodal Learning and Applications Workshop (MULA 2022) in conjunction with CVPR 2022. (PDF, arXiv:2003.12739, presentation video).

Abstract: How to best integrate linguistic and perceptual processing in multi-modal tasks that involve language and vision is an important open problem. In this work, we argue that the common practice of using language in a top-down manner, to direct visual attention over high-level visual features, may not be optimal. We hypothesize that the use of language to also condition the bottom-up processing from pixels to high-level features can provide benefits to the overall performance. To support our claim, we propose a model for language-vision problems involving dense prediction, and perform experiments on two different multi-modal tasks: image segmentation from referring expressions and language-guided image colorization. We compare results where either one or both of the top-down and bottom-up visual branches are conditioned on language. Our experiments reveal that using language to control the filters for bottom-up visual processing in addition to top-down attention leads to better results on both tasks and achieves state-of-the-art performance. Our analysis of different word types in input expressions suggest that the bottom-up conditioning is especially helpful in the presence of low level visual concepts like color.


Full post...

June 09, 2022

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models (BIG-bench)

Srivastava et al. (442 authors). March 2022. arXiv:2206.04615 [cs.CL]. (github).

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.


Full post...

May 25, 2022

CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions

Tayfun Ates, M. Ateşoğlu, Çağatay Yiğit, Ilker Kesen, Mert Kobas, Erkut Erdem, Aykut Erdem, Tilbe Goksun, Deniz Yuret. May 2022. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2602–2627, Dublin, Ireland. Association for Computational Linguistics. (PDF, openreview, arXiv:2012.04293, poster).

Abstract: Humans are able to perceive, understand and reason about causal events. Developing models with similar physical and causal understanding capabilities is a long-standing goal of artificial intelligence. As a step towards this direction, we introduce CRAFT, a new video question answering dataset that requires causal reasoning about physical forces and object interactions. It contains 58K video and question pairs that are generated from 10K videos from 20 different virtual environments, containing various objects in motion that interact with each other and the scene. Two question categories in CRAFT include previously studied descriptive and counterfactual questions. Additionally, inspired by the Force Dynamics Theory in cognitive linguistics, we introduce a new causal question category that involves understanding the causal interactions between objects through notions like cause, enable, and prevent. Our results show that even though the questions in CRAFT are easy for humans, the tested baseline models, including existing state-of-the-art methods, do not yet deal with the challenges posed in our benchmark.


Full post...

Mukayese: Turkish NLP Strikes Back

Ali Safaya, Emirhan Kurtuluş, Arda Göktoğan, Deniz Yuret. May 2022. In Findings of the Association for Computational Linguistics: ACL 2022, pages 846–863, Dublin, Ireland. Association for Computational Linguistics. (PDF, openreview, arXiv:2203.01215, poster).

Abstract: Having sufficient resources for language X lifts it from the under-resourced languages class, but not necessarily from the under-researched class. In this paper, we address the problem of the absence of organized benchmarks in the Turkish language. We demonstrate that languages such as Turkish are left behind the state-of-the-art in NLP applications. As a solution, we present Mukayese, a set of NLP benchmarks for the Turkish language that contains several NLP tasks. We work on one or more datasets for each benchmark and present two or more baselines. Moreover, we present four new benchmarking datasets in Turkish for language modeling, sentence segmentation, and spell checking. All datasets and baselines are available under: https://github.com/alisafaya/mukayese.


Full post...

March 14, 2022

Machine learning in and out of equilibrium

Michael Hinczewski, Shishir Adhikari, Alkan Kabakcioglu, Alexander Strang, and Deniz Yuret. March 2022. Bulletin of the American Physical Society.

Abstract:The algorithms used to train neural networks, like stochastic gradient descent (SGD), have close parallels to natural processes that navigate a high-dimensional parameter space—for example protein folding or evolution. Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels in a single, unified framework. We focus in particular on the stationary state of the system in the long-time limit. In contrast to its biophysical analogues, conventional SGD leads to a nonequilibrium stationary state exhibiting persistent currents in the space of network parameters. The effective loss landscape that determines the shape of this stationary distribution sensitively depends on training details, i.e. the choice to minibatch with or without replacement. We also demonstrate that the state satisfies the integral fluctuation theorem, a nonequilibrium generalization of the second law of thermodynamics. Finally, we introduce an alternative ``thermalized'' SGD procedure, designed to achieve an equilibrium stationary state. Deployed as a secondary training step, after conventional SGD has converged, thermalization is an efficient method to implement Bayesian machine learning, allowing us to estimate the posterior distribution of network predictions.


Full post...

December 12, 2020

CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions

Tayfun Ates, Muhammed Samil Atesoglu, Cagatay Yigit, Ilker Kesen, Mert Kobas, Erkut Erdem, Aykut Erdem, Tilbe Goksun, Deniz Yuret. Shared Visual Representations in Human and Machine Intelligence (SVRHM 2020). NeurIPS Workshop. (PDF, Presentation)

Abstract: Recent advances in Artificial Intelligence and deep learning have revived the interest in studying the gap between the reasoning capabilities of humans and machines. In this ongoing work, we introduce CRAFT, a new visual question answering dataset that requires causal reasoning about physical forces and object interactions. It contains 38K video and question pairs that are generated from 3K videos from 10 different virtual environments, containing different number of objects in motion that interact with each other. Two question categories from CRAFT include previously studied descriptive and counterfactual questions. Besides, inspired by the theory of force dynamics from the field of human cognitive psychology, we introduce new question categories that involve understanding the intentions of objects through the notions of cause, enable, and prevent. Our preliminary results demonstrate that even though these tasks are very intuitive for humans, the implemented baselines could not cope with the underlying challenges.


Full post...

September 25, 2019

Morphological analysis using a sequence decoder

Ekin Akyürek, Erenay Dayanık, Deniz Yuret (2019). Transactions Of The Association For Computational Linguistics, 7, 567-579. (PDF, arXiv)

Abstract: We introduce Morse, a recurrent encoder-decoder model that produces morphological analyses of each word in a sentence. The encoder turns the relevant information about the word and its context into a fixed size vector representation and the decoder generates the sequence of characters for the lemma followed by a sequence of individual morphological features. We show that generating morphological features individually rather than as a combined tag allows the model to handle rare or unseen tags and outperform whole-tag models. In addition, generating morphological features as a sequence rather than e.g. an unordered set allows our model to produce an arbitrary number of features that represent multiple inflectional groups in morphologically complex languages. We obtain state-of-the art results in nine languages of different morphological complexity under low-resource, high-resource and transfer learning settings. We also introduce TrMor2018, a new high accuracy Turkish morphology dataset. Our Morse implementation and the TrMor2018 dataset are available online to support future research.

See https://github.com/ai-ku/Morse.jl for a Morse implementation in Julia/Knet and https://github.com/ai-ku/TrMor2018 for the new Turkish dataset.


Full post...

September 09, 2019

Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-context Setting

Hürriyetoğlu, Ali and Yörük, Erdem and Yuret, Deniz and Yoltar, C. and Gürel, B. and Duruşan, F. and Mutlu, O. and Akdemir, A. In CLEF 2019 Working Notes. September, 2019. (PDF, Proceedings).

Abstract: We present an overview of the CLEF-2019 Lab ProtestNews on Extracting Protests from News in the context of generalizable natural language processing. The lab consists of document, sentence, and token level information classification and extraction tasks that were referred as task 1, task 2, and task 3 respectively in the scope of this lab. The tasks required the participants to identify protest relevant information from English local news at one or more aforementioned levels in a cross-context setting, which is cross-country in the scope of this lab. The training and development data were collected from India and test data was collected from India and China. The lab attracted 58 teams to participate in the lab. 12 and 9 of these teams submitted results and working notes respectively. We have observed neural networks yield the best results and the performance drops significantly for majority of the submissions in the cross-country setting, which is China.


Full post...

August 01, 2019

KU_AI at MEDIQA 2019: Domain-specific Pre-training and Transfer Learning for Medical NLI

Cengiz, Cemil and Sert, Ulaş and Yuret, Deniz. In Proceedings of the 18th BioNLP Workshop and Shared Task. August, 2019. Florence, Italy. (PDF)

Abstract: In this paper, we describe our system and results submitted for the Natural Language Inference (NLI) track of the MEDIQA 2019 Shared Task. As KU_AI team, we used BERT as our baseline model and pre-processed the MedNLI dataset to mitigate the negative impact of de-identification artifacts. Moreover, we investigated different pre-training and transfer learning approaches to improve the performance. We show that pre-training the language model on rich biomedical corpora has a significant effect in teaching the model domain-specific language. In addition, training the model on large NLI datasets such as MultiNLI and SNLI helps in learning task-specific reasoning. Finally, we ensembled our highest-performing models, and achieved 84.7% accuracy on the unseen test dataset and ranked 10th out of 17 teams in the official results.


Full post...

June 06, 2019

Team Howard Beale at SemEval-2019 Task 4: Hyperpartisan News Detection with BERT

Osman Mutlu, Ozan Arkan Can and Erenay Dayanık. 2019. In International Workshop on Semantic Evaluation (SemEval-2019 at NAACL-HLT-2019). (paper, proceedings)

Abstract: This paper describes our system for SemEval-2019 Task 4: Hyperpartisan News Detection (Kiesel et al., 2019). We use pretrained BERT (Devlin et al., 2018) architecture and investigate the effect of different fine tuning regimes on the final classification task. We show that additional pretraining on news domain improves the performance on the Hyperpartisan News Detection task. Our system ranked 8th out of 42 teams with 78.3% accuracy on the held-out test dataset.


Full post...

Learning from Implicit Information in Natural Language Instructions for Robotic Manipulations

Ozan Arkan Can, Pedro Zuidberg Dos Martires , Andreas Persson , Julian Gaal , Amy Loutfi , Luc De Raedt , Deniz Yuret and Alessandro Saffiotti. 2019. In Proceedings of the Combined Workshop on Spatial Language Understanding (SpLU) & Grounded Communication for Robotics (RoboNLP) at NAACL-HLT-2019. (abstract, paper, poster, proceedings)

Abstract: Human-robot interaction often occurs in the form of instructions given from a human to a robot. For a robot to successfully follow instructions, a common representation of the world and objects in it should be shared between humans and the robot so that the instructions can be grounded. Achieving this representation can be done via learning, where both the world representation and the language grounding are learned simultaneously. However, in robotics this can be a difficult task due to the cost and scarcity of data. In this paper, we tackle the problem by separately learning the world representation of the robot and the language grounding. While this approach can address the challenges in getting sufficient data, it may give rise to inconsistencies between both learned components. Therefore, we further propose Bayesian learning to resolve such inconsistencies between the natural language grounding and a robot’s world representation by exploiting spatio-relational information that is implicitly present in instructions given by a human. Moreover, we demonstrate the feasibility of our approach on a scenario involving a robotic arm in the physical world.


Full post...

April 07, 2019

A Task Set Proposal for Automatic Event Information Collection across Multiple Countries

Ali Hürriyetoğlu, Erdem Yörük, Deniz Yuret, Çağrı Yoltar, Burak Güler, Fırat Duruşan and Osman Mutlu. 2019. In ECIR (CLEF Organizers Lab Track), Germany. (paper, proceedings)

Abstract: We propose a coherent set of tasks for protest information collection in the context of generalizable natural language processing. The tasks are news article classification, event sentence detection, and event extraction. Having tools for collecting event information from data produced in multiple countries enables comparative sociology and politics studies. We have annotated news articles in English from a source and a target country in order to be able to measure the performance of the tools developed using data from one country on data from a different country. Our preliminary experiments have shown that the performance of the tools developed using English texts from India drops to a level that are not usable when they are applied on English texts from China. We think our setting addresses the challenge of building generalizable NLP tools that perform well independent of the source of the text and will accelerate progress in line of developing generalizable NLP systems.


Full post...

November 06, 2018

Towards Generalizable Place Name Recognition Systems: Analysis and Enhancement of NER Systems on English News from India

Arda Akdemir, Ali Hürriyetoglu, Erdem Yörük, Burak Gürel, Çagri Yoltar and Deniz Yuret. 2018. In Proceedings of the 12th Workshop on Geographic Information Retrieval (GIR'18). ACM. (pdf, url, proceedings)

Abstract: Place name recognition is one of the key tasks in Information Extraction. In this paper, we tackle this task in English News from India. We first analyze the results obtained by using available tools and corpora and then train our own models to obtain better results. Most of the previous work done on entity recognition for English makes use of similar corpora for both training and testing. Yet we observe that the performance drops significantly when we test the models on different datasets. For this reason, we have trained various models using combinations of several corpora. Our results show that training models using combinations of several corpora improves the relative performance of these models but still more research on this area is necessary to obtain place name recognizers that generalize to any given dataset.


Full post...

October 31, 2018

Tree-stack LSTM in Transition Based Dependency Parsing

Ömer Kırnap, Erenay Dayanık and Deniz Yuret. 2018. In Proceedings of the {CoNLL} 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. (paper, code, proceedings, 2017 version, Ömer's MS thesis)

Abstract: We introduce tree-stack LSTM to model state of a transition based parser with recurrent neural networks. Tree-stack LSTM does not use any parse tree based or hand-crafted features, yet performs better than models with these features. We also develop new set of embeddings from raw features to enhance the performance. There are 4 main components of this model: stack’s σ-LSTM, buffer’s βLSTM, actions’ LSTM and tree-RNN. All LSTMs use continuous dense feature vectors (embeddings) as an input. Tree-RNN updates these embeddings based on transitions. We show that our model improves performance with low resource languages compared with its predecessors. We participate in CoNLL 2018 UD Shared Task as the ”KParse” team and ranked 16th in LAS, 15th in BLAS and BLEX metrics, of 27 participants parsing 82 test sets from 57 languages.


Full post...

SParse: Koç University Graph-Based Parsing System for the CoNLL 2018 Shared Task

Berkay Furkan Önder, Can Gümeli and Deniz Yuret. 2018. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. (paper, code, proceedings)

Abstract: We present SParse, our Graph-Based Parsing model submitted for the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (Zeman et al., 2018). Our model extends the state-of-the-art biaffine parser (Dozat and Manning, 2016) with a structural meta-learning module, SMeta, that combines local and global label predictions. Our parser has been trained and run on Universal Dependencies datasets (Nivre et al., 2016, 2018) and has 87.48% LAS, 78.63% MLAS, 78.69% BLEX and 81.76% CLAS (Nivre and Fang, 2017) score on the Italian-ISDT dataset and has 72.78% LAS, 59.10% MLAS, 61.38% BLEX and 61.72% CLAS score on the Japanese-GSD dataset in our official submission. All other corpora are evaluated after the submission deadline, for whom we present our unofficial test results.


Full post...

July 01, 2018

Morphological Disambiguation for Turkish

Dilek Zeynep Hakkani-Tür, Murat Saraçlar, Gökhan Tür, Kemal Oflazer and Deniz Yuret. 2018. In Turkish Natural Language Processing, Kemal Oflazer and Murat Saraçlar (Eds.), Ch.3, pp.53-68. Springer. (URL)

Abstract: Morphological disambiguation is the task of determining the contextually correct morphological parses of tokens in a sentence. A morphological disambiguator takes in sets of morphological parses for each token, generated by a morphological analyzer, and then selects a morphological parse for each, considering statistical and/or linguistic contextual information. This task can be seen as a generalization of the part-of-speech (POS) tagging problem for morphologically rich languages. The disambiguated morphological analysis is usually crucial for further processing steps such as dependency parsing. In this chapter, we review morphological disambiguation problem for Turkish and discuss approaches for solving this problem as they have evolved from manually crafted constraint-based rule systems to systems employing machine learning.


Full post...

May 27, 2018

Fast multidimensional reduction and broadcast operations on GPU for machine learning

Doğa Dikbayır, Enis Berk Çoban, İlker Kesen, Deniz Yuret, Didem Unat. Concurrency and Computation: Practice and Experience. 2018. (PDF). Abstract: Reduction and broadcast operations are commonly used in machine learning algorithms for different purposes. They widely appear in the calculation of the gradient values of a loss function, which are one of the core structures of neural networks. Both operations are implemented naively in many libraries usually for scalar reduction or broadcast; however, to our knowledge, there are no optimized multidimensional implementations available. This fact limits the performance of machine learning models requiring these operations to be performed on tensors. In this work, we address the problem and propose two new strategies that extend the existing implementations to perform on tensors. We introduce formal definitions of both operations using tensor notations, investigate their mathematical properties, and exploit these properties to provide an efficient solution for each. We implement our parallel strategies and test them on a CUDA enabled Tesla K40 m GPU accelerator. Our performant implementations achieve up to 75% of the peak device memory bandwidth on different tensor sizes and dimensions. Significant speedups against the implementations available in the Knet Deep Learning framework are also achieved for both operations.
Full post...