Evangelos (Vangelis) Kazakos

PhD in Computer Science, currently a postdoctoral researcher at CIIRC @ CTU

IMG_0843.jpg

Office: B-637

Building: Czech Institute of Informatics, Robotics and Cybernetics (CIIRC @ CTU)

Address: Jugoslávských partyzánů 1580/3, 160 00

Email: evangelos [dot] kazakos [at] cvut [dot] cz

Welcome to my webpage! I’m Vangelis. I hold a PhD from the University of Bristol, where my thesis focused on Audio-Visual Egocentric Action Recognition, highlighting the essential role of audio in understanding egocentric actions captured through wearable sensors. I was also a key contributor to the EPIC-KITCHENS dataset, which has since become a cornerstone in the field of egocentric vision research.

Currently, I am a postdoctoral researcher at the Czech Institute of Informatics, Robotics, and Cybernetics (CIIRC) at CTU Prague. My research is focused on multimodal learning and architectures, with a particular emphasis on spatio-temporal grounding using video and language data.

news

Sep 02, 2025 We’ve released grove-transformers, a lightweight, inference-only interface for our GROVE model, implemented with 🤗 Transformers. [🤗 link] [GitHub link]
Aug 21, 2025 We’ve released datasets (HowToGround1M and iGround), checkpoints and code for our work “Large-scale Pre-training for Grounded Video Caption Generation”. [Project webpage] [Code, checkpoints, data]
Jul 22, 2025 I’ve been invited to present our work “Large-scale Pre-training for Grounded Video Caption Generation” at the EuroHPC User Days event on the Artificial Intelligence for Science parallel session, scheduled for the 1 October 2025 at 11:30. [Event link]
Jun 26, 2025 Our work “Large-scale Pre-training for Grounded Video Caption Generation” has been accepted to ICCV 2025!
May 15, 2025 I was nominated as Outstanding Reviewer for CVPR 2025. [link]
Apr 18, 2025 I presented our work “Large-scale Pre-training for Grounded Video Caption Generation” at the weekly webinar of TwelveLabs. [YouTube link]
Mar 13, 2025 Our paper “Large-scale Pre-training for Grounded Video Caption Generation” is now on arXiv. [arXiv] [Project webpage] [Code] (available soon, stay tuned!)
Feb 28, 2025 I received the 2024 IJCV Outstanding Reviewer Award. Announcement
Nov 01, 2024 I started a new role as a Postdoctoral Researcher at Czech Institute of Informatics, Robotics and Cybernetics (CIIRC) at CTU in Prague. My research will focus on multimodal understanding using video and language
Feb 27, 2024 Our paper with title “TIM: A Time Interval Machine for Audio-Visual Action Recognition” has been accepted at CVPR 2024 [paper] [project page]
Jan 18, 2024 Our paper with title “Graph Guided Question Answer Generation for Procedural Question-Answering” has been accepted at EACL 2024 [paper]
Feb 17, 2023 Our paper with title “Epic-sounds: A large-scale dataset of actions that sound” has been accepted at ICASSP 2023 [paper] [project page]
May 30, 2022 I joined Samsung AI Center in Cambridge as a Research Scientist
Apr 27, 2022 I successfully defended my PhD dissertation with title “Audio-Visual Egocentric Action Recognition[link]