Evangelos (Vangelis) Kazakos
PhD in Computer Science, currently a postdoctoral researcher at CIIRC @ CTU

Welcome to my webpage! I’m Vangelis. I hold a PhD from the University of Bristol, where my thesis focused on Audio-Visual Egocentric Action Recognition, highlighting the essential role of audio in understanding egocentric actions captured through wearable sensors. I was also a key contributor to the EPIC-KITCHENS dataset, which has since become a cornerstone in the field of egocentric vision research.
Currently, I am a postdoctoral researcher at the Czech Institute of Informatics, Robotics, and Cybernetics (CIIRC) at CTU Prague. My research is focused on multimodal learning and architectures, with a particular emphasis on spatio-temporal grounding using video and language data.
news
Apr 18, 2025 | I presented our work “Large-scale Pre-training for Grounded Video Caption Generation” at the weekly webinar of TwelveLabs. [YouTube link] |
---|---|
Mar 13, 2025 | Our paper “Large-scale Pre-training for Grounded Video Caption Generation” is now on arXiv. [arXiv] [Project webpage] [Code] (available soon, stay tuned!) |
Feb 28, 2025 | I received the 2024 IJCV Outstanding Reviewer Award. Announcement |
Nov 01, 2024 | I started a new role as a Postdoctoral Researcher at Czech Institute of Informatics, Robotics and Cybernetics (CIIRC) at CTU in Prague. My research will focus on multimodal understanding using video and language |
Feb 27, 2024 | Our paper with title “TIM: A Time Interval Machine for Audio-Visual Action Recognition” has been accepted at CVPR 2024 [paper] [project page] |
Jan 18, 2024 | Our paper with title “Graph Guided Question Answer Generation for Procedural Question-Answering” has been accepted at EACL 2024 [paper] |
Feb 17, 2023 | Our paper with title “Epic-sounds: A large-scale dataset of actions that sound” has been accepted at ICASSP 2023 [paper] [project page] |
May 30, 2022 | I joined Samsung AI Center in Cambridge as a Research Scientist |
Apr 27, 2022 | I successfully defended my PhD dissertation with title “Audio-Visual Egocentric Action Recognition” [link] |