Evangelos Kazakos (Vangelis)

PhD in Computer Science, currently a postdoctoral researcher at CIIRC @ CTU

Welcome! I’m Vangelis, a postdoctoral researcher at CIIRC, CTU Prague. My research focuses on multimodal learning and spatio-temporal understanding in images and videos. I’m part of the IMPACT group, where I work closely with the robotics team to bridge vision and robotics by developing methods useful for training VLAs and world models. I’ve also been a core contributor to several community datasets: EPIC-KITCHENS-55 and EPIC-KITCHENS-100, EPIC-SOUNDS, and more recently HowToGround1M and iGround.

Previously, I did my PhD at the University of Bristol, where my thesis on Audio-Visual Egocentric Action Recognition highlighted the essential role of audio in understanding actions captured through wearable sensors.

news

Dec 23, 2025

Our paper, REALM, a real-to-sim validated benchmark for robot manipulation, is on arXiv. [arXiv] [Project webpage] [Code]

Dec 16, 2025

I’m co-organising the first AI for Peace workshop @ ICLR 2026. [Workshop webpage]

Sep 2, 2025

We’ve released grove-transformers, a lightweight, inference-only interface for our GROVE model, implemented with 🤗 Transformers. [🤗 link] [GitHub link]

Aug 21, 2025

We’ve released datasets (HowToGround1M and iGround), checkpoints and code for our work “Large-scale Pre-training for Grounded Video Caption Generation”. [Project webpage] [Code, checkpoints, data]

Jul 22, 2025

I’ve been invited to present our work “Large-scale Pre-training for Grounded Video Caption Generation” at the EuroHPC User Days event on the Artificial Intelligence for Science parallel session, scheduled for the 1 October 2025 at 11:30. [Event link]

Jun 26, 2025

Our work “Large-scale Pre-training for Grounded Video Caption Generation” has been accepted to ICCV 2025!

May 15, 2025

I was nominated as Outstanding Reviewer for CVPR 2025. [link]

Apr 18, 2025

I presented our work “Large-scale Pre-training for Grounded Video Caption Generation” at the weekly webinar of TwelveLabs. [YouTube link]

Mar 13, 2025

Our paper “Large-scale Pre-training for Grounded Video Caption Generation” is now on arXiv. [arXiv] [Project webpage] [Code] (available soon, stay tuned!)

Feb 28, 2025

I received the 2024 IJCV Outstanding Reviewer Award. Announcement

Nov 1, 2024

I started a new role as a Postdoctoral Researcher at Czech Institute of Informatics, Robotics and Cybernetics (CIIRC) at CTU in Prague. My research will focus on multimodal understanding using video and language

Feb 27, 2024

Our paper with title “TIM: A Time Interval Machine for Audio-Visual Action Recognition” has been accepted at CVPR 2024 [paper] [project page]

Jan 18, 2024

Our paper with title “Graph Guided Question Answer Generation for Procedural Question-Answering” has been accepted at EACL 2024 [paper]

Feb 17, 2023

Our paper with title “Epic-sounds: A large-scale dataset of actions that sound” has been accepted at ICASSP 2023 [paper] [project page]

May 30, 2022

I joined Samsung AI Center in Cambridge as a Research Scientist

Apr 27, 2022

I successfully defended my PhD dissertation with title “Audio-Visual Egocentric Action Recognition[link]

publications

datasets

EPIC-KITCHENS-100

IJCV 2022

Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100

100 hours · 20M frames · 90K actions · 700 videos · 45 environments

EPIC-SOUNDS

TPAMI 2025

EPIC-SOUNDS: A Large-Scale Dataset of Actions That Sound

78.4K categorised audio segments · 44 classes · 39.2K non-categorised segments

HowToGround1M

ICCV 2025

Large-scale Pre-training for Grounded Video Caption Generation

Automatically annotated large-scale pre-training dataset for grounded video captioning (derived from HowTo100M)

iGround

ICCV 2025

Large-scale Pre-training for Grounded Video Caption Generation

3,513 videos · manually annotated captions · dense spatio-temporally grounded bounding boxes

EPIC-KITCHENS-55

ECCV 2018

Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

55 hours · 11.5M frames · 39.6K action segments · 454.3K object bounding boxes · 32 participants

awards

CVPR 2025
Outstanding Reviewer, CVPR 2025. link
IJCV 2024
IJCV Outstanding Reviewer Award. announcement
CVPR 2024
Distinguished Paper Award, EgoVis workshop @ CVPR 2024, for "EPIC-SOUNDS: A Large-Scale Dataset of Actions That Sound". link
CVPR 2024
Outstanding Reviewer, CVPR 2024. link
CVPR 2023
Outstanding Reviewer, CVPR 2023. link
ICASSP 2021
ICASSP 2021 Outstanding Paper Award for "Slow-Fast Auditory Streams for Audio Recognition". certificate