news

Sep 02, 2025 We’ve released grove-transformers, a lightweight, inference-only interface for our GROVE model, implemented with 🤗 Transformers. [🤗 link] [GitHub link]
Aug 21, 2025 We’ve released datasets (HowToGround1M and iGround), checkpoints and code for our work “Large-scale Pre-training for Grounded Video Caption Generation”. [Project webpage] [Code, checkpoints, data]
Jul 22, 2025 I’ve been invited to present our work “Large-scale Pre-training for Grounded Video Caption Generation” at the EuroHPC User Days event on the Artificial Intelligence for Science parallel session, scheduled for the 1 October 2025 at 11:30. [Event link]
Jun 26, 2025 Our work “Large-scale Pre-training for Grounded Video Caption Generation” has been accepted to ICCV 2025!
May 15, 2025 I was nominated as Outstanding Reviewer for CVPR 2025. [link]
Apr 18, 2025 I presented our work “Large-scale Pre-training for Grounded Video Caption Generation” at the weekly webinar of TwelveLabs. [YouTube link]
Mar 13, 2025 Our paper “Large-scale Pre-training for Grounded Video Caption Generation” is now on arXiv. [arXiv] [Project webpage] [Code] (available soon, stay tuned!)
Feb 28, 2025 I received the 2024 IJCV Outstanding Reviewer Award. Announcement
Nov 01, 2024 I started a new role as a Postdoctoral Researcher at Czech Institute of Informatics, Robotics and Cybernetics (CIIRC) at CTU in Prague. My research will focus on multimodal understanding using video and language
Feb 27, 2024 Our paper with title “TIM: A Time Interval Machine for Audio-Visual Action Recognition” has been accepted at CVPR 2024 [paper] [project page]
Jan 18, 2024 Our paper with title “Graph Guided Question Answer Generation for Procedural Question-Answering” has been accepted at EACL 2024 [paper]
Feb 17, 2023 Our paper with title “Epic-sounds: A large-scale dataset of actions that sound” has been accepted at ICASSP 2023 [paper] [project page]
May 30, 2022 I joined Samsung AI Center in Cambridge as a Research Scientist
Apr 27, 2022 I successfully defended my PhD dissertation with title “Audio-Visual Egocentric Action Recognition[link]