Talk

Autonomous Video Hunter: AI Agents for Real-Time OSINT

Recon Village @ DEF CON 33 • 8th, 9th and 10th August 2025

Abstract

Imagine discovering critical intelligence hidden inside live video streams faster than any human analyst could. We'll begin with a compelling hypothetical scenario: a breaking news livestream unintentionally captures crucial clues about a missing person's location, but overwhelmed human investigators miss the moment. Inspired by real world challenges investigators face daily, this scenario motivated us to build Autonomous Video Hunter (AVH), a system of AI powered agents that scour video content in real time to extract actionable OSINT.

Technical core:

We'll showcase how AVH combines open source AI models for image recognition and audio transcription, orchestrated by custom Python based agents. These agents autonomously analyze video streams, detect critical visuals, logos, speech keywords, and quickly cross reference these clues against online databases and OSINT repositories.

Live demo:

Experience AVH live as it identifies a target logo and relevant context (e.g., social media profiles and geolocation clues) from a random video clip in mere seconds. We'll also address practical challenges, from reducing false positives to scaling efficiently across multiple simultaneous streams.

By the end of this lightning talk, attendees will understand how autonomous agents transform overwhelming video data into OSINT insights rapidly and effectively. We'll also share a lightweight open source AVH tool for the OSINT community to use and build upon.

Speaker

Kevin Dela Rosa

Co-founder & CTO, Cloudglue

Kevin Dela Rosa is the CTO of Cloudglue (formerly Aviary Inc), building AI video understanding platforms that transform audiovisual content into structured data for LLM and agentic retrieval use cases. With 14+ years in multimodal AI, he previously led engineering teams at Snapchat developing billion-scale visual search systems and generative AI products. His work has been featured at technical conferences including CVPR, NeurIPS, AAAI, ISMIR, AWS re:Invent, KubeCon, and cultural and entertainment venues ranging from Cannes and Art Basel to the Super Bowl and The Late Late Show. At Cloudglue, he leads research and development of technologies enabling AI systems to comprehend complex audiovisual content, focusing on creating systems that allow AI agents to see, hear, and understand the visual world at scale

View full speaker profile →