A production intelligence system I designed and built from scratch -- and a blueprint for what it could look like inside your organization.
Creative agencies and production teams are drowning in media but starving for efficiency. Valuable footage remains buried because cataloging is subjective and manually exhausting.
I designed and implemented this architecture end-to-end. Growl bypasses internet bottlenecks by separating local proxy rendering from cloud-scale AI analysis.
Lightweight macOS desktop app probes RAW clips and transcodes clean H.264 review proxies locally.
0% CLOUD UPLOAD COSTOnly the small review proxies are sent to Cloudflare R2. Original RAW source files remain secure on local drives.
ULTRA-FAST TRANSFERSServerless GPU workers ingest proxies and execute heavy deep-learning computer vision and audio modeling.
100% AUTOMATEDFFmpeg scdet scene detection, black frame registers, and optical-flow camera movement analysis.
YOLOv8 & Pose object detection. Face verification, spatial coordinates, and bounding box coverage.
openai/clip-vit-base semantic vector embeddings mapped to each keyframe for similarity searches.
Gemini Vision multimodal descriptions, narrative summaries, and shot classifications.
Word-level audio transcript, speaker diarization, and on-screen text chyrons/slide OCR.
Librosa tempo, beat trackers, speaker segment RMS energy, and music/speech/silence filters.
I built a working semantic search layer that runs zero-API-cost local queries combining dialogue text and vector CLIP model matching over keyframe embeddings. Here's a live demo.
02.14s - 05.80s 94% Match
45.10s - 49.30s 89% Match
Most AI developers don't understand production workflows. I do -- as a filmmaker who has felt this pain directly. Growl is proof of concept, technical portfolio, and a system ready to be adapted for your infrastructure.
Filmmaker and AI systems builder. I built Growl to solve a problem I lived firsthand -- and I'm looking for the right organization to bring this capability to at scale.