Overview
- A new arXiv paper from University of Washington and Google Research treats the passage of time as a learnable visual concept in video.
- The authors train models to detect speed changes and estimate playback multipliers using cues like audio pitch and motion patterns.
- Those estimators are used to mine what the team describes as the largest slow-motion dataset assembled from web video.
- Built on that data, the work includes speed-conditioned video generation and a temporal super-resolution model that turns low‑FPS, blurry clips into smoother high‑FPS sequences.
- Community summaries highlight limits, including weaker results on silent or static footage, hallucinated details in synthesized frames, high diffusion compute costs, and no documented public releases or peer review yet.