Instead of taking a image every 5 seconds from the video and embed it, you could detect when there are enough changes between frames to decide to embed or not. One frame, one scene, one vector.
For instance, Ffmpeg can do that with the filter `select=gt(scene,0.3)`. It selects the frames whose scene detection score is greater then 0.3 (the scene change detection score are values between 0 and 1).
For instance, Ffmpeg can do that with the filter `select=gt(scene,0.3)`. It selects the frames whose scene detection score is greater then 0.3 (the scene change detection score are values between 0 and 1).
https://ffmpeg.org//ffmpeg-filters.html#select_002c-aselect
Otherwise you’d select frames with 0.3, 0.7, 1.0, 0.7, 0.3 - selecting 5 frames instead of 1?
Two pass with sobel filter comes to mind.
which uses https://github.com/Breakthrough/PySceneDetect
under the hood i'm sure it's the same ffmpeg method ;)