A Video Compression Scheme using Audio Frequency Analyses,
or Voice Recognition.
Nathan Hemenway
Abstract
Various methods and pattern matching techniques now exist
where by physical, psycho-acoustic events can be detected
from any given audio source.
These methods might be Blind Source Separation, Multivariate
Mixture of Gaussians, Hidden Markov Modeling, or perhaps
simple density analyses or volume level changing. A plethora
of digital signal processing techniques exist.
This paper describes a A Video Compression Scheme using
Audio Frequency Analyses, or Voice Recognition as a first layer
in an algorithm that achieves demonstrated improved psycho-acoustic
performance for a video compression scheme.
Background
Video compression algorithms historically have used
sophisticated analyses methods to reduce the bit rate
and throughput of each frame of video. The reason is
simple, reduce the amount of data transfer per frame,
and you reduce the total amount of overall size of data
transferred. Also the per frame overhead of rendering
each frame is reduced. The result: an efficient mechanism
for delivering video over media that have bandwidth issues.
These techniques almost always rely on complex image
analyses where by the video frame is reduced to the lowest
denominator of color, motion, and detail.
Given any video segment, certain frames are chosen which
are used in these analyses. Too often these frames, or key frames
are chosen at constant distances apart, or at moments
in the sequence where motion, or change is detected.
Too often, key or crucial moments where audio events
are happening, there is no video key frame selected.
This results often in artifacts we recognize as where
video and audio are not synching, or are perceived
as mismatched.