The DCT is really neat, but the actual compression magic comes from a combination of side effects that occur after you apply it:
1. The DCT (II) packs lower frequency coefficients into the top-left corner of the block.
2. Quantization helps to zero out many higher frequency coefficients (toward bottom-right corner). This is where your information loss occurs.
3. Clever zig-zag scanning of the quantized coefficients means that you wind up with long runs of zeroes.
4. Zig-zag scanned blocks are RLE coded. This is the first form of actual compression.
5. RLE coded blocks are sent through huffman or arithmetic coding. This is the final form of actual compression (for intra-frame-only/JPEG considerations). Additional compression occurs in MPEG, et. al. with interframe techniques.
pornel 148 days ago [-]
The "actual compression magic" has been used before DCT in other codecs, but applied directly to pixels gave lousy results.
You can also look at 90's software video codecs developed when DCT was still too expensive for video. They had all kinds of approaches to quantization and entropy coding, and they all were a pixelated mess.
DCT is the key ingredient that enabled compression of photographic content.
HarHarVeryFunny 148 days ago [-]
What's so special about DCT for image compression?
The main idea of lossy image compression is throwing away file detail, which means converting to frequency domain and throwing away high frequency coefficients. Conceptually FFT would work fine for this, so use of DCT instead seems more like an optimization rather than a key component.
Sesse__ 147 days ago [-]
The DCT, just like the DFT, is a discrete version of the (analog) Fourier transform. (FFT is just a clever, fast implementation of the DFT, just like when people say “DCT” they usually mean a similarly fast implementation of the DCT. Call it FCT if you wish.) Where they differ is the assumed boundary conditions; DFT works like Fourier-transforming a signal that is repeated and then loops forever, while DCT is like Fourier-transforming a signal that is _reflected_ and then loops forever. (There is also a related transform called DST, where it assumes the signal is _inverted_ and reflected. It's occasionally useful in solving certain kinds of differential equations; only rarely so in compression.)
This matches much better with what happens when you cut out small pieces of a signal, so it gives less noise and thus better energy isolation. Say your little pixel block (let's make it 8x1 for simplicity) is a simple gradient, so it goes 1, 2, 3, 4, 5, 6, 7, 8. What do you think has the least amount of high-frequency content you'd need to deal with; 123456788765432112345678… or 123456781234567812345678…?
(There's also a DCT version that goes more like 1234567876543212345678 etc., but that's a different story)
derf_ 147 days ago [-]
A DST is optimal when the variance of the signal you are compressing grows linearly (as opposed to the DCT, which is optimal for uniform variance). That happens, for example, when you have a prediction on one side (e.g., intra prediction in image compression) and you are coding the prediction residual. The farther you get from the source of your prediction, the less accurate it is likely to be. Block-based motion compensation prediction residuals in video coding are also not uniform: the prediction error is higher at block edges than the block center, so a DST sometimes works better (e.g., when using a transform size smaller than the motion compensation block size).
So still useful for compression, just in more specialized circumstances.
Sesse__ 147 days ago [-]
Yes, I think “only rarely useful” covers it :-) I know there are some codecs that support it, e.g. VP9?
derf_ 146 days ago [-]
Probably every video coding standard published after c. 2012 has some version of it in its toolbox (including VP9).
146 days ago [-]
HarHarVeryFunny 147 days ago [-]
Thanks!
pizza 147 days ago [-]
In practice there is a difference because FFT would have more edge artifacts at block boundaries - lowering visual quality - and DCT has better energy compaction into lower frequencies meaning longer runs of zeros of higher frequency coefficients after quantization so better compression. Another plus is the DCT only needs real numbers.
disqard 147 days ago [-]
There's at least one specific optimization in H.264 (aka "MPEG-4 Part 10") to smooth out these artifacts at block boundaries, called "deblocking".
And then there's the current maintainer of libjpeg, who switched the down- and up-scaling algorithms for chroma subsampling to use DCT scaling because that's mathematically more beautiful, which does introduce some additional artifacts at the block boundaries of each upsampled chroma block.
kappi 148 days ago [-]
DCT is now replaced by Hadamard Transform which can be implemented by additions/subtractions and don't have the drift problem of DCT. HT was considered before DCT, but during that time DCT was picked because of better perceptual quality. Later during H.264 standardization, HT replaced DCT and is now used in all video codecs instead of DCT.
teleforce 147 days ago [-]
Thanks for the info, looking into Hadamard Matrices recently for wireless ECC and the fact it's being used in compression algorithm was oblivious to me.
What is interesting is that the techniques that are being used in compression, communication and signal processing in general involved orthogonality and Nasir's master and PhD research thesis were in the area of Orthogonal Transform for Digital Signal Processing.
Hadamard Transform can provide orthogonality but unlike DCT and DFT/FFT that are limited to real and complex respectively, Hadamard Transform is very versatile and can be used in real, complex, quaternion and also octonion numbering schemes that probably the latter are more suited for higher dimensions data and signal processing.
Hadamard orthogonal codes has also been used as ECC in reliable space communication in both the Mariner and Voyager missions, for examples [1].
[1] On some applications of Hadamard matrices [PDF]:
Interestingly enough, JPEG XR used a form of the Hadamard Transformation, but JPEG XL (which is newer) uses DCT and Haar transforms.
[edit]
Combined with the information from sibling comments, it seems that the Hadamard transform was something used in standards developed in the '00s but not since.
lifthrasiir 147 days ago [-]
WHT is essentially a cheap implementation of multi-dimensional DCT, so it approximates but doesn't actually replace DCT in all scenarios. It seems that DCT is a better fit for photographic contents than WHT but was more expensive until FP multiplication became much cheaper so WHT was briefly considered as an alternative.
adgjlsfhk1 148 days ago [-]
probably because multiplication got really fast
edflsafoiewq 147 days ago [-]
JPEG XL supports lossless JPEG transcode, so it has to offer a superset of the features in JPEG.
aidenn0 147 days ago [-]
Yes, but variable-blocksize DCT is definitely used in the lossy modes (not sure about lossless).
lifthrasiir 147 days ago [-]
The squeeze transform used for extra progressive decoding is effectively a modified Haar transform.
edflsafoiewq 147 days ago [-]
Yeah, JPEG uses the DCT, so JPEG XL needs it too to be a superset.
aidenn0 147 days ago [-]
Yes, but if the DCT were purely vestigial, then:
1. It wouldn't have support for DCTs not in JPEG.
2. It wouldn't use the DCT in its lossy-compression of photographic content if another transform was considered significantly better.
Perhaps one could argue that they didn't want to add extra transforms, but they do use a modified Haar transform for e.g. synthetic content and alpha channels.
Technically, AV1 also uses Hadamard in the lossless modes.
aidenn0 147 days ago [-]
Thanks for the correction, I didn't know that.
kappi 148 days ago [-]
correct, it is integer DCT. Lot of techniques adopted from the integer transform of H.264. That's what I meant, not the floating point DCT proposed in 70s.
Sesse__ 147 days ago [-]
The big change is basically that we now typically specify exactly which integer approximation to the (real-valued, “ideal”) DCT to use; this means the decoder and encoder is much likely to fall out of sync. As a bonus, this means we can use a slightly worse but much faster approximation without catastrophes happening, and possibly also make it exactly invertible.
That is true for advanced techniques, but for simple compression you can simply throw away high frequency coefficients. The simplicity of dct makes it so impressive.
kleiba 148 days ago [-]
Wikipedia writes: "Ahmed developed a practical DCT algorithm with his PhD students T. Raj Natarajan, Wills Dietrich, and Jeremy Fries, and his friend Dr. K. R. Rao at the University of Texas at Arlington in 1973." [1]
So perhaps it would fair to give due credit to the co-workers as well.
> (subtitle) His digital-compression breakthrough helped make JPEGs and MPEGs possible
Technically, the DCT isn't restricted to only digital compression. The DCT performs a matrix multiplication on a real vector, giving a real vector as output. You can perform a DCT on a finite sequence of analog values if you really wanted to, by performing a specific weighted sum of the values to yield a new sequence of analog values.
dilippkumar 148 days ago [-]
Hey! Nice to see this here.
My graduate thesis advisor was a coinventor of the DCT [0]. I miss my grad school days - he was a great advisor.
I really like the book he co-authored with P. Yip [0]. Grabbed a copy on AbeBooks a few years ago while working on a custom codec. Excellent coverage of the transform from many angles, including reference diagrams of how to implement the various transforms in software/hardware and ~200 pages worth of discussion around applications.
It was nice back then but the one thing that completely boggles my mind, after decades of keeping and backup'ing .jpg files around (family pictures), is that I can now compress these very same files to JPEG XL and deterministically get, bit for bit, the original jpg if I want. "Lossless" 20% to 30% gain (I know jpg is lossy: but JPEG XL doesn't lose additional details).
Having files around which, for twenty years, couldn't be compressed losslessly and that now suddenly can is just wild.
And even though I didn't look that much into it JPEG XL is, basically... More DCT!?
lifthrasiir 147 days ago [-]
If you transcode JPEG into JPEG XL the underlying DCT is limited to the usual 8x8 block (otherwise it'd be much harder to reconstuct the original bitstream), and all the improvement will come solely from better entropy coding and prediction.
cameldrv 148 days ago [-]
Maybe someone can chime in with a good explanation: I've never really understood why the DCT is better than the discrete Fourier transform for compression. I once read it had something to do with it not needing a window function and working better for small block/window sizes.
ack_complete 148 days ago [-]
The DCT copes better than the FFT with the piecewise blocking done by image codecs.
Consider a shallow 1D gradient, which is just a ramp. The DFT's interpretation of this as a periodic signal turns it into a sawtooth, which takes lots of high frequency components to reduce ringing and keep the edge sharp enough. The DCT is equivalent to the DFT on the mirrored signal, which instead turns this into a triangle wave, which takes less high frequency components to represent reasonably.
This varies for different types of data; my understanding is that audio codecs tend to prefer the modified DCT (MDCT) instead due to the different characteristics of audio signals.
westcoast49 147 days ago [-]
The cosine function itself is symmetrical around the Y-axis. Supposedly, the main advantage of the DCT is that it's better at representing symmetrical features in data sets, which is a good fit for both image and audio data.
trhway 148 days ago [-]
The first layer of the visual cortex (and what the input layers convolutional kernels in visual NN converge to) are those Gabor kernels - cosine multiplied by exponentially decreasing amplitude thus de-facto limiting the spatial attention of the given neuron to a spot.
andai 148 days ago [-]
This man is the reason I want to study electrical engineering.
Is that appropriate? Would another discipline give me a better grounding in not just these techniques, but the mental foundations that made their discovery possible?
stagger87 147 days ago [-]
EE and CS are both going to be where the "rubber meets the road", or the application of these concepts, especially at the BS/MS level. Specifically in classes covering things like communication codecs, video/image processing, signal processing, and compression. If you're interested more in the foundations of these ideas, you really need to look more towards pure math. For instance, the beginning of every coding book I own starts with a review of abstract algebra, and lot of signal processing ideas are built on top of complex analysis.
andai 145 days ago [-]
Thanks. That's very encouraging, I'm about to begin a bachelor's in pure math!
Could you recommend some of the books you mentioned?
proteal 147 days ago [-]
You wouldn’t go wrong with electrical engineering if this is the stuff you like. However, I think most engineering and engineering-adjacent disciplines (basically STEM) will give you a similar set of tools to approach any problem. If what youre really after is the pioneering aspects of his work, consider a double degree in business/engineering. The problems businesses face are really just engineering problems in disguise. Since most people who have the desire and capability to be an engineer become engineers instead of businesspeople, there’s a dearth of engineering talent in most non-engineer roles. In my last role at a Fortune 500, my nickname was “The Wizard” because I was so good at translating business needs to computer workflows it seemed like magic to my coworkers. When I’d regale my successes to my engineer friends they’d just laugh. At my org, I was 1 of 1 who could solve these problems. At their firms, my friends were on teams of 20+ who could all do what I did in their sleep. They worked in a more competitive domain where magic was an every day occurrence, so their work product felt lackluster when compared to their peers.
diroussel 147 days ago [-]
Most engineering (certainly that I know) is based on linear algebra. But of course there is a lot to learn in that field, but covering the basics can help understand a lot of engineering maths.
Equations, differentiation, integration, partial equations, complex numbers, matrices, eigen vectors, correlation, Fourier transform, laplace transform. And probably others.
max_ 148 days ago [-]
One thing I recommend people to do is study compression algorithms like Jpeg.
I find the relationship between compression algos & cognitive science very interesting.
drunkspider 148 days ago [-]
What's the relationship between compression algorithms and cognitive science?
"Lossless" compression is based on information that can be discarded without negative consequences because it cannot be perceived by humans. The data is real and there, you just can't see it or hear it. If you can quantify what information humans can't perceive, you can discard it, leaving less data and possibly more amenable data for a subsequent lossless compression phase. MP3, JPEG, MPEG all benefit from this understanding of the human perceptual system.
hnlmorg 148 days ago [-]
You have it backwards there. You’re describing lossy compression.
Lossless is formats like Flac and zip. Lossless compression basically stores the same data in more efficient (from a file size perspective) states rather than discarding stuff that isn’t perceived.
The clue is in the name of the term: “lossy” means you lose data. “Lossless” means you don’t lose data. So if a zip file was lossy, you’d never be able to decompress it. Whereas you cannot restore data you’ve lost from an MP3.
omneity 148 days ago [-]
You're talking about lossy compression. Specifically perceptual lossy compression[0].
Lossless compression is entirely reversible. Nothing is lost and nothing is discarded, perceived or not, like zip.
>"Only a few image-compression standards not using DCT exist today.”
I am only aware of JPEG, actually. Can anyone help me? PNG uses deflate and not DCT, TIFF supports sort of everything (JPEG is uncommon but possible) but generally no DCT is used, GIF uses some RLE but also not with a DCT, J2K also does not use a DCT, EXR can use Wavelet as well but no DCT I'm aware of.
Retr0id 147 days ago [-]
Most of the video-codec-derived formats (HEIC, HEIF) use DCT. Webp also uses DCT (and arguably webp is video-codec-derived too).
JPEG-XL also uses DCT.
As an aside, while PNG uses deflate for entropy coding, the conceptual analog to DCT in the context of PNG would be its row filters. JPEG's entropy coding isn't all that different to PNG's (aside from the arithmetic coding option which isn't widely used).
fp64 147 days ago [-]
Yeah, I thought about the video codecs but I wasn't too sure about even the majority there. But here, the quote might be at least much closer to reality.
And yes, I am sorry mixing DCT with entropy coding, I've noticed already during writing my comment, but didn't find a better way, and I see you understood what I meant.
selimthegrim 148 days ago [-]
His Signals and Systems book is still around and not bad either.
146 days ago [-]
146 days ago [-]
pajeetwarez 148 days ago [-]
[flagged]
duped 148 days ago [-]
This is why it's important to pay attention in linear algebra class as a CS undergrad!
laidoffamazon 148 days ago [-]
Extremely impressive, done while doing research at Kansas State University with a PhD from the University of New Mexico. I don't know if any new major advancements have come from people from state schools today.
mkoubaa 148 days ago [-]
Is this sarcastic?
laidoffamazon 147 days ago [-]
No, I'm aware of how people think about people that don't go to top schools.
147 days ago [-]
sgerenser 147 days ago [-]
Yeah, nothing but losers from Berkeley, UMich, UW, etc. /s
1. The DCT (II) packs lower frequency coefficients into the top-left corner of the block.
2. Quantization helps to zero out many higher frequency coefficients (toward bottom-right corner). This is where your information loss occurs.
3. Clever zig-zag scanning of the quantized coefficients means that you wind up with long runs of zeroes.
4. Zig-zag scanned blocks are RLE coded. This is the first form of actual compression.
5. RLE coded blocks are sent through huffman or arithmetic coding. This is the final form of actual compression (for intra-frame-only/JPEG considerations). Additional compression occurs in MPEG, et. al. with interframe techniques.
You can also look at 90's software video codecs developed when DCT was still too expensive for video. They had all kinds of approaches to quantization and entropy coding, and they all were a pixelated mess.
DCT is the key ingredient that enabled compression of photographic content.
The main idea of lossy image compression is throwing away file detail, which means converting to frequency domain and throwing away high frequency coefficients. Conceptually FFT would work fine for this, so use of DCT instead seems more like an optimization rather than a key component.
This matches much better with what happens when you cut out small pieces of a signal, so it gives less noise and thus better energy isolation. Say your little pixel block (let's make it 8x1 for simplicity) is a simple gradient, so it goes 1, 2, 3, 4, 5, 6, 7, 8. What do you think has the least amount of high-frequency content you'd need to deal with; 123456788765432112345678… or 123456781234567812345678…?
(There's also a DCT version that goes more like 1234567876543212345678 etc., but that's a different story)
So still useful for compression, just in more specialized circumstances.
https://en.wikipedia.org/wiki/Deblocking_filter
What is interesting is that the techniques that are being used in compression, communication and signal processing in general involved orthogonality and Nasir's master and PhD research thesis were in the area of Orthogonal Transform for Digital Signal Processing.
Hadamard Transform can provide orthogonality but unlike DCT and DFT/FFT that are limited to real and complex respectively, Hadamard Transform is very versatile and can be used in real, complex, quaternion and also octonion numbering schemes that probably the latter are more suited for higher dimensions data and signal processing.
Hadamard orthogonal codes has also been used as ECC in reliable space communication in both the Mariner and Voyager missions, for examples [1].
[1] On some applications of Hadamard matrices [PDF]:
https://documents.uow.edu.au/~jennie/WEB/WEB05-10/2005_12.pd...
[edit]
Combined with the information from sibling comments, it seems that the Hadamard transform was something used in standards developed in the '00s but not since.
1. It wouldn't have support for DCTs not in JPEG.
2. It wouldn't use the DCT in its lossy-compression of photographic content if another transform was considered significantly better.
Perhaps one could argue that they didn't want to add extra transforms, but they do use a modified Haar transform for e.g. synthetic content and alpha channels.
X265/HEVC https://en.m.wikipedia.org/wiki/High_Efficiency_Video_Coding
Also not true for X266/VVC.
https://fgiesen.wordpress.com/2013/11/04/bink-2-2-integer-dc... has a ton of technical information if you want to dive into it.
So perhaps it would fair to give due credit to the co-workers as well.
[1] https://en.wikipedia.org/wiki/Discrete_cosine_transform
> (subtitle) His digital-compression breakthrough helped make JPEGs and MPEGs possible
Technically, the DCT isn't restricted to only digital compression. The DCT performs a matrix multiplication on a real vector, giving a real vector as output. You can perform a DCT on a finite sequence of analog values if you really wanted to, by performing a specific weighted sum of the values to yield a new sequence of analog values.
My graduate thesis advisor was a coinventor of the DCT [0]. I miss my grad school days - he was a great advisor.
[0]. https://en.wikipedia.org/wiki/K._R._Rao
[0]: https://dl.acm.org/doi/10.5555/96810
Having files around which, for twenty years, couldn't be compressed losslessly and that now suddenly can is just wild.
And even though I didn't look that much into it JPEG XL is, basically... More DCT!?
Consider a shallow 1D gradient, which is just a ramp. The DFT's interpretation of this as a periodic signal turns it into a sawtooth, which takes lots of high frequency components to reduce ringing and keep the edge sharp enough. The DCT is equivalent to the DFT on the mirrored signal, which instead turns this into a triangle wave, which takes less high frequency components to represent reasonably.
This varies for different types of data; my understanding is that audio codecs tend to prefer the modified DCT (MDCT) instead due to the different characteristics of audio signals.
Is that appropriate? Would another discipline give me a better grounding in not just these techniques, but the mental foundations that made their discovery possible?
Could you recommend some of the books you mentioned?
Equations, differentiation, integration, partial equations, complex numbers, matrices, eigen vectors, correlation, Fourier transform, laplace transform. And probably others.
I find the relationship between compression algos & cognitive science very interesting.
But this is an example https://archive.is/KShWY#9
Lossless is formats like Flac and zip. Lossless compression basically stores the same data in more efficient (from a file size perspective) states rather than discarding stuff that isn’t perceived.
The clue is in the name of the term: “lossy” means you lose data. “Lossless” means you don’t lose data. So if a zip file was lossy, you’d never be able to decompress it. Whereas you cannot restore data you’ve lost from an MP3.
Lossless compression is entirely reversible. Nothing is lost and nothing is discarded, perceived or not, like zip.
0: https://arxiv.org/abs/2106.02782
I am only aware of JPEG, actually. Can anyone help me? PNG uses deflate and not DCT, TIFF supports sort of everything (JPEG is uncommon but possible) but generally no DCT is used, GIF uses some RLE but also not with a DCT, J2K also does not use a DCT, EXR can use Wavelet as well but no DCT I'm aware of.
JPEG-XL also uses DCT.
As an aside, while PNG uses deflate for entropy coding, the conceptual analog to DCT in the context of PNG would be its row filters. JPEG's entropy coding isn't all that different to PNG's (aside from the arithmetic coding option which isn't widely used).
And yes, I am sorry mixing DCT with entropy coding, I've noticed already during writing my comment, but didn't find a better way, and I see you understood what I meant.