FLAC v1.4.x Performance Tests

Topic: FLAC v1.4.x Performance Tests (Read 73353 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Re: FLAC v1.4.x Performance Tests

Reply #200 – 2022-10-22 14:35:17

flac 1.4.2 arrived. Here my attempt with GCC 12.2.0 and the same flags as before. One build with disable-asm-optimizations and faster 16bit encoding for use in CUETools for example.
btw. configure spits out "unrecognized options: --enable-sse" Is it meant to be that way and the compiler decides? In that case different CPU versions could benefit more from the compiler choices?

Re: FLAC v1.4.x Performance Tests

Reply #201 – 2022-10-22 15:46:05

Quote from: Wombat on 2022-10-22 14:35:17

btw. configure spits out "unrecognized options: --enable-sse" Is it meant to be that way and the compiler decides? In that case different CPU versions could benefit more from the compiler choices?

--enable-sse was a misnomer. It was actually 'force sse2'. This option has been removed. See here and the changelog.

It didn't do anything for 64-bit compiles anyway, only for 32-bit compiles.

Re: FLAC v1.4.x Performance Tests

Reply #202 – 2022-10-22 15:49:58

Thanks. Now Clang 15.0.3 (MSYS2) creates faster binaries for the first time. Attached some.

Re: FLAC v1.4.x Performance Tests

Reply #203 – 2022-10-22 16:32:24

Looks like -q is quite predictable for high bitrate CDDA transcodes. Here are some files from different lossless formats (ape, flac, tak, tta, wv) transcoded to flac, sorted by their original bitrates.

1000-1177kbps compressed, 119 files, 12h55m50s

-8 -q8
6,305,740,006 bytes

-8 -q9
6,305,212,370 bytes

-8 -q10
6,305,587,831 bytes

-8
6,307,265,332 bytes

950-999kbps compressed, 142 files, 10h8m32s

-8 -q9
4,458,162,567 bytes

-8 -q10
4,457,902,583 bytes

-8 -q11
4,458,321,772 bytes

-8
4,459,051,184 bytes

Lower bitrate files are much harder to predict, -p will make more sense.

Re: FLAC v1.4.x Performance Tests

Reply #204 – 2022-10-22 18:15:57

Quote from: bennetng on 2022-10-21 16:34:33

Quote from: bennetng on 2022-10-21 16:07:52
Deathblow with -p

24/48, 8 wav files, multi-thread, -8p

flac1021wombat-noasm
Total encoding time: 1:12.704, 176.12x realtime

flac1021wombat
Total encoding time: 0:59.812, 214.08x realtime

Same files but 16/48

flac1021wombat-noasm
Total encoding time: 0:29.735, 430.62x realtime

flac1021wombat
Total encoding time: 0:29.656, 431.77x realtime
flac1021znver2john33
24/48
Total encoding time: 1:03.859, 200.51x realtime
16/48
Total encoding time: 0:29.985, 427.03x realtime

My i3-12100 must be a remarked Ryzen

flac 1.4.1 Case GCC 12.2.0
24/48: 1:02.718, 204.16x realtime
16/48: 0:32.359, 395.70x realtime

flac 1.4.2 Wombat Clang 15.0.3
24/48: 1:01.844, 207.04x realtime
16/48: 0:30.000, 426.82x realtime

Re: FLAC v1.4.x Performance Tests

Reply #205 – 2022-10-22 18:51:21

Nice. For 16-44.1 GCC 12.2.0 and disable asm is the fastest. Clang does bad with it disabled. Will be interesting how fast Case and his clean enviroment does. I depend on MSYS2.

Re: FLAC v1.4.x Performance Tests

Reply #206 – 2022-10-23 02:40:54

Pic favorites out of packages

Ryzen 5900x
-8 -p single file 24-96
-8 -p single file 16-44.1
metaflac Replaygain 18,6GB Hibitrate files

Clang
28.75x realtime
112.03x realtime
2:20 minutes

GCC
27.97x realtime
106.54x realtime
2:09 minutes

GCC disable-asm-optimizations
21.48x realtime
132.45x realtime
2:09 minutes

Re: FLAC v1.4.x Performance Tests

Reply #207 – 2022-10-23 12:13:26

Quote from: bennetng on 2022-10-22 16:32:24

Looks like -q is quite predictable for high bitrate CDDA transcodes. Here are some files from different lossless formats (ape, flac, tak, tta, wv) transcoded to flac, sorted by their original bitrates.
950-999kbps compressed, 142 files, 10h8m32s

-8 -q9
4,458,162,567 bytes

-8 -q10
4,457,902,583 bytes

-8 -q11
4,458,321,772 bytes

-8
4,459,051,184 bytes

Lower bitrate files are much harder to predict, -p will make more sense.

All flac 1.4.2, multi-thread. They all have same file sizes.

Upper: -8p
4455579306 bytes

Lower: -8 -q10 -b2880
4456433345 bytes

Wombat GCC 12.2.0
Total encoding time: 1:19.875, 457.10x realtime
Total encoding time: 0:28.890, 1263.81x realtime

Wombat GCC 12.2.0 noasm
Total encoding time: 1:23.219, 438.74x realtime
Total encoding time: 0:29.890, 1221.53x realtime

Wombat Clang 15.0.3
Total encoding time: 1:24.375, 432.72x realtime
Total encoding time: 0:29.797, 1225.34x realtime

Xiph
Total encoding time: 1:26.812, 420.58x realtime
Total encoding time: 0:31.203, 1170.13x realtime

Finally different from a real Ryzen.

Re: FLAC v1.4.x Performance Tests

Reply #208 – 2022-10-23 12:53:19

24-bit, 88.2-352.8kHz, 14 files, multi-thread, -8p

Wombat GCC 12.2.0
Total encoding time: 1:40.266, 40.59x realtime
1916812600 bytes

Wombat GCC 12.2.0 noasm
Total encoding time: 1:54.453, 35.56x realtime
1916812600 bytes

Wombat Clang 15.0.3
Total encoding time: 1:39.859, 40.76x realtime
1916812631 bytes

Xiph
Total encoding time: 1:41.250, 40.20x realtime
1916812621 bytes

Re: FLAC v1.4.x Performance Tests

Reply #209 – 2022-10-23 14:07:37

I'll have a go too.

Here are the results for testing the 'Xiph' Win64 binary

Re: FLAC v1.4.x Performance Tests

Reply #210 – 2022-10-23 19:20:41

High resolution coming up. Prepare to be impressed if you haven't already seen what 1.4.x can do to high resolution.

* No classical music in this corpus - that behaves different (much smaller benefits from going above -7), so this is arguably a bit 1.4-friendly
* All stereo. Vast majority 96/24; a little bit of it is 88.2, and one track is 96/16.
* 178 files. Nearly all my non-classical high resolution stereo downloads. (DSD test files excluded, but who the hell uses those for anything but WavPack worship?)
* File sizes with tags removed. Which I forgot to do about the FLACs, so I removed afterwards, saw a decrease of 22 670 684, and adjusted. I think this means padding is removed too. Maybe a bit unfair to the APEv2 tagged formats, which don't need padding, but more interesting to discussing the codecs themselves. (But, I used MD5 ... as if that matters much.)
* Everything that isn't stated as a different codec or as 1.3, are 1.4.1 or 1.4.2

64.286%	ALAC (refalac 1.75)
63.589%	FLAC 1.3 at -5
62.647%	TAK -p0
62.058%	FLAC 1.3 at -8e
61.799%	-3 comfortably beats old -8pe too, not only -8e
61.477%	TTA
60.746%	Monkey's Normal
60.745%	Monkey's High
60.798%	WavPack -hx
59.920%	-5
59.868%	Monkey's Insane
59.742%	Monkey's Extra High
59.571%	TAK -p1
59.313%	MPEG-4 ALS at default
58.979%	-7
59.047%	TAK -p2
58.673%	-8e is faster and compresses better than -8p. but there are better options
58.649%	avoid this -8pe. it takes ten times -8e
58.560%	-7r7 -A "subdivide_tukey(6)"
58.525%	-7r7 -A "subdivide_tukey(6)" -l 13
58.525%	-7r7 -A "subdivide_tukey(6)" -l 14 (yes slightly smaller than -l 13)
58.489%	-7r7 -A "subdivide_tukey(6)" -l 15
58.477%	about -8e speed: -7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" -l 14
58.475%	-7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" -l 14 -b 8192
58.454%	-7r7 -A "subdivide_tukey(7);tukey(7e-2)" -l 16 -b 8192
58.059%	WavPack -hx4
57.879%	TAK -p3
57.753%	TAK -p4m
57.412%	OptimFROG --preset 2 (default setting)

.
Evidence from this and some other non-rigorous testing on a part of these 178 files:
* This is damn good, although it's not gonna touch TAK nor WavPack -hx4 (where for the WavPack I tried only those two settings, -hx and -hx4 this time).
Actually, if I take the high-rez part of ktf's comparison and "manually" imagine a FLAC improvement like this, it doesn't seem to beat TAK -p1, so this is likely "more FLAC-friendly material" - relatively at least.
* That "all the sevens" thing: -7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" - with or without -l 14 (which is a "seven" good enough to remember) - were at first arbitrarily chosen to see "what does it take to get around -8e time". As you see, it is much better ... urhmh ... to the extent anything around there is "much".
* -8e beats -8p at both time and size. Maybe surprising - to those who haven't already tested it. ( @bennetng , you just tested hi-rez -8p: how dows it work with your material?)
* -l 13 to -l 15 have something to them, but careful: It does not seem to be the case directly off -7 or -8. Say -8 -l 13 is not good, but -8 -A [something slow] -l 13 is. A bit of testing indicates that -l 13 starts saving space at -A subdivide_tukey(5) and -l 14 at (6).
With high-res classical music, -l 13 is the setting that improves over -7.
* -b 8192 also needs "-A [something slow]", it seems also to do harm when applied to -7 or -8 plain. But it doesn't help much here.
* -r7 is a good thing, but not at my high-resolution classical music; there, the sixth and seventh order are seldom used at all. (But at worst I found, -r7 at classical makes for 0.2 parts per million in size and only costs a bit of time, so ... if you want one monster slow setting, include -r7. -r8 doesn't improve much over -r7 it seems.)
* Monkey's wasn't developed for high resolution. Well we knew that - none of the codecs were. But however, on the Planet of the Apes there is nothing to do about it within a given compression mode: then a given signal yields a given encoded bitstream with no room for improving anything but speed.

Re: FLAC v1.4.x Performance Tests

Reply #211 – 2022-10-23 19:35:24

What are logical values for the x_tukey options? Seems I can put anything and the encoder accepts it. I see above you're using (7e-1), I've seen (3/2e-1). I tried something random like (9/7z-1) and it works. I've had good results using whole numbers, but what about these other values?

Re: FLAC v1.4.x Performance Tests

Reply #212 – 2022-10-23 20:01:38

Quote from: Replica9000 on 2022-10-23 19:35:24

I tried something random like (9/7z-1) and it works.

It is a bit tricky that the encoder accepts anything and silently drops stuff it doesn't understand.

If you want to know what it does, read the explanation at the bottom of this page: https://xiph.org/flac/documentation_tools_flac.html

TL;DR: for starters, just use whole numbers like subdivide_tukey(5) or something. If you feel like it, you can specify a second fraction between 0 and 1, like subdivide_tukey(5/0.2) The second value is locale-specific, so is subdivide_tukey(5/0,2) for many non-English PCs. Using scientific notation (2e-1) is a way around that. For other apodizations, see the linked document.

Re: FLAC v1.4.x Performance Tests

Reply #213 – 2022-10-23 20:08:14

Quote from: Porcus on 2022-10-23 19:20:41

* Monkey's wasn't developed for high resolution. Well we knew that - none of the codecs were. But however, on the Planet of the Apes there is nothing to do about it within a given compression mode: then a given signal yields a given encoded bitstream with no room for improving anything but speed.

If FLAC wasn't changed in a non-backwards compatible way back in 2007, FLAC would have done terrible at 24-bit too. Luckily that change went in before Josh left. Now it is way too late to make such a change.

Re: FLAC v1.4.x Performance Tests

Reply #214 – 2022-10-23 20:30:17

Yeah, maybe I should not ask more questions on that document by now, but ... C3, you set the limit at 16 bits. How about 20?
Asking because the hdcd.exe utility appeared around 2007. So quite soon there would indeed be quite a few CD-sourced 24 bit (with at least four wasted) files.
And so if that were a problem, it would likely have manifested itself - unless reference FLAC would use Rice-4 on those signals then.

Re: FLAC v1.4.x Performance Tests

Reply #215 – 2022-10-23 20:56:41

Quote from: Replica9000 on 2022-10-23 19:35:24

I've seen (3/2e-1)

Just a point: "/" is not a division slash, it is a separator between arguments, where the first is mandatory.
tukey(P) takes only one argument in, and that is a number between 0 and 1. The subdivide_tukey can be specified as subdivide_tukey(N) and optionally subdivide_tukey(N/P) - but then again, the "/P" has nothing to do with division. As ktf says, for starters stick to N and remember that higher N will slow down.

What this tukey function does? For the block of the signal - 4096 samples, typically - it keeps the middle 1-P fraction and it downweighs the beginning and end according to a cosine function. Turns out, it typically gives a much better predictor than not applying any weight - that would be "rectangle".
The subdivide_tukey "generates more functions" in a way that recycles lots of calculations. It takes time to try them all, but it doesn't make for more complicated decoding. They are several simple attempts, and the encoder picks the one that happens to fit best. The decoder doesn't know how hard the encoder tried.

Re: FLAC v1.4.x Performance Tests

Reply #216 – 2022-10-23 21:05:30

Quote from: Porcus on 2022-10-23 19:20:41

@bennetng , you just tested hi-rez -8p: how dows it work with your material?

Files in Reply #208 contain classical, jazz and pop stuff but in general -b is much more important than -p. Acoustic / unplugged materials as usual may benefit from higher -b, for 88.2k and above I would try 6144 to 16384 for these genres. Amplified / electronic / loudness war hi-res files still prefer lower -b. When -b is wrong, windowing also works poorly.

I also found something about decoded HDCD, I mean the "real" tracks which really make use of transient filter and peak extension. flac seems pretty good at dealing with HDCD. I used flac -a and see many wasted bits in a decoded HDCD image.

I investigated wasted bits with a normal 16-bit file, using foo_dsp_utility > Scale. 24-bit output with 0.5 scale is essentially bit shift, so bitrate almost remain the same, but then I tried 0.75, 0.625 and 0.875 and flac can still get a lot of wasted bits. However, with dither or something like -0.1dB gain flac can no longer see wasted bits. foo_dsp_utility > Add Noise kills wasted bits as well, like 0.000001 noise with 16 to 24-bit conversion.

Re: FLAC v1.4.x Performance Tests

Reply #217 – 2022-10-23 21:22:41

Quote from: bennetng on 2022-10-23 21:05:30

Files in Reply #208 contain classical, jazz and pop stuff but in general -b is much more important than -p.

Comparing -8e to -8p?
For CDDA, -8p is better and -8e is (not always) so much outdone that you wouldn't use it.
For high resolution, -e cannot be so easily written off, at least it is better than -p - in my tests, that is.

Re: FLAC v1.4.x Performance Tests

Reply #218 – 2022-10-24 02:12:16

Quote from: Porcus on 2022-10-23 21:22:41

Comparing -8e to -8p?

Only one of the 20 24-96 albums i use for the Replaygain benchmark comes out smaller with -8 -e vs -8 -p.
The size difference is not so much but speed is indeed.
The GCC version is much faster in multithreading with these as the Clang version if anyboby wants to know.

-8 -p
233.22x realtime
19.713.844.242 Bytes

-8 -e
292.17x realtime
19.721.033.993 Bytes

Re: FLAC v1.4.x Performance Tests

Reply #219 – 2022-10-24 06:25:41

Quote from: Porcus on 2022-10-23 20:30:17

Yeah, maybe I should not ask more questions on that document by now, but ... C3, you set the limit at 16 bits. How about 20?

That bit is directly what libFLAC does too. Seems silly to change it when there are already so many versions out that do this at 16 bit. Also, it would probably hurt compression a bit and ffmpeg doesn't care anyway, it even uses 5-bit Rice parameters for 16-bit audio in extreme cases.

Re: FLAC v1.4.x Performance Tests

Reply #220 – 2022-10-24 07:56:10

Here is what I can do with -e: resample a CDDA album to 24/88.2 with different resamplers. Care was taken to avoid clipping.

Code: [Select]

1052063534 SoX best 8p.flac
1050564778 SoX best 8e.flac

1214873052 RetroArch normal 8p.flac
1215123562 RetroArch normal 8e.flac

1335883144 RetroArch lower 8p.flac
1336339394 RetroArch lower 8e.flac

Re: FLAC v1.4.x Performance Tests

Reply #221 – 2022-10-24 08:06:48

This is a slightly puzzling. -e being helpful in particular when there is no high frequency content suggests that here the guesstimation procedure isn't very good. Yet it is precisely in those cases where 1.4 makes for the bigg improvements.

Re: FLAC v1.4.x Performance Tests

Reply #222 – 2022-10-24 08:22:49

No contradiction because the difference is night and day when compared to 1.3.x. No -p or -e.

1052844799 SoX best flac 142 -8.flac
1257694431 SoX best flac 134 -8.flac

Re: FLAC v1.4.x Performance Tests

Reply #223 – 2022-10-24 09:39:54

Don't want to crash the hires party, but here are some test results with v1.4.2 binaries floating around here.
As always, CPU is Intel Core i7-8700 CPU @ 3.20GHz, test corpus is mostly classic rock CDDA material).

Code: [Select]

FLAC Binary: xiph-142\flac.exe (299520 bytes)
FLAC Option: -7
- Average time =  27.940 seconds (5 rounds), Encoding speed = 386.97x
- FLAC size = 1.167.014.374 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac142-x64-gcc1220-Ofast+manyflags-noasm-wombat_2022-10-23.exe (665600 bytes)
FLAC Option: -7
- Average time =  22.560 seconds (5 rounds), Encoding speed = 479.25x
- FLAC size = 1.167.014.374 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac142-x64-gcc1220-Ofast+manyflags-wombat_2022-10-23.exe (737280 bytes)
FLAC Option: -7
- Average time =  25.519 seconds (5 rounds), Encoding speed = 423.68x
- FLAC size = 1.167.014.374 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac142-x64-Clang1503-Ofast-wombat_2022-10-23.exe (613376 bytes)
FLAC Option: -7
- Average time =  24.699 seconds (5 rounds), Encoding speed = 437.75x
- FLAC size = 1.167.014.372 bytes (= 61,188% of WAV size, ~863 kbps)

So here wombat's gcc build w/noasm is the fastest, on-par with his 141-fastmath build from 2022-10-14.

Re: FLAC v1.4.x Performance Tests

Reply #224 – 2022-10-24 10:26:56

Quote from: Porcus on 2022-10-24 08:06:48

This is a slightly puzzling. -e being helpful in particular when there is no high frequency content suggests that here the guesstimation procedure isn't very good. Yet it is precisely in those cases where 1.4 makes for the bigg improvements.

The improvements in 1.4.0 weren't related to the guesstimation. 1.4.0 improved the accuracy with which predictors were formed, the guesstimation (that -e circumvents by brute-forcing) is in which order to pick.

FLAC's LPC encoding works by first calculating autocorrelation. These calculated autocorrelation numbers are then crunched to form a set of predictors, one for each prospective LPC order (for preset 8 that is order 1 through 12). These predictors have become more accurate with the release of 1.4.0. However, the encoder still has to guess which order will result in the smallest representation. This guesstimation remains unchanged.

Notice