aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDavid Oberhollenzer <david.oberhollenzer@sigma-star.at>2020-05-23 02:29:49 +0200
committerDavid Oberhollenzer <david.oberhollenzer@sigma-star.at>2020-05-23 02:29:49 +0200
commitf2c487470dbfe3cf56f20b9899d5586ebcbefcc7 (patch)
tree12b6e695311035b5c72a8ecb284c1afae9b8caf1
parent66f29d5ecdfcc0ff2455241fdb9a229f58d24dcf (diff)
Update benchmark numbers for zstd, now that it uses correct parameters
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
-rw-r--r--doc/benchmark.odsbin53962 -> 53760 bytes
-rw-r--r--doc/parallelism.txt58
2 files changed, 29 insertions, 29 deletions
diff --git a/doc/benchmark.ods b/doc/benchmark.ods
index 6ea8871..62ee480 100644
--- a/doc/benchmark.ods
+++ b/doc/benchmark.ods
Binary files differ
diff --git a/doc/parallelism.txt b/doc/parallelism.txt
index 3c3afb1..315a631 100644
--- a/doc/parallelism.txt
+++ b/doc/parallelism.txt
@@ -150,7 +150,7 @@
- As uncompressed tarball: ~6.5GiB (7,008,118,272)
- As LZ4 compressed SquashFS image: ~3.1GiB (3,381,751,808)
- As LZO compressed SquashFS image: ~2.5GiB (2,732,015,616)
- - As zstd compressed SquashFS image: ~2.4GiB (2,536,910,848)
+ - As zstd compressed SquashFS image: ~2.1GiB (2,295,017,472)
- As gzip compressed SquashFS image: ~2.3GiB (2,471,276,544)
- As lzma compressed SquashFS image: ~2.0GiB (2,102,169,600)
- As XZ compressed SquashFS image: ~2.0GiB (2,098,466,816)
@@ -164,7 +164,7 @@
AMD Ryzen 7 3700X
32GiB DDR4 RAM
- Fedora 31 with Linux 5.5.17
+ Fedora 31
2.4) Results
@@ -172,23 +172,23 @@
The raw timing results are as follows:
Jobs XZ lzma gzip LZO LZ4 zstd
- serial 17m39.613s 16m10.710s 9m56.606s 13m22.337s 12.159s 28.493s
- 1 17m38.050s 15m49.753s 9m46.948s 13m06.705s 11.908s 28.926s
- 2 9m26.712s 8m24.706s 5m08.152s 6m53.872s 7.395s 16.381s
- 3 6m29.733s 5m47.422s 3m33.235s 4m44.407s 6.069s 11.949s
- 4 5m02.993s 4m30.361s 2m43.447s 3m39.825s 5.864s 9.917s
- 5 4m07.959s 3m40.860s 2m13.454s 2m59.395s 5.749s 8.803s
- 6 3m30.514s 3m07.816s 1m53.641s 2m32.461s 5.926s 8.359s
- 7 3m04.009s 2m43.765s 1m39.742s 2m12.536s 6.281s 8.264s
- 8 2m45.050s 2m26.996s 1m28.776s 1m58.253s 6.395s 7.844s
- 9 2m34.993s 2m18.868s 1m21.668s 1m50.461s 6.890s 7.915s
- 10 2m27.399s 2m11.214s 1m15.461s 1m44.060s 7.225s 8.157s
- 11 2m20.068s 2m04.592s 1m10.286s 1m37.749s 7.557s 8.448s
- 12 2m13.131s 1m58.710s 1m05.957s 1m32.596s 8.127s 8.652s
- 13 2m07.472s 1m53.481s 1m02.041s 1m27.982s 8.704s 9.210s
- 14 2m02.365s 1m48.773s 1m00.337s 1m24.444s 9.494s 10.547s
- 15 1m58.298s 1m45.079s 58.348s 1m21.445s 10.192s 11.427s
- 16 1m55.940s 1m42.176s 56.615s 1m19.030s 10.964s 12.889s
+ serial 17m39.613s 16m10.710s 9m56.606s 13m22.337s 12.159s 9m33.600s
+ 1 17m38.050s 15m49.753s 9m46.948s 13m06.705s 11.908s 9m23.445s
+ 2 9m26.712s 8m24.706s 5m08.152s 6m53.872s 7.395s 5m 1.734s
+ 3 6m29.733s 5m47.422s 3m33.235s 4m44.407s 6.069s 3m30.708s
+ 4 5m02.993s 4m30.361s 2m43.447s 3m39.825s 5.864s 2m44.418s
+ 5 4m07.959s 3m40.860s 2m13.454s 2m59.395s 5.749s 2m16.745s
+ 6 3m30.514s 3m07.816s 1m53.641s 2m32.461s 5.926s 1m57.607s
+ 7 3m04.009s 2m43.765s 1m39.742s 2m12.536s 6.281s 1m43.734s
+ 8 2m45.050s 2m26.996s 1m28.776s 1m58.253s 6.395s 1m34.500s
+ 9 2m34.993s 2m18.868s 1m21.668s 1m50.461s 6.890s 1m29.820s
+ 10 2m27.399s 2m11.214s 1m15.461s 1m44.060s 7.225s 1m26.176s
+ 11 2m20.068s 2m04.592s 1m10.286s 1m37.749s 7.557s 1m22.566s
+ 12 2m13.131s 1m58.710s 1m05.957s 1m32.596s 8.127s 1m18.883s
+ 13 2m07.472s 1m53.481s 1m02.041s 1m27.982s 8.704s 1m16.218s
+ 14 2m02.365s 1m48.773s 1m00.337s 1m24.444s 9.494s 1m14.175s
+ 15 1m58.298s 1m45.079s 58.348s 1m21.445s 10.192s 1m12.134s
+ 16 1m55.940s 1m42.176s 56.615s 1m19.030s 10.964s 1m11.049s
The file "benchmark.ods" contains those values, values derived from this and
charts depicting the results.
@@ -196,15 +196,15 @@
2.5) Discussion
- Most obviously, the results indicate that LZ4 and zstd compression are clearly
- I/O bound and not CPU bound. They don't benefit from parallelization beyond
- 2-4 worker threads and even that benefit is marginal with efficiency
+ Most obviously, the results indicate that LZ4, unlike the other compressors,
+ is clearly I/O bound and not CPU bound and doesn't benefit from parallelization
+ beyond 2-4 worker threads and even that benefit is marginal with efficiency
plummetting immediately.
- The other compressors (XZ, lzma, gzip, lzo) are clearly CPU bound. Speedup
- increases linearly until about 8 cores, but with a slope < 1, as evident by
- efficiency linearly decreasing and reaching 80% for 8 cores.
+ The other compressors are clearly CPU bound. Speedup increases linearly until
+ about 8 cores, but with a slope < 1, as evident by efficiency linearly
+ decreasing and reaching 80% for 8 cores.
A reason for this sub-linear scaling may be the choke point introduced by the
creation of fragment blocks, that *requires* a synchronization. To test this
@@ -230,10 +230,10 @@
As a side effect, this benchmark also produces some insights into the
compression ratio and throughput of the supported compressors. Indicating that
for the Debian live image, XZ clearly provides the highest data density, while
- LZ4 is clearly the fastest compressor available, directly followed by zstd
- which has a much better compression ratio than LZ4, comparable to the gzip
- compressor, while being almost 50 times faster. The throughput of the zstd
- compressor is truly impressive, considering the compression ratio it achieves.
+ LZ4 is clearly the fastest compressor available.
+
+ The throughput of the zstd compressor is comparable to gzip, while the
+ resulting compression ratio is closer to LZMA.
Repeating the benchmark without tail-end-packing and with fragments completely
disabled would also show the effectiveness of tail-end-packing and fragment