Skip to content

Commit 02b0cd8

Browse files
author
Jim DeFabia
committed
HPCC-33601 Document the new lz4s and lz4shc index compression and options
Signed-off-by: Jim DeFabia <[email protected]>
1 parent b04fa2a commit 02b0cd8

File tree

2 files changed

+210
-49
lines changed

2 files changed

+210
-49
lines changed

docs/EN_US/ECLLanguageReference/ECLR_mods/BltInFunc-BUILD.xml

+109-30
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,9 @@
3939

4040
<para><informaltable colsep="1" frame="all" rowsep="1">
4141
<tgroup cols="2">
42-
<colspec colwidth="78.50pt" />
42+
<colspec colwidth="78.50pt"/>
4343

44-
<colspec />
44+
<colspec/>
4545

4646
<tbody>
4747
<row>
@@ -241,9 +241,9 @@
241241

242242
<para><informaltable colsep="1" frame="all" rowsep="1">
243243
<tgroup cols="2">
244-
<colspec colwidth="125pt" />
244+
<colspec colwidth="125pt"/>
245245

246-
<colspec />
246+
<colspec/>
247247

248248
<tbody>
249249
<row>
@@ -256,8 +256,8 @@
256256
written to disk is always determined by the number of nodes in
257257
the cluster on which the workunit executes, regardless of the
258258
number of nodes on the target cluster(s) unless the WIDTH option
259-
is also specified. Use this option for bare-metal deployments.
260-
</entry>
259+
is also specified. Use this option for bare-metal
260+
deployments.</entry>
261261
</row>
262262

263263
<row>
@@ -292,7 +292,7 @@
292292
names of the plane(s) to write the
293293
<emphasis>indexfile</emphasis> to. The
294294
<emphasis>targetPlane</emphasis> names must be listed as they
295-
are defined in the deployment. </entry>
295+
are defined in the deployment.</entry>
296296
</row>
297297

298298
<row>
@@ -856,17 +856,17 @@ BUILD(FilterDsLib1);
856856

857857
<informaltable colsep="1" frame="all" rowsep="1">
858858
<tgroup cols="2">
859-
<colspec align="left" colwidth="122.40pt" />
859+
<colspec align="left" colwidth="188*"/>
860860

861-
<colspec />
861+
<colspec colwidth="812*"/>
862862

863863
<tbody>
864864
<row>
865865
<entry><emphasis role="bold">LZW</emphasis></entry>
866866

867-
<entry>The default compression. It is a variant of the
868-
Lempel-Ziv-Welch algorithm. It remains the default for backward
869-
compatibility.</entry>
867+
<entry>A variant of the Lempel-Ziv-Welch algorithm. This was the
868+
the default compression prior to versions 9.6.90, 9.8.66,and
869+
9.10.12.</entry>
870870
</row>
871871

872872
<row>
@@ -894,34 +894,113 @@ BUILD(FilterDsLib1);
894894
compression on the payload. The resulting index can be smaller
895895
than using lz4.</entry>
896896
</row>
897+
898+
<row>
899+
<entry><emphasis role="bold"><emphasis
900+
role="bold">'inplace:lz4s'</emphasis> </emphasis></entry>
901+
902+
<entry>Causes inplace compression on the key fields and lz4s
903+
compression on the payload. This uses the stream LZ4 API to avoid
904+
recompressing the data and reduce the index build times.</entry>
905+
</row>
906+
907+
<row>
908+
<entry><emphasis role="bold"><emphasis
909+
role="bold">'inplace:lz4shc'</emphasis> </emphasis></entry>
910+
911+
<entry>The default compression for inplace indexes in versions
912+
after versions 9.6.90, 9.8.66, and 9.10.12. Causes inplace
913+
compression on the key fields and lz4shc compression on the
914+
payload. This uses the stream LZ4 API to avoid recompressing the
915+
data and reduce the index build times.</entry>
916+
</row>
897917
</tbody>
898918
</tgroup>
899919
</informaltable>
900920

901-
<para>The inplace index compression format (introduced in version 9.2.0)
902-
improves compression of keyed fields and allows them to be searched
903-
without decompression. The original index compression implementation
904-
decompresses the rows when they are read from disk.</para>
921+
<para>The lz4s and lz4hc inplace index compression formats (introduced in
922+
versions 9.6.90, 9.8.66, and 9.10.12 9.2.0 or later) improves compression
923+
and reduces build time. These formats require an engine that supports it.
924+
In other words, <emphasis role="bold">if you build an index using the lz4s
925+
or lz4shc formats, you must use a platform later than 9.6.90, 9.8.66, and
926+
9.10.12 to read those indexes.</emphasis></para>
927+
928+
<para>If you attempt to read an index with the inplace compression format
929+
on a system that does not support it, you will receive an error
930+
message.</para>
905931

906932
<para>Because the branch nodes can be searched without decompression more
907933
branch nodes fit into memory which can improve search performance. The lz4
908934
compression used for the payload is significantly faster at decompressing
909-
leaf pages than the previous LZW compression.</para>
935+
leaf pages than the previous LZW compression. Whether performance is
936+
better with lz4hc (a high-compression variant of lz4) on the payload
937+
fields depends on the access characteristics of the data and how much of
938+
the index is cached in memory.</para>
910939

911-
<para>Whether performance is better with lz4hc (a high-compression variant
912-
of lz4) on the payload fields depends on the access characteristics of the
913-
data and how much of the index is cached in memory.</para>
940+
<para><emphasis role="bold">Compression Levels :</emphasis></para>
914941

915-
<para>If you attempt to read an index with the inplace compression format
916-
on a system that does not support them, you will receive an error
917-
message.</para>
942+
<informaltable colsep="1" frame="all" rowsep="1">
943+
<tgroup cols="2">
944+
<colspec align="left" colwidth="240*"/>
945+
946+
<colspec colwidth="836*"/>
947+
948+
<tbody>
949+
<row>
950+
<entry><emphasis role="bold">hclevel</emphasis></entry>
951+
952+
<entry>An integer between 2 and 12 to specify the level of
953+
compression. The default is 3. Higher levels increase the
954+
compression, but also increase the compression times. This may be
955+
cost effective depending on the length of time the data is stored,
956+
and the storage costs compared to the compute costs to build the
957+
index.</entry>
958+
</row>
918959

919-
<para>See Also: <link linkend="INDEX_record_structure">INDEX</link>, <link
920-
linkend="JOIN">JOIN</link>, <link linkend="FETCH">FETCH</link>, <link
921-
linkend="MODULE_Structure">MODULE</link>, <link
922-
linkend="INTERFACE_Structure">INTERFACE</link>, <link
923-
linkend="LIBRARY">LIBRARY</link>, <link
924-
linkend="DISTRIBUTE">DISTRIBUTE</link>, <link
925-
linkend="_WORKUNIT">#WORKUNIT</link></para>
960+
<row>
961+
<entry><emphasis role="bold">maxcompression</emphasis></entry>
962+
963+
<entry>The maximum desired compression ratio. This avoids the leaf
964+
nodes getting too large when expanded, but increases the size of
965+
some indexes. The default is 20.</entry>
966+
</row>
967+
968+
<row>
969+
<entry><emphasis role="bold">maxrecompress</emphasis></entry>
970+
971+
<entry>Specifies the number of times the entire input dataset
972+
should be recompressed to free up space. Increasing the number
973+
decreases the size of the indexes, and will probably decrease the
974+
decompress time slightly (because there are fewer stream blocks),
975+
but will increase the build time. The default is 1.</entry>
976+
</row>
977+
</tbody>
978+
</tgroup>
979+
</informaltable>
980+
981+
<para/>
982+
983+
<para>Example:</para>
984+
985+
<programlisting>Vehicles := DATASET('vehicles',
986+
{STRING2 st,STRING20 city,STRING20 lname},FLAT);
987+
988+
SearchTerms := RECORD
989+
Vehicles.st;
990+
Vehicles.city;
991+
END;
992+
Payload := RECORD
993+
Vehicles.lname;
994+
END;
995+
VehicleKey := INDEX(Vehicles,SearchTerms,Payload,'vkey::st.city',
996+
COMPRESSED('inplace:lz4shc,compressopt(hclevel=9,
997+
maxcompression=25,
998+
maxrecompress=4)'));
999+
BUILD(VehicleKey);</programlisting>
1000+
1001+
<para>See Also: <link linkend="DATASET">DATASET</link>, <link
1002+
linkend="BUILD">BUILDINDEX</link>, <link linkend="JOIN">JOIN</link>, <link
1003+
linkend="FETCH">FETCH</link>, <link
1004+
linkend="KEYED-WILD">KEYED/WILD</link></para>
9261005
</sect2>
9271006
</sect1>

docs/EN_US/ECLLanguageReference/ECLR_mods/Recrd-Index.xml

+101-19
Original file line numberDiff line numberDiff line change
@@ -49,9 +49,9 @@
4949

5050
<informaltable colsep="1" frame="all" rowsep="1">
5151
<tgroup cols="2">
52-
<colspec align="left" colwidth="122.40pt" />
52+
<colspec align="left" colwidth="122.40pt"/>
5353

54-
<colspec />
54+
<colspec/>
5555

5656
<tbody>
5757
<row>
@@ -266,7 +266,7 @@
266266

267267
<para>All STRINGs must be fixed length.</para>
268268

269-
<para></para>
269+
<para/>
270270
</listitem>
271271
</itemizedlist></para>
272272

@@ -365,17 +365,17 @@ BUILD(VehicleKey3);
365365

366366
<informaltable colsep="1" frame="all" rowsep="1">
367367
<tgroup cols="2">
368-
<colspec align="left" colwidth="122.40pt" />
368+
<colspec align="left" colwidth="188*"/>
369369

370-
<colspec />
370+
<colspec colwidth="836*"/>
371371

372372
<tbody>
373373
<row>
374374
<entry><emphasis role="bold">LZW</emphasis></entry>
375375

376-
<entry>The default compression. It is a variant of the
377-
Lempel-Ziv-Welch algorithm. It remains the default for backward
378-
compatibility.</entry>
376+
<entry>A variant of the Lempel-Ziv-Welch algorithm. This was the
377+
the default compression prior to versions 9.6.90, 9.8.66, and
378+
9.10.12.</entry>
379379
</row>
380380

381381
<row>
@@ -403,27 +403,109 @@ BUILD(VehicleKey3);
403403
compression on the payload. The resulting index can be smaller
404404
than using lz4.</entry>
405405
</row>
406+
407+
<row>
408+
<entry><emphasis role="bold"><emphasis
409+
role="bold">'inplace:lz4s'</emphasis> </emphasis></entry>
410+
411+
<entry>Causes inplace compression on the key fields and lz4s
412+
compression on the payload. This uses the stream LZ4 API to avoid
413+
recompressing the data and reduce the index build times.</entry>
414+
</row>
415+
416+
<row>
417+
<entry><emphasis role="bold"><emphasis
418+
role="bold">'inplace:lz4shc'</emphasis> </emphasis></entry>
419+
420+
<entry>The default compression for inplace indexes in versions
421+
after versions 9.6.90, 9.8.66, and 9.10.12. Causes inplace
422+
compression on the key fields and lz4shc compression on the
423+
payload. This uses the stream LZ4 API to avoid recompressing the
424+
data and reduce the index build times.</entry>
425+
</row>
406426
</tbody>
407427
</tgroup>
408428
</informaltable>
409429

410-
<para>The inplace index compression format (introduced in version 9.2.0)
411-
improves compression of keyed fields and allows them to be searched
412-
without decompression. The original index compression implementation
413-
decompresses the rows when they are read from disk.</para>
430+
<para>The lz4s and lz4hc inplace index compression formats (introduced in
431+
versions 9.6.90, 9.8.66, and 9.10.12 9.2.0 or later) improves compression
432+
and reduces build time. These formats require an engine that supports it.
433+
In other words, <emphasis role="bold">if you build an index using the lz4s
434+
or lz4shc formats, you must use a platform later than 9.6.90, 9.8.66, and
435+
9.10.12 to read those indexes. </emphasis></para>
436+
437+
<para>If you attempt to read an index with the inplace compression format
438+
on a system that does not support it, you will receive an error
439+
message.</para>
414440

415441
<para>Because the branch nodes can be searched without decompression more
416442
branch nodes fit into memory which can improve search performance. The lz4
417443
compression used for the payload is significantly faster at decompressing
418-
leaf pages than the previous LZW compression.</para>
444+
leaf pages than the previous LZW compression. Whether performance is
445+
better with lz4hc (a high-compression variant of lz4) on the payload
446+
fields depends on the access characteristics of the data and how much of
447+
the index is cached in memory.</para>
419448

420-
<para>Whether performance is better with lz4hc (a high-compression variant
421-
of lz4) on the payload fields depends on the access characteristics of the
422-
data and how much of the index is cached in memory.</para>
449+
<para><emphasis role="bold">Compression Levels :</emphasis></para>
423450

424-
<para>If you attempt to read an index with the inplace compression format
425-
on a system that does not support them, you will receive an error
426-
message.</para>
451+
<informaltable colsep="1" frame="all" rowsep="1">
452+
<tgroup cols="2">
453+
<colspec align="left" colwidth="240*"/>
454+
455+
<colspec colwidth="733*"/>
456+
457+
<tbody>
458+
<row>
459+
<entry><emphasis role="bold">hclevel</emphasis></entry>
460+
461+
<entry>An integer between 2 and 12 to specify the level of
462+
compression. The default is 3. Higher levels increase the
463+
compression, but also increase the compression times. This may be
464+
cost effective depending on the length of time the data is stored,
465+
and the storage costs compared to the compute costs to build the
466+
index.</entry>
467+
</row>
468+
469+
<row>
470+
<entry><emphasis role="bold">maxcompression</emphasis></entry>
471+
472+
<entry>The maximum desired compression ratio. This avoids the leaf
473+
nodes getting too large when expanded, but increases the size of
474+
some indexes. The default is 20.</entry>
475+
</row>
476+
477+
<row>
478+
<entry><emphasis role="bold">maxrecompress</emphasis></entry>
479+
480+
<entry>Specifies the number of times the entire input dataset
481+
should be recompressed to free up space. Increasing the number
482+
decreases the size of the indexes, and will probably decrease the
483+
decompress time slightly (because there are fewer stream blocks),
484+
but will increase the build time. The default is 1.</entry>
485+
</row>
486+
</tbody>
487+
</tgroup>
488+
</informaltable>
489+
490+
<para/>
491+
492+
<para>Example:</para>
493+
494+
<programlisting>Vehicles := DATASET('vehicles',
495+
{STRING2 st,STRING20 city,STRING20 lname},FLAT);
496+
497+
SearchTerms := RECORD
498+
Vehicles.st;
499+
Vehicles.city;
500+
END;
501+
Payload := RECORD
502+
Vehicles.lname;
503+
END;
504+
VehicleKey := INDEX(Vehicles,SearchTerms,Payload,'vkey::st.city',
505+
COMPRESSED('inplace:lz4shc,compressopt(hclevel=9,
506+
maxcompression=25,
507+
maxrecompress=4)'));
508+
BUILD(VehicleKey);</programlisting>
427509

428510
<para>See Also: <link linkend="DATASET">DATASET</link>, <link
429511
linkend="BUILD">BUILDINDEX</link>, <link linkend="JOIN">JOIN</link>, <link

0 commit comments

Comments
 (0)