diff --git a/LICENSE b/LICENSE index 06d01f6..d9a10c0 100644 --- a/LICENSE +++ b/LICENSE @@ -1,19 +1,176 @@ -Copyright (c) 2020 Ritchie Vink - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS diff --git a/README.md b/README.md index ef0b77d..71fe299 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,29 @@ -polars-tpch -=========== +# Polars Decision Support (PDS) benchmarks -This repo contains the code used for performance evaluation of polars. The benchmarks are TPC-standardised queries and data designed to test the performance of "real" workflows. +## Disclaimer + +Polars Decision Support (PDS) benchmarks are derived from the TPC-H Benchmarks and as such any results obtained using PDS are not comparable to published TPC-H Benchmark results, as the results obtained from using PDS do not comply with the TPC-H Benchmarks. + +These benchmarks are our adaptation of an industry-standard decision support benchmark often used in the DataFrame library community. PDS consists of the same 22 queries as the industry standard benchmark TPC-H, but has modified parts for dataset generation and execution scripts. From the [TPC website](https://www.tpc.org/tpch/): > TPC-H is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. -## Generating TPC-H Data +## License + +PDS is licensed under Apache License, Version 2.0. + +Additionally, certain files in PDS are licensed subject to the accompanying [TPC EULA](TPC%20EULA.txt) (also available at ). Files subject to the TPC EULA are identified as such within the files. + +You may not use PDS except in compliance with the Apache License, Version 2.0 and the TPC EULA. + +## Generating PDS Benchmarking Data ### Project setup ```shell # clone this repository -git clone https://github.com/pola-rs/tpch.git +git clone https://github.com/pola-rs/pdsh.git cd tpch/tpch-dbgen # build tpch-dbgen diff --git a/TPC EULA.txt b/TPC EULA.txt new file mode 100644 index 0000000..92af24a --- /dev/null +++ b/TPC EULA.txt @@ -0,0 +1,320 @@ +END USER LICENSE AGREEMENT +VERSION 2.2 + +READ THE TERMS AND CONDITIONS OF THIS AGREEMENT ("AGREEMENT") CAREFULLY +BEFORE INSTALLING OR USING THE ACCOMPANYING SOFTWARE. BY INSTALLING OR +USING THE SOFTWARE OR RELATED DOCUMENTATION, YOU AGREE TO BE BOUND BY +THE TERMS OF THIS AGREEMENT. IF YOU DO NOT AGREE TO THE TERMS OF THIS +AGREEMENT, DO NOT INSTALL OR USE THE SOFTWARE. IF YOU ARE ACCESSING THE +SOFTWARE ON BEHALF OF YOUR ORGANIZATION, YOU REPRESENT AND WARRANT THAT +YOU HAVE SUFFICIENT AUTHORITY TO BIND YOUR ORGANIZATION TO THIS +AGREEMENT. + +USE AND RE-EXPORT OF THE SOFTWARE IS SUBJECT TO THE UNITED STATES EXPORT +CONTROL ADMINISTRATION REGULATIONS. THE SOFTWARE MAY NOT BE USED BY +UNLICENSED PERSONS OR ENTITIES, AND MAY NOT BE RE- EXPORTED TO ANOTHER +COUNTRY. SEE EXPORT ASSURANCE (CLAUSE 13) OF THIS LICENSE. + +This is a legal agreement between you (or, if you are accessing the +software on behalf of your organization, your organization) ("You" or +"User") and the Transaction Processing Performance Council ("TPC"). This +Agreement states the terms and conditions upon which TPC offers to +license the Software, including, but not limited to, the source code, +scripts, executable programs, drivers, libraries and data files +associated with such programs, and modifications thereof (the +"Software"), and online, electronic or printed documentation +("Documentation," together with the Software, "Materials"). + +LICENSE + +1. Definitions + +"Executive Summary" shall mean a short summary of a TPC Benchmark Result +that shows the configuration, primary metrics, performance data, and +pricing details. The exact requirements for the Executive Summary are +defined in each TPC Benchmark Standard. +"Full Disclosure Report (FDR)" shall mean a document that describes The +TPC Benchmark Result in sufficient detail such that the Result could be +recreated. The exact requirements for the FDR are defined in each TPC +Benchmark Standard. +"TPC Benchmark Result (Result)" shall mean a performance test submitted +to the TPC attested to meet the requirements of a TPC Benchmark Standard +at the time of submission. A Result is documented by an Executive +Summary and, if required, a FDR. +"TPC Benchmark Standard" shall mean a TPC Benchmark Specification and +any associated code or binaries approved by the TPC. The various TPC +Benchmark Standards can be found at +http://www.tpc.org/information/current_specifications.asp. +"TPC Policies" shall mean the guiding principles for how the TPC +conducts its operations and business. The current TPC Policies can be +found at http://www.tpc.org/information/current_specifications.asp. + +2. Ownership. The Materials are licensed, not sold, to You for use only +under the terms of this Agreement. As between You and TPC (and, to the +extent applicable, its licensors), TPC retains all rights, title and +interest to and ownership of the Materials and reserves all rights not +expressly granted to You. + +3. License Grant. Subject to Your compliance in all material respects +with the terms and conditions of this Agreement, TPC grants You a +restricted, non-exclusive, revocable license to install and use the +Materials, but only as expressly permitted herein. You may only use the +Software on computer systems under Your direct control. You may download +multiple copies of the Materials and make verbatim copies of the +original of the Software so long as Your use of such copies complies +with the terms of this Agreement. +a. Use by Individual. If You are accessing the Materials as an +individual, only You (as an individual) may access and use the +Materials. +b. Use by Organization. If You are accessing the Materials on behalf of +Your organization, only You and those within Your organization may use +the Materials. Your organization must identify a contact person to TPC +and conduct communications with TPC through that contact person. + +4. Restrictions. The following restrictions apply to all use of the +Materials by You. +a. General: You may not: +(1) use, copy, print, modify, adapt, create derivative works from, +market, deliver, rent, lease, sublicense, make, have made, assign, +pledge, transfer, sell, offer to sell, import, reproduce, distribute, +publicly perform, publicly display or otherwise grant rights to the +Materials, or any copy thereof, in whole or in part, except as expressly +permitted under this Agreement; or +(2) use the Materials in any way that does not comply with all +applicable laws and regulations. +b. Modification: You may modify the Software. +c. Public Disclosure: You may not publicly disclose any performance +results produced while using the Software except in the following +circumstances: +(1) as part of a TPC Benchmark Result. For purposes of this Agreement, a +"TPC Benchmark Result" is a performance test submitted to the TPC, +documented by a Full Disclosure Report and Executive Summary, claiming +to meet the requirements of an official TPC Benchmark Standard. You +agree that TPC Benchmark Results may only be published in accordance +with the TPC Policies. viewable at http: //www.tpc.org +(2) as part of an academic or research effort that does not imply or +state a marketing position +(3) any other use of the Software, provided that any performance results +must be clearly identified as not being comparable to TPC Benchmark +Results unless specifically authorized by TPC. + +5. License Modification. Requests for modification of this license shall +be addressed to info@tpc.org. You may not remove or modify this license +without permission. + +6. Copyright. The Materials are owned by TPC and/or its licensors, and +are protected by United States copyright laws and international treaty +provisions. You may not remove the copyright notice from the original or +any copy of the Materials, and You must apply the notice if You extract +part of the Materials not bearing a notice. + +7. Use of Name. You acknowledge and agree that TPC owns all trademark +and trade name rights in the names, trademarks and logos used by TPC in +the Materials. User shall preserve any notices regarding such ownership. +User may only use such names, trademarks and logos in accordance with +the usage guidelines specified by the TPC Policies. + +8. Merger or Integration. Any portion of the Materials merged into or +integrated with other software or documentation will continue to be +subject to the terms and conditions of this Agreement. + +9. Limited Grants of Sublicense. You may distribute the Software as +provided or as modified as permitted under clause 4 b. of this +Agreement, provided You comply with all of the terms of this Agreement +and the following conditions: + +a. If You distribute any portion of the Software in its original form +You may do so only under this Agreement by including a complete copy of +this Agreement with Your distribution, and if You distribute the +Software in modified form, You may only do so under a license that at a +minimum provides all of the protections and conditions of use contained +within this Agreement; + +b. You must include on each copy of the Software that You distribute the +following legend in all caps, at the top of the label and license, and +in a font not less than 12 point and no less prominent than any other +printing: "THE TPC SOFTWARE IS AVAILABLE WITHOUT CHARGE FROM TPC."; + +c. You must retain all copyright, patent, trademark, and attribution +notices that are present in the Software; and + +d. You may not charge a fee for the distribution of this Software, +including any modifications permitted under clause 4.b. + +10. Term and Termination. +a. Term. The license granted to You is effective until terminated. +b. Termination. +(1) By You. You may terminate this Agreement at any time by returning +the Materials (including any portions or copies thereof) to TPC or +providing written notice to the TPC that all copies of the Materials +within Your custody or control have been deleted or destroyed. +(2) By TPC. In the event You materially fail to comply with any term or +condition of this Agreement, and You fail to remedy such non-compliance +within 30 days after the receipt of notice to that effect, then TPC +shall have the right to terminate this Agreement immediately upon +written notice at the end of such 30-day period. +c. Effect of Termination. Termination of this Agreement in accordance +with this clause 10 will not terminate the rights of end users +sublicensed by You pursuant to this Agreement. Moreover, upon +termination and at TPC's written request, You agree to either (1) return +the Materials (including any portions or copies thereof) to TPC or (2) +immediately destroy all copies of the Materials within Your custody or +control and inform the TPC of the destruction of the Materials. Upon +termination, TPC may also enforce any rights provided by law. The +provisions of this Agreement that protect the proprietary rights of TPC +and its Licensors will continue in force after termination. + +11. No Warranty; Materials Provided "As Is". TO THE MAXIMUM EXTENT +PERMITTED BY APPLICABLE LAW, THE MATERIALS ARE PROVIDED "AS IS" AND WITH +ALL FAULTS, AND TPC (AND ITS LICENSORS) AND THE AUTHORS AND DEVELOPERS +OF THE MATERIALS HEREBY DISCLAIM ALL WARRANTIES, REPRESENTATIONS AND +CONDITIONS, EITHER EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT +LIMITED TO, ANY IMPLIED WARRANTIES, DUTIES OR CONDITIONS RELATING TO +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, ACCURACY OR +COMPLETENESS OF RESPONSES, RESULTS, WORKMANLIKE EFFORT, LACK OF VIRUSES, +LACK OF NEGLIGENCE, TITLE, QUIET ENJOYMENT, QUIET POSSESSION, +CORRESPONDENCE TO DESCRIPTION OR NONINFRINGEMENT. USER RECOGNIZES THAT +THE MATERIALS ARE THE RESULT OF A COOPERATIVE, NON-PROFIT EFFORT AND +THAT TPC DOES NOT CONDUCT A TYPICAL BUSINESS. USER ACCEPTS THE MATERIALS +"AS IS" AND WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED. + +Without limitation, TPC (and its licensors) do not warrant that the +functions contained in the Software or Materials will meet Your +requirements or that the operation of the Software will be +uninterrupted, error-free or free from malicious code. For purposes of +this paragraph, "malicious code" means any program code designed to +contaminate other computer programs or computer data, consume computer +resources, modify, destroy, record, or transmit data, or in some other +fashion usurp the normal operation of the computer, computer system, or +computer network, including viruses, Trojan horses, droppers, worms, +logic bombs, and the like. TPC (and its licensors) shall not be liable +for the accuracy of any information provided by TPC or third-party +technical support personnel, or any damages caused, either directly or +indirectly, by acts taken or omissions made by You as a result of such +technical support. + +You assume full responsibility for the selection of the Materials to +achieve Your intended results, and for the installation, use and results +obtained from the Materials. You also assume the entire risk as it +applies to the quality and performance of the Materials. Should the +Materials prove defective, You (and not TPC) assume the entire liability +of any and all necessary servicing, repair or correction. Some +countries/states do not allow the exclusion of implied warranties, so +the above exclusion may not apply to You. TPC (and its licensors) +further disclaims all warranties of any kind if the Materials were +customized, repackaged or altered in any way by any party other than TPC +(or its licensors). + +12. Disclaimer of Liability. TPC (and its licensors) assumes no +liability with respect to the Materials, including liability for +infringement of intellectual property rights, negligence, or any other +liability. TPC is not aware of any infringement of copyright or patent +that may result from its grant of rights to User of the Materials. If +User receives any notice of infringement, such notice shall be +immediately communicated to TPC who will have sole discretion to take +action to evaluate the claim and, if practicable, modify the Materials +as necessary to avoid infringement. In the event that TPC determines +that the Materials cannot be modified to avoid such infringement (or any +other infringement claim communicated to TPC), TPC may terminate this +Agreement immediately. User shall suspend use of the Materials until +modifications to avoid claims of infringement have been completed. User +waives any claim against TPC in the event of such infringement claims by +others. + +13. Export Assurance. Use and re-export of the Materials and related +technical information is subject to the Export Administration +Regulations (EAR) of the United States Department of Commerce. User +hereby agrees that User (a) assumes responsibility for compliance with +the EAR in its use of the Materials and technical information, and (b) +will not export, re-export, or otherwise disclose directly or +indirectly, the Materials, technical data, or any direct product of the +Materials or technical data in violation of the EAR. + +14. Limitation of Remedies And Damages. IN NO EVENT WILL TPC OR ITS +LICENSORS OR LICENSEE BE LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL OR +CONSEQUENTIAL DAMAGES OR FOR ANY LOST PROFITS, LOST SAVINGS, LOST +REVENUES OR LOST DATA ARISING FROM OR RELATING TO THE MATERIALS OR THIS +AGREEMENT, EVEN IF TPC OR ITS LICENSORS OR LICENSEE HAVE BEEN ADVISED OF +THE POSSIBILITY OF SUCH DAMAGES. IN NO EVENT WILL TPC'S OR ITS +LICENSORS' LIABILITY OR DAMAGES TO YOU OR ANY OTHER PERSON EVER EXCEED +U.S. ONE HUNDRED DOLLARS (US $100), REGARDLESS OF THE FORM OF THE CLAIM. +IN NO EVENT WILL LICENSEE'S LIABILITY OR DAMAGES TO TPC OR ANY OTHER +PERSON EVER EXCEED $1,000,000, REGARDLESS OF THE FORM OF THE CLAIM. Some +countries/states do not allow the limitation or exclusion of liability +for incidental or consequential damages, so the above limitation or +exclusion may not apply to You. + +15. U.S. Government Restricted Rights. All Software and related +documentation are provided with restricted rights. Use, duplication or +disclosure by the U.S. Government is subject to restrictions as set +forth in subdivision (b)(3)(ii) of the Rights in Technical Data and +Computer Software Clause at 252.227-7013. If You are using the Software +outside of the United States, You will comply with the applicable local +laws of Your country, U.S. export control law, and the English version +of this Agreement. + +16. Contractor/Manufacturer. The Contractor/Manufacturer for the +Software is: + +Transaction Processing Performance Council +572B Ruger Street, P.O. Box 29920 +San Francisco, CA 94129 + +17. General. This Agreement is binding on You as well as Your employees, +employers, contractors and agents, and on any successors and assignees. +This Agreement is governed by the laws of the State of California +(except to the extent federal law governs copyrights and trademarks) +without respect to any provisions of California law that would cause +application of the law of another state or country. The parties agree +that the United Nations Convention on Contracts for the International +Sale of Goods will not govern this Agreement. This Agreement is the +entire agreement between us regarding the subject matter hereof and +supersedes any other understandings or agreements with respect to the +Materials or the subject matter hereof. If any provision of this +Agreement is deemed invalid or unenforceable by any court having +jurisdiction, that particular provision will be deemed modified to the +extent necessary to make the provision valid and enforceable, and the +remaining provisions will remain in full force and effect. + +SPECIAL PROVISIONS APPLICABLE TO THE EUROPEAN UNION + +If You acquired the Materials in the European Union (EU), the following +provisions also apply to You. If there is any inconsistency between the +terms of the Software License Agreement set out earlier and the +following provisions, the following provisions shall take precedence. + +1. Distribution. You may sublicense modifications of the Software +covered in this Agreement if they meet the requirements of clause 9 +above. + +2. Limited Warranty. EXCEPT AS STATED EARLIER IN THIS AGREEMENT, AND AS +PROVIDED UNDER THE HEADING "STATUTORY RIGHTS", THE SOFTWARE IS PROVIDED +AS-IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, +INCLUDING, BUT NOT LIMITED TO, ANY IMPLIED WARRANTIES, NONINFRINGEMENT, +OR CONDITIONS OF MERCHANTABILITY, QUALITY AND FITNESS FOR A PARTICULAR +PURPOSE. + +3. Limitation of Remedy and Damages. THE LIMITATIONS OF REMEDIES AND +DAMAGES IN THE SOFTWARE LICENSE AGREEMENT SHALL NOT APPLY TO PERSONAL +INJURY (INCLUDING DEATH) TO ANY PERSON CAUSED BY TPC'S NEGLIGENCE AND +ARE SUBJECT TO THE PROVISION SET OUT UNDER THE HEADING "STATUTORY +RIGHTS". + +4. Statutory Rights: Irish law provides that certain conditions and +warranties may be implied in contracts for the sale of goods and in +contracts for the supply of services. Such conditions and warranties are +hereby excluded, to the extent such exclusion, in the context of this +transaction, is lawful under Irish law. Conversely, such conditions and +warranties, insofar as they may not be lawfully excluded, shall apply. +Accordingly nothing in this Agreement shall prejudice any rights that +You may enjoy by virtue of Sections 12, 13, 14 or 15 of the Irish Sale +of Goods Act 1893 (as amended). + +5. General. This Agreement is governed by the laws of the Republic of +Ireland. The local language version of this agreement shall apply to +Materials acquired in the EU. This Agreement is the entire agreement +between us with respect to the subject matter hereof and You agree that +TPC will not have any liability for any untrue statement or +representation made by it, its agents or anyone else (whether innocently +or negligently) upon which You relied upon entering this Agreement, +unless such untrue statement or representation was made fraudulently. \ No newline at end of file diff --git a/scripts/prepare_data.py b/scripts/prepare_data.py index a5aca1a..19a7e7a 100644 --- a/scripts/prepare_data.py +++ b/scripts/prepare_data.py @@ -1,9 +1,28 @@ +# Certain portions of the contents of this file are derived from TPC-H version 3.0.1 +# (retrieved from +# http://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp). +# Such portions are subject to copyrights held by +# Transaction Processing Performance Council (“TPC”) +# and licensed under the TPC EULA (a copy of which accompanies this +# file as “TPC EULA” and is also available at +# http://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp) +# (the “TPC EULA”). +# +# You may not use this file except in compliance with the TPC EULA. +# DISCLAIMER: Portions of this file is derived from the TPC-H +# Benchmark and as such any result obtained using this file are not +# comparable to published TPC-H Benchmark results, as the results +# obtained from using this file do not comply with the TPC-H Benchmark. + + import polars as pl from settings import Settings settings = Settings() +# Source tables contained in the schema for TPC-H. For more information, check - +# https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC-H_v3.0.1.pdf table_columns = { "customer": [ diff --git a/tpch-dbgen/README b/tpch-dbgen/README index 90bb2e9..2220af9 100644 --- a/tpch-dbgen/README +++ b/tpch-dbgen/README @@ -1,7 +1,18 @@ -# @(#)README 2.4.0 + +# Disclaimer + +Certain portions of the contents of this folder are derived from TPC-H version 3.2.0 (retrieved from ). Such portions are subject to copyrights held by Transaction Processing Performance Council (“TPC”) and licensed under the TPC EULA (a copy of which accompanies this file as “TPC EULA” and is also available at ) (the “TPC EULA”). + +You may not use files in this folder except in compliance with the TPC EULA. +DISCLAIMER: Portions of this folder is derived from the TPC-H Benchmark and as such any results +obtained using this file are not comparable to published TPC-H Benchmark results, as the results obtained from using this file do not comply with the TPC-H Benchmark. + +# @(#)README 2.4.0 Table of Contents + =================== + 0. What is this document? 1. What is DBGEN? 2. What will DBGEN create? @@ -20,48 +31,48 @@ Table of Contents 15. Version Numbering in DBGEN and QGEN 16. Validated Platforms -0. What is this document? + 0. What is this document? This is the general README file for DBGEN and QGEN, the data- -base population and executable query text generation programs -used in the TPC-H benchmark. It covers the proper use -of DBGEN and QGEN. For information on porting the utility to your +base population and executable query text generation programs +used in the TPC-H benchmark. It covers the proper use +of DBGEN and QGEN. For information on porting the utility to your particular platform see Porting.Notes. 1. What is DBGEN? DBGEN is a database population program for use with the TPC-H benchmark. -It is written in ANSI 'C' for portability, and has -been successfully ported to over a dozen different systems. While the -TPC-H specification allow an implementor to use any utility -to populate the benchmark database, the resultant population must exactly -match the output of DBGEN. The source code has been provided to make the +It is written in ANSI 'C' for portability, and has +been successfully ported to over a dozen different systems. While the +TPC-H specification allow an implementor to use any utility +to populate the benchmark database, the resultant population must exactly +match the output of DBGEN. The source code has been provided to make the process of building a compliant database population as simple as possible. 2. What will DBGEN create? Without any command line options, DBGEN will generate 8 separate ascii files. Each file will contain pipe-delimited load data for one of the -tables defined in the TPC-H database schema. The default tables -will contain the load data required for a scale factor 1 database. By -default the file will be created in the current directory and be -named .tbl. As an example, customer.tbl will contain the +tables defined in the TPC-H database schema. The default tables +will contain the load data required for a scale factor 1 database. By +default the file will be created in the current directory and be +named
.tbl. As an example, customer.tbl will contain the load data for the customer table. -When invoked with the '-U' flag, DBGEN will create the data sets to be -used in the update functions and the SQL syntax required to delete the -data sets. The update files will be created in the same directory as -the load data files and will be named "u_
.set". The delete -syntax will be written to "delete.set". For instance, the data set to -be used in the third query set to update the lineitem table will be -named "u_lineitem.tbl.3", and the SQL to remove those rows will be -found in "delete.3". The size of the update files can be controlled +When invoked with the '-U' flag, DBGEN will create the data sets to be +used in the update functions and the SQL syntax required to delete the +data sets. The update files will be created in the same directory as +the load data files and will be named "u_
.set". The delete +syntax will be written to "delete.set". For instance, the data set to +be used in the third query set to update the lineitem table will be +named "u_lineitem.tbl.3", and the SQL to remove those rows will be +found in "delete.3". The size of the update files can be controlled with the '-r' flag. 3. How is DBGEN built? -Create an appropriate makefile, using makefile.suite as a basis, -and type make. Refer to Porting.Notes for more details and for +Create an appropriate makefile, using makefile.suite as a basis, +and type make. Refer to Porting.Notes for more details and for suggested compile time options. 4. Command Line Options for DBGEN @@ -87,96 +98,95 @@ option argument default action 1.0 represents ~1 GB of data -T
Generate the data for a particular table - ONLY. Arguments: p -- part/partuspp, - c -- customer, s -- supplier, + ONLY. Arguments: p -- part/partuspp, + c -- customer, s -- supplier, o -- orders/lineitem, n -- nation, r -- region, l -- code (same as n and r), - O -- orders, L -- lineitem, P -- part, + O -- orders, L -- lineitem, P -- part, S -- partsupp --O d Generate SQL for delete function +-O d Generate SQL for delete function instead of key ranges --O f Allow over-ride of default output file +-O f Allow over-ride of default output file names -O h Generate headers in flat ascii files. - hd_XXX routines must be defined in + hd_XXX routines must be defined in load_stub.c -O m Flat files generate fixed length records --O r Generate key ranges for the UF2 update +-O r Generate key ranges for the UF2 update function -O v Verify data set without generating it. --r 10 Scale each udpate file to the given +-r 10 Scale each udpate file to the given percentage (expressed in basis points) of the data set --v none Verbose. Progress messages are +-v none Verbose. Progress messages are displayed as data is generated. -n Use database for in-line load --C Use separate processes to +-C Use separate processes to generate data -S Generate the th part of a multi-part load or update set -U Create a specified number of data sets - in flat files for the update/delete + in flat files for the update/delete functions --i Split the inserted rows in an refresh pair - between files +-i Split the inserted rows in an refresh pair + between files -d Split the deleted rows in an refresh pair - between files + between files 5. DBGEN limitations and compliant usage -DBGEN is meant to be a robust population generator for use with the -TPC-H benchmark. It is hoped that DBGEN will make it easier -to experiment with and become proficient in the execution of TPC decision -support benchmarks. As a result, it includes a number of command line -options which are not, strictly speaking, necessary to generate a compliant -data set for a TPC-D run. In addition, some command line options will accept -arguments which result in the generation of NON-COMPLIANT data sets. Options +DBGEN is meant to be a robust population generator for use with the +TPC-H benchmark. It is hoped that DBGEN will make it easier +to experiment with and become proficient in the execution of TPC decision +support benchmarks. As a result, it includes a number of command line +options which are not, strictly speaking, necessary to generate a compliant +data set for a TPC-D run. In addition, some command line options will accept +arguments which result in the generation of NON-COMPLIANT data sets. Options which should be used with care include: --s -- scale factor. TPC-H runs are only compliant when run against SF's +-s -- scale factor. TPC-H runs are only compliant when run against SF's of 1, 10, 100, 300, 1000, 3000, 10000, 30000, 100000 --r -- refresh percentage. TPC-H runs are only compliant when run with +-r -- refresh percentage. TPC-H runs are only compliant when run with -r 10, the default. 6. Sample DBGEN executions DBGEN has been built to allow as much flexibility as possible, but is -fundementally intended to generate two things: a database population -against which the queries in TPC-H can be run, and the updates -that are used during the update functions in TPC-H. Here are +fundementally intended to generate two things: a database population +against which the queries in TPC-H can be run, and the updates +that are used during the update functions in TPC-H. Here are some sample uses of DBGEN. 1. To generate the database population for the qualification database - dbgen -s 1 + dbgen -s 1 2. To generate the lineitem table only, for a scale factor 10 database, and over-write any existing flat files: - dbgen -s 10 -f -T L - 4. To geterate a 100GB data set in 1GB pieces, generate only the part and + dbgen -s 10 -f -T L + 4. To geterate a 100GB data set in 1GB pieces, generate only the part and partsupplier tables, and include some progress reports along the way: - dbgen -s 100 -S 1 -C 100 -T p -v (to generate the first 1GB file) - dbgen -s 100 -S 2 -C 100 -T p -v (to generate the second 1GB file) + dbgen -s 100 -S 1 -C 100 -T p -v (to generate the first 1GB file) + dbgen -s 100 -S 2 -C 100 -T p -v (to generate the second 1GB file) (and so on, incrementing the argument to -S each time) 5. To generate the update files needed for a 4 stream run of the throughput - test at 100 GB, using an existing set of seed files from an 8 process + test at 100 GB, using an existing set of seed files from an 8 process load: - dbgen -s 100 -U 4 -C 8 - + dbgen -s 100 -U 4 -C 8 -7. What is QGEN? + 7. What is QGEN? QGEN is a query generation program for use with the TPC-H benchmark. It is written in ANSI 'C' for portability, and has been successfully @@ -215,8 +225,8 @@ select l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, - sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, - sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, + sum(l_extendedprice *(1 - l_discount)) as sum_disc_price, + sum(l_extendedprice* (1 - l_discount) *(1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, @@ -236,8 +246,8 @@ select l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, - sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, - sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, + sum(l_extendedprice *(1 - l_discount)) as sum_disc_price, + sum(l_extendedprice* (1 - l_discount) *(1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, @@ -294,15 +304,15 @@ option argument default action -r Seed the rnadom number generator with --s Set scale to for parameter +-s Set scale to for parameter substitutions. --t Use contents of to complete a query +-t Use contents of to complete a query stream -T none Use time table format for date substitution --v none Verbose. Progress messages are +-v none Verbose. Progress messages are displayed as data is generated. -x none Generate a query plan as part of query @@ -312,8 +322,8 @@ option argument default action QGEN is a simple ASCII text filter, meant to translate query generalized query syntax("query template") into the executable query text(EQT) re- -quired by the benchmarks. It provides a number of shorthands and syntactic -extensions that allow the automatic generation of query parameters and some +quired by the benchmarks. It provides a number of shorthands and syntactic +extensions that allow the automatic generation of query parameters and some control over the operation of the benchmark implementation. QGEN first strips all comments from the query template, recognizing both @@ -326,29 +336,29 @@ will not be expanded. Tag Converted To Based on === ============ ======== -:c database ;(1) -n from the command line +:c database ;(1) -n from the command line :x set explain on;(1) -x from the command line : paremeter :s stream number :o output to outpath/qnum.stream;(1) - -o from command line, -s from + -o from command line, -s from command line :b BEGIN WORK;(1) -a from comand line :e COMMIT WORK(1) -a from command line :q query number -:n sets rowcount to be returned +:n sets rowcount to be returned to , unless -N appears on the command line Notes: (1) This is Informix-specific syntax. Refer to Porting.Notes for tailoring the generated text to your database environment. - + 12. Sample QGEN executions and Query Templates -QGEN translates generic query templates into valid SQL. In addition, it -allows conditional inclusion of the commands necessary to connect to a +QGEN translates generic query templates into valid SQL. In addition, it +allows conditional inclusion of the commands necessary to connect to a database, produce diagnostic output, etc. Here are some sample of QGEN -usage, and the way that command line parameters and the query templates +usage, and the way that command line parameters and the query templates interact to produce valid SQL. Template, in $DSS_QUERY/1.sql: @@ -361,15 +371,15 @@ interact to produce valid SQL. 1. "qgen 1", would produce: select count(*) from foo; - select count(*) from lineitem - where l_orderdate < '1997-01-01'; + select count(*) from lineitem + where l_orderdate < '1997-01-01'; Assuming that 1 January 1997 was a valid substitution for parameter 1. 2. "qgen -d -c dss1 1, would produce: database dss1; select count(*) from foo; - select count(*) from lineitem - where l_orderdate < '1995-07-18'; + select count(*) from lineitem + where l_orderdate < '1995-07-18'; Assuming that 18 July 1995 was the default substitution for parameter 1, and using Informix syntax. @@ -378,15 +388,14 @@ interact to produce valid SQL. output to "somepath/1.0" select count(*) from foo; set explain on; - select count(*) from lineitem - where l_orderdate < '1995-07-18'; + select count(*) from lineitem + where l_orderdate < '1995-07-18'; Assuming that 18 July 1995 was the default substitution for parameter 1, and using Informix syntax. - -13. Environment Variables + 13. Environment Variables -Enviroment variables are used to control features of DBGEN and QGEN +Enviroment variables are used to control features of DBGEN and QGEN which are unlikely to change from one execution to another. Variable Default Action @@ -406,19 +415,19 @@ available with the '-h' option. A version number is of the form: | | | | | | | | | | | | - | | | -- modification: alphabetic, incremented for any trivial changes + | | | -- modification: alphabetic, incremented for any trivial changes | | | to the source (e.g, porting ifdef's) | | ---- patch level: numeric, incremented for any minor bug fix | | (e.g, qgen parameter range) | ------- release: numeric, incremented for each minor revision of the | specification - |-------- version: numeric, incremented for each major revision of the + |-------- version: numeric, incremented for each major revision of the specification -An implementation of TPC-H is valid only if it conforms to the +An implementation of TPC-H is valid only if it conforms to the following version usage rules: - -- The Version of DBGEN and QGEN must match the integer portion of the + -- The Version of DBGEN and QGEN must match the integer portion of the current specification revision 15. The current revisions are: @@ -426,11 +435,11 @@ following version usage rules: QGEN: 2.4.0 16. Validated Platforms - The following platforms have been validated to produce the reference + The following platforms have been validated to produce the reference data set for TPC-H 2.4.0 - Processor Operating System (version) Compiler (version) Compiler Flags + Processor Operating System (version) Compiler (version) Compiler Flags ---------------------------------------------------------------------------- - POWER5 AIX 64-bit (5.3) C for AIX Compiler, v7 -q64 (no -g) - IA-64 HPUX 64-bit () icc - Linux 32-bit () gcc + POWER5 AIX 64-bit (5.3) C for AIX Compiler, v7 -q64 (no -g) + IA-64 HPUX 64-bit () icc + Linux 32-bit () gcc