Here we show how to use FlexLLMGen for the data wrangling tasks including entity match (EM), data imputation (DI) and error detection (ED). The implementation follows the fm_data_tasks repo from HazyResearch.
- Run our install.sh scripts to install additional python library and download the desired datasets.
-
To check the outcome and verify the result of a data imputation task (e.g., Restaurant on OPT-6.7B), run:
bash test_single_query_case.sh
-
To test the throughput of FlexLLMGen for a data imputation task (e.g., Restaurant on OPT-6.7B), run:
bash test_batch_query_case.sh
-
To run the complete tests of all tasks on OPT-6.7B:
bash test_batch_query_all_opt6.7b.sh
-
To run the complete tests of all tasks on OPT-30B:
bash test_batch_query_all_opt30b.sh
-
To run the complete tests of all tasks on OPT-175B:
bash test_batch_query_all_opt175b.sh
-
Notice that in this data wrangling tasks, such as entity match (EM), data imputation (DI) and error detection (ED), the input sequences length is very long (from 123 to 1274), but the output length is very short (e.g., 3, 5, or 10). Most of the inference time is spent on prefill phase, so here we report the throughput that includes both input and output tokens as our measurement.
-
We run the experiments on the same setting as the HELM benchmark with a single T4 (16GB) GPU, 200GB of DRAM, and 1.5TB SSD connected by NVMe.
Task | Tested Samples | Input Length | Output Length | Time (s) | Input + Output Throughput (token/s) |
---|---|---|---|---|---|
EM: Fodors-Zagats | 189 | 744 | 3 | 109.556 | 1281.871 |
EM: Beer | 91 | 592 | 3 | 42.087 | 1272.360 |
EM: iTunes-Amazon | 109 | 529 | 3 | 59.467 | 966.178 |
EM: Walmart-Amazon | 200 | 748 | 3 | 126.538 | 1186.992 |
EM: Amazon-Google | 200 | 876 | 3 | 144.593 | 1215.828 |
EM: DBLP-ACM | 200 | 1274 | 3 | 207.513 | 1230.767 |
EM: DBLP-GoogleScholar | 200 | 1209 | 3 | 232.65 | 1097.78 |
DI: Restaurant | 86 | 123 | 5 | 10.397 | 984.865 |
DI: Buy | 65 | 488 | 10 | 43.077 | 739.876 |
ED: Hospital | 200 | 200 | 3 | 30.137 | 1347.203 |
Task | Tested Samples | Input Length | Output Length | Time (s) | Input + Output Throughput (token/s) |
---|---|---|---|---|---|
EM: Fodors-Zagats | 189 | 744 | 3 | 541.550 | 248.287 |
EM: Beer | 91 | 592 | 3 | 238.58 | 224.450 |
EM: iTunes-Amazon | 109 | 529 | 3 | 267.639 | 198.775 |
EM: Walmart-Amazon | 200 | 748 | 3 | 682.635 | 220.030 |
EM: Amazon-Google | 200 | 876 | 3 | 799.514 | 219.884 |
EM: DBLP-ACM | 200 | 1274 | 3 | 1119.272 | 228.184 |
EM: DBLP-GoogleScholar | 200 | 1209 | 3 | 1271.534 | 190.636 |
DI: Restaurant | 86 | 123 | 5 | 60.310 | 169.790 |
DI: Buy | 65 | 488 | 10 | 185.882 | 160.747 |
ED: Hospital | 200 | 200 | 3 | 158.329 | 256.429 |
Task | Tested Samples | Input Length | Output Length | Time (s) | Input + Output Throughput (token/s) |
---|---|---|---|---|---|
EM: Fodors-Zagats | 189 | 744 | 3 | 3928.310 | 34.228 |
EM: Beer | 91 | 592 | 3 | 1356.786 | 35.083 |
EM: iTunes-Amazon | 109 | 529 | 3 | 1569.062 | 33.906 |
EM: Walmart-Amazon | 200 | 748 | 3 | 4171.319 | 36.008 |
EM: Amazon-Google | 200 | 876 | 3 | 4893.572 | 35.925 |
EM: DBLP-ACM | 200 | 1274 | 3 | 7624.726 | 33.496 |
EM: DBLP-GoogleScholar | 200 | 1209 | 3 | 8275.828 | 29.290 |
DI: Restaurant | 86 | 123 | 5 | 648.762 | 16.968 |
DI: Buy | 65 | 488 | 10 | 2086.961 | 14.317 |
ED: Hospital | 200 | 200 | 3 | 1154.133 | 35.178 |