Update BEIR 2CR docs; other minor touch-ups (#1857)

+ Updated BEIR 2CR docs to provide refs and more detailed execution instructions. + Fixed broken test cases. + Tweaked 2CR regressions; minor refactoring.
castorini · Apr 8, 2024 · c13cd63 · c13cd63
1 parent 30441ea
commit c13cd63
Show file tree

Hide file tree

Showing 5 changed files with 204 additions and 27 deletions.
diff --git a/docs/2cr/beir.html b/docs/2cr/beir.html
@@ -40,6 +40,14 @@
   padding-left: 15px;
 }
 
+blockquote.mycode2 {
+  border-left: 3px solid #ccc;
+  margin-left: 25px;
+  padding-top: 10px;
+  padding-bottom: 10px;
+  padding-left: 15px;
+}
+
 tr th.headertop {
   border-bottom: none;
   padding-bottom: 0rem
@@ -132,14 +140,17 @@ <h1 class="mb-3">BEIR</h1>
 
 <div class="container my-4">
 
+<p>The two-click<a href="#" data-mdb-toggle="tooltip" title="What are the two clicks, you ask? Copy and paste!"><sup>*</sup></a> reproduction matrix below provides commands for reproducing the experimental results below.
+Instructions for programmatic execution are shown at the bottom of this page (scroll down).</p>
+
 <p>Key:</p>
 
 <ul>
-  <li>BM25 Flat: BM25 "flat" baseline</li>
-  <li>BM25 Multifield: BM25 "multifield" baseline</li>
-  <li>SPLADE: SPLADE++ (CoCondenser-EnsembleDistil)</li>
-  <li>Contriever-msmarco: Contriever FT MS MARCO</li>
-  <li>BEG-base: BGE-base-en-v1.5</li>
+  <li>BM25 Flat: BM25 "flat" baseline [1]</li>
+  <li>BM25 Multifield: BM25 "multifield" baseline [1]</li>
+  <li>SPLADE: SPLADE++ (CoCondenser-EnsembleDistil) [2]</li>
+  <li>Contriever-msmarco: Contriever FT MS MARCO [3]</li>
+  <li>BEG-base: BGE-base-en-v1.5 [4]</li>
 </ul>
 
 <div class="table-responsive">
@@ -150,7 +161,7 @@ <h1 class="mb-3">BEIR</h1>
         <th class="headertop"></th>
         <th class="headertop" colspan="3"><b>BM25 Flat</b></th>
         <th class="headertop" colspan="3"><b>BM25 Multifield</b></th>
-        <th class="headertop" colspan="3"><b>SPLADE</b></th>
+        <th class="headertop" colspan="3"><b>SPLADE++ ED</b></th>
         <th class="headertop" colspan="3"><b>Contriever MSMARCO</b></th>
         <th class="headertop" colspan="3"><b>BGE-base</b></th>
       </tr>
@@ -6009,6 +6020,81 @@ <h1 class="mb-3">BEIR</h1>
   </table>
 </div>
 
+<ul style="list-style-type:none; padding-top: 25px">
+
+<li><p>[1] Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, and Jimmy Lin.
+<a href="https://arxiv.org/abs/2306.07471">Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard.</a>
+<i>arXiv:2306.07471</i>, June 2023.</p></li>
+
+<li><p>[2] Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant.
+<a href="https://dl.acm.org/doi/10.1145/3477495.3531857">From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.</a>
+<i>Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</i>, pages 2353–2359.</p></li>
+
+<li><p>[3] Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave.
+<a href="https://arxiv.org/abs/2112.09118">Towards Unsupervised Dense Information Retrieval with Contrastive Learning.</a>
+<i>arXiv:2112.09118</i>, December 2021.</p></li>
+
+<li><p>[4] Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff.
+<a href="https://arxiv.org/abs/2309.07597">C-Pack: Packaged Resources To Advance General Chinese Embedding.</a>
+<i>arXiv:2309.07597</i>, December 2023.</p></li>
+
+</ul>
+
+<div style="padding-top: 20px"/>
+
+<h4>Programmatic Execution</h4>
+
+<p>All experimental runs shown in the above table can be programmatically executed based on the instructions below.
+To list all the experimental conditions:</p>
+
+<blockquote class="mycode2"><tt>
+python -m pyserini.2cr.beir --list-conditions
+</tt></blockquote>
+
+<p>These conditions correspond to the table rows above.</p>
+
+<p>For all conditions, just show the commands in a "dry run":</p>
+
+<blockquote class="mycode2"><tt>
+python -m pyserini.2cr.beir --all --display-commands --dry-run
+</tt></blockquote>
+
+<p>To actually run all the experimental conditions:</p>
+
+<blockquote class="mycode2"><tt>
+python -m pyserini.2cr.beir --all --display-commands
+</tt></blockquote>
+
+<p>With the above command, run files will be placed in the current directory.
+Use the option <tt>--directory runs/</tt> to place the runs in a sub-directory.</p>
+
+<p>To show the commands for a specific condition:</p>
+
+<blockquote class="mycode2"><tt>
+python -m pyserini.2cr.beir --condition bm25-flat --display-commands --dry-run
+</tt></blockquote>
+
+<p>This will generate exactly the commands for a specific condition above (corresponding to a row in the table).</p>
+
+<p>To actually run a specific condition:</p>
+
+<blockquote class="mycode2"><tt>
+python -m pyserini.2cr.beir --condition bm25-flat --display-commands
+</tt></blockquote>
+
+<p>Again, with the above command, run files will be placed in the current directory.
+Use the option <tt>--directory runs/</tt> to place the runs in a sub-directory.</p>
+
+<p>Finally, to generate this page:</p>
+
+<blockquote class="mycode2"><tt>
+python -m pyserini.2cr.beir --generate-report --output beir.html
+</tt></blockquote>
+
+<p>The output file <tt>beir.html</tt> should be identical to this page.</p>
+
+<div style="padding-top: 50px"/>
+
       </div>
 
 

diff --git a/pyserini/2cr/beir_html.template b/pyserini/2cr/beir_html.template
@@ -40,6 +40,14 @@ blockquote.mycode {
   padding-left: 15px;
 }
 
+blockquote.mycode2 {
+  border-left: 3px solid #ccc;
+  margin-left: 25px;
+  padding-top: 10px;
+  padding-bottom: 10px;
+  padding-left: 15px;
+}
+
 tr th.headertop {
   border-bottom: none;
   padding-bottom: 0rem
@@ -132,14 +140,17 @@ pre[class*="prettyprint"] {
 
 <div class="container my-4">
 
+<p>The two-click<a href="#" data-mdb-toggle="tooltip" title="What are the two clicks, you ask? Copy and paste!"><sup>*</sup></a> reproduction matrix below provides commands for reproducing the experimental results below.
+Instructions for programmatic execution are shown at the bottom of this page (scroll down).</p>
+
 <p>Key:</p>
 
 <ul>
-  <li>BM25 Flat: BM25 "flat" baseline</li>
-  <li>BM25 Multifield: BM25 "multifield" baseline</li>
-  <li>SPLADE: SPLADE++ (CoCondenser-EnsembleDistil)</li>
-  <li>Contriever-msmarco: Contriever FT MS MARCO</li>
-  <li>BEG-base: BGE-base-en-v1.5</li>
+  <li>BM25 Flat: BM25 "flat" baseline [1]</li>
+  <li>BM25 Multifield: BM25 "multifield" baseline [1]</li>
+  <li>SPLADE: SPLADE++ (CoCondenser-EnsembleDistil) [2]</li>
+  <li>Contriever-msmarco: Contriever FT MS MARCO [3]</li>
+  <li>BEG-base: BGE-base-en-v1.5 [4]</li>
 </ul>
 
 <div class="table-responsive">
@@ -150,7 +161,7 @@ pre[class*="prettyprint"] {
         <th class="headertop"></th>
         <th class="headertop" colspan="3"><b>BM25 Flat</b></th>
         <th class="headertop" colspan="3"><b>BM25 Multifield</b></th>
-        <th class="headertop" colspan="3"><b>SPLADE</b></th>
+        <th class="headertop" colspan="3"><b>SPLADE++ ED</b></th>
         <th class="headertop" colspan="3"><b>Contriever MSMARCO</b></th>
         <th class="headertop" colspan="3"><b>BGE-base</b></th>
       </tr>
@@ -181,6 +192,81 @@ $rows
   </table>
 </div>
 
+<ul style="list-style-type:none; padding-top: 25px">
+
+<li><p>[1] Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, and Jimmy Lin.
+<a href="https://arxiv.org/abs/2306.07471">Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard.</a>
+<i>arXiv:2306.07471</i>, June 2023.</p></li>
+
+<li><p>[2] Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant.
+<a href="https://dl.acm.org/doi/10.1145/3477495.3531857">From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.</a>
+<i>Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</i>, pages 2353–2359.</p></li>
+
+<li><p>[3] Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave.
+<a href="https://arxiv.org/abs/2112.09118">Towards Unsupervised Dense Information Retrieval with Contrastive Learning.</a>
+<i>arXiv:2112.09118</i>, December 2021.</p></li>
+
+<li><p>[4] Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff.
+<a href="https://arxiv.org/abs/2309.07597">C-Pack: Packaged Resources To Advance General Chinese Embedding.</a>
+<i>arXiv:2309.07597</i>, December 2023.</p></li>
+
+</ul>
+
+<div style="padding-top: 20px"/>
+
+<h4>Programmatic Execution</h4>
+
+<p>All experimental runs shown in the above table can be programmatically executed based on the instructions below.
+To list all the experimental conditions:</p>
+
+<blockquote class="mycode2"><tt>
+python -m pyserini.2cr.beir --list-conditions
+</tt></blockquote>
+
+<p>These conditions correspond to the table rows above.</p>
+
+<p>For all conditions, just show the commands in a "dry run":</p>
+
+<blockquote class="mycode2"><tt>
+python -m pyserini.2cr.beir --all --display-commands --dry-run
+</tt></blockquote>
+
+<p>To actually run all the experimental conditions:</p>
+
+<blockquote class="mycode2"><tt>
+python -m pyserini.2cr.beir --all --display-commands
+</tt></blockquote>
+
+<p>With the above command, run files will be placed in the current directory.
+Use the option <tt>--directory runs/</tt> to place the runs in a sub-directory.</p>
+
+<p>To show the commands for a specific condition:</p>
+
+<blockquote class="mycode2"><tt>
+python -m pyserini.2cr.beir --condition bm25-flat --display-commands --dry-run
+</tt></blockquote>
+
+<p>This will generate exactly the commands for a specific condition above (corresponding to a row in the table).</p>
+
+<p>To actually run a specific condition:</p>
+
+<blockquote class="mycode2"><tt>
+python -m pyserini.2cr.beir --condition bm25-flat --display-commands
+</tt></blockquote>
+
+<p>Again, with the above command, run files will be placed in the current directory.
+Use the option <tt>--directory runs/</tt> to place the runs in a sub-directory.</p>
+
+<p>Finally, to generate this page:</p>
+
+<blockquote class="mycode2"><tt>
+python -m pyserini.2cr.beir --generate-report --output beir.html
+</tt></blockquote>
+
+<p>The output file <tt>beir.html</tt> should be identical to this page.</p>
+
+<div style="padding-top: 50px"/>
+
       </div>
 
 

diff --git a/pyserini/2cr/ciral.py b/pyserini/2cr/ciral.py
@@ -19,9 +19,9 @@
 import os
 import sys
 import time
-import subprocess
 import importlib.resources
 from collections import defaultdict, OrderedDict
+from datetime import datetime
 from string import Template
 
 import yaml
@@ -96,6 +96,7 @@ def list_conditions():
     for language in languages:
         print(language[1])
 
+
 def print_results(table, metric, split):
     print(f'Metric = {metric}, Split = {split}')
     print(' ' * 32, end='')
@@ -110,6 +111,7 @@ def print_results(table, metric, split):
         print('')
     print('')
 
+
 def generate_table_rows(table, row_template, commands, eval_commands, table_id, split, metric):
     row_cnt = 1
     html_rows = []
@@ -153,6 +155,7 @@ def generate_table_rows(table, row_template, commands, eval_commands, table_id,
 
     return html_rows
 
+
 def extract_topic_fn_from_cmd(cmd):
     cmd = cmd.split()
     topic_idx = cmd.index('--topics')
@@ -189,12 +192,9 @@ def generate_report(args):
                                      afriberta_dpr_output=afriberta_dpr_output, fusion_tag=fusion_tag)
             else:
                 expected_args = dict(split=display_split, output=runfile,
-                                    sparse_threads=sparse_threads, sparse_batch_size=sparse_batch_size,
-                                    dense_threads=dense_threads, dense_batch_size=dense_batch_size)
+                                     sparse_threads=sparse_threads, sparse_batch_size=sparse_batch_size,
+                                     dense_threads=dense_threads, dense_batch_size=dense_batch_size)
 
-            # cmd = Template(cmd_template).substitute(split=display_split, output=runfile,
-            #                                         sparse_threads=sparse_threads, sparse_batch_size=sparse_batch_size,
-            #                                         dense_threads=dense_threads, dense_batch_size=dense_batch_size)
             cmd = Template(cmd_template).substitute(**expected_args)
             commands[name] = format_run_command(cmd)
 
@@ -289,7 +289,7 @@ def run_conditions(args):
                             if not os.path.exists(runfile):
                                 continue
                             score = float(run_eval_and_return_metric(metric, f'{eval_key}-{split}',
-                                                                        trec_eval_metric_definitions[metric], runfile))
+                                                                     trec_eval_metric_definitions[metric], runfile))
                             if math.isclose(score, float(expected[metric])):
                                 result_str = ok_str
                             else:
@@ -306,18 +306,24 @@ def run_conditions(args):
             print_results(table, metric, split)
 
     end = time.time()
-    print(f'Total elapsed time: {end - start:.0f}s')
+
+    start_str = datetime.utcfromtimestamp(start).strftime('%Y-%m-%d %H:%M:%S')
+    end_str = datetime.utcfromtimestamp(end).strftime('%Y-%m-%d %H:%M:%S')
+
+    print('\n')
+    print(f'Start time: {start_str}')
+    print(f'End time: {end_str}')
+    print(f'Total elapsed time: {end - start:.0f}s ~{(end - start)/3600:.1f}hr')
+
 
 if __name__ == '__main__':
     parser = argparse.ArgumentParser(description='Generate regression matrix for CIRAL.')
-    parser.add_argument('--condition', type=str,
-                        help='Condition to run', required=False)
+    parser.add_argument('--condition', type=str, help='Condition to run', required=False)
     # To list all conditions
     parser.add_argument('--list-conditions', action='store_true', default=False, help='List available conditions.')
     # For generating reports
     parser.add_argument('--generate-report', action='store_true', default=False, help='Generate report.')
-    parser.add_argument('--display-split', type=str, help='Split to generate report on.', 
-                        default='test-b', required=False)
+    parser.add_argument('--display-split', type=str, help='Split to generate report on.', default='test-b', required=False)
     parser.add_argument('--output', type=str, help='File to store report.', required=False)
     # For actually running the experimental conditions
     parser.add_argument('--all', action='store_true', default=False, help='Run using all languages.')

diff --git a/scripts/jobs.integrations-all.txt b/scripts/jobs.integrations-all.txt
@@ -1,4 +1,4 @@
-python -m unittest discover -s integrations/dense > logs/log.dense 2>&1
 python -m unittest discover -s integrations/sparse > logs/log.sparse 2>&1
+python -m unittest discover -s integrations/dense > logs/log.dense 2>&1
 python -m unittest discover -s integrations/clprf > logs/log.clprf 2>&1
 python -m unittest discover -s integrations/papers > logs/log.papers 2>&1
diff --git a/tests/test_prebuilt_index.py b/tests/test_prebuilt_index.py
@@ -83,8 +83,7 @@ def test_impact_beir(self):
                     urls.append(url)
 
         # 29 from SPLADE-distill CoCodenser-medium
-        # 29 from SPLADE++ (CoCondenser-EnsembleDistil)
-        self.assertEqual(cnt, 58)
+        self.assertEqual(cnt, 29)
         self._test_urls(urls)
 
     def test_impact_mrtydi(self):