Skip to content

Commit

Permalink
Update BEIR 2CR docs; other minor touch-ups (#1857)
Browse files Browse the repository at this point in the history
+ Updated BEIR 2CR docs to provide refs and more detailed execution instructions.
+ Fixed broken test cases.
+ Tweaked 2CR regressions; minor refactoring.
  • Loading branch information
lintool authored Apr 8, 2024
1 parent 30441ea commit c13cd63
Show file tree
Hide file tree
Showing 5 changed files with 204 additions and 27 deletions.
98 changes: 92 additions & 6 deletions docs/2cr/beir.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,14 @@
padding-left: 15px;
}

blockquote.mycode2 {
border-left: 3px solid #ccc;
margin-left: 25px;
padding-top: 10px;
padding-bottom: 10px;
padding-left: 15px;
}

tr th.headertop {
border-bottom: none;
padding-bottom: 0rem
Expand Down Expand Up @@ -132,14 +140,17 @@ <h1 class="mb-3">BEIR</h1>

<div class="container my-4">

<p>The two-click<a href="#" data-mdb-toggle="tooltip" title="What are the two clicks, you ask? Copy and paste!"><sup>*</sup></a> reproduction matrix below provides commands for reproducing the experimental results below.
Instructions for programmatic execution are shown at the bottom of this page (scroll down).</p>

<p>Key:</p>

<ul>
<li>BM25 Flat: BM25 "flat" baseline</li>
<li>BM25 Multifield: BM25 "multifield" baseline</li>
<li>SPLADE: SPLADE++ (CoCondenser-EnsembleDistil)</li>
<li>Contriever-msmarco: Contriever FT MS MARCO</li>
<li>BEG-base: BGE-base-en-v1.5</li>
<li>BM25 Flat: BM25 "flat" baseline [1]</li>
<li>BM25 Multifield: BM25 "multifield" baseline [1]</li>
<li>SPLADE: SPLADE++ (CoCondenser-EnsembleDistil) [2]</li>
<li>Contriever-msmarco: Contriever FT MS MARCO [3]</li>
<li>BEG-base: BGE-base-en-v1.5 [4]</li>
</ul>

<div class="table-responsive">
Expand All @@ -150,7 +161,7 @@ <h1 class="mb-3">BEIR</h1>
<th class="headertop"></th>
<th class="headertop" colspan="3"><b>BM25 Flat</b></th>
<th class="headertop" colspan="3"><b>BM25 Multifield</b></th>
<th class="headertop" colspan="3"><b>SPLADE</b></th>
<th class="headertop" colspan="3"><b>SPLADE++ ED</b></th>
<th class="headertop" colspan="3"><b>Contriever MSMARCO</b></th>
<th class="headertop" colspan="3"><b>BGE-base</b></th>
</tr>
Expand Down Expand Up @@ -6009,6 +6020,81 @@ <h1 class="mb-3">BEIR</h1>
</table>
</div>

<ul style="list-style-type:none; padding-top: 25px">

<li><p>[1] Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, and Jimmy Lin.
<a href="https://arxiv.org/abs/2306.07471">Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard.</a>
<i>arXiv:2306.07471</i>, June 2023.</p></li>

<li><p>[2] Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant.
<a href="https://dl.acm.org/doi/10.1145/3477495.3531857">From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.</a>
<i>Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</i>, pages 2353–2359.</p></li>

<li><p>[3] Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave.
<a href="https://arxiv.org/abs/2112.09118">Towards Unsupervised Dense Information Retrieval with Contrastive Learning.</a>
<i>arXiv:2112.09118</i>, December 2021.</p></li>

<li><p>[4] Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff.
<a href="https://arxiv.org/abs/2309.07597">C-Pack: Packaged Resources To Advance General Chinese Embedding.</a>
<i>arXiv:2309.07597</i>, December 2023.</p></li>

</ul>

<div style="padding-top: 20px"/>

<h4>Programmatic Execution</h4>

<p>All experimental runs shown in the above table can be programmatically executed based on the instructions below.
To list all the experimental conditions:</p>

<blockquote class="mycode2"><tt>
python -m pyserini.2cr.beir --list-conditions
</tt></blockquote>

<p>These conditions correspond to the table rows above.</p>

<p>For all conditions, just show the commands in a "dry run":</p>

<blockquote class="mycode2"><tt>
python -m pyserini.2cr.beir --all --display-commands --dry-run
</tt></blockquote>

<p>To actually run all the experimental conditions:</p>

<blockquote class="mycode2"><tt>
python -m pyserini.2cr.beir --all --display-commands
</tt></blockquote>

<p>With the above command, run files will be placed in the current directory.
Use the option <tt>--directory runs/</tt> to place the runs in a sub-directory.</p>

<p>To show the commands for a specific condition:</p>

<blockquote class="mycode2"><tt>
python -m pyserini.2cr.beir --condition bm25-flat --display-commands --dry-run
</tt></blockquote>

<p>This will generate exactly the commands for a specific condition above (corresponding to a row in the table).</p>

<p>To actually run a specific condition:</p>

<blockquote class="mycode2"><tt>
python -m pyserini.2cr.beir --condition bm25-flat --display-commands
</tt></blockquote>

<p>Again, with the above command, run files will be placed in the current directory.
Use the option <tt>--directory runs/</tt> to place the runs in a sub-directory.</p>

<p>Finally, to generate this page:</p>

<blockquote class="mycode2"><tt>
python -m pyserini.2cr.beir --generate-report --output beir.html
</tt></blockquote>

<p>The output file <tt>beir.html</tt> should be identical to this page.</p>

<div style="padding-top: 50px"/>

</div>


Expand Down
98 changes: 92 additions & 6 deletions pyserini/2cr/beir_html.template
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,14 @@ blockquote.mycode {
padding-left: 15px;
}

blockquote.mycode2 {
border-left: 3px solid #ccc;
margin-left: 25px;
padding-top: 10px;
padding-bottom: 10px;
padding-left: 15px;
}

tr th.headertop {
border-bottom: none;
padding-bottom: 0rem
Expand Down Expand Up @@ -132,14 +140,17 @@ pre[class*="prettyprint"] {

<div class="container my-4">

<p>The two-click<a href="#" data-mdb-toggle="tooltip" title="What are the two clicks, you ask? Copy and paste!"><sup>*</sup></a> reproduction matrix below provides commands for reproducing the experimental results below.
Instructions for programmatic execution are shown at the bottom of this page (scroll down).</p>

<p>Key:</p>

<ul>
<li>BM25 Flat: BM25 "flat" baseline</li>
<li>BM25 Multifield: BM25 "multifield" baseline</li>
<li>SPLADE: SPLADE++ (CoCondenser-EnsembleDistil)</li>
<li>Contriever-msmarco: Contriever FT MS MARCO</li>
<li>BEG-base: BGE-base-en-v1.5</li>
<li>BM25 Flat: BM25 "flat" baseline [1]</li>
<li>BM25 Multifield: BM25 "multifield" baseline [1]</li>
<li>SPLADE: SPLADE++ (CoCondenser-EnsembleDistil) [2]</li>
<li>Contriever-msmarco: Contriever FT MS MARCO [3]</li>
<li>BEG-base: BGE-base-en-v1.5 [4]</li>
</ul>

<div class="table-responsive">
Expand All @@ -150,7 +161,7 @@ pre[class*="prettyprint"] {
<th class="headertop"></th>
<th class="headertop" colspan="3"><b>BM25 Flat</b></th>
<th class="headertop" colspan="3"><b>BM25 Multifield</b></th>
<th class="headertop" colspan="3"><b>SPLADE</b></th>
<th class="headertop" colspan="3"><b>SPLADE++ ED</b></th>
<th class="headertop" colspan="3"><b>Contriever MSMARCO</b></th>
<th class="headertop" colspan="3"><b>BGE-base</b></th>
</tr>
Expand Down Expand Up @@ -181,6 +192,81 @@ $rows
</table>
</div>

<ul style="list-style-type:none; padding-top: 25px">

<li><p>[1] Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, and Jimmy Lin.
<a href="https://arxiv.org/abs/2306.07471">Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard.</a>
<i>arXiv:2306.07471</i>, June 2023.</p></li>

<li><p>[2] Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant.
<a href="https://dl.acm.org/doi/10.1145/3477495.3531857">From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective.</a>
<i>Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</i>, pages 2353–2359.</p></li>

<li><p>[3] Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave.
<a href="https://arxiv.org/abs/2112.09118">Towards Unsupervised Dense Information Retrieval with Contrastive Learning.</a>
<i>arXiv:2112.09118</i>, December 2021.</p></li>

<li><p>[4] Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff.
<a href="https://arxiv.org/abs/2309.07597">C-Pack: Packaged Resources To Advance General Chinese Embedding.</a>
<i>arXiv:2309.07597</i>, December 2023.</p></li>

</ul>

<div style="padding-top: 20px"/>

<h4>Programmatic Execution</h4>

<p>All experimental runs shown in the above table can be programmatically executed based on the instructions below.
To list all the experimental conditions:</p>

<blockquote class="mycode2"><tt>
python -m pyserini.2cr.beir --list-conditions
</tt></blockquote>

<p>These conditions correspond to the table rows above.</p>

<p>For all conditions, just show the commands in a "dry run":</p>

<blockquote class="mycode2"><tt>
python -m pyserini.2cr.beir --all --display-commands --dry-run
</tt></blockquote>

<p>To actually run all the experimental conditions:</p>

<blockquote class="mycode2"><tt>
python -m pyserini.2cr.beir --all --display-commands
</tt></blockquote>

<p>With the above command, run files will be placed in the current directory.
Use the option <tt>--directory runs/</tt> to place the runs in a sub-directory.</p>

<p>To show the commands for a specific condition:</p>

<blockquote class="mycode2"><tt>
python -m pyserini.2cr.beir --condition bm25-flat --display-commands --dry-run
</tt></blockquote>

<p>This will generate exactly the commands for a specific condition above (corresponding to a row in the table).</p>

<p>To actually run a specific condition:</p>

<blockquote class="mycode2"><tt>
python -m pyserini.2cr.beir --condition bm25-flat --display-commands
</tt></blockquote>

<p>Again, with the above command, run files will be placed in the current directory.
Use the option <tt>--directory runs/</tt> to place the runs in a sub-directory.</p>

<p>Finally, to generate this page:</p>

<blockquote class="mycode2"><tt>
python -m pyserini.2cr.beir --generate-report --output beir.html
</tt></blockquote>

<p>The output file <tt>beir.html</tt> should be identical to this page.</p>

<div style="padding-top: 50px"/>

</div>


Expand Down
30 changes: 18 additions & 12 deletions pyserini/2cr/ciral.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@
import os
import sys
import time
import subprocess
import importlib.resources
from collections import defaultdict, OrderedDict
from datetime import datetime
from string import Template

import yaml
Expand Down Expand Up @@ -96,6 +96,7 @@ def list_conditions():
for language in languages:
print(language[1])


def print_results(table, metric, split):
print(f'Metric = {metric}, Split = {split}')
print(' ' * 32, end='')
Expand All @@ -110,6 +111,7 @@ def print_results(table, metric, split):
print('')
print('')


def generate_table_rows(table, row_template, commands, eval_commands, table_id, split, metric):
row_cnt = 1
html_rows = []
Expand Down Expand Up @@ -153,6 +155,7 @@ def generate_table_rows(table, row_template, commands, eval_commands, table_id,

return html_rows


def extract_topic_fn_from_cmd(cmd):
cmd = cmd.split()
topic_idx = cmd.index('--topics')
Expand Down Expand Up @@ -189,12 +192,9 @@ def generate_report(args):
afriberta_dpr_output=afriberta_dpr_output, fusion_tag=fusion_tag)
else:
expected_args = dict(split=display_split, output=runfile,
sparse_threads=sparse_threads, sparse_batch_size=sparse_batch_size,
dense_threads=dense_threads, dense_batch_size=dense_batch_size)
sparse_threads=sparse_threads, sparse_batch_size=sparse_batch_size,
dense_threads=dense_threads, dense_batch_size=dense_batch_size)

# cmd = Template(cmd_template).substitute(split=display_split, output=runfile,
# sparse_threads=sparse_threads, sparse_batch_size=sparse_batch_size,
# dense_threads=dense_threads, dense_batch_size=dense_batch_size)
cmd = Template(cmd_template).substitute(**expected_args)
commands[name] = format_run_command(cmd)

Expand Down Expand Up @@ -289,7 +289,7 @@ def run_conditions(args):
if not os.path.exists(runfile):
continue
score = float(run_eval_and_return_metric(metric, f'{eval_key}-{split}',
trec_eval_metric_definitions[metric], runfile))
trec_eval_metric_definitions[metric], runfile))
if math.isclose(score, float(expected[metric])):
result_str = ok_str
else:
Expand All @@ -306,18 +306,24 @@ def run_conditions(args):
print_results(table, metric, split)

end = time.time()
print(f'Total elapsed time: {end - start:.0f}s')

start_str = datetime.utcfromtimestamp(start).strftime('%Y-%m-%d %H:%M:%S')
end_str = datetime.utcfromtimestamp(end).strftime('%Y-%m-%d %H:%M:%S')

print('\n')
print(f'Start time: {start_str}')
print(f'End time: {end_str}')
print(f'Total elapsed time: {end - start:.0f}s ~{(end - start)/3600:.1f}hr')


if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Generate regression matrix for CIRAL.')
parser.add_argument('--condition', type=str,
help='Condition to run', required=False)
parser.add_argument('--condition', type=str, help='Condition to run', required=False)
# To list all conditions
parser.add_argument('--list-conditions', action='store_true', default=False, help='List available conditions.')
# For generating reports
parser.add_argument('--generate-report', action='store_true', default=False, help='Generate report.')
parser.add_argument('--display-split', type=str, help='Split to generate report on.',
default='test-b', required=False)
parser.add_argument('--display-split', type=str, help='Split to generate report on.', default='test-b', required=False)
parser.add_argument('--output', type=str, help='File to store report.', required=False)
# For actually running the experimental conditions
parser.add_argument('--all', action='store_true', default=False, help='Run using all languages.')
Expand Down
2 changes: 1 addition & 1 deletion scripts/jobs.integrations-all.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
python -m unittest discover -s integrations/dense > logs/log.dense 2>&1
python -m unittest discover -s integrations/sparse > logs/log.sparse 2>&1
python -m unittest discover -s integrations/dense > logs/log.dense 2>&1
python -m unittest discover -s integrations/clprf > logs/log.clprf 2>&1
python -m unittest discover -s integrations/papers > logs/log.papers 2>&1
3 changes: 1 addition & 2 deletions tests/test_prebuilt_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,7 @@ def test_impact_beir(self):
urls.append(url)

# 29 from SPLADE-distill CoCodenser-medium
# 29 from SPLADE++ (CoCondenser-EnsembleDistil)
self.assertEqual(cnt, 58)
self.assertEqual(cnt, 29)
self._test_urls(urls)

def test_impact_mrtydi(self):
Expand Down

0 comments on commit c13cd63

Please sign in to comment.