Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation fails when trying to extract keywords from a specific sentence #430

Open
edoust opened this issue Oct 20, 2023 · 2 comments
Open

Comments

@edoust
Copy link

edoust commented Oct 20, 2023

I am trying to extract keywords from sentences using the all-MiniLM-L6-v2 model

When using this specific sentence (either alone or in combination with other sentences), the keyword extraction fails:
Up 3 Up 4 Down 2 Up 7 Up 2 Down 4 Down 4 Up 6 Up 1 Down 1 Down 3

I know this may not be a meaningful sentence, but it should not cause all sentences to not be evaluated

Is there a way to fix this, or to know which sentences would fail during evaluation?

This is my repro sample: crash-repro-keywords.zip

It contains this code snippet for evaluating the sentence:

let input_strings = ["Up 3 Up 4 Down 2 Up 7 Up 2 Down 4 Down 4 Up 6 Up 1 Down 1 Down 3"].to_vec();

let keyword_extraction_config = KeywordExtractionConfig {
    sentence_embeddings_config: SentenceEmbeddingsConfig::from(SentenceEmbeddingsModelType::AllMiniLmL6V2),
    max_sum_candidates: Some(20),
    diversity: Some(0.3),
    scorer_type: KeywordScorerType::MaximalMarginRelevance,
    ngram_range: (1, 1),
    num_keywords: 6,
    ..Default::default()
};

use rust_bert::pipelines::keywords_extraction::KeywordExtractionModel;
let keyword_extraction_model = KeywordExtractionModel::new(keyword_extraction_config).unwrap();

// Credits: Wikimedia https://en.wikipedia.org/wiki/Rust_(programming_language)
let output = keyword_extraction_model.predict(&input_strings).unwrap();

This is the error that is printed:

`thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value:
Torch("stack expects a non-empty TensorList
Exception raised from stack at C:\\actions-runner\\_work\\pytorch\\pytorch\\builder\\windows\\pytorch\\aten\\src\\ATen\\native\\TensorShape.cpp:2659 (most recent call first):
00007FFCBC57D24200007FFCBC57D1E0 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>]
00007FFCBC57CE1A00007FFCBC57CDC0 c10.dll!c10::detail::torchCheckFail [<unknown file> @ <unknown line number>]
00007FFC83A8296400007FFC83A82900 torch_cpu.dll!at::native::stack [<unknown file> @ <unknown line number>]
00007FFC845E262B00007FFC845DE0B0 torch_cpu.dll!at::compositeexplicitautograd::view_copy_symint_outf [<unknown file> @ <unknown line number>]
00007FFC845BD46100007FFC84578730 torch_cpu.dll!at::compositeexplicitautograd::bucketize_outf [<unknown file> @ <unknown line number>]
00007FFC83FA845600007FFC83FA82B0 torch_cpu.dll!at::_ops::stack::call [<unknown file> @ <unknown line number>]
00007FF68E5A4C8E00007FF68E5A4C50 crash-repro-keywords.exe!at::stack [C:\\temp\\libtorch\\include\\ATen\\ops\\stack.h @ 27]
00007FF68E544D1700007FF68E544CA0 crash-repro-keywords.exe!atg_stack [C:\\Users\\usr1\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\torch-sys-0.13.0\\libtch\\torch_api_generated.cpp @ 16438]
00007FF68E37827A00007FF68E378180 crash-repro-keywords.exe!tch::wrappers::tensor::Tensor::f_stack<tch::wrappers::tensor::Tensor> [C:\\Users\\usr1\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\tch-0.13.0\\src\\wrappers\\tensor_fallible_generated.rs @ 33246]
00007FF68E37A65600007FF68E37A630 crash-repro-keywords.exe!tch::wrappers::tensor::Tensor::stack<tch::wrappers::tensor::Tensor> [C:\\Users\\usr1\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\tch-0.13.0\\src\\wrappers\\tensor_generated.rs @ 16878]
00007FF68D760C3500007FF68D760B80 crash-repro-keywords.exe!rust_bert::pipelines::sentence_embeddings::pipeline::SentenceEmbeddingsModel::encode_as_tensor<ref$<enum2$<alloc::borrow::Cow<str$> > > > [C:\\Users\\usr1\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\rust-bert-0.21.0\\src\\pipelines\\sentence_embeddings\\pipeline.rs @ 353]
00007FF68D72141C00007FF68D7211F0 crash-repro-keywords.exe!rust_bert::pipelines::keywords_extraction::pipeline::KeywordExtractionModel::predict<ref$<str$> > [C:\\Users\\usr1\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\rust-bert-0.21.0\\src\\pipelines\\keywords_extraction\\pipeline.rs @ 230]
00007FF68D7A891F00007FF68D7A8610 crash-repro-keywords.exe!crash_repro_keywords::get_keywords [E:\\_local\\crash-repro-keywords\\src\\main.rs @ 36]
00007FF68D7A856F00007FF68D7A8520 crash-repro-keywords.exe!crash_repro_keywords::main [E:\\_local\\crash-repro-keywords\\src\\main.rs @ 9]
00007FF68D7DB90B00007FF68D7DB900 crash-repro-keywords.exe!core::ops::function::FnOnce::call_once<void (*)(),tuple$<> > [/rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3\\library\\core\\src\\ops\\function.rs @ 250]
00007FF68D7A280E00007FF68D7A2800 crash-repro-keywords.exe!std::sys_common::backtrace::__rust_begin_short_backtrace<void (*)(),tuple$<> > [/rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3\\library\\std\\src\\sys_common\\backtrace.rs @ 138]
00007FF68D7BE6E100007FF68D7BE6D0 crash-repro-keywords.exe!std::rt::lang_start::closure$0<tuple$<> > [/rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3\\library\\std\\src\\rt.rs @ 166]
00007FF68E47F4A800007FF68E47F3F0 crash-repro-keywords.exe!std::rt::lang_start_internal [/rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\\std\\src\\rt.rs @ 148]
00007FF68D7BE6BA00007FF68D7BE680 crash-repro-keywords.exe!std::rt::lang_start<tuple$<> > [/rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3\\library\\std\\src\\rt.rs @ 165]
00007FF68D7A8FB900007FF68D7A8FA0 crash-repro-keywords.exe!main [<unknown file> @ <unknown line number>]00007FF68E6B41CC00007FF68E6B40C0 crash-repro-keywords.exe!__scrt_common_main_seh [D:\\a\\_work\\1\\s\\src\\vctools\\crt\\vcstartup\\src\\startup\\exe_common.inl @ 288]
00007FFDE131257D00007FFDE1312560 KERNEL32.DLL!BaseThreadInitThunk [<unknown file> @ <unknown line number>]
00007FFDE2EAAA7800007FFDE2EAAA50 ntdll.dll!RtlUserThreadStart [<unknown file> @ <unknown line number>]
")', C:\Users\usr1\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tch-0.13.0\src\wrappers\tensor_generated.rs:16878:39`

This is the stack trace:

stack backtrace:
   0:     0x7ff68e48b9cc - std::sys_common::backtrace::_print::impl$0::fmt
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\sys_common\backtrace.rs:44
   1:     0x7ff68e4a942b - core::fmt::rt::Argument::fmt
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\core\src\fmt\rt.rs:138
   2:     0x7ff68e4a942b - core::fmt::write
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\core\src\fmt\mod.rs:1094
   3:     0x7ff68e48657f - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\io\mod.rs:1714
   4:     0x7ff68e48b77b - std::sys_common::backtrace::_print
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\sys_common\backtrace.rs:47
   5:     0x7ff68e48b77b - std::sys_common::backtrace::print
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\sys_common\backtrace.rs:34
   6:     0x7ff68e48df7a - std::panicking::default_hook::closure$1
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\panicking.rs:269
   7:     0x7ff68e48dbcf - std::panicking::default_hook
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\panicking.rs:288
   8:     0x7ff68e48e62e - std::panicking::rust_panic_with_hook
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\panicking.rs:705
   9:     0x7ff68e48e51d - std::panicking::begin_panic_handler::closure$0
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\panicking.rs:597
  10:     0x7ff68e48c349 - std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure_env$0,never$>
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\sys_common\backtrace.rs:151
  11:     0x7ff68e48e220 - std::panicking::begin_panic_handler
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\panicking.rs:593
  12:     0x7ff68e6b6a85 - core::panicking::panic_fmt
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\core\src\panicking.rs:67
  13:     0x7ff68e6b7093 - core::result::unwrap_failed
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\core\src\result.rs:1651
  14:     0x7ff68e38507b - enum2$<core::result::Result<tch::wrappers::tensor::Tensor,enum2$<tch::error::TchError> > >::unwrap<tch::wrappers::tensor::Tensor,enum2$<tch::error::TchError> >
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3\library\core\src\result.rs:1076
  15:     0x7ff68e37a667 - tch::wrappers::tensor::Tensor::stack<tch::wrappers::tensor::Tensor>
                               at C:\Users\usr1\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tch-0.13.0\src\wrappers\tensor_generated.rs:16878
  16:     0x7ff68d760c35 - rust_bert::pipelines::sentence_embeddings::pipeline::SentenceEmbeddingsModel::encode_as_tensor<ref$<enum2$<alloc::borrow::Cow<str$> > > >
                               at C:\Users\usr1\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rust-bert-0.21.0\src\pipelines\sentence_embeddings\pipeline.rs:353
  17:     0x7ff68d72141c - rust_bert::pipelines::keywords_extraction::pipeline::KeywordExtractionModel::predict<ref$<str$> >
                               at C:\Users\usr1\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rust-bert-0.21.0\src\pipelines\keywords_extraction\pipeline.rs:230
  18:     0x7ff68d7a891f - crash_repro_keywords::get_keywords
                               at E:\_local\crash-repro-keywords\src\main.rs:36
  19:     0x7ff68d7a856f - crash_repro_keywords::main
                               at E:\_local\crash-repro-keywords\src\main.rs:9
  20:     0x7ff68d7db90b - core::ops::function::FnOnce::call_once<void (*)(),tuple$<> >
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3\library\core\src\ops\function.rs:250
  21:     0x7ff68d7a280e - std::sys_common::backtrace::__rust_begin_short_backtrace<void (*)(),tuple$<> >
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3\library\std\src\sys_common\backtrace.rs:135
  22:     0x7ff68d7a280e - std::sys_common::backtrace::__rust_begin_short_backtrace<void (*)(),tuple$<> >
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3\library\std\src\sys_common\backtrace.rs:135
  23:     0x7ff68d7be6e1 - std::rt::lang_start::closure$0<tuple$<> >
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3\library\std\src\rt.rs:166
  24:     0x7ff68e47f4a8 - std::rt::lang_start_internal::closure$2
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\rt.rs:148
  25:     0x7ff68e47f4a8 - std::panicking::try::do_call
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\panicking.rs:500
  26:     0x7ff68e47f4a8 - std::panicking::try
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\panicking.rs:464
  27:     0x7ff68e47f4a8 - std::panic::catch_unwind
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\panic.rs:142
  28:     0x7ff68e47f4a8 - std::rt::lang_start_internal
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library\std\src\rt.rs:148
  29:     0x7ff68d7be6ba - std::rt::lang_start<tuple$<> >
                               at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3\library\std\src\rt.rs:165
  30:     0x7ff68d7a8fb9 - main
  31:     0x7ff68e6b41cc - invoke_main
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
  32:     0x7ff68e6b41cc - __scrt_common_main_seh
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
  33:     0x7ffde131257d - BaseThreadInitThunk
  34:     0x7ffde2eaaa78 - RtlUserThreadStart
@guillaume-be
Copy link
Owner

Hello @edoust ,

I am unable to reproduce, running the code shared above gives:

[[Keyword { text: "4", score: 0.4896546, offsets: [Offset { begin: 8, end: 9 }, Offset { begin: 32, end: 33 }, Offset { begin: 39, end: 40 }] }, Keyword { text: "6", score: 0.435
37995, offsets: [Offset { begin: 44, end: 45 }] }, Keyword { text: "3", score: 0.42764783, offsets: [Offset { begin: 3, end: 4 }, Offset { begin: 63, end: 64 }] }, Keyword { text
: "7", score: 0.4275967, offsets: [Offset { begin: 20, end: 21 }] }, Keyword { text: "1", score: 0.38711178, offsets: [Offset { begin: 49, end: 50 }, Offset { begin: 56, end: 57 
}] }, Keyword { text: "2", score: 0.34410587, offsets: [Offset { begin: 15, end: 16 }, Offset { begin: 25, end: 26 }] }]]

@edoust
Copy link
Author

edoust commented Oct 21, 2023

Hey @guillaume-be

I noticed this does not happen when I checkout this repo and run the mentioned sentence in the example for keyword extraction.

When I run my code in my provided example project it fails however, with the error mentioned above

Did you test it with my project? Also, I was running it on Windows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants