Optimum Intel OpenVino fails with segmentation fault #3066

yifanmai · 2024-10-16T22:26:11Z

Recently, the Optimum Intel OpenVino tests have been failing intermittently because of what appears to be a race condition due to multiple concurrent calls to inference. This causes the run to exit with a segmentation fault. Could you take a look?

Example logs from this run:

Executor.execute {
      Parallelizing computation on 10 items over 4 threads {
        Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
        Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
        Loading hf-internal-testing/tiny-random-MistralForCausalLM (kwargs={'openvino': True}) for HELM model hf-internal-testing/tiny-random-MistralForCausalLM with Hugging Face Transformers {
          Hugging Face device set to "cpu" because CUDA is unavailable.
          Loading Hugging Face model hf-internal-testing/tiny-random-MistralForCausalLM {
            Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
            Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
            Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/transformers/cache_utils.py:447: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  or len(self.key_cache[layer_idx]) == 0  # the layer has no cache
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:281: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  elif sliding_window is None or key_value_length < sliding_window:
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/transformers/cache_utils.py:432: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  elif len(self.key_cache[layer_idx]) == 0:  # fills previously skipped layers; checking for tensor causes errors
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
          } [7.965s]
        } [7.966s]
        HuggingFace error: Infer Request is busy
        Request failed. Retrying (attempt #2) in 10 seconds... (See above for error details)
        HuggingFace error: Infer Request is busy
        Request failed. Retrying (attempt #2) in 10 seconds... (See above for error details)
        HuggingFace error: Infer Request is busy
        Request failed. Retrying (attempt #2) in 10 seconds... (See above for error details)
/home/runner/work/_temp/3b3f1c68-38a5-4e0d-ba66-80ecc08f0[297](https://github.com/stanford-crfm/helm/actions/runs/11369018353/job/31625461750#step:7:298).sh: line 1:  2069 Segmentation fault      (core dumped) helm-run --run-entries boolq:model=hf-internal-testing/tiny-random-MistralForCausalLM --enable-huggingface-models hf-internal-testing/tiny-random-MistralForCausalLM --suite v1 --max-eval-instances 10 --openvino

The text was updated successfully, but these errors were encountered:

yifanmai · 2024-10-28T21:59:05Z

Hi @NoushNabi, have you had time to take a look at the segfault issue in the OpenVino codepath?

yifanmai · 2024-11-06T18:57:36Z

Hi @NoushNabi, let me know if you have time to look at the segfault issue in the OpenVino codepath. If not, my plan is to remove the OpenVino codepath temporarily until the issue is resolved upstream.

yifanmai added bug Something isn't working models labels Oct 16, 2024

yifanmai mentioned this issue Nov 12, 2024

Remove OpenVino support #3153

Merged

yifanmai closed this as completed in #3153 Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimum Intel OpenVino fails with segmentation fault #3066

Optimum Intel OpenVino fails with segmentation fault #3066

yifanmai commented Oct 16, 2024 •

edited

Loading

yifanmai commented Oct 28, 2024

yifanmai commented Nov 6, 2024

Optimum Intel OpenVino fails with segmentation fault #3066

Optimum Intel OpenVino fails with segmentation fault #3066

Comments

yifanmai commented Oct 16, 2024 • edited Loading

yifanmai commented Oct 28, 2024

yifanmai commented Nov 6, 2024

yifanmai commented Oct 16, 2024 •

edited

Loading