Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record-stream-uploader periodically throws error "You did not provide the number of bytes specified by the Content-Length HTTP header.", OOMKilled #1131

Open
alex-kuzmin-hg opened this issue Jan 8, 2025 · 1 comment
Labels
Bug A error that causes the feature to behave differently than what was expected based on design docs P1 High priority issue. Required to be completed in the assigned milestone. P2 Required to be completed in the assigned milestone, but may or may not impact release schedule.

Comments

@alex-kuzmin-hg
Copy link
Contributor

Describe the bug

During long load test, record-stream-uploader may be OOMed after throwing stream of errors "You did not provide the number of bytes specified" like this below

kubectl -n solo-alex-kuzmin-n2 logs network-node1-0 -c record-stream-uploader --previous

Cloud Copy Initiated [ service = 'S3', bucket = 'solo-streams', remote_path = 'recordstreams/record0.0.3', filename = '2025-01-08T02_48_16.053066207Z.rcd_sig' ] 
Cloud Copy Initiated [ service = 'S3', bucket = 'solo-streams', remote_path = 'recordstreams/record0.0.3', filename = '2025-01-08T02_48_16.053066207Z.rcd.gz' ] 
Cloud Copy Complete [ service = 'S3', bucket = 'solo-streams', remote_path = 'recordstreams/record0.0.3', filename = '2025-01-08T02_47_50.040672600Z.rcd_sig' ] 
Cloud Copy Timing [ duration_ms = '30506.100', upload_duration_ms = '30316.840', service = 'S3', bucket = 'solo-streams', remote_path = 'recordstreams/record0.0.3', filename = '2025-01-08T02_47_50.040672600Z.rcd_sig' ] 
Cloud Copy Complete [ service = 'S3', bucket = 'solo-streams', remote_path = 'recordstreams/record0.0.3', filename = '2025-01-08T02_47_58.000914340Z.rcd_sig' ] 
Cloud Copy Error [ service = 'S3', watch_directory = '/opt/hgcapp/recordStreams', bucket = 'solo-streams', remote_path = 'recordstreams/record0.0.3', filename = '2025-01-08T02_47_48.038146733Z.rcd_sig' ]: Failed to upload /opt/hgcapp/recordStreams/2025-01-08T02_47_48.038146733Z.rcd_sig to solo-streams/recordstreams/record0.0.3/2025-01-08T02_47_48.038146733Z.rcd_sig: An error occurred (IncompleteBody) when calling the PutObject operation: You did not provide the number of bytes specified by the Content-Length HTTP header. 
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/boto3/s3/transfer.py", line 279, in upload_file
    future.result()
  File "/usr/local/lib/python3.7/dist-packages/s3transfer/futures.py", line 106, in result
    return self._coordinator.result()
  File "/usr/local/lib/python3.7/dist-packages/s3transfer/futures.py", line 265, in result
    raise self._exception
  File "/usr/local/lib/python3.7/dist-packages/s3transfer/tasks.py", line 126, in __call__
    return self._execute_main(kwargs)
  File "/usr/local/lib/python3.7/dist-packages/s3transfer/tasks.py", line 150, in _execute_main
    return_value = self._main(**kwargs)
  File "/usr/local/lib/python3.7/dist-packages/s3transfer/upload.py", line 692, in _main
    client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
  File "/usr/local/lib/python3.7/dist-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (IncompleteBody) when calling the PutObject operation: You did not provide the number of bytes specified by the Content-Length HTTP header.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/mirror.py", line 516, in cloud_copy
    storage.put(local_path, key)
  File "/usr/local/bin/mirror.py", line 1555, in put
    self.__bucket.upload_file(local_path, key, Config=self.__transfer_config)
  File "/usr/local/lib/python3.7/dist-packages/boto3/s3/inject.py", line 209, in bucket_upload_file
    ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
  File "/usr/local/lib/python3.7/dist-packages/boto3/s3/inject.py", line 131, in upload_file
    extra_args=ExtraArgs, callback=Callback)
  File "/usr/local/lib/python3.7/dist-packages/boto3/s3/transfer.py", line 287, in upload_file
    filename, '/'.join([bucket, key]), e))
boto3.exceptions.S3UploadFailedError: Failed to upload /opt/hgcapp/recordStreams/2025-01-08T02_47_48.038146733Z.rcd_sig to solo-streams/recordstreams/record0.0.3/2025-01-08T02_47_48.038146733Z.rcd_sig: An error occurred (IncompleteBody) when calling the PutObject operation: You did not provide the number of bytes specified by the Content-Length HTTP header.
Cloud Copy Complete [ service = 'S3', bucket = 'solo-streams', remote_path = 'recordstreams/record0.0.3', filename = '2025-01-08T02_48_04.057291549Z.rcd_sig' ] 
Cloud Copy Complete [ service = 'S3', bucket = 'solo-streams', remote_path = 'recordstreams/record0.0.3', filename = '2025-01-08T02_47_56.048000973Z.rcd_sig' ] 
Cloud Copy Complete [ service = 'S3', bucket = 'solo-streams', remote_path = 'recordstreams/record0.0.3', filename = '2025-01-08T02_48_06.033271868Z.rcd_sig' ] 

Describe the expected behavior

Should not thro any errors, not OOM-Killed

To Reproduce

Run 6-hours NFT test on Latitude

Additional Context

No response

@alex-kuzmin-hg alex-kuzmin-hg added Bug A error that causes the feature to behave differently than what was expected based on design docs Pending Triage New issue that needs to be triaged by the team labels Jan 8, 2025
@jeromy-cannon jeromy-cannon added P1 High priority issue. Required to be completed in the assigned milestone. P2 Required to be completed in the assigned milestone, but may or may not impact release schedule. and removed Pending Triage New issue that needs to be triaged by the team labels Jan 9, 2025
@alex-kuzmin-hg
Copy link
Contributor Author

It causes periodic OOMKilll, e.g.:

  blockstream-uploader:
    Container ID:  containerd://4eed3acbade28cdefedd9de3a4b05774a88c5b09ab461f6c9ec1eed89d4b8988
    Image:         gcr.io/hedera-registry/uploader-mirror:2.0.0
    Image ID:      gcr.io/hedera-registry/uploader-mirror@sha256:b42e67720ad060fc516fc32e04c4e6e5a1fbf9761f016ec7902c3f570ab9a12f
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      /usr/bin/env python3.7 /app/application.py \
        --linux \
        --watch-directory /opt/hgcapp/blockStreams/block-0.0.11 \
        --s3-endpoint http://minio-hl:9000 \
      
    State:          Running
      Started:      Fri, 24 Jan 2025 00:31:39 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Thu, 23 Jan 2025 20:58:33 +0000
      Finished:     Fri, 24 Jan 2025 00:31:38 +0000
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     150m
      memory:  400Mi
    Requests:
      cpu:     100m
      memory:  200Mi
    Environment Variables from:
      uploader-mirror-secrets  Secret  Optional: false
    Environment:
      DEBUG:                   true
      REAPER_ENABLE:           true
      REAPER_MIN_KEEP:         1
      REAPER_INTERVAL:         1
      REAPER_DEFAULT_BACKOFF:  1
      STREAM_FILE_EXTENSION:   blk.gz
      STREAM_EXTENSION:        blk.gz
      SIG_REQUIRE:             false
      SIG_PRIORITIZE:          false
      BUCKET_PATH:             blockStreams/block-0.0.11
      BUCKET_NAME:             solo-streams
      S3_ENABLE:               true
      GCS_ENABLE:              false
    Mounts:
      /opt/hgcapp/blockStreams from hgcapp-blockstream (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-29wgt (ro)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A error that causes the feature to behave differently than what was expected based on design docs P1 High priority issue. Required to be completed in the assigned milestone. P2 Required to be completed in the assigned milestone, but may or may not impact release schedule.
Projects
None yet
Development

No branches or pull requests

2 participants