Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-8484: HashJoinPOP memory leak is caused by an oom exception wh… #2889

Merged
merged 1 commit into from
Mar 29, 2024

Conversation

shfshihuafeng
Copy link
Contributor

@shfshihuafeng shfshihuafeng commented Mar 18, 2024

…en read data from Stream with container

DRILL-8484: HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

Description

when traversing fieldList druing read data from Stream with container , if the intermediate process throw exception,we can not release previously constructed vectors. it result in memory leak

Documentation

(Please describe user-visible changes similar to what should appear in the Drill documentation.)

Testing

You can add debugging code to reproduce this scenario as following
or
test tpch like DRILL-8483
(1) debug code

  public void readFromStreamWithContainer(VectorContainer myContainer, InputStream input) throws IOException {
    final VectorContainer container = new VectorContainer();
    final UserBitShared.RecordBatchDef batchDef = UserBitShared.RecordBatchDef.parseDelimitedFrom(input);
    recordCount = batchDef.getRecordCount();
    if (batchDef.hasCarriesTwoByteSelectionVector() && batchDef.getCarriesTwoByteSelectionVector()) {

      if (sv2 == null) {
        sv2 = new SelectionVector2(allocator);
      }
      sv2.allocateNew(recordCount * SelectionVector2.RECORD_SIZE);
      sv2.getBuffer().setBytes(0, input, recordCount * SelectionVector2.RECORD_SIZE);
      svMode = BatchSchema.SelectionVectorMode.TWO_BYTE;
    }
    final List<ValueVector> vectorList = Lists.newArrayList();
    final List<SerializedField> fieldList = batchDef.getFieldList();
    int i = 0;
    for (SerializedField metaData : fieldList) {
      i++;
      final int dataLength = metaData.getBufferLength();
      final MaterializedField field = MaterializedField.create(metaData);
      final DrillBuf buf = allocator.buffer(dataLength);
      ValueVector vector = null;
      try {
        buf.writeBytes(input, dataLength);
        vector = TypeHelper.getNewVector(field, allocator);
      //DEBUG for leak
        if (i == 3) {
          logger.warn("shf test memory except");
          throw new OutOfMemoryException("test memory except");
        }
        vector.load(metaData, buf);
      } catch (Exception e) {
        if (vectorList.size() > 0 ) {
          for (ValueVector valueVector : vectorList) {
            DrillBuf[] buffers = valueVector.getBuffers(false);
            logger.warn("shf leak buffers " + Arrays.asList(buffers));
            // valueVector.clear();
          }
        }
        throw e;
      } finally {
        buf.release();
      }
      vectorList.add(vector);
    }

(2) run following sql (tpch8)

select
o_year,
sum(case when nation = 'CHINA' then volume else 0 end) / sum(volume) as mkt_share
from (
select
extract(year from o_orderdate) as o_year,
l_extendedprice * 1.0 as volume,
n2.n_name as nation
from hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, hive.tpch1s.nation n2, hive.tpch1s.region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'ASIA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01'
and date '1996-12-31'
and p_type = 'LARGE BRUSHED BRASS') as all_nations
group by o_year
order by o_year;

(3) you find memory leak ,but there is no sql

image

@cgivre cgivre added bug backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there labels Mar 18, 2024
for (ValueVector valueVector : vectorList) {
valueVector.clear();
}
throw UserException.memoryError(oom).message("Allocator memory failed").build(logger);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what would cause an error like this? If so what would the user need to do to fix this?

Copy link
Contributor Author

@shfshihuafeng shfshihuafeng Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we prepare to allocator memory using "allocator.buffer(dataLength)" for hashjoinPop allocator, if actual memory > maxAllocation(The parameter is calculated by call computeOperatorMemory) ,then it throw exception, like following my test。
user can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or reduce concurrency based on actual conditions.

throw exception code

public DrillBuf buffer(final int initialRequestSize, BufferManager manager) {
    assertOpen();

    AllocationOutcome outcome = allocateBytes(actualRequestSize);
    if (!outcome.isOk()) {
      throw new OutOfMemoryException(createErrorMsg(this, actualRequestSize, initialRequestSize));
    }

my test scenario

Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer of size 16384 (rounded from 14359) due to memory limit (41943040). Current allocation: 22583616
        at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
        at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
        at org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172)

@cgivre
Copy link
Contributor

cgivre commented Mar 26, 2024

@shfshihuafeng Can you please resolve merge conflicts.

@shfshihuafeng
Copy link
Contributor Author

@shfshihuafeng Can you please resolve merge conflicts.

it is done

Copy link
Contributor

@cgivre cgivre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1
Thanks for this.

@cgivre cgivre merged commit 95a6c93 into apache:master Mar 29, 2024
8 checks passed
jnturton pushed a commit to jnturton/drill that referenced this pull request Apr 12, 2024
jnturton pushed a commit that referenced this pull request Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there bug minor-update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants