bug: unable to merge branch with large data change #4448

wvandeun · 2024-09-24T21:12:36Z

Component

API Server / GraphQL

Infrahub version

0.16.0

Current Behavior

When you want to merge a branch with a large data change (40 devices, 4000 interfaces) you get an error message:

{
  "name": "ApolloError",
  "graphQLErrors": [
    {
      "message": "Unable to connect to the database",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ],
      "path": [
        "BranchMerge"
      ]
    }
  ],
  "protocolErrors": [],
  "clientErrors": [],
  "networkError": null,
  "message": "Unable to connect to the database"
}

The message only seems to occur after some time and it seems that the database server has run out of memory.

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "req-rsp-timeout-task"
Exception in thread "HTTP-Dispatcher" java.lang.OutOfMemoryError: Java heap space
Exception in thread "neo4j.ThroughputMonitor-1" java.lang.OutOfMemoryError: Java heap space
Uncaught error from thread [cc-discovery-actor-system-akka.io.pinned-dispatcher-8]: Java heap space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[cc-discovery-actor-system]
java.lang.OutOfMemoryError: Java heap space
Uncaught error from thread [cc-discovery-actor-system-scheduler-1]: Java heap space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[cc-discovery-actor-system]
java.lang.OutOfMemoryError: Java heap space
Uncaught error from thread [cc-discovery-actor-system-akka.actor.internal-dispatcher-33]: Java heap space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[cc-discovery-actor-system]
java.lang.OutOfMemoryError: Java heap space
2024-09-24 15:46:47.127+0000 INFO  Neo4j Server shutdown initiated by request
2024-09-24 15:46:47.138+0000 INFO  Stopping...
ERROR StatusConsoleListener An exception occurred processing Appender rotatingWriter.neo4j.database.neo4j.db.query.execution.pipelined.failure.csv
 org.apache.logging.log4j.core.appender.AppenderLoggingException: java.lang.OutOfMemoryError: Java heap space
        at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:165)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
        at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:683)
        at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:641)
        at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:624)
        at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:560)
        at org.apache.logging.log4j.core.config.DefaultReliabilityStrategy.log(DefaultReliabilityStrategy.java:63)
        at org.apache.logging.log4j.core.Logger.log(Logger.java:163)
        at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2168)
        at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2122)
        at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2105)
        at org.apache.logging.log4j.spi.AbstractLogger.printf(AbstractLogger.java:2095)
        at org.neo4j.logging.log4j.RotatingLogFileWriter.printf(RotatingLogFileWriter.java:71)
        at com.neo4j.metrics.output.RotatableCsvReporter.report(RotatableCsvReporter.java:234)
        at com.neo4j.metrics.output.RotatableCsvReporter.reportMeter(RotatableCsvReporter.java:180)
        at com.neo4j.metrics.output.RotatableCsvReporter.report(RotatableCsvReporter.java:144)
        at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:277)
        at com.codahale.metrics.ScheduledReporter.lambda$start$0(ScheduledReporter.java:206)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.OutOfMemoryError: Java heap space

After this issue, you seem to be unable to restart the database and therefor the application server.

This happens using the recommended hardware requirements (in this case 12CPU, 16GB RAM).

Expected Behavior

The merge operation should succeed!

Steps to Reproduce

load an instance of Infrahub with the demo schema inv dev.start demo.load-infra-schema
create a branch infrahubctl branch create test
run the following script to load the dataset in the branch test (40 devices, 100 interfaces each) infrahubctl run <script.py> num_devices=40 --branch test

import logging
from infrahub_sdk import InfrahubClient


async def run(client: InfrahubClient, log: logging.Logger, branch: str, num_devices: int=50) -> None:
    site = await client.create("LocationSite", name="atl1")
    await site.save(allow_upsert=True)

    num_devices = int(num_devices)

    device_batch = await client.create_batch()
    interface_batch = await client.create_batch()

    for i in range(num_devices):
        device = await client.create("InfraDevice", name=f"atl1-test{i}", site=site, type="testing")
        device_batch.add(task=device.save, node=device, allow_upsert=True)
        log.info(f"Added device {device.name.value}")
        
    async for node, result in device_batch.execute():
        print(f"device {node.name.value} was created in Infrahub succesfully")
        client.store.set(key=node.name.value, node=node)

    for i in range(num_devices):
        for j in range(100):
            interface = await client.create("InfraInterfaceL2", name=f"Ethernet{j}", l2_mode="Access", speed=10000, device=client.store.get(key=f"atl1-test{i}"))
            interface_batch.add(task=interface.save, node=interface, allow_upsert=True)
            log.info(f"  Added interface {interface.name.value} for device {interface.device.peer.name.value}")

    async for node, result in interface_batch.execute():
        print(f"interface {node.name.value} {node.device.peer.name.value} was created in Infrahub succesfully")

go to the branch detail page for the test branch
merge the branch

Additional Information

No response

The text was updated successfully, but these errors were encountered:

wvandeun added type/bug Something isn't working as expected group/backend Issue related to the backend (API Server, Git Agent) labels Sep 24, 2024

exalate-issue-sync bot added priority/2 This issue stalls work on the project or its dependents, it's a blocker for a release and removed group/backend Issue related to the backend (API Server, Git Agent) labels Sep 24, 2024

exalate-issue-sync bot added this to the Infrahub - 0.16.2 milestone Sep 24, 2024

dgarros modified the milestones: Infrahub - 0.16.2, Infrahub - 1.0 Sep 28, 2024

exalate-issue-sync bot assigned ajtmccarty Oct 1, 2024

exalate-issue-sync bot added the state/planned This issue is planned to be worked on in an upcoming release. label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: unable to merge branch with large data change #4448

bug: unable to merge branch with large data change #4448

wvandeun commented Sep 24, 2024

bug: unable to merge branch with large data change #4448

bug: unable to merge branch with large data change #4448

Comments

wvandeun commented Sep 24, 2024

Component

Infrahub version

Current Behavior

Expected Behavior

Steps to Reproduce

Additional Information