Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling OpenTelemetry Results in High Network Traffic to Mysql RDS Database over port 3306 #7660

Open
tbruketa opened this issue Sep 20, 2024 · 0 comments

Comments

@tbruketa
Copy link

tbruketa commented Sep 20, 2024

Configuration:

  • ECS Fargate Deployment
    2GB, 1vCPU task
    Running datadog-agent as a sidecar, version LATEST

  • Java service details
    dd-trace-java version:1.39.0
    jre- openjdk17
    Service running quartz scheduled jobs, quartz DB running RDS mysql node.

Problem:
With otel enabled (-Ddd.trace.otel.enabled=True), the service started reporting malformed responses from the quartz db. To debug, I ECS exec'd into container and ran packet capture. The packet capture revealed that datadog trace headers were being sent over port 3306 every microsecond.

Excerpt from that capture:

No.	Time	Source	Destination	Protocol	Length	Info
770	2024-09-17 11:17:01.915264	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=277980 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
771	2024-09-17 11:17:01.915265	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=279428 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
772	2024-09-17 11:17:01.915266	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=280876 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
773	2024-09-17 11:17:01.915267	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=282324 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
774	2024-09-17 11:17:01.915268	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=283772 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
775	2024-09-17 11:17:01.915270	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=285220 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
776	2024-09-17 11:17:01.915271	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=286668 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
777	2024-09-17 11:17:01.915272	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=288116 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
778	2024-09-17 11:17:01.915273	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=289564 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
779	2024-09-17 11:17:01.915275	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=291012 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
780	2024-09-17 11:17:01.915276	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=292460 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
781	2024-09-17 11:17:01.915277	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=293908 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
782	2024-09-17 11:17:01.915278	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=295356 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
783	2024-09-17 11:17:01.915280	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [PSH, ACK] Seq=296804 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
784	2024-09-17 11:17:01.915292	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=298252 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
785	2024-09-17 11:17:01.915294	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=299700 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]
786	2024-09-17 11:17:01.915295	10.186.19.151	10.186.20.145	TCP	1514	60524 → 3306 [ACK] Seq=301148 Ack=2297 Win=443 Len=1448 TSval=1394847712 TSecr=3229442889 [TCP segment of a reassembled PDU]

Packet contents:

................L...........x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags...traceparent..
tracestate.7413813011058402355.7833730518518681419.1&_dd.p.dm=-1,_dd.p.tid=66eb0fc100000000700-66eb0fc10000000066e327a74b4e2433-6cb70066d6714f4b-018dd=s:1;p:6cb70066d6714f4b;t.dm:-1;t.tid:66eb0fc100000000SET autocommit=0...........X...........x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags...traceparent..
tracestate.3979447959892378912.8344628055981575419.1&_dd.p.dm=-1,_dd.p.tid=66eb0fc100000000700-66eb0fc1000000003739d6d5e389c520-73ce1301538a4cfb-018dd=s:1;p:15457d911075c71e;t.dm:-1;t.tid:66eb0fc100000000SET sql_select_limit=DEFAULT...........3....j...................x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags..
tracestate...x-datadog-trace-id...x-datadog-parent-id...x-datadog-sampling-priority...x-datadog-tags...traceparent..
tracestate.427945611997644003.6576361562975893455.0._dd.p.tid=66eb0e09000000000dd=s:0;p:5b43edff9bb5f7cf;t.tid:66eb0e0900000000.7191470921818366357.1798006128438116530.0._dd.p.tid=66eb0e1d000000000dd=s:0;p:18f3cd0c2d8780b2;t.tid:66eb0e1d00000000.4938570726832957048.2133814586677695057.1&_dd.p.dm=-1,_dd.p.tid=66eb0e31000000008dd=s:1;p:1d9cd50fa7789251;t.dm:-1;t.tid:66eb0e3100000000.9155331581490082968.3911665580619875894.1&_dd.p.dm=-1,_dd.p.tid=66eb0e81000000008dd=s:1;p:3649071ee4f84636;t.dm:-1;t.tid:66eb0e8100000000.1975753461226688178.3144906682718224722.1&_dd.p.dm=-1,_dd.p.tid=66eb0e95000000008dd=s:1;p:2ba4f3f72b2c0152;t.dm:-1;t.tid:66eb0e9500000000.5175087155385900368.4802731818572389321.1&_dd.p.dm=-1,_dd.p.tid=66eb0ea9000000008dd=s:1;p:42a6bb249d791bc9;t.dm:-1;t.tid:66eb0ea900000000.1533228886724140827.3566816573080556763.1&_dd.p.dm=-1,_dd.p.tid=66eb0ebd000000008dd=s:1;p:317fe0c6817154db;t.dm:-1;t.tid:66eb0ebd00000000.1624115546942736786.5178820832051581466.1&_dd.p.dm=-1,_dd.p.tid=66eb0ed1000000008dd=s:1;p:47dede1bb1d3d21a;t.dm:-1;t.tid:66eb0ed100000000.4993432826554890093.2400333781412563728.1&_dd.p.dm=-1,_dd.p.tid=66eb0ee5000000008dd=s:1;p:214fb2db4f774b10;t.dm:-1;t.tid:66eb0ee500000000.7866833069448814619.8677466693365174305.1&_dd.p.dm=-1,_dd.p.tid=66eb0ef9000000008dd=s:1;p:786c8dfb80509021;t.dm:-1;t.tid:66eb0ef900000000.8968719544231293593.5565443653474602534.1&_dd.p.dm=-1,_dd.p.tid=66eb0f0d000000008dd=s:1;p:4d3c6d840ece4e26;t.dm:-1;t.tid:66eb0f0d00000000.8415550386615210273.9203827178546084430.1&_dd.p.dm=-1,_dd.p.tid=66eb0f21000000008dd=s:1;p:7fba900e0f533a4e;t.dm:-1;t.tid:66eb0f2100000000.1682975621898768200.3829553622689324388.1&_dd.p.dm=-1,_dd.p.tid=66eb0f35000000008dd=s:1;p:35254ebae0b51d64;t.dm:-1;t.tid:66eb0f3500000000.9153844829678083355.1732589296857550282.1&_dd.p.dm=-1,_dd.p.tid=66eb0f35000000008dd=s:1;p:180b64c92cd68dca;t.dm:-1;t.tid:66eb0f3500000000.2013899293032892014.8803289733122481974.1&_dd.p.dm=-1,_dd.p.tid=66eb0f49000000008dd=s:1;p:7a2b915ed59bdf36;t.dm:-1;t.tid:66eb0f4900000000.3807943517730481606.2708123002834288482.1&_dd.p.dm=-1,_dd.p.tid=66eb0f5d000000008dd=s:1;p:25952f85d8007f62;t.dm:-1;t.tid:66eb0f5d00000000.6167940868843967854.5981020964776354031.1&_dd.p.dm=-1,_dd.p.tid=66eb0f71000000008dd=s:1;p:5300dae0efa8a0ef;t.dm:-1;t.tid:66eb0f7100000000.7165434073750645354.7393658531980571513.1&_dd.p.dm=-1,_dd.p.tid=66eb0f85000000008dd=s:1;p:669b8d42f53cf379;t.dm:-1;t.tid:66eb0f8500000000.2725717486794457750.6901007443369765802.0._dd.p.tid=66eb0f99000000000dd=s:0;p:5fc54db4e661abaa;t.tid:66eb0f9900000000.1184497231463834434.8596551529907830465.1&_dd.p.dm=-1,_dd.p.tid=66eb0fad000000008dd=s:1;p:774d1611e97f3ec1;t.dm:-1;t.tid:66eb0faad00000000.3979447959892378912.1532769310189405982.1&_dd.p.dm=-1,_dd.p.tid=66eb0fc100000000700-66eb0fc1000000003739d6d5e389c520-15457d911075c71e-018dd=s:1;p:15457d911075c71e;t.dm:-1;t.tid:66eb0fc100000000SELECT * FROM QRTZ_SCHEDULER_STATE WHERE SCHED_NAME = 'DatamartEtlServiceScheduler'.....\....def

Other than the malformed response, this also resulted in a 10 fold increase in network traffic and the service frequently pinning the cpu. There is also some evidence that this was causing nic saturation and causing problem with other tcp traffic; capture noted TCP Window Full and ZeroWindow'ing senders. The FullWindow/ZeroWindow and malformed packet were all coincident leading to the conclusion that this increased network load caused the fatal malformed response from RDS (service unable to get quartz jobs).

We turned off otel as we were not relying on too many metrics, but I am interested in if we misconfigured it or if there is, in fact, an issue. I would also liked to understand more about activity over port 3306; reserved port restricted for mysql. In general, is this context propagation or metrics retrieval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant