Aggregated Throughput Anomaly detection #184

tushartathgur · 2023-03-17T17:35:51Z

This PR does the following:

Implements argument "agg-flow" and "p2p-label" for aggregated flow.
Aggregated flow contains Pods to external, pods to pods based of labels and pod to service flows.
New retrieve table has been added for aggregated TAD.
Modified retrieve table for TAD, new fields include agg_type and algo_type for better understanding.
TAD delete command can now take multiple tad ids to delete.

codecov · 2023-03-17T17:41:42Z

Codecov Report

Merging #184 (e305fdd) into main (31f4752) will decrease coverage by 0.30%.
The diff coverage is 64.32%.

@@            Coverage Diff             @@
##             main     #184      +/-   ##
==========================================
- Coverage   66.54%   66.25%   -0.30%     
==========================================
  Files          38       38              
  Lines        4783     5049     +266     
==========================================
+ Hits         3183     3345     +162     
- Misses       1453     1548      +95     
- Partials      147      156       +9

Flag	Coverage Δ
python-coverage	`56.37% <55.73%> (-1.14%)`	⬇️
unit-tests	`69.90% <74.21%> (+0.20%)`	⬆️

Impacted Files	Coverage Δ
plugins/anomaly-detection/anomaly_detection.py	`46.52% <51.20%> (-2.51%)`	⬇️
pkg/theia/commands/anomaly_detection_run.go	`84.81% <60.97%> (-12.00%)`	⬇️
pkg/controller/anomalydetector/controller.go	`73.76% <70.96%> (+1.49%)`	⬆️
pkg/theia/commands/anomaly_detection_delete.go	`94.00% <100.00%> (+1.14%)`	⬆️
pkg/theia/commands/anomaly_detection_retrieve.go	`89.65% <100.00%> (+3.50%)`	⬆️
...lugins/anomaly-detection/anomaly_detection_test.py	`94.23% <100.00%> (+1.20%)`	⬆️

... and 1 file with indirect coverage changes

elton-furtado · 2023-03-27T23:54:22Z

pkg/apiserver/registry/intelligence/throughputanomalydetector/rest.go

 		algoCalc,
 		anomaly
 	FROM tadetector WHERE id = (?);`,
+	aggtadpodQuery: `
+	SELECT


code formatter missing in .go files?

no, make fmt does cover rest files
tathgurt@tathgurtFLVDL theia % find . -type d -name '.cache' -prune -o -type f -name '*.go' -print | grep rest ./pkg/apiserver/registry/intelligence/throughputanomalydetector/rest.go ./pkg/apiserver/registry/intelligence/throughputanomalydetector/rest_test.go ./pkg/apiserver/registry/intelligence/networkpolicyrecommendation/rest.go ./pkg/apiserver/registry/intelligence/networkpolicyrecommendation/rest_test.go ./pkg/apiserver/registry/system/supportbundle/rest.go ./pkg/apiserver/registry/system/supportbundle/rest_test.go ./pkg/apiserver/registry/stats/clickhouse/rest.go ./pkg/apiserver/registry/stats/clickhouse/rest_test.go

tushartathgur · 2023-03-28T19:04:34Z

/theia-test-e2e

yanjunz97 · 2023-03-28T18:38:02Z

build/charts/theia/provisioning/datasources/create_table.sh

        algoType String,
        algoCalc Float64,
        throughput Float64,
        anomaly String,
        id String
    ) engine=ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
-    ORDER BY (flowStartSeconds);
+    ORDER BY (flowEndSeconds);


Just for curiosity, why do we change this from flowStartSeconds to flowEndSeconds?

That is because we are reusing the same table for aggregated flow output and it will not have flowStartSeconds col in aggregated output, but the common column in both the outputs is flowEndSeconds

yanjunz97 · 2023-03-28T19:13:03Z

pkg/apiserver/registry/intelligence/throughputanomalydetector/rest.go

+	SELECT
+	    id,
+	    sourcePodNamespace,
+        sourcePodLabels,
+        flowEndSeconds,
+        throughput,


I think it's the mix of spaces and tabs makes the indentation looks strange here. Could you unify them?

yanjunz97 · 2023-03-28T19:16:29Z

pkg/apiserver/registry/intelligence/throughputanomalydetector/rest.go

+	aggtadpodQuery
+	aggtadpod2podQuery
+	aggtadpod2svcQuery


Keep the camel-case naming to make them aggTadPodQuery, aggTadPod2PodQuery, aggTadPod2SvcQuery

yanjunz97 · 2023-03-28T19:23:05Z

pkg/theia/commands/anomaly_detection_delete.go

+	)
+}
+
+func deleteTADid(cmd *cobra.Command, tadName string) error {


Suggested change

func deleteTADid(cmd *cobra.Command, tadName string) error {

func deleteTADId(cmd *cobra.Command, tadName string) error {

yanjunz97 · 2023-03-28T19:30:21Z

pkg/theia/commands/anomaly_detection_retrieve.go

+		case "e2e":
+			fmt.Fprintf(w, "id\tsourceIP\tsourceTransportPort\tdestinationIP\tdestinationTransportPort\tflowStartSeconds\tflowEndSeconds\tthroughput\taggType\talgoType\talgoCalc\tanomaly\n")
+			for _, p := range tad.Stats {
+				fmt.Fprintf(w, "%v\t%v\t%v\t%v\t%v\t%v\t%v\t%v\t%v\t%v\t%v\t%v\n", p.Id, p.SourceIP, p.SourceTransportPort, p.DestinationIP, p.DestinationTransportPort, p.FlowStartSeconds, p.FlowEndSeconds, p.Throughput, p.AggType, p.AlgoType, p.AlgoCalc, p.Anomaly)
+			}


I feels this is not quite readable. Is it possible to construct a matrix and reuse the TableOutput function here?

yanjunz97 · 2023-03-28T19:34:10Z

pkg/theia/commands/anomaly_detection_run.go

+	throughputAnomalyDetectionAlgoCmd.Flags().String(
+		"agg-flow",
+		"",
+		`Specifies which aggregated flow to perform anomaly detection on, options are pods/pod2pod/pod2svc`,
+	)
+	throughputAnomalyDetectionAlgoCmd.Flags().String(
+		"p2p-label",
+		"",
+		`On choosing agg-flow as pod2pod, user need to mention labels for inbound/outbound throughput`,


For the newly added functionality, could you include them in documentation?

Thanks for the comment, I have added it in the other PR 192

dreamtalen · 2023-03-28T18:42:55Z

plugins/anomaly-detection/anomaly_detection.py

-        f.col("new.anomaly").alias("anomaly"))
+def filter_df_with_true_anomalies(spark, plotDF, algo_type, agg_flow=None):
+    if agg_flow:
+        plotDF = plotDF.withColumn(


I recommend merging the shared operations found in lines 317-343. Looks like the different operations are selecting different columns in 322-324 and 336-338.

dreamtalen · 2023-03-29T00:07:32Z

plugins/anomaly-detection/anomaly_detection.py

        f.collect_list("flowEndSeconds").alias("flowEndSeconds"),
        f.stddev_samp("max(throughput)").alias("throughputStandardDeviation"),
-        f.collect_list(f.struct(["Diff_Secs", "max(throughput)"])).alias(
-            "Diff_Secs, Throughput"))
+        f.collect_list(f.struct(["max(throughput)"])).alias("Throughput"))


As we discussed last Friday, this should be sum(throughput) for aggregated TAD, I noticed you put that changes on #192, can you move all functional code changes inside this PR and let #192 have test codes only?

tushartathgur · 2023-03-29T17:30:10Z

/theia-test-e2e

dreamtalen

I think the main issue is that the implementation has deviated from our initial design. If you prefer, we can discuss this further offline.

dreamtalen · 2023-03-29T21:38:04Z

pkg/theia/commands/anomaly_detection_retrieve.go

+				result = append(result, []string{p.Id, p.SourcePodNamespace, p.SourcePodLabels, p.DestinationPodNamespace, p.DestinationPodLabels, p.FlowEndSeconds, p.Throughput, p.AggType, p.AlgoType, p.AlgoCalc, p.Anomaly})
+			}
+		case "pod_to_svc":
+			result = append(result, []string{"id", "sourcePodNamespace", "sourcePodLabels", "destinationServicePortName", "flowEndSeconds", "throughput", "aggType", "algoType", "algoCalc", "anomaly"})


I don't think the 'pod_to_svc' type aligns with our design. Our goal was to monitor the aggregated throughput of all traffic directed to a Service, so the 'sourcePodNamespace' and 'sourcePodLabels' parameters are not relevant in this case.

dreamtalen · 2023-03-29T21:43:21Z

pkg/theia/commands/anomaly_detection_retrieve.go

+			for _, p := range tad.Stats {
+				result = append(result, []string{p.Id, p.SourceIP, p.SourceTransportPort, p.DestinationIP, p.DestinationTransportPort, p.FlowStartSeconds, p.FlowEndSeconds, p.Throughput, p.AggType, p.AlgoType, p.AlgoCalc, p.Anomaly})
+			}
+		case "pod_to_external":


Same concern for the 'pod_to_external' type. I think our goal was to monitor the aggregated throughput of all traffic directed to an external IP.

dreamtalen · 2023-03-29T21:47:14Z

pkg/theia/commands/anomaly_detection_retrieve.go

+			for _, p := range tad.Stats {
+				result = append(result, []string{p.Id, p.SourcePodNamespace, p.SourcePodLabels, p.FlowEndSeconds, p.Throughput, p.AggType, p.AlgoType, p.AlgoCalc, p.Anomaly})
+			}
+		case "pod_to_pod":


"pod_to_pod" seems a fair usecase here, just want to point out our initial goal was to monitor the aggregated in/out bound throughput of a set of specific pod labels.

dreamtalen · 2023-03-30T06:03:56Z

pkg/theia/commands/anomaly_detection_run.go

+			throughputAnomalyDetection.AggregatedFlow = aggregatedFlow
+			throughputAnomalyDetection.ExternalIP = externalIp
+		case "svc":
+			throughputAnomalyDetection.AggregatedFlow = aggregatedFlow


Why we require users input Pod labels and IP for "pod" and "external" cases but not require a service name for the "svc" case?
I feel Pod labels and external IP are optional input, if user doesn't provide this info, we will considering all possible pod labels and external IPs. Like what we have done in the "svc" case.

Sure, I can add them as optional parameters, just wanted to confirm if the user doesn’t provide labels or external ip, should we have a check if the corresponding columns are empty or should we even include those columns in DF ?

For example if user chooses "external" agg type and doesn't provided any input IP, we will consider all toExternal flows, group by the destinationIP, aggregated throughput for each destinationIP, and find anomalies if any.
Hope this exmaple answered your question.

for “external” case, it may work as we have flow type to define if the traffic is for external, but there could be pods with no labels, and pods with any type of label, should we add both of them? they both would have different kind of queries.

I think we should add both the cases of empty and non empty case for labels, also we can add svc name as argument for user to provide a specific service name, but in case of service age_type, we should only consider non empty cases

I see, we can consider the non empty case only for now if the empty case needs big changes.
Maybe you can add another parameter "podname" in the pod agg case, it will help cover the cases of pods with no labels.

Currently, we should not consider the empty cases, as we look for pods based of labels, if there are no labels it would be little misleading to still collect them based of their name.

What I meant was to have a user input parameter 'podname' that would cover cases where the user knows the pods they're interested in don't have labels.
Just a nice-to-have.

dreamtalen · 2023-07-07T22:35:08Z

build/charts/theia/provisioning/datasources/migrators/000005_0-6-0.down.sql

@@ -198,3 +198,17 @@ ALTER TABLE flows
 ALTER TABLE flows_local
    DROP COLUMN egressName,
    DROP COLUMN egressIP;
+ALTER TABLE tadetector
+    DROP COLUMN podNamespace;


It should be a comma instead of semicolon between DROP COLUMNs, please check my previous comment: #184 (comment)

dreamtalen · 2023-07-07T22:36:13Z

build/charts/theia/provisioning/datasources/migrators/000005_0-6-0.up.sql

@@ -137,3 +137,17 @@ ALTER TABLE flows
 ALTER TABLE flows_local
    ADD COLUMN egressName String,
    ADD COLUMN egressIP String;
+ALTER TABLE tadetector
+    ADD COLUMN podNamespace;


Same above, use comma and add datatype in ADD COLUMNs, please check my previous comment: #184 (comment)

tushartathgur · 2023-07-10T04:37:03Z

/theia-test-e2e

tushartathgur · 2023-07-10T05:05:07Z

/theia-test-e2e

elton-furtado · 2023-07-10T16:54:10Z

/theia-test-e2e

Signed-off-by: Tushar Tathgur <[email protected]>

tushartathgur · 2023-07-11T16:34:07Z

/theia-test-e2e

tushartathgur · 2023-07-11T17:00:27Z

/theia-test-e2e

tushartathgur · 2023-07-11T17:56:01Z

/theia-test-e2e

tushartathgur · 2023-07-11T18:22:56Z

/theia-test-e2e

tushartathgur · 2023-07-11T19:01:13Z

/theia-test-e2e

tushartathgur · 2023-07-11T19:02:24Z

/theia-test-e2e

tushartathgur · 2023-07-11T19:32:20Z

/theia-test-e2e

tushartathgur · 2023-07-11T20:15:45Z

/theia-test-e2e

tushartathgur · 2023-07-11T21:35:39Z

/theia-test-e2e

tushartathgur · 2023-07-11T22:05:19Z

/theia-test-e2e

Signed-off-by: Tushar Tathgur <[email protected]>

tushartathgur · 2023-07-11T22:56:23Z

/theia-test-e2e

tushartathgur force-pushed the agg_flow_part2 branch 6 times, most recently from 66bd428 to 1518a84 Compare March 21, 2023 22:55

tushartathgur requested review from yanjunz97 and dreamtalen March 21, 2023 23:22

tushartathgur force-pushed the agg_flow_part2 branch 5 times, most recently from abcf5bc to 842d8e8 Compare March 27, 2023 20:18

elton-furtado reviewed Mar 28, 2023

View reviewed changes

tushartathgur force-pushed the agg_flow_part2 branch from 842d8e8 to cda0b2b Compare March 28, 2023 19:03

yanjunz97 reviewed Mar 28, 2023

View reviewed changes

tushartathgur force-pushed the agg_flow_part2 branch from 3a013ec to c031f46 Compare March 28, 2023 22:12

dreamtalen reviewed Mar 29, 2023

View reviewed changes

tushartathgur force-pushed the agg_flow_part2 branch from c031f46 to 3a8d388 Compare March 29, 2023 01:15

dreamtalen reviewed Mar 29, 2023

View reviewed changes

tushartathgur force-pushed the agg_flow_part2 branch from 3a8d388 to 6f682d2 Compare March 30, 2023 05:33

dreamtalen reviewed Mar 30, 2023

View reviewed changes

tushartathgur force-pushed the agg_flow_part2 branch 5 times, most recently from 4c2299f to ed6a2b4 Compare March 30, 2023 19:36

tushartathgur force-pushed the agg_flow_part2 branch from c76ecaf to 570e83a Compare July 7, 2023 21:00

dreamtalen reviewed Jul 7, 2023

View reviewed changes

tushartathgur force-pushed the agg_flow_part2 branch 2 times, most recently from 512a38a to 11741fa Compare July 8, 2023 00:13

Addressed comments with Table and datasources changes

2ad63d0

Signed-off-by: Tushar Tathgur <[email protected]>

tushartathgur force-pushed the agg_flow_part2 branch from 11741fa to d921d1a Compare July 11, 2023 16:33

tushartathgur force-pushed the agg_flow_part2 branch from d921d1a to c9bb8e6 Compare July 11, 2023 16:59

tushartathgur force-pushed the agg_flow_part2 branch from c9bb8e6 to bf7d123 Compare July 11, 2023 17:55

tushartathgur force-pushed the agg_flow_part2 branch from bf7d123 to 1ff58a2 Compare July 11, 2023 18:20

tushartathgur force-pushed the agg_flow_part2 branch from 1ff58a2 to 19688ec Compare July 11, 2023 19:31

tushartathgur force-pushed the agg_flow_part2 branch from 19688ec to 73d84dc Compare July 11, 2023 20:15

tushartathgur force-pushed the agg_flow_part2 branch from 73d84dc to ff39e79 Compare July 11, 2023 21:35

tushartathgur force-pushed the agg_flow_part2 branch from ff39e79 to c98c8ae Compare July 11, 2023 22:04

Resolving Jenkins e2e issue

e305fdd

Signed-off-by: Tushar Tathgur <[email protected]>

tushartathgur force-pushed the agg_flow_part2 branch from c98c8ae to e305fdd Compare July 11, 2023 22:55

tushartathgur merged commit aa09d11 into antrea-io:main Jul 12, 2023
42 checks passed

	func deleteTADid(cmd *cobra.Command, tadName string) error {
	func deleteTADId(cmd *cobra.Command, tadName string) error {

Aggregated Throughput Anomaly detection #184

Aggregated Throughput Anomaly detection #184

Conversation

tushartathgur commented Mar 17, 2023

codecov bot commented Mar 17, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

tushartathgur Mar 28, 2023 • edited Loading

Choose a reason for hiding this comment

tushartathgur commented Mar 28, 2023

Choose a reason for hiding this comment

tushartathgur Mar 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tushartathgur commented Mar 29, 2023

dreamtalen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dreamtalen Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tushartathgur Mar 30, 2023 • edited Loading

Choose a reason for hiding this comment

dreamtalen Mar 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tushartathgur commented Jul 10, 2023

tushartathgur commented Jul 10, 2023

elton-furtado commented Jul 10, 2023

tushartathgur commented Jul 11, 2023

tushartathgur commented Jul 11, 2023

tushartathgur commented Jul 11, 2023

tushartathgur commented Jul 11, 2023

tushartathgur commented Jul 11, 2023

tushartathgur commented Jul 11, 2023

tushartathgur commented Jul 11, 2023

tushartathgur commented Jul 11, 2023

tushartathgur commented Jul 11, 2023

tushartathgur commented Jul 11, 2023

tushartathgur commented Jul 11, 2023

codecov bot commented Mar 17, 2023 •

edited

Loading

tushartathgur Mar 28, 2023 •

edited

Loading

tushartathgur Mar 28, 2023 •

edited

Loading

dreamtalen Mar 29, 2023 •

edited

Loading

tushartathgur Mar 30, 2023 •

edited

Loading

dreamtalen Mar 30, 2023 •

edited

Loading