FRR 10.0.1 route maps filtering prefixes that should be sent #16367

fedepaol · 2024-07-11T15:14:45Z

Description

This is happening in metallb/frr-k8s#166 , where I am bumping the version used in frr-k8s from 9.1 to 10.0.1. The FRR-K8s CI is quite stable, while after the bump I am seeing a few flakes.

172.18.0.2 tries to announce 192.168.0.0/24 and 172.18.0.5 is not receiving it

I am going to attach the full configuration, but adding here the relevant snippets:

Neighbor route map:

address-family ipv4 unicast
  network 192.168.0.0/24
  network 192.168.1.0/24
  network 192.168.2.0/24
  network 192.168.3.0/24
  neighbor 172.18.0.5 activate
  neighbor 172.18.0.5 route-map 172.18.0.5-in in
  neighbor 172.18.0.5 route-map 172.18.0.5-out out

Prefix lists:

ip prefix-list 172.18.0.5-100-ipv4-localpref-prefixes seq 1 permit 192.168.0.0/24
ip prefix-list 172.18.0.5-100-ipv4-localpref-prefixes seq 2 permit 192.168.1.0/24
ip prefix-list 172.18.0.5-pl-ipv4 seq 1 permit 192.168.0.0/24
ip prefix-list 172.18.0.5-pl-ipv4 seq 2 permit 192.168.1.0/24
ip prefix-list 172.18.0.5-pl-ipv4 seq 3 permit 192.168.2.0/24
ip prefix-list 172.18.0.5-pl-ipv4 seq 4 permit 192.168.3.0/24
ip prefix-list 172.18.0.5-150-ipv4-localpref-prefixes seq 1 permit 192.168.2.0/24
ip prefix-list 172.18.0.5-inpl-ipv4 seq 1 deny any
ip prefix-list 172.18.0.5-denied-inpl-ipv4 seq 1 permit 192.167.9.0/24 le 32

Route maps:


route-map 172.18.0.5-out permit 1
 match ip address prefix-list 172.18.0.5-100-ipv4-localpref-prefixes
 on-match next
 set local-preference 100
exit
!
route-map 172.18.0.5-out permit 2
 match ip address prefix-list 172.18.0.5-100-ipv4-localpref-prefixes
 on-match next
 set local-preference 100
exit
!
route-map 172.18.0.5-out permit 3
 match ip address prefix-list 172.18.0.5-150-ipv4-localpref-prefixes
 on-match next
 set local-preference 150
exit
!
route-map 172.18.0.5-out permit 4
 match ip address prefix-list 172.18.0.5-pl-ipv4
exit
!
route-map 172.18.0.5-out permit 5
 match ipv6 address prefix-list 172.18.0.5-pl-ipv4
exit
!

Show bgp ipv4 on 172.18.0.2

Instance default:
BGP table version is 4, local router ID is 172.18.0.2, vrf id 0
Default local pref 100, local AS 64512
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
 *> 192.168.0.0/24   0.0.0.0                  0         32768 i
 *> 192.168.1.0/24   0.0.0.0                  0         32768 i
 *> 192.168.2.0/24   0.0.0.0                  0         32768 i
 *> 192.168.3.0/24   0.0.0.0                  0         32768 i

Show bgp ipv4 on 172.18.0.5:

Dumping show bgp ip for ebgp-single-hop, local addresses: ipv4 - 172.18.0.5, ipv6 - fc00:f853:ccd:e793::5

Instance default:
BGP table version is 20, local router ID is 172.18.0.5, vrf id 0
Default local pref 100, local AS 4200000000
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
 *= 192.168.0.0/24   172.18.0.4               0             0 64512 i
 *>                  172.18.0.3               0             0 64512 i
 *= 192.168.1.0/24   172.18.0.4               0             0 64512 i
 *>                  172.18.0.3               0             0 64512 i
 *= 192.168.2.0/24   172.18.0.4               0             0 64512 i
 *>                  172.18.0.3               0             0 64512 i
 *= 192.168.3.0/24   172.18.0.4               0             0 64512 i
 *>                  172.18.0.3               0             0 64512 i

(as we can see some other instances were able to push the routes.

The logs say the prefix is being filtered by the route-map:


2024/07/11 11:05:26.147 BGP: [WJC21-WPGZF] u171:s171 announcing routes to 172.18.0.5, combined into 1 peers
2024/07/11 11:05:26.147 BGP: [T5JFA-13199] subgroup_process_announce_selected: p=192.168.0.0/24, selected=0x7fb35fffd0b0
2024/07/11 11:05:26.147 BGP: [VERY4-P6JC8] 172.18.0.5(Unknown) [Update:SEND] 192.168.0.0/24 is filtered by route-map '172.18.0.5-out'
2024/07/11 11:05:26.147 BGP: [T5JFA-13199] subgroup_process_announce_selected: p=192.168.1.0/24, selected=0x7fb35fffd1f0
2024/07/11 11:05:26.147 BGP: [VERY4-P6JC8] 172.18.0.5(Unknown) [Update:SEND] 192.168.1.0/24 is filtered by route-map '172.18.0.5-out'
2024/07/11 11:05:26.147 BGP: [T5JFA-13199] subgroup_process_announce_selected: p=192.168.2.0/24, selected=0x7fb35fffd330
2024/07/11 11:05:26.147 BGP: [VERY4-P6JC8] 172.18.0.5(Unknown) [Update:SEND] 192.168.2.0/24 is filtered by route-map '172.18.0.5-out'
2024/07/11 11:05:26.147 BGP: [T5JFA-13199] subgroup_process_announce_selected: p=192.168.3.0/24, selected=0x7fb35fffd470
2024/07/11 11:05:26.147 BGP: [VERY4-P6JC8] 172.18.0.5(Unknown) [Update:SEND] 192.168.3.0/24 is filtered by route-map '172.18.0.5-out'
2024/07/11 11:05:26.149 BGP: [WJC21-WPGZF] u174:s174 announcing routes to 172.30.0.3, combined into 1 peers

Version

I am experiencing this using the images quay.io/frrouting/frr:10.0.1 and quay.io/frrouting/frr:master

How to reproduce

It's a flake, the same test sometimes passes and some other time it does not. I have the feeling that it is associated to prefixlist with community / local preferences. On the other hand, it happens often enough to be able to provide the right dump commands after a test fails.

Expected behavior

The neighbor should receive the prefixes

Actual behavior

The instance running on the cluster is not advertising the prefixes.

Additional context

No response

Checklist

I have searched the open issues for this bug.
I have not included sensitive information in this report.

The text was updated successfully, but these errors were encountered:

fedepaol · 2024-07-11T15:16:20Z

Full running config of 172.18.0.5 (the receiver)

####### Show running config
Building configuration...

Current configuration:
!
frr version 10.0.1_git
frr defaults traditional
hostname cd1e47ac22ac
log file /tmp/frr.log
log timestamp precision 3
!
debug zebra nht
debug bgp neighbor-events
debug bgp nht
debug bgp updates in
debug bgp updates out
debug bfd peer
!
password zebra
!
router bgp 4200000000
 bgp router-id 172.18.0.5
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 no bgp network import-check
 neighbor 172.18.0.2 remote-as 64512
 neighbor 172.18.0.3 remote-as 64512
 neighbor 172.18.0.4 remote-as 64512
 !
 address-family ipv4 unicast
  neighbor 172.18.0.2 activate
  neighbor 172.18.0.2 next-hop-self
  neighbor 172.18.0.3 activate
  neighbor 172.18.0.3 next-hop-self
  neighbor 172.18.0.4 activate
  neighbor 172.18.0.4 next-hop-self
 exit-address-family
exit
!
route-map RMAP permit 10
 set ipv6 next-hop prefer-global
exit
!
end

fedepaol · 2024-07-11T15:26:30Z

Running config for the instance running on the cluster

runningconfig.txt

fedepaol · 2024-07-11T15:27:05Z

logs of the instance running on the cluster
frrlogs.txt

ton31337 · 2024-07-12T13:21:08Z

route-map 172.18.0.5-out permit 1
 match ip address prefix-list 172.18.0.5-100-ipv4-localpref-prefixes
 on-match next
 set local-preference 100
exit
!
route-map 172.18.0.5-out permit 2
 match ip address prefix-list 172.18.0.5-100-ipv4-localpref-prefixes
 on-match next
 set local-preference 100
exit

Two identical match/set with different sequence numbers?

Btw, could you please enable debug route-map detail? It would at least tell what's going on a bit more.

Also, I'm trying to understand the use case here with on-match next. Could you drop it and test without on-match next, because you don't use it at all (as it should be).

fedepaol · 2024-07-12T13:48:38Z

route-map 172.18.0.5-out permit 1
 match ip address prefix-list 172.18.0.5-100-ipv4-localpref-prefixes
 on-match next
 set local-preference 100
exit
!
route-map 172.18.0.5-out permit 2
 match ip address prefix-list 172.18.0.5-100-ipv4-localpref-prefixes
 on-match next
 set local-preference 100
exit

Two identical match/set with different sequence numbers?

Ugh, I guess we can remove them

Btw, could you please enable debug route-map detail? It would at least tell what's going on a bit more.

Yup, will do and re-run

Also, I'm trying to understand the use case here with on-match next. Could you drop it and test without on-match next, because you don't use it at all (as it should be).

The idea (maybe I am mis-using it) is to apply the local pref and then validate the ip later on to the following maps. I need it in case I want also to apply a community to the same prefixes

fedepaol · 2024-07-15T12:39:15Z

@ton31337 here we go:

2024/07/12 15:53:58.328 BGP: [T5JFA-13199] subgroup_process_announce_selected: p=192.168.0.0/24, selected=0x7f56a3a81330
2024/07/12 15:53:58.328 BGP: [RYF1Z-ZKDRS] route_match_address_prefix_list: Prefix List 172.30.0.2-100-ipv4-localpref-prefixes specified does not exist defaulting to NO_MATCH
2024/07/12 15:53:58.328 BGP: [RYF1Z-ZKDRS] route_match_address_prefix_list: Prefix List 172.30.0.2-10:100-ipv4-community-prefixes specified does not exist defaulting to NO_MATCH
2024/07/12 15:53:58.328 BGP: [RYF1Z-ZKDRS] route_match_address_prefix_list: Prefix List 172.30.0.2-150-ipv4-localpref-prefixes specified does not exist defaulting to NO_MATCH
2024/07/12 15:53:58.328 BGP: [RYF1Z-ZKDRS] route_match_address_prefix_list: Prefix List 172.30.0.2-10:101-ipv4-community-prefixes specified does not exist defaulting to NO_MATCH
2024/07/12 15:53:58.328 BGP: [RYF1Z-ZKDRS] route_match_address_prefix_list: Prefix List 172.30.0.2-pl-ipv4 specified does not exist defaulting to NO_MATCH
2024/07/12 15:53:58.328 BGP: [KZK58-6T4Y6] No best match sequence for pfx: 192.168.0.0/24 in route-map: 172.30.0.2-out, result: no match

Show running config and full log attached :
showrunning.txt.gz

fulllog.txt.gz

fedepaol · 2024-07-15T12:40:13Z

I checked only the first, seem to be there, so I'd say it's not related to the redundant match (although it makes sense to remove it).

ton31337 · 2024-07-16T14:28:23Z

Is this happening during frr-reload, on boot, container restart?

fedepaol · 2024-07-16T14:42:52Z

Definitely during reload. It's a battery of e2e tests, where on each test we run a different configuration and do some assertions.

ton31337 · 2024-07-20T08:43:32Z

Could it be possible to have a full reload log? frr-reload.py --debug --reload /etc/frr/frr.conf. I'd like to understand what actually triggers this flake...

fedepaol · 2024-07-25T14:06:10Z

Here's another run:


This is the neighbor complaining about missing routes (172.18.0.5) 

Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
 *>i192.168.2.0/24   172.18.0.2               0    100      0 i
 *=i                 172.18.0.4               0    100      0 i

Displayed 1 routes and 2 total paths




This is 172.18.0.3:


Instance default:
BGP table version is 2, local router ID is 172.18.0.3, vrf id 0
Default local pref 100, local AS 64512
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
 *> 192.168.2.0/24   0.0.0.0                  0         32768 i
 *> 192.169.2.0/24   0.0.0.0                  0         32768 i

Displayed 2 routes and 2 total paths




ip prefix-list 172.18.0.5-pl-ipv4 seq 1 permit 192.168.2.0/24

Still have the filter

2024/07/23 15:29:13.553 BGP: [VERY4-P6JC8] 172.18.0.5(Unknown) [Update:SEND] 192.168.2.0/24 is filtered by route-map '172.18.0.5-out'

fedepaol · 2024-07-25T14:08:31Z

Last reload event:
lastreload.txt.gz

fedepaol · 2024-07-25T14:13:02Z

Sequence of reloads in the last 10 minutes
reload.txt.gz

Every reload sequence starts with
Caught SIGHUP and acquired lock! Reloading FRR.. and ends with FRR reloaded successfully! 2 seconds

Every cycle, we do a frr-reload.py --test first and a frr-reload.py --overwrite later.

fedepaol · 2024-07-26T09:29:44Z

frr.zip

fedepaol · 2024-07-26T09:31:12Z

As discussed on slack, I attached the logs of the reloader and the generated configuration when running with 9.1 (the generated config is the same regardless).

ton31337 · 2024-07-31T10:45:38Z

First, this is the wrong sequence of operations:

ip prefix-list 172.30.0.2-pl-ipv4 seq 1 permit 192.168.2.0/24
ip prefix-list 172.30.0.2-pl-ipv4 seq 2 permit 192.169.2.0/24
route-map 172.30.0.2-out permit 1
  match ip address prefix-list 172.30.0.2-pl-ipv4
route-map 172.30.0.2-out permit 2
  match ipv6 address prefix-list 172.30.0.2-pl-ipv4

ipv6 prefix-list 172.30.0.2-pl-ipv4 seq 3 deny any <<<<<<<<<<<<<<<<<

ipv6 prefix-list ... should be created before route-map statements to avoid noisy logs like below.

Looking at the logs you provided, it's not clear which AFI triggered this output (I assume IPv6):

2024/07/12 15:53:58.328 BGP: [RYF1Z-ZKDRS] route_match_address_prefix_list: Prefix List 172.30.0.2-10:100-ipv4-community-prefixes specified does not exist defaulting to NO_MATCH

Could you move ipv6 prefix-list ... creation above route-maps and re-test. I'd like to get rid of any false positives.

Also, could you enable log commands to see what actual commands were executed and in which order.

fedepaol · 2024-09-13T10:07:47Z

Sorry for the extreme delay @ton31337 , here I am with the additional information:

Here we have 172.18.0.5 not getting routes from 172.18.0.2.

From the logs the usual:

2024/09/12 16:01:41.695 BGP: [WJC21-WPGZF] u109:s109 announcing routes to 172.18.0.5, combined into 1 peers
2024/09/12 16:01:41.695 BGP: [T5JFA-13199] subgroup_process_announce_selected: p=192.168.2.0/24, selected=0x7fb5d00e47d0
2024/09/12 16:01:41.695 BGP: [KZK58-6T4Y6] No best match sequence for pfx: 192.168.2.0/24 in route-map: 172.18.0.5-out, result: no match
2024/09/12 16:01:41.695 BGP: [H5AW4-JFYQC] Route-map: 172.18.0.5-out, prefix: 192.168.2.0/24, result: deny
2024/09/12 16:01:41.695 BGP: [VERY4-P6JC8] 172.18.0.5(Unknown) [Update:SEND] 192.168.2.0/24 is filtered by route-map '172.18.0.5-out'

I applied the requested changes (define the prefix lists before the maps and adding log commands in the configuration):

Attaching a tgz with:

the frr log
the configuration file
the output of show running config
the logs of the last 3 reloads

Also, note that in a few reloads I am seeing

BGP is already running; AS is 64512
line 684: Failure to communicate[13] to bgpd, line: router bgp 64513 vrf red

line 685: Warning[4]...: early exit from config file
[2101|bgpd] Configuration file[/etc/frr/frr.conf] processing failure: 13
[2094|mgmtd] done
2024-09-12 15:57:51,557 �[91mWARNING�[0m: frr-reload.py failed due to
vtysh (exec file) exited with status 13

Not sure it's related or not. I am going to attach also a file with more reload events (reload_full.txt) which contains them too.

logs.zip

fedepaol added the triage Needs further investigation label Jul 11, 2024

ton31337 added the bgp label Jul 11, 2024

ton31337 self-assigned this Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FRR 10.0.1 route maps filtering prefixes that should be sent #16367

FRR 10.0.1 route maps filtering prefixes that should be sent #16367

fedepaol commented Jul 11, 2024

fedepaol commented Jul 11, 2024

fedepaol commented Jul 11, 2024

fedepaol commented Jul 11, 2024

ton31337 commented Jul 12, 2024 •

edited

Loading

fedepaol commented Jul 12, 2024 •

edited

Loading

fedepaol commented Jul 15, 2024

fedepaol commented Jul 15, 2024

ton31337 commented Jul 16, 2024

fedepaol commented Jul 16, 2024

ton31337 commented Jul 20, 2024

fedepaol commented Jul 25, 2024

fedepaol commented Jul 25, 2024

fedepaol commented Jul 25, 2024

fedepaol commented Jul 26, 2024

fedepaol commented Jul 26, 2024

ton31337 commented Jul 31, 2024 •

edited

Loading

fedepaol commented Sep 13, 2024

FRR 10.0.1 route maps filtering prefixes that should be sent #16367

FRR 10.0.1 route maps filtering prefixes that should be sent #16367

Comments

fedepaol commented Jul 11, 2024

Description

Version

How to reproduce

Expected behavior

Actual behavior

Additional context

Checklist

fedepaol commented Jul 11, 2024

fedepaol commented Jul 11, 2024

fedepaol commented Jul 11, 2024

ton31337 commented Jul 12, 2024 • edited Loading

fedepaol commented Jul 12, 2024 • edited Loading

fedepaol commented Jul 15, 2024

fedepaol commented Jul 15, 2024

ton31337 commented Jul 16, 2024

fedepaol commented Jul 16, 2024

ton31337 commented Jul 20, 2024

fedepaol commented Jul 25, 2024

fedepaol commented Jul 25, 2024

fedepaol commented Jul 25, 2024

fedepaol commented Jul 26, 2024

fedepaol commented Jul 26, 2024

ton31337 commented Jul 31, 2024 • edited Loading

fedepaol commented Sep 13, 2024

ton31337 commented Jul 12, 2024 •

edited

Loading

fedepaol commented Jul 12, 2024 •

edited

Loading

ton31337 commented Jul 31, 2024 •

edited

Loading