-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: OSPFD encapsulation #43
Open
xspbrx
wants to merge
1,357
commits into
ospf-unnumbered
Choose a base branch
from
master
base: ospf-unnumbered
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
IPv6 multipath is broken in BGP if nexthop contains only global address. IPv6 always uses both nextop IPv6 address and ifIndex in sending routes down to zebra. In cases where only the global IPv6 address is present in the nexthop information, the existing code doesn't set the ifIndex. An example of such a case is when a route-map isused with "set ipv6 next-hop" and only global address is specified. This code causes the ifIndex to be determined and set thereby fixing the multipath programming. Signed-off-by: Dinesh G Dutt <[email protected]> Reviewed-by: Shrijeet Mukherjee <[email protected]>
0. Introduction This is the design specification for next hop tracking feature in Quagga. 1. Background Recursive routes are of the form: p/m --> n [Ex: 1.1.0.0/16 --> 2.2.2.2] where 'n' itself is resolved through another route as follows: p2/m --> h, interface [Ex: 2.2.2.0/24 --> 3.3.3.3, eth0] Usually, BGP routes are recursive in nature and BGP nexthops get resolved through an IGP route. IGP usually adds its routes pointing to an interface (these are called non-recursive routes). When BGP receives a recursive route from a peer, it needs to validate the nexthop. The path is marked valid or invalid based on the reachability status of the nexthop. Nexthop validation is also important for BGP decision process as the metric to reach the nexthop is a parameter to best path selection process. As it goes with routing, this is a dynamic process. Route to the nexthop can change. The nexthop can become unreachable or reachable. In the current BGP implementation, the nexthop validation is done periodically in the scanner run. The default scanner run interval is one minute. Every minute, the scanner task walks the entire BGP table. It checks the validity of each nexthop with Zebra (the routing table manager) through a request and response message exchange between BGP and Zebra process. BGP process is blocked for that duration. The mechanism has two major drawbacks: (1) The scanner task runs to completion. That can potentially starve the other tasks for long periods of time, based on the BGP table size and number of nexthops. (2) Convergence around routing changes that affect the nexthops can be long (around a minute with the default intervals). The interval can be shortened to achieve faster reaction time, but it makes the first problem worse, with the scanner task consuming most of the CPU resources. "Next hop tracking" feature makes this process event-driven. It eliminates periodic nexthop validation and introduces an asynchronous communication path between BGP and Zebra for route change notifications that can then be acted upon. 2. Goal Stating the obvious, the main goal is to remove the two limitations we discussed in the previous section. The goals, in a constructive tone, are the following: - fairness: the scanner run should not consume an unjustly high amount of CPU time. This should give an overall good performance and response time to other events (route changes, session events, IO/user interface). - convergence: BGP must react to nexthop changes instantly and provide sub-second convergence. This may involve diverting the routes from one nexthop to another. 3. Overview of the changes The changes are in both BGP and Zebra modules. The short summary is the following: - Zebra implements a registration mechanism by which clients can register for next hop notification. Consequently, it maintains a separate table, per (VRF, AF) pair, of next hops and interested client-list per next hop. - When the main routing table changes in Zebra, it evaluates the next hop table: for each next hop, it checks if the route table modifications have changed its state. If so, it notifies the interested clients. - BGP is one such client. It registers the next hops corresponding to all of its received routes/paths. It also threads the paths against each nexthop structure. - When BGP receives a next hop notification from Zebra, it walks the corresponding path list. It makes them valid or invalid depending on the next hop notification. It then re-computes best path for the corresponding destination. This may result in re-announcing those destinations to peers. 4. Design 4.1. Modules The core design introduces an "nht" (next hop tracking) module in BGP and "rnh" (recursive nexthop) module in Zebra. The "nht" module provides the following APIs: bgp_find_or_add_nexthop() : find or add a nexthop in BGP nexthop table bgp_find_nexthop() : find a nexthop in BGP nexthop table bgp_parse_nexthop_update() : parse a nexthop update message coming from zebra The "rnh" module provides the following APIs: zebra_add_rnh() : add a recursive nexthop zebra_delete_rnh() : delete a recursive nexthop zebra_lookup_rnh() : lookup a recursive nexthop zebra_add_rnh_client() : register a client for nexthop notifications against a recursive nexthop zebra_remove_rnh_client(): remove the client registration for a recursive nexthop zebra_evaluate_rnh_table(): (re)evaluate the recursive nexthop table (most probably because the main routing table has changed). zebra_cleanup_rnh_client(): Cleanup a client from the "rnh" module data structures (most probably because the client is going away). 4.2. Control flow The next hop registration control flow is the following: <==== BGP Process ====>|<==== Zebra Process ====> | receive module nht module | zserv module rnh module ---------------------------------------------------------------------- | | | bgp_update_ | | | main() | bgp_find_or_add_ | | | nexthop() | | | | | | | zserv_nexthop_ | | | register() | | | | zebra_add_rnh() | | | The next hop notification control flow is the following: <==== Zebra Process ====>|<==== BGP Process ====> | rib module rnh module | zebra module nht module ---------------------------------------------------------------------- | | | meta_queue_ | | | process() | zebra_evaluate_ | | | rnh_table() | | | | | | | bgp_read_nexthop_ | | | update() | | | | bgp_parse_ | | | nexthop_update() | | | 4.3. zclient message format ZEBRA_NEXTHOP_REGISTER and ZEBRA_NEXTHOP_UNREGISTER messages are encoded in the following way: /* * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | AF | prefix len | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * . Nexthop prefix . * . . * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * . . * . . * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | AF | prefix len | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * . Nexthop prefix . * . . * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ ZEBRA_NEXTHOP_UPDATE message is encoded as follows: /* * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | AF | prefix len | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * . Nexthop prefix getting resolved . * . . * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | metric | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | #nexthops | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | nexthop type | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * . resolving Nexthop details . * . . * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * . . * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | nexthop type | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * . resolving Nexthop details . * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ 4.4. BGP data structure Legend: /\ struct bgp_node: a BGP destination/route/prefix \/ [ ] struct bgp_info: a BGP path (e.g. route received from a peer) _ (_) struct bgp_nexthop_cache: a BGP nexthop /\ NULL \/--+ ^ | : +--[ ]--[ ]--[ ]--> NULL /\ : \/--+ : | : +--[ ]--[ ]--> NULL : _ : (_)............. 4.5. Zebra data structure rnh table: O / \ O O / \ O O struct rnh { u_char flags; struct rib *state; struct list *client_list; struct route_node *node; }; 5. User interface changes quagga# show ip nht 3.3.3.3 resolved via kernel via 11.0.0.6, swp1 Client list: bgp(fd 12) 11.0.0.10 resolved via connected is directly connected, swp2 Client list: bgp(fd 12) 11.0.0.18 resolved via connected is directly connected, swp4 Client list: bgp(fd 12) 11.11.11.11 resolved via kernel via 10.0.1.2, eth0 Client list: bgp(fd 12) quagga# show ip bgp nexthop Current BGP nexthop cache: 3.3.3.3 valid [IGP metric 0], #paths 3 Last update: Wed Oct 16 04:43:49 2013 11.0.0.10 valid [IGP metric 1], #paths 1 Last update: Wed Oct 16 04:43:51 2013 11.0.0.18 valid [IGP metric 1], #paths 2 Last update: Wed Oct 16 04:43:47 2013 11.11.11.11 valid [IGP metric 0], #paths 1 Last update: Wed Oct 16 04:43:47 2013 quagga# show ipv6 nht quagga# show ip bgp nexthop detail quagga# debug bgp nht quagga# debug zebra nht 6. Sample test cases r2----r3 / \ / r1----r4 - Verify that a change in IGP cost triggers NHT + shutdown the r1-r4 and r2-r4 links + no shut the r1-r4 and r2-r4 links and wait for OSPF to come back up + We should be back to the original nexthop via r4 now - Verify that a NH becoming unreachable triggers NHT + Shutdown all links to r4 - Verify that a NH becoming reachable triggers NHT + no shut all links to r4 7. Future work - route-policy for next hop validation (e.g. ignore default route) - damping for rapid next hop changes - prioritized handling of nexthop changes ((un)reachability vs. metric changes) - handling recursion loop, e.g. 11.11.11.11/32 -> 12.12.12.12 12.12.12.12/32 -> 11.11.11.11 11.0.0.0/8 -> <interface> - better statistics Addresses upstream comments. "show ip bgp nexthop detail" couldn't display multiple NHs due to a bug. Fix that. Fix reference counts for the nexthop cache entries Signed-off-by: Pradosh Mohapatra <[email protected]> Signed-off-by: Daniel Walton <[email protected]> Signed-off-by: Dinesh Dutt <[email protected]> Signed-off-by: Donald Sharp <[email protected]> Signed-off-by: Vivek Venkatraman <[email protected]> Fix reference counts for the nexthop cache entries. Signed-off-by: Vivek Venkatraman <[email protected]> Edited-by: Paul Jakma <[email protected]> - Fix nexthop_ipv6_add defs in rib.h not having been modified with rib_ prefix. - Remove rib_lookup_and_pushup, appears not to be used except for !HAVE_NETLINK && HAVE_STRUCT_IFALIASREQ case of ioctl.c::if_set_prefix, so it's not being used at all on platform with most testing of RIB.
…ddresses are removed. When you flap an interface repeatedly, you can get into situations where the code has not quite finished cleaning up before the next event happens. Gracefully prevent a NULL dereference. Signed-off-by: Pradosh Mohapatra <[email protected]>
In the absence of this patch, attempting to type "address-family ipv6 unicast" would result in an "Ambiguous command" error and in the case of "address-family ipv6 multicast", the command would silently fail, without the prompt dropping into the address-family mode. The cause is how the parse tree is constructed for ipv6 address family. There was an error in extract.pl.in script and in vtysh.c files which assumed that there was only address family ipv6 command, without unicast or multicast and so the command was failing. Signed-off-by: vivek <[email protected]>
'outq' field in 'show ip bgp sum' displays the number of formatted packets to a peer. Since the route announcement follows an input-buffered pattern (i.e. adj-rib-out is a separate queue of routes per peer and packets are formatted from the routes at the time of TCP write), the outq field doesn't show any interesting data worth watching. The patch is to display the adj-rib-out queue depth instead. signed-off-by: [email protected] reviewed-by: [email protected]
This patch adds support for matching on local preference in BGP route-map. Signed-off-by: Dinesh Dutt <[email protected]>
…reflectors. By default, attribute modification via route-map policy out is ignored on reflected routes. This patch provides an option to allow this modification to occur. Once enabled, it affects all reflected routes. Signed-off-by: Dinesh G Dutt <[email protected]>
Do not allow a program outside Quagga to delete a Quagga route from the kernel. To delete a Quagga route, do it inside Quagga. Signed-off-by: James Li <[email protected]>
Allow the user to actually turn off the set metric command in ospf6d and rip. Signed-off-by: Daniel Walton <[email protected]>
Signed-off-by: Daniel Walton <[email protected]>
Signed-off-by: Vipin Kumar <[email protected]> Reviewed-by: Dinesh Dutt <[email protected]> Reviewed-by: Daniel Walton <[email protected]>
* bgp_aspath.c: (aspath_prepend) aspath_delete_confed_seq may result in as2 being updated, and seg2 becoming invalid. E.g. if the first segment of of as2 is confeds. However, code there after unconditionally reads from seg2. Reset seg2, and re-do the empty check on it. Caught by valgrinding tools/aspathtest.
* bgp_aspath.h: Add BGP_AS_IS_PRIVATE macro. * bgp_aspath.c: (aspath_highest) use said macro to also ensure 4-byte private AS range is ignored in calculating highest public ASN. (aspath_private_as_check) consolidate to use said macro. Note: Extracted from 'bgpd: Add replace-as option to remove-private-as' by [email protected].
AS_PATH comparison is broken if CONFED_AS_SEQ are present. This patch fixes this issue Signed-off-by: Daniel Walton <[email protected]>
There were various failures in ANVL's aggregation tests, this patch fixes those issues found Signed-off-by: Daniel Walton <[email protected]>
…rtisement of a prefix Add these commands to bgp: clear ip bgp prefix A.B.C.D/M clear bgp ipv6 (unicast|multicast) prefix X:X::X:X/M These two commands forces a bestpath calculation to happen again if necessary to re-advertise the prefix. Signed-off-by: Daniel Walton <[email protected]>
When showing a prefix in bgp allow user to specify output based upon the bestpath chosen, multipath information of all information about a prefix(the default) Signed-off-by: Daniel Walton <[email protected]>
Allow the user to enter the 'clear ip ospf interface [IFNAME]' command this resets the connection between ospf and any peers out the specified interface. Signed-off-by: Vipin Kumar <[email protected]>
This is a fix to make sure router-LSA is updated when neighbor's interface ID change is received in hello packet. Signed-off-by: Vipin Kumar <[email protected]>
Clear interface commands for ospfv3. Allow the user to clear all peers out the specified interface. Signed-off-by: Vipin Kumar <[email protected]>
SYMPTOM: If some of the ospfv3 commands like 'show ipv6 ospf6 route' are executed with ospf6d daemon running but before having any ospfv3 configuration, then ospf6d crash is seen. ISSUE: There are a few show commands, which are (unlike others) not checking if ospf6 instance is initialized already. FIX: Add the missing checks, by using OSPF6_CMD_CHECK_RUNNING() in the commands where its needed and not yet used. Signed-off-by: Vipin Kumar <[email protected]> Reviewed-by: Pradosh Mohapatra <[email protected]> Dinesh Dutt <[email protected]>
When turning off redistribution in ospf6, allow the user to specify the full form of the command entered. Signed-off-by: Daniel Walton <[email protected]>
If a BGP path has an unreachable nexthop display that path as invalid Signed-off-by: Daniel Walton <[email protected]>
ISSUE: LSAcks (for directed acks) are being sent to neighbor's unicast address. RFC 2328 says: "The IP destination address for the packet is selected as follows. On physical point-to-point networks, the IP destination is always set to the address AllSPFRouters" Fix is to unconditionally set the destination address for LSAcks over point-to-point links as AllSPFRouters. Quagga OSPF already has similar change for OSPF DBD, LSUpdate and LSrequest packets. Signed-off-by: Vipin Kumar <[email protected]> Reviewed-by: Daniel Walton <[email protected]> Reviewed-by: Dinesh G Dutt <[email protected]>
OSPF silently ignores 'no ip ospf hello-interval X' and 'no ip ospf hello-interval X' Signed-off-by: Daniel Walton <[email protected]> Reviewed-by: Dinesh G Dutt <[email protected]>
ANVL test 17.5. The current implementation wouldn't start sending LSReq unti the DB Desc packets have all been received (no M bit). This caused the test choke up. RFC 2328 allows for sending LSReq on receiving the first DbDesc packet as long as the nbr state is Exchange. This patch fixes that. Signed-off-by: Dinesh Dutt <[email protected]> Edited-by: Paul Jakma <[email protected]> to start the sending of LsReq from the nsm_negotiation_done FSM transition function for ExStart->Exchange, rather than tacking the call to ospf_ls_req_send to the bottom of the DD desc processing function.
As per the RFC, when the NU bit is set, prefix should be ignored. However, the code is currently ignoring prefix with LA bit too. Fixing that part. In future, we should also set LA bit for the loopback addresses. Not doing this part right away, as quagga wont be backward compatible with its own previous releases. Maybe after a release or so, we should start setting LA bit too. Signed-off-by: Vipin Kumar <[email protected]> Reviewed-by: Daniel Walton <[email protected]>
When a route_node has multiple ospf6_routes under it (common subnet case), then the current implementation has an issue in adjusting the route_node->info on a ospf6_route_remove() call. The main reason is that it ends up using exact match to determine if the next ospf6_route belongs to the same route_node or not. Fixing that part to use rnode (the existing back-pointer to the route_node) from the ospf6_route to determine that. Also fixing some of the walks to turn them safe so that the route deletion is fine. Signed-off-by: Vipin Kumar <[email protected]> Reviewed-by: Vivek Venkatraman <[email protected]>
This is to avoid a crash triggered by process termination when ospf6d daemon is running and 'router ospf6' config has not been done yet. Signed-off-by: Vipin Kumar <[email protected]> Reviewed-by: Daniel Walton <[email protected]>
* routeserver.texi: The full-mesh v RS topology diagrammes are fairly obvious and don't deserve so much space. As smaller wrap-floats they might be acceptable, but there seems no way to access that TeX feature from Texinfo. So nuke them. Centre the 2 other figures. Tweak size to avoid bbox overflow messages.
* Makefile.am: The Quagga specific PDF dependency is gone. The overfull boxes that caused texi2dvi to return fail code are gone. So the custom rule can go.
* overfull hboxes cause texi2dvi to return fail, even if the PDf is written. We hacked around this by running it with '... || true', but that sucks for buildboting the docs. Fix all the overfull hboxes. * ospfd.texi: Long command definitions can cause hbox overruns in the columnar command definitions index. This leads to strange errors about "Missing number, treated as zero." when building the index - very hard to figure out. 'show ip ospf database ...' was the culprit. Use a distinct deffnx alias for each option instead of trying to stuff them into 1 line.
…er tweaks * master.cfg: Add a "build-docs" builder to test generation of HTML and PDF docs into the commit checks. With nightly=true property, upload generated docs to a static dir on the master. Run from a NightlyScheduler. Add the properties from the internal worker config to the buildbot BuildSlave properties, so they're visible in the web UI. Filter through a workers2publicprops helper, to whitelist allowed props.
was accidentally not implemented earlier
See bugzilla #948
to triggers refresh of IKE SAs immediately on this command
helps to debug configuration problems
RFC2332 states that prefix length MUST be 0xff for unique bindings. However, it seems at least some Cisco firmwares use host prefix length instead (which on wire level makes sense). Relax the handling of prefix length to treat all value longer than address length as 0xff. Additionally treat 0x00 the same way too, this is required by the RFC. This also fixes the prefix length address family to be checked against protocol address.
Signed-off-by: Donald Sharp <[email protected]>
Signed-off-by: Donald Sharp <[email protected]>
" % Ambiguous command." Fixes bug #950 Signed-off-by: Balaji Gurudoss <[email protected]>
Fix to resolve ripv2 update process from skipping over non multicast interfaces when sending updates. Reported by: Christian Hammers <[email protected]>
Cisco has a bug that it rejects packets with zero hop count. Use one to avoid potential forwarding of registration requests.
Fixes bug #955. Removed the installation of vrf specific tag and distance command. Reported by: goodman <[email protected]>
If you have an AS_PATH with more entries than what can be written into a single AS_SEGMENT_MAX it needs to be broken up. The code that noticed that the AS_PATH needs to be broken up was not correctly calculating the size of the resulting message. This patch addresses this issue.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
Where is the encapsulation code for OSPFD?
Thank you