Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

Pull in bpf/for-next (8f4a95080418) #204

Merged
merged 3,896 commits into from
May 17, 2024
Merged

Pull in bpf/for-next (8f4a95080418) #204

merged 3,896 commits into from
May 17, 2024

Conversation

htejun
Copy link
Collaborator

@htejun htejun commented May 17, 2024

No description provided.

anakryiko and others added 30 commits May 7, 2024 16:21
Extend libbpf's pre-load checks for BPF programs, detecting more typical
conditions that are destinated to cause BPF program failure. This is an
opportunity to provide more helpful and actionable error message to
users, instead of potentially very confusing BPF verifier log and/or
error.

In this case, we detect struct_ops BPF program that was not referenced
anywhere, but still attempted to be loaded (according to libbpf logic).
Suggest that the program might need to be used in some struct_ops
variable. User will get a message of the following kind:

  libbpf: prog 'test_1_forgotten': SEC("struct_ops") program isn't referenced anywhere, did you forget to use it?

Suggested-by: Tejun Heo <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
Add a simple test that validates that libbpf will reject isolated
struct_ops program early with helpful warning message.

Also validate that explicit use of such BPF program through BPF skeleton
after BPF object is open won't trigger any warnings.

Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
Drive-by clean up, we shouldn't use meaningless "test_" prefix for
subtest names.

Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
Andrii Nakryiko says:

====================
Fix yet another case of mishandling SEC("struct_ops") programs that were
nulled out programmatically through BPF skeleton by the user.

While at it, add some improvements around detecting and reporting errors,
specifically a common case of declaring SEC("struct_ops") program, but
forgetting to actually make use of it by setting it as a callback
implementation in SEC(".struct_ops") variable (i.e., map) declaration.

A bunch of new selftests are added as well.
====================

Signed-off-by: Martin KaFai Lau <[email protected]>
…nnection attempt

Netdev CI reports occasional failures with this test
("ERROR: ns2-dX6bUE did not pick up tcp connection from peer").

Add explicit busywait call until the initial connection attempt shows
up in conntrack rather than a one-shot 'must exist' check.

Signed-off-by: Florian Westphal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
subflow_add_reset_reason(skb, ...) can fail.

We can not assume mptcp_get_ext(skb) always return a non NULL pointer.

syzbot reported:

general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 0 PID: 5098 Comm: syz-executor132 Not tainted 6.9.0-rc6-syzkaller-01478-gcdc74c9d06e7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
 RIP: 0010:subflow_v6_route_req+0x2c7/0x490 net/mptcp/subflow.c:388
Code: 8d 7b 07 48 89 f8 48 c1 e8 03 42 0f b6 04 20 84 c0 0f 85 c0 01 00 00 0f b6 43 07 48 8d 1c c3 48 83 c3 18 48 89 d8 48 c1 e8 03 <42> 0f b6 04 20 84 c0 0f 85 84 01 00 00 0f b6 5b 01 83 e3 0f 48 89
RSP: 0018:ffffc9000362eb68 EFLAGS: 00010206
RAX: 0000000000000003 RBX: 0000000000000018 RCX: ffff888022039e00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88807d961140 R08: ffffffff8b6cb76b R09: 1ffff1100fb2c230
R10: dffffc0000000000 R11: ffffed100fb2c231 R12: dffffc0000000000
R13: ffff888022bfe273 R14: ffff88802cf9cc80 R15: ffff88802ad5a700
FS:  0000555587ad2380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f420c3f9720 CR3: 0000000022bfc000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  tcp_conn_request+0xf07/0x32c0 net/ipv4/tcp_input.c:7180
  tcp_rcv_state_process+0x183c/0x4500 net/ipv4/tcp_input.c:6663
  tcp_v6_do_rcv+0x8b2/0x1310 net/ipv6/tcp_ipv6.c:1673
  tcp_v6_rcv+0x22b4/0x30b0 net/ipv6/tcp_ipv6.c:1910
  ip6_protocol_deliver_rcu+0xc76/0x1570 net/ipv6/ip6_input.c:438
  ip6_input_finish+0x186/0x2d0 net/ipv6/ip6_input.c:483
  NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
  NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
  __netif_receive_skb_one_core net/core/dev.c:5625 [inline]
  __netif_receive_skb+0x1ea/0x650 net/core/dev.c:5739
  netif_receive_skb_internal net/core/dev.c:5825 [inline]
  netif_receive_skb+0x1e8/0x890 net/core/dev.c:5885
  tun_rx_batched+0x1b7/0x8f0 drivers/net/tun.c:1549
  tun_get_user+0x2f35/0x4560 drivers/net/tun.c:2002
  tun_chr_write_iter+0x113/0x1f0 drivers/net/tun.c:2048
  call_write_iter include/linux/fs.h:2110 [inline]
  new_sync_write fs/read_write.c:497 [inline]
  vfs_write+0xa84/0xcb0 fs/read_write.c:590
  ksys_write+0x1a0/0x2c0 fs/read_write.c:643
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 3e14049 ("mptcp: support rstreason for passive reset")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Reviewed-by: Jason Xing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Commit 7e8cdc9 ("nfc: Add KCOV annotations") added
kcov_remote_start_common()/kcov_remote_stop() pair into nci_rx_work(),
with an assumption that kcov_remote_stop() is called upon continue of
the for loop. But commit d24b035 ("nfc: nci: Fix uninit-value in
nci_dev_up and nci_ntf_packet") forgot to call kcov_remote_stop() before
break of the for loop.

Reported-by: syzbot <[email protected]>
Closes: https://syzkaller.appspot.com/bug?extid=0438378d6f157baae1a2
Fixes: d24b035 ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet")
Suggested-by: Andrey Konovalov <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Allow the Dynamic Interrupt Moderation (DIM) library to be built as a
module. This is particularly useful in an Android GKI (Google Kernel
Image) configuration where everything is built as a module, including
Ethernet controller drivers. Having to build DIMLIB into the kernel
image with potentially no user is wasteful.

Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
When building with clang, via:

    make LLVM=1 -C tools/testing/selftest

...clang warns about three variables that are not initialized in all
cases:

1) The opt_ipproto_off variable is used uninitialized if "testname" is
not "ip". Willem de Bruijn pointed out that this is an actual bug, and
suggested the fix that I'm using here (thanks!).

2) The addr_len is used uninitialized, but only in the assert case,
   which bails out, so this is harmless.

3) The family variable in add_listener() is only used uninitialized in
   the error case (neither IPv4 nor IPv6 is specified), so it's also
   harmless.

Fix by initializing each variable.

Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Acked-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
The current behavior is to accept any strings as inputs, this results in
an inconsistent result where an unexisting scheduler can be set:

  # sysctl -w net.mptcp.scheduler=notdefault
  net.mptcp.scheduler = notdefault

This patch changes this behavior by checking for existing scheduler
before accepting the input.

Fixes: e3b2870 ("mptcp: add a new sysctl scheduler")
Cc: [email protected]
Signed-off-by: Gregory Detal <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Tested-by: Geliang Tang <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://lore.kernel.org/r/20240506-upstream-net-20240506-mptcp-sched-exist-v1-1-2ed1529e521e@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
Some usb drivers try to set small skb->truesize and break
core networking stacks.

I replace one skb_clone() by an allocation of a fresh
and small skb, to get minimally sized skbs, like we did
in commit 1e2c611 ("net: cdc_ncm: reduce skb truesize
in rx path") and 4ce62d5 ("net: usb: ax88179_178a:
stop lying about skb->truesize")

Fixes: 361459c ("net: usb: aqc111: Implement RX data path")
Signed-off-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Some usb drivers try to set small skb->truesize and break
core networking stacks.

In this patch, I removed one of the skb->truesize override.

I also replaced one skb_clone() by an allocation of a fresh
and small skb, to get minimally sized skbs, like we did
in commit 1e2c611 ("net: cdc_ncm: reduce skb truesize
in rx path") and 4ce62d5 ("net: usb: ax88179_178a:
stop lying about skb->truesize")

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Steve Glendinning <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Some usb drivers set small skb->truesize and break
core networking stacks.

In this patch, I removed one of the skb->truesize override.

I also replaced one skb_clone() by an allocation of a fresh
and small skb, to get minimally sized skbs, like we did
in commit 1e2c611 ("net: cdc_ncm: reduce skb truesize
in rx path") and 4ce62d5 ("net: usb: ax88179_178a:
stop lying about skb->truesize")

Fixes: c9b3745 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support")
Signed-off-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
…t/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-05-06 (ice)

This series contains updates to ice driver only.

Paul adds support for additional E830 devices and adjusts naming for
existing E830 devices.

Marcin commonizes a couple of TC setup calls to reduce duplicated code.

Mateusz adds ice_vsi_cfg_params into ice_vsi to consolidate info.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: refactor struct ice_vsi_cfg_params to be inside of struct ice_vsi
  ice: Deduplicate tc action setup
  ice: update E830 device ids and comments
  ice: add additional E830 device ids
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
The internal tag string doesn't contain a newline. Append one when
emitting the tag via sysfs.

[Stefan] Orthogonal to the newline issue, sysfs_emit(buf, "%s", fs->tag) is
needed to prevent format string injection.

Signed-off-by: Brian Foster <[email protected]>
Fixes: a8f62f5 ("virtiofs: export filesystem tags through sysfs")
Signed-off-by: Miklos Szeredi <[email protected]>
Add DCB support to get/set trust configuration for different packet
priority information sources. Some switch allow to chose different
source of packet priority classification. For example on KSZ switches it
is possible to configure VLAN PCP and/or DSCP sources.

Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Most of Microchip KSZ switches use Internal Priority Value associated
with every frame. For example, it is possible to map any VLAN PCP or
DSCP value to IPV and at the end, map IPV to a queue.

Since amount of IPVs is not equal to amount of queues, add this
information and make use of it in some functions.

Signed-off-by: Oleksij Rempel <[email protected]>
Acked-by: Arun Ramadoss <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
IEEE 802.1q specification provides recommendation and examples which can
be used as good default values for different drivers.

This patch implements mapping examples documented in IEEE 802.1Q-2022 in
Annex I "I.3 Traffic type to traffic class mapping" and IETF DSCP naming
and mapping DSCP to Traffic Type inspired by RFC8325.

This helpers will be used in followup patches for dsa/microchip DCB
implementation.

Signed-off-by: Oleksij Rempel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
KSZ88X3 switches support up to 4 queues. Rework ksz8795_set_prio_queue()
to support KSZ8795 and KSZ88X3 families of switches.

Per default, configure KSZ88X3 to use one queue, since it need special
handling due to priority related errata. Errata handling is implemented
in a separate patch.

Signed-off-by: Oleksij Rempel <[email protected]>
Acked-by: Arun Ramadoss <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Add DCB support to configure app trust sources and default port priority.

Following commands can be used for testing:
dcb apptrust set dev lan1 order pcp dscp
dcb app replace dev lan1 default-prio 3

Since it is not possible to configure DSCP-Prio mapping per port, this
patch provide only ability to read switch global dscp-prio mapping and
way to enable/disable app trust for DSCP.

Signed-off-by: Oleksij Rempel <[email protected]>
Acked-by: Arun Ramadoss <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
KSZ88X3 switches have different behavior on different ports:
- It seems to be not possible to disable VLAN PCP classification on port
  2. It means, as soon as mutliqueue support is enabled, frames with
     VLAN tag will get PCP prios. This behavior do not affect Port 1 -
     it is possible to disable PCP prios.
- DSCP classification is not working on Port 2.

Since there are still usable configuration combinations, I added some
quirks to make sure user will get appropriate error message if not
possible configuration is chosen.

Signed-off-by: Oleksij Rempel <[email protected]>
Acked-by: Arun Ramadoss <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
I tested ETS support on KSZ9893, so it should work other KSZ989X
variants too, which was till not listed as support.

With this change we now officially not support only ksz8 family of
chips.

Signed-off-by: Oleksij Rempel <[email protected]>
Acked-by: Arun Ramadoss <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
…n KSZ8xxx variants

Init priority to queue mapping in the way as it shown in IEEE 802.1Q
mapping example.

Signed-off-by: Oleksij Rempel <[email protected]>
Acked-by: Arun Ramadoss <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
802.1P (PCP) and DiffServ (DSCP) are handled now by DCB code. Let it do
all needed initial configuration.

Signed-off-by: Oleksij Rempel <[email protected]>
Acked-by: Arun Ramadoss <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Some switches like Microchip KSZ variants do not support per port DSCP
priority configuration. Instead there is a global DSCP mapping table.

To handle it, we will accept set/del request to any of user ports to
make global configuration and update dcb app entries for all other
ports.

Signed-off-by: Oleksij Rempel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Microchip KSZ and LAN variants do not have per port DSCP priority
configuration. Instead there is a global DSCP mapping table.

This patch provides write access to this global DSCP map. In case entry
is "deleted", we map corresponding DSCP entry to a best effort prio,
which is expected to be the default priority for all untagged traffic.

Signed-off-by: Oleksij Rempel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Add tests covering following functionality on KSZ9477 switch family:
- default port priority
- global DSCP to Internal Priority Mapping
- apptrust configuration

This script was tested on KSZ9893R

Signed-off-by: Oleksij Rempel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Oleksij Rempel says:

====================
add DCB and DSCP support for KSZ switches

This patch series is aimed at improving support for DCB (Data Center
Bridging) and DSCP (Differentiated Services Code Point) on KSZ switches.

The main goal is to introduce global DSCP and PCP (Priority Code Point)
mapping support, addressing the limitation of KSZ switches not having
per-port DSCP priority mapping. This involves extending the DSA
framework with new callbacks for managing trust settings for global DSCP
and PCP maps. Additionally, we introduce IEEE 802.1q helpers for default
configurations, benefiting other drivers too.

Change logs are in separate patches.

Compared to v6 this series includes some new patches for DSCP global
mapping support and QoS selftest script for KSZ9477 switches.
====================

Signed-off-by: David S. Miller <[email protected]>
The change from skb_copy to pskb_copy unfortunately changed the data
copying to omit the ethernet header, since it was pulled before reaching
this point. Fix this by calling __skb_push/pull around pskb_copy.

Fixes: 59c878c ("net: bridge: fix multicast-to-unicast with fraglist GSO")
Signed-off-by: Felix Fietkau <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Introduce a tracepoint for icmp_send, which can help users to get more
detail information conveniently when icmp abnormal events happen.

1. Giving an usecase example:
=============================
When an application experiences packet loss due to an unreachable UDP
destination port, the kernel will send an exception message through the
icmp_send function. By adding a trace point for icmp_send, developers or
system administrators can obtain detailed information about the UDP
packet loss, including the type, code, source address, destination address,
source port, and destination port. This facilitates the trouble-shooting
of UDP packet loss issues especially for those network-service
applications.

2. Operation Instructions:
==========================
Switch to the tracing directory.
        cd /sys/kernel/tracing
Filter for destination port unreachable.
        echo "type==3 && code==3" > events/icmp/icmp_send/filter
Enable trace event.
        echo 1 > events/icmp/icmp_send/enable

3. Result View:
================
 udp_client_erro-11370   [002] ...s.12   124.728002:
 icmp_send: icmp_send: type=3, code=3.
 From 127.0.0.1:41895 to 127.0.0.1:6666 ulen=23
 skbaddr=00000000589b167a

Signed-off-by: Peilin He <[email protected]>
Signed-off-by: xu xin <[email protected]>
Reviewed-by: Yunkai Zhang <[email protected]>
Cc: Yang Yang <[email protected]>
Cc: Liu Chun <[email protected]>
Cc: Xuexin Jiang <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Russell King (Oracle) and others added 26 commits May 13, 2024 17:19
Introduce a mechanism whereby platforms can create their PCS instances
prior to the network device being published to userspace, but after
some of the core stmmac initialisation has been completed. This means
that the data structures that platforms need will be available.

Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Maxime Chevallier <[email protected]>
Reviewed-by: Serge Semin <[email protected]>
Co-developed-by: Romain Gantois <[email protected]>
Signed-off-by: Romain Gantois <[email protected]>
Reviewed-by: Hariprasad Kelam <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Use the newly introduced pcs_init() and pcs_exit() operations to
create and destroy the PCS instance at a more appropriate moment during
the driver lifecycle, thereby avoiding publishing a network device to
userspace that has not yet finished its PCS initialisation.

There are other similar issues with this driver which remain
unaddressed, but these are out of scope for this patch.

Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Maxime Chevallier <[email protected]>
[rgantois: removed second parameters of new callbacks]
Signed-off-by: Romain Gantois <[email protected]>
Reviewed-by: Hariprasad Kelam <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Add support for the Renesas RZ/N1 GMAC. This support can make use of a
custom RZ/N1 PCS which is fetched by parsing the pcs-handle device tree
property.

Signed-off-by: Clément Léger <[email protected]>
Co-developed-by: Romain Gantois <[email protected]>
Signed-off-by: Romain Gantois <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Hariprasad Kelam <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Romain Gantois says:

====================
net: stmmac: Add support for RZN1 GMAC devices

This is version seven of my series that adds support for a Gigabit Ethernet
controller featured in the Renesas r9a06g032 SoC, of the RZ/N1 family. This
GMAC device is based on a Synopsys IP and is compatible with the stmmac driver.

My former colleague Clément Léger originally sent a series for this driver,
but an issue in bringing up the PCS clock had blocked the upstreaming
process. This issue has since been resolved by the following series:

https://lore.kernel.org/all/[email protected]/

This series consists of a devicetree binding describing the RZN1 GMAC
controller IP, a node for the GMAC1 device in the r9a06g032 SoC device
tree, and the GMAC driver itself which is a glue layer in stmmac.

There are also two patches by Russell that improve pcs initialization handling
in stmmac.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
In this function, only updating the map can finish the job for socket
reset reason because the corresponding drop reasons are ready.

Signed-off-by: Jason Xing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Based on the existing skb drop reason, updating the rstreason map can
help us finish the rstreason job in this function.

Signed-off-by: Jason Xing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Like the previous patch does in this series, finish the conversion map is
enough to let rstreason mechanism work in this function.

Signed-off-by: Jason Xing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
There are two possible cases where TCP layer can send an RST. Since they
happen in the same place, I think using one independent reason is enough
to identify this special situation.

Signed-off-by: Jason Xing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
We're going to send an RST due to invalid syn packet which is already
checked whether 1) it is in sequence, 2) it is a retransmitted skb.

As RFC 793 says, if the state of socket is not CLOSED/LISTEN/SYN-SENT,
then we should send an RST when receiving bad syn packet:
"fourth, check the SYN bit,...If the SYN is in the window it is an
error, send a reset"

Signed-off-by: Jason Xing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Jason Xing says:

====================
tcp: support rstreasons in the passive logic

In this series, I split all kinds of reasons into five part which,
I think, can be easily reviewed. I respectively implement corresponding
rstreasons in those functions. After this, we can trace the whole tcp
passive reset with clear reasons.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
The "struct prestera_msg_vtcam_rule_add_req" uses a dynamically sized
set of trailing elements. Specifically, it uses an array of structures
of type "prestera_msg_acl_action actions_msg".

The "struct prestera_msg_flood_domain_ports_set_req" also uses a
dynamically sized set of trailing elements. Specifically, it uses an
array of structures of type "prestera_msg_acl_action actions_msg".

So, use the preferred way in the kernel declaring flexible arrays [1].

At the same time, prepare for the coming implementation by GCC and Clang
of the __counted_by attribute. Flexible array members annotated with
__counted_by can have their accesses bounds-checked at run-time via
CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for
strcpy/memcpy-family functions). In this case, it is important to note
that the attribute used is specifically __counted_by_le since the
counters are of type __le32.

The logic does not need to change since the counters for the flexible
arrays are asigned before any access to the arrays.

The order in which the structure prestera_msg_vtcam_rule_add_req and the
structure prestera_msg_flood_domain_ports_set_req are defined must be
changed to avoid incomplete type errors.

Also, avoid the open-coded arithmetic in memory allocator functions [2]
using the "struct_size" macro.

Moreover, the new structure members also allow us to avoid the open-
coded arithmetic on pointers. So, take advantage of this refactoring
accordingly.

This code was detected with the help of Coccinelle, and audited and
modified manually.

Link: https://www.kernel.org/doc/html/next/process/deprecated.html#zero-length-and-one-element-arrays [1]
Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [2]
Signed-off-by: Erick Archer <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/AS8PR02MB7237E8469568A59795F1F0408BE12@AS8PR02MB7237.eurprd02.prod.outlook.com
Signed-off-by: Jakub Kicinski <[email protected]>
Change the Kconfig dependency, so this driver can be built and run on ARM64
with 4K page size.
16/64K page sizes are not supported yet.

Signed-off-by: Haiyang Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
SO_KEEPALIVE support has to be set on each subflow: on each TCP socket,
where sk_prot->keepalive is defined. Technically, nothing has to be done
on the MPTCP socket. That's why mptcp_sol_socket_sync_intval() was
called instead of mptcp_sol_socket_intval().

Except that when nothing is done on the MPTCP socket, the
getsockopt(SO_KEEPALIVE), handled in net/core/sock.c:sk_getsockopt(),
will not know if SO_KEEPALIVE has been set on the different subflows or
not.

The fix is simple: simply call mptcp_sol_socket_intval() which will end
up calling net/core/sock.c:sk_setsockopt() where the SOCK_KEEPOPEN flag
will be set, the one used in sk_getsockopt().

So now, getsockopt(SO_KEEPALIVE) on an MPTCP socket will return the same
value as the one previously set with setsockopt(SO_KEEPALIVE).

Fixes: 1b3e7ed ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
SO_KEEPALIVE support has been added a while ago, as part of a series
"adding SOL_SOCKET" support. To have a full control of this keep-alive
feature, it is important to also support TCP_KEEP* socket options at the
SOL_TCP level.

Supporting them on the setsockopt() part is easy, it is just a matter of
remembering each value in the MPTCP sock structure, and calling
tcp_sock_set_keep*() helpers on each subflow. If the value is not
modified (0), calling these helpers will not do anything. For the
getsockopt() part, the corresponding value from the MPTCP sock structure
or the default one is simply returned. All of this is very similar to
other TCP_* socket options supported by MPTCP.

It looks important for kernels supporting SO_KEEPALIVE, to also support
TCP_KEEP* options as well: some apps seem to (wrongly) consider that if
the former is supported, the latter ones will be supported as well. But
also, not having this simple and isolated change is preventing MPTCP
support in some apps, and libraries like GoLang [1]. This is why this
patch is seen as a fix.

Closes: multipath-tcp/mptcp_net-next#383
Fixes: 1b3e7ed ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
Link: golang/go#56539 [1]
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Up to recently, it has been recommended to use getsockopt(MPTCP_INFO) to
check if a fallback to TCP happened, or if the client requested to use
MPTCP.

In this case, the userspace app is only interested by the returned value
of the getsocktop() call, and can then give 0 for the option length, and
NULL for the buffer address. An easy optimisation is then to stop early,
and avoid filling a local buffer -- which now requires two different
locks -- if it is not needed.

Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
The sysctl lists the available schedulers that can be set using
net.mptcp.scheduler similarly to net.ipv4.tcp_available_congestion_control.

Signed-off-by: Gregory Detal <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Tested-by: Geliang Tang <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
strcpy() performs no bounds checking on the destination buffer. This
could result in linear overflows beyond the end of the buffer, leading
to all kinds of misbehaviors. The safe replacement is strscpy() [1].

This is in preparation of a possible future step where all strcpy() uses
will be removed in favour of strscpy() [2].

This fixes CheckPatch warnings:

  WARNING: Prefer strscpy over strcpy

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1]
Link: KSPP/linux#88 [2]
Reviewed-by: Geliang Tang <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
The 'else' statements are not needed here, because their previous 'if'
block ends with a 'return'.

This fixes CheckPatch warnings:

  WARNING: else is not generally useful after a break or return

Reviewed-by: Geliang Tang <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Nothing from protocol.h depends on mptcp_pm_gen.h, only code from
pm_netlink.c and pm_userspace.c depends on it.

So this include can be moved where it is needed to avoid a "unused
includes" warning.

Reviewed-by: Geliang Tang <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
So this file is now self-contained: it can be compiled alone with
analytic tools.

Reviewed-by: Geliang Tang <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Mat Martineau says:

====================
mptcp: small improvements, fix and clean-ups

This series contain mostly unrelated patches:

- The two first patches can be seen as "fixes". They are part of this
  series for -next because it looks like the last batch of fixes for
  v6.9 has already been sent. These fixes are not urgent, so they can
  wait if an unlikely v6.9-rc8 is published. About the two patches:
    - Patch 1 fixes getsockopt(SO_KEEPALIVE) support on MPTCP sockets
    - Patch 2 makes sure the full TCP keep-alive feature is supported,
      not just SO_KEEPALIVE.

- Patch 3 is a small optimisation when getsockopt(MPTCP_INFO) is used
  without buffer, just to check if MPTCP is still being used: no
  fallback to TCP.

- Patch 4 adds net.mptcp.available_schedulers sysctl knob to list packet
  schedulers, similar to net.ipv4.tcp_available_congestion_control.

- Patch 5 and 6 fix CheckPatch warnings: "prefer strscpy over strcpy"
  and "else is not generally useful after a break or return".

- Patch 7 and 8 remove and add header includes to avoid unused ones, and
  add missing ones to be self-contained.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Reinitialize the whole EST structure would also reset the mutex
lock which is embedded in the EST structure, and then trigger
the following warning. To address this, move the lock to struct
stmmac_priv. We also need to reacquire the mutex lock when doing
this initialization.

DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 3 PID: 505 at kernel/locking/mutex.c:587 __mutex_lock+0xd84/0x1068
 Modules linked in:
 CPU: 3 PID: 505 Comm: tc Not tainted 6.9.0-rc6-00053-g0106679839f7-dirty #29
 Hardware name: NXP i.MX8MPlus EVK board (DT)
 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : __mutex_lock+0xd84/0x1068
 lr : __mutex_lock+0xd84/0x1068
 sp : ffffffc0864e3570
 x29: ffffffc0864e3570 x28: ffffffc0817bdc78 x27: 0000000000000003
 x26: ffffff80c54f1808 x25: ffffff80c9164080 x24: ffffffc080d723ac
 x23: 0000000000000000 x22: 0000000000000002 x21: 0000000000000000
 x20: 0000000000000000 x19: ffffffc083bc3000 x18: ffffffffffffffff
 x17: ffffffc08117b080 x16: 0000000000000002 x15: ffffff80d2d40000
 x14: 00000000000002da x13: ffffff80d2d404b8 x12: ffffffc082b5a5c8
 x11: ffffffc082bca680 x10: ffffffc082bb2640 x9 : ffffffc082bb2698
 x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001
 x5 : ffffff8178fe0d48 x4 : 0000000000000000 x3 : 0000000000000027
 x2 : ffffff8178fe0d50 x1 : 0000000000000000 x0 : 0000000000000000
 Call trace:
  __mutex_lock+0xd84/0x1068
  mutex_lock_nested+0x28/0x34
  tc_setup_taprio+0x118/0x68c
  stmmac_setup_tc+0x50/0xf0
  taprio_change+0x868/0xc9c

Fixes: b2aae65 ("net: stmmac: add mutex lock to protect est parameters")
Signed-off-by: Xiaolei Wang <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Serge Semin <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Move the EST structure to struct stmmac_priv, because the
EST configs don't look like platform config, but EST is
enabled in runtime with the settings retrieved for the TC
TAPRIO feature also in runtime. So it's better to have the
EST-data preserved in the driver private data instead of
the platform data storage.

Signed-off-by: Xiaolei Wang <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Serge Semin <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Xiaolei Wang says:

====================
Move EST lock and EST structure to struct stmmac_priv

1. Pulling the mutex protecting the EST structure out to avoid
    clearing it during reinit/memset of the EST structure,and
    reacquire the mutex lock when doing this initialization.

2. Moving the EST structure to a more logical location
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
The series is causing issues with PHY drivers built as modules.
Since it was only partially applied and the merge window has
opened let's revert and try again for v6.11.

Revert 6916e46 ("net: phy: Introduce ethernet link topology representation")
Revert 0ec5ed6 ("net: sfp: pass the phy_device when disconnecting an sfp module's PHY")
Revert e75e4e0 ("net: phy: add helpers to handle sfp phy connect/disconnect")
Revert fdd3539 ("net: sfp: Add helper to return the SFP bus name")
Revert 841942b ("net: ethtool: Allow passing a phy index for some commands")

Link: https://lore.kernel.org/all/171242462917.4000.9759453824684907063.git-patchwork-notify@kernel.org/
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
@htejun htejun requested a review from Byte-Lab May 17, 2024 16:59
@Byte-Lab Byte-Lab merged commit 9cfa965 into sched_ext May 17, 2024
1 check passed
@Byte-Lab Byte-Lab deleted the htejun/pull-bpf-for-next branch May 17, 2024 17:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.