* [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT
@ 2020-01-13 18:10 Toke Høiland-Jørgensen
2020-01-13 18:10 ` [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-13 18:10 UTC (permalink / raw
To: netdev
Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
Jesper Dangaard Brouer, Björn Töpel, John Fastabend
Since commit 96360004b862 ("xdp: Make devmap flush_list common for all map
instances"), devmap flushing is a global operation instead of tied to a
particular map. This means that with a bit of refactoring, we can finally fix
the performance delta between the bpf_redirect_map() and bpf_redirect() helper
functions, by introducing bulking for the latter as well.
This series makes this change by moving the data structure used for the bulking
into struct net_device itself, so we can access it even when there is not
devmap. Once this is done, moving the bpf_redirect() helper to use the bulking
mechanism becomes quite trivial, and brings bpf_redirect() up to the same as
bpf_redirect_map():
Before: After:
1 CPU:
bpf_redirect_map: 8.4 Mpps 8.4 Mpps (no change)
bpf_redirect: 5.0 Mpps 8.4 Mpps (+68%)
2 CPUs:
bpf_redirect_map: 15.9 Mpps 16.1 Mpps (+1% or ~no change)
bpf_redirect: 9.5 Mpps 15.9 Mpps (+67%)
After this patch series, the only semantics different between the two variants
of the bpf() helper (apart from the absence of a map argument, obviously) is
that the _map() variant will return an error if passed an invalid map index,
whereas the bpf_redirect() helper will succeed, but drop packets on
xdp_do_redirect(). This is because the helper has no reference to the calling
netdev, so unfortunately we can't do the ifindex lookup directly in the helper.
Changelog:
v2:
- Consolidate code paths and tracepoints for map and non-map redirect variants
(Björn)
- Add performance data for 2-CPU test (Jesper)
- Move fields to avoid shifting cache lines in struct net_device (Eric)
---
Toke Høiland-Jørgensen (2):
xdp: Move devmap bulk queue into struct net_device
xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths
include/linux/bpf.h | 13 +++++-
include/linux/netdevice.h | 11 +++--
include/trace/events/xdp.h | 104 +++++++++++++++++++-------------------------
kernel/bpf/devmap.c | 94 +++++++++++++++++++++-------------------
net/core/dev.c | 2 +
net/core/filter.c | 86 +++++++-----------------------------
6 files changed, 132 insertions(+), 178 deletions(-)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device
2020-01-13 18:10 [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen
@ 2020-01-13 18:10 ` Toke Høiland-Jørgensen
2020-01-15 19:45 ` John Fastabend
2020-01-15 20:17 ` Jesper Dangaard Brouer
2020-01-13 18:10 ` [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths Toke Høiland-Jørgensen
2020-01-14 17:47 ` [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Alexei Starovoitov
2 siblings, 2 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-13 18:10 UTC (permalink / raw
To: netdev
Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
Jesper Dangaard Brouer, Björn Töpel, John Fastabend
From: Toke Høiland-Jørgensen <toke@redhat.com>
Commit 96360004b862 ("xdp: Make devmap flush_list common for all map
instances"), changed devmap flushing to be a global operation instead of a
per-map operation. However, the queue structure used for bulking was still
allocated as part of the containing map.
This patch moves the devmap bulk queue into struct net_device. The
motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
which will be changed in a subsequent commit. To avoid other fields of
struct net_device moving to different cache lines, we also move a couple of
other members around.
We defer the actual allocation of the bulk queue structure until the
NETDEV_REGISTER notification devmap.c. This makes it possible to check for
ndo_xdp_xmit support before allocating the structure, which is not possible
at the time struct net_device is allocated. However, we keep the freeing in
free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.
Because of this change, we lose the reference back to the map that
originated the redirect, so change the tracepoint to always return 0 as the
map ID and index. Otherwise no functional change is intended with this
patch.
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
include/linux/netdevice.h | 11 +++++---
include/trace/events/xdp.h | 2 +
kernel/bpf/devmap.c | 63 +++++++++++++++++++-------------------------
net/core/dev.c | 2 +
4 files changed, 37 insertions(+), 41 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2741aa35bec6..1f24405c1ec5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -876,6 +876,7 @@ enum bpf_netdev_command {
struct bpf_prog_offload_ops;
struct netlink_ext_ack;
struct xdp_umem;
+struct xdp_dev_bulk_queue;
struct netdev_bpf {
enum bpf_netdev_command command;
@@ -1986,12 +1987,10 @@ struct net_device {
unsigned int num_tx_queues;
unsigned int real_num_tx_queues;
struct Qdisc *qdisc;
-#ifdef CONFIG_NET_SCHED
- DECLARE_HASHTABLE (qdisc_hash, 4);
-#endif
unsigned int tx_queue_len;
spinlock_t tx_global_lock;
- int watchdog_timeo;
+
+ struct xdp_dev_bulk_queue __percpu *xdp_bulkq;
#ifdef CONFIG_XPS
struct xps_dev_maps __rcu *xps_cpus_map;
@@ -2001,8 +2000,12 @@ struct net_device {
struct mini_Qdisc __rcu *miniq_egress;
#endif
+#ifdef CONFIG_NET_SCHED
+ DECLARE_HASHTABLE (qdisc_hash, 4);
+#endif
/* These may be needed for future network-power-down code. */
struct timer_list watchdog_timer;
+ int watchdog_timeo;
int __percpu *pcpu_refcnt;
struct list_head todo_list;
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index a7378bcd9928..72bad13d4a3c 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -278,7 +278,7 @@ TRACE_EVENT(xdp_devmap_xmit,
),
TP_fast_assign(
- __entry->map_id = map->id;
+ __entry->map_id = map ? map->id : 0;
__entry->act = XDP_REDIRECT;
__entry->map_index = map_index;
__entry->drops = drops;
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index da9c832fc5c8..030d125c3839 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -53,13 +53,11 @@
(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
#define DEV_MAP_BULK_SIZE 16
-struct bpf_dtab_netdev;
-
-struct xdp_bulk_queue {
+struct xdp_dev_bulk_queue {
struct xdp_frame *q[DEV_MAP_BULK_SIZE];
struct list_head flush_node;
+ struct net_device *dev;
struct net_device *dev_rx;
- struct bpf_dtab_netdev *obj;
unsigned int count;
};
@@ -67,9 +65,8 @@ struct bpf_dtab_netdev {
struct net_device *dev; /* must be first member, due to tracepoint */
struct hlist_node index_hlist;
struct bpf_dtab *dtab;
- struct xdp_bulk_queue __percpu *bulkq;
struct rcu_head rcu;
- unsigned int idx; /* keep track of map index for tracepoint */
+ unsigned int idx;
};
struct bpf_dtab {
@@ -219,7 +216,6 @@ static void dev_map_free(struct bpf_map *map)
hlist_for_each_entry_safe(dev, next, head, index_hlist) {
hlist_del_rcu(&dev->index_hlist);
- free_percpu(dev->bulkq);
dev_put(dev->dev);
kfree(dev);
}
@@ -234,7 +230,6 @@ static void dev_map_free(struct bpf_map *map)
if (!dev)
continue;
- free_percpu(dev->bulkq);
dev_put(dev->dev);
kfree(dev);
}
@@ -320,10 +315,9 @@ static int dev_map_hash_get_next_key(struct bpf_map *map, void *key,
return -ENOENT;
}
-static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
+static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
{
- struct bpf_dtab_netdev *obj = bq->obj;
- struct net_device *dev = obj->dev;
+ struct net_device *dev = bq->dev;
int sent = 0, drops = 0, err = 0;
int i;
@@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
out:
bq->count = 0;
- trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx,
- sent, drops, bq->dev_rx, dev, err);
+ trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err);
bq->dev_rx = NULL;
__list_del_clearprev(&bq->flush_node);
return 0;
@@ -374,7 +367,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
void __dev_map_flush(void)
{
struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
- struct xdp_bulk_queue *bq, *tmp;
+ struct xdp_dev_bulk_queue *bq, *tmp;
rcu_read_lock();
list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
@@ -401,12 +394,12 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
/* Runs under RCU-read-side, plus in softirq under NAPI protection.
* Thus, safe percpu variable access.
*/
-static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf,
+static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
struct net_device *dev_rx)
{
struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
- struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
+ struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq);
if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
bq_xmit_all(bq, 0);
@@ -444,7 +437,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
if (unlikely(!xdpf))
return -EOVERFLOW;
- return bq_enqueue(dst, xdpf, dev_rx);
+ return bq_enqueue(dev, xdpf, dev_rx);
}
int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
@@ -483,7 +476,6 @@ static void __dev_map_entry_free(struct rcu_head *rcu)
struct bpf_dtab_netdev *dev;
dev = container_of(rcu, struct bpf_dtab_netdev, rcu);
- free_percpu(dev->bulkq);
dev_put(dev->dev);
kfree(dev);
}
@@ -538,30 +530,15 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
u32 ifindex,
unsigned int idx)
{
- gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
struct bpf_dtab_netdev *dev;
- struct xdp_bulk_queue *bq;
- int cpu;
- dev = kmalloc_node(sizeof(*dev), gfp, dtab->map.numa_node);
+ dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN,
+ dtab->map.numa_node);
if (!dev)
return ERR_PTR(-ENOMEM);
- dev->bulkq = __alloc_percpu_gfp(sizeof(*dev->bulkq),
- sizeof(void *), gfp);
- if (!dev->bulkq) {
- kfree(dev);
- return ERR_PTR(-ENOMEM);
- }
-
- for_each_possible_cpu(cpu) {
- bq = per_cpu_ptr(dev->bulkq, cpu);
- bq->obj = dev;
- }
-
dev->dev = dev_get_by_index(net, ifindex);
if (!dev->dev) {
- free_percpu(dev->bulkq);
kfree(dev);
return ERR_PTR(-EINVAL);
}
@@ -721,9 +698,23 @@ static int dev_map_notification(struct notifier_block *notifier,
{
struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
struct bpf_dtab *dtab;
- int i;
+ int i, cpu;
switch (event) {
+ case NETDEV_REGISTER:
+ if (!netdev->netdev_ops->ndo_xdp_xmit || netdev->xdp_bulkq)
+ break;
+
+ /* will be freed in free_netdev() */
+ netdev->xdp_bulkq =
+ __alloc_percpu_gfp(sizeof(struct xdp_dev_bulk_queue),
+ sizeof(void *), GFP_ATOMIC);
+ if (!netdev->xdp_bulkq)
+ return NOTIFY_BAD;
+
+ for_each_possible_cpu(cpu)
+ per_cpu_ptr(netdev->xdp_bulkq, cpu)->dev = netdev;
+ break;
case NETDEV_UNREGISTER:
/* This rcu_read_lock/unlock pair is needed because
* dev_map_list is an RCU list AND to ensure a delete
diff --git a/net/core/dev.c b/net/core/dev.c
index d99f88c58636..e7802a41ae7f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9847,6 +9847,8 @@ void free_netdev(struct net_device *dev)
free_percpu(dev->pcpu_refcnt);
dev->pcpu_refcnt = NULL;
+ free_percpu(dev->xdp_bulkq);
+ dev->xdp_bulkq = NULL;
netdev_unregister_lockdep_key(dev);
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths
2020-01-13 18:10 [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen
2020-01-13 18:10 ` [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen
@ 2020-01-13 18:10 ` Toke Høiland-Jørgensen
2020-01-15 12:16 ` Maciej Fijalkowski
2020-01-15 19:43 ` John Fastabend
2020-01-14 17:47 ` [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Alexei Starovoitov
2 siblings, 2 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-13 18:10 UTC (permalink / raw
To: netdev
Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
Jesper Dangaard Brouer, Björn Töpel, John Fastabend
From: Toke Høiland-Jørgensen <toke@redhat.com>
Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
we can re-use the bulking for the non-map version of the bpf_redirect()
helper. This is a simple matter of having xdp_do_redirect_slow() queue the
frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
Unfortunately we can't make the bpf_redirect() helper return an error if
the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
have a reference to the network namespace of the ingress device at the time
the helper is called. So we have to leave it as-is and keep the device
lookup in xdp_do_redirect_slow().
Since this leaves less reason to have the non-map redirect code in a
separate function, so we get rid of the xdp_do_redirect_slow() function
entirely. This does lose us the tracepoint disambiguation, but fortunately
the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
entry structures. This means both can contain a map index, so we can just
amend the tracepoint definitions so we always emit the xdp_redirect(_err)
tracepoints, but with the map ID only populated if a map is present. This
means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
the definitions around in case someone is still listening for them.
With this change, the performance of the xdp_redirect sample program goes
from 5Mpps to 8.4Mpps (a 68% increase).
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
include/linux/bpf.h | 13 +++++-
include/trace/events/xdp.h | 102 +++++++++++++++++++-------------------------
kernel/bpf/devmap.c | 31 +++++++++----
net/core/filter.c | 86 +++++++------------------------------
4 files changed, 95 insertions(+), 137 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b14e51d56a82..25c050202536 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -962,7 +962,9 @@ struct sk_buff;
struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key);
-void __dev_map_flush(void);
+void __dev_flush(void);
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+ struct net_device *dev_rx);
int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
struct net_device *dev_rx);
int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
@@ -1071,13 +1073,20 @@ static inline struct net_device *__dev_map_hash_lookup_elem(struct bpf_map *map
return NULL;
}
-static inline void __dev_map_flush(void)
+static inline void __dev_flush(void)
{
}
struct xdp_buff;
struct bpf_dtab_netdev;
+static inline
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+ struct net_device *dev_rx)
+{
+ return 0;
+}
+
static inline
int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
struct net_device *dev_rx)
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 72bad13d4a3c..cf568a38f852 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -79,14 +79,27 @@ TRACE_EVENT(xdp_bulk_tx,
__entry->sent, __entry->drops, __entry->err)
);
+#ifndef __DEVMAP_OBJ_TYPE
+#define __DEVMAP_OBJ_TYPE
+struct _bpf_dtab_netdev {
+ struct net_device *dev;
+};
+#endif /* __DEVMAP_OBJ_TYPE */
+
+#define devmap_ifindex(tgt, map) \
+ (((map->map_type == BPF_MAP_TYPE_DEVMAP || \
+ map->map_type == BPF_MAP_TYPE_DEVMAP_HASH)) ? \
+ ((struct _bpf_dtab_netdev *)tgt)->dev->ifindex : 0)
+
+
DECLARE_EVENT_CLASS(xdp_redirect_template,
TP_PROTO(const struct net_device *dev,
const struct bpf_prog *xdp,
- int to_ifindex, int err,
- const struct bpf_map *map, u32 map_index),
+ const void *tgt, int err,
+ const struct bpf_map *map, u32 index),
- TP_ARGS(dev, xdp, to_ifindex, err, map, map_index),
+ TP_ARGS(dev, xdp, tgt, err, map, index),
TP_STRUCT__entry(
__field(int, prog_id)
@@ -103,90 +116,65 @@ DECLARE_EVENT_CLASS(xdp_redirect_template,
__entry->act = XDP_REDIRECT;
__entry->ifindex = dev->ifindex;
__entry->err = err;
- __entry->to_ifindex = to_ifindex;
+ __entry->to_ifindex = map ? devmap_ifindex(tgt, map) :
+ index;
__entry->map_id = map ? map->id : 0;
- __entry->map_index = map_index;
+ __entry->map_index = map ? index : 0;
),
- TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d",
+ TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d"
+ " map_id=%d map_index=%d",
__entry->prog_id,
__print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
__entry->ifindex, __entry->to_ifindex,
- __entry->err)
+ __entry->err, __entry->map_id, __entry->map_index)
);
DEFINE_EVENT(xdp_redirect_template, xdp_redirect,
TP_PROTO(const struct net_device *dev,
const struct bpf_prog *xdp,
- int to_ifindex, int err,
- const struct bpf_map *map, u32 map_index),
- TP_ARGS(dev, xdp, to_ifindex, err, map, map_index)
+ const void *tgt, int err,
+ const struct bpf_map *map, u32 index),
+ TP_ARGS(dev, xdp, tgt, err, map, index)
);
DEFINE_EVENT(xdp_redirect_template, xdp_redirect_err,
TP_PROTO(const struct net_device *dev,
const struct bpf_prog *xdp,
- int to_ifindex, int err,
- const struct bpf_map *map, u32 map_index),
- TP_ARGS(dev, xdp, to_ifindex, err, map, map_index)
+ const void *tgt, int err,
+ const struct bpf_map *map, u32 index),
+ TP_ARGS(dev, xdp, tgt, err, map, index)
);
#define _trace_xdp_redirect(dev, xdp, to) \
- trace_xdp_redirect(dev, xdp, to, 0, NULL, 0);
+ trace_xdp_redirect(dev, xdp, NULL, 0, NULL, to);
#define _trace_xdp_redirect_err(dev, xdp, to, err) \
- trace_xdp_redirect_err(dev, xdp, to, err, NULL, 0);
+ trace_xdp_redirect_err(dev, xdp, NULL, err, NULL, to);
+
+#define _trace_xdp_redirect_map(dev, xdp, to, map, index) \
+ trace_xdp_redirect(dev, xdp, to, 0, map, index);
-DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map,
+#define _trace_xdp_redirect_map_err(dev, xdp, to, map, index, err) \
+ trace_xdp_redirect_err(dev, xdp, to, err, map, index);
+
+/* not used anymore, but kept around so as not to break old programs */
+DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map,
TP_PROTO(const struct net_device *dev,
const struct bpf_prog *xdp,
- int to_ifindex, int err,
- const struct bpf_map *map, u32 map_index),
- TP_ARGS(dev, xdp, to_ifindex, err, map, map_index),
- TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d"
- " map_id=%d map_index=%d",
- __entry->prog_id,
- __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
- __entry->ifindex, __entry->to_ifindex,
- __entry->err,
- __entry->map_id, __entry->map_index)
+ const void *tgt, int err,
+ const struct bpf_map *map, u32 index),
+ TP_ARGS(dev, xdp, tgt, err, map, index)
);
-DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map_err,
+DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map_err,
TP_PROTO(const struct net_device *dev,
const struct bpf_prog *xdp,
- int to_ifindex, int err,
- const struct bpf_map *map, u32 map_index),
- TP_ARGS(dev, xdp, to_ifindex, err, map, map_index),
- TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d"
- " map_id=%d map_index=%d",
- __entry->prog_id,
- __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
- __entry->ifindex, __entry->to_ifindex,
- __entry->err,
- __entry->map_id, __entry->map_index)
+ const void *tgt, int err,
+ const struct bpf_map *map, u32 index),
+ TP_ARGS(dev, xdp, tgt, err, map, index)
);
-#ifndef __DEVMAP_OBJ_TYPE
-#define __DEVMAP_OBJ_TYPE
-struct _bpf_dtab_netdev {
- struct net_device *dev;
-};
-#endif /* __DEVMAP_OBJ_TYPE */
-
-#define devmap_ifindex(fwd, map) \
- ((map->map_type == BPF_MAP_TYPE_DEVMAP || \
- map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) ? \
- ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0)
-
-#define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx) \
- trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map), \
- 0, map, idx)
-
-#define _trace_xdp_redirect_map_err(dev, xdp, fwd, map, idx, err) \
- trace_xdp_redirect_map_err(dev, xdp, devmap_ifindex(fwd, map), \
- err, map, idx)
-
TRACE_EVENT(xdp_cpumap_kthread,
TP_PROTO(int map_id, unsigned int processed, unsigned int drops,
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 030d125c3839..db32272c4f77 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -81,7 +81,7 @@ struct bpf_dtab {
u32 n_buckets;
};
-static DEFINE_PER_CPU(struct list_head, dev_map_flush_list);
+static DEFINE_PER_CPU(struct list_head, dev_flush_list);
static DEFINE_SPINLOCK(dev_map_lock);
static LIST_HEAD(dev_map_list);
@@ -357,16 +357,16 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
goto out;
}
-/* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled
+/* __dev_flush is called from xdp_do_flush_map() which _must_ be signaled
* from the driver before returning from its napi->poll() routine. The poll()
* routine is called either from busy_poll context or net_rx_action signaled
* from NET_RX_SOFTIRQ. Either way the poll routine must complete before the
* net device can be torn down. On devmap tear down we ensure the flush list
* is empty before completing to ensure all flush operations have completed.
*/
-void __dev_map_flush(void)
+void __dev_flush(void)
{
- struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
+ struct list_head *flush_list = this_cpu_ptr(&dev_flush_list);
struct xdp_dev_bulk_queue *bq, *tmp;
rcu_read_lock();
@@ -398,7 +398,7 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
struct net_device *dev_rx)
{
- struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
+ struct list_head *flush_list = this_cpu_ptr(&dev_flush_list);
struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq);
if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
@@ -419,10 +419,9 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
return 0;
}
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
- struct net_device *dev_rx)
+static inline int _xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+ struct net_device *dev_rx)
{
- struct net_device *dev = dst->dev;
struct xdp_frame *xdpf;
int err;
@@ -440,6 +439,20 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
return bq_enqueue(dev, xdpf, dev_rx);
}
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+ struct net_device *dev_rx)
+{
+ return _xdp_enqueue(dev, xdp, dev_rx);
+}
+
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+ struct net_device *dev_rx)
+{
+ struct net_device *dev = dst->dev;
+
+ return _xdp_enqueue(dev, xdp, dev_rx);
+}
+
int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
struct bpf_prog *xdp_prog)
{
@@ -762,7 +775,7 @@ static int __init dev_map_init(void)
register_netdevice_notifier(&dev_map_notifier);
for_each_possible_cpu(cpu)
- INIT_LIST_HEAD(&per_cpu(dev_map_flush_list, cpu));
+ INIT_LIST_HEAD(&per_cpu(dev_flush_list, cpu));
return 0;
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 42fd17c48c5f..f023f3a8f351 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3458,58 +3458,6 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = {
.arg2_type = ARG_ANYTHING,
};
-static int __bpf_tx_xdp(struct net_device *dev,
- struct bpf_map *map,
- struct xdp_buff *xdp,
- u32 index)
-{
- struct xdp_frame *xdpf;
- int err, sent;
-
- if (!dev->netdev_ops->ndo_xdp_xmit) {
- return -EOPNOTSUPP;
- }
-
- err = xdp_ok_fwd_dev(dev, xdp->data_end - xdp->data);
- if (unlikely(err))
- return err;
-
- xdpf = convert_to_xdp_frame(xdp);
- if (unlikely(!xdpf))
- return -EOVERFLOW;
-
- sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf, XDP_XMIT_FLUSH);
- if (sent <= 0)
- return sent;
- return 0;
-}
-
-static noinline int
-xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog, struct bpf_redirect_info *ri)
-{
- struct net_device *fwd;
- u32 index = ri->tgt_index;
- int err;
-
- fwd = dev_get_by_index_rcu(dev_net(dev), index);
- ri->tgt_index = 0;
- if (unlikely(!fwd)) {
- err = -EINVAL;
- goto err;
- }
-
- err = __bpf_tx_xdp(fwd, NULL, xdp, 0);
- if (unlikely(err))
- goto err;
-
- _trace_xdp_redirect(dev, xdp_prog, index);
- return 0;
-err:
- _trace_xdp_redirect_err(dev, xdp_prog, index, err);
- return err;
-}
-
static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
struct bpf_map *map, struct xdp_buff *xdp)
{
@@ -3529,7 +3477,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
void xdp_do_flush_map(void)
{
- __dev_map_flush();
+ __dev_flush();
__cpu_map_flush();
__xsk_map_flush();
}
@@ -3568,10 +3516,11 @@ void bpf_clear_redirect_map(struct bpf_map *map)
}
}
-static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog, struct bpf_map *map,
- struct bpf_redirect_info *ri)
+int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
+ struct bpf_prog *xdp_prog)
{
+ struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+ struct bpf_map *map = READ_ONCE(ri->map);
u32 index = ri->tgt_index;
void *fwd = ri->tgt_value;
int err;
@@ -3580,7 +3529,18 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
ri->tgt_value = NULL;
WRITE_ONCE(ri->map, NULL);
- err = __bpf_tx_xdp_map(dev, fwd, map, xdp);
+ if (unlikely(!map)) {
+ fwd = dev_get_by_index_rcu(dev_net(dev), index);
+ if (unlikely(!fwd)) {
+ err = -EINVAL;
+ goto err;
+ }
+
+ err = dev_xdp_enqueue(fwd, xdp, dev);
+ } else {
+ err = __bpf_tx_xdp_map(dev, fwd, map, xdp);
+ }
+
if (unlikely(err))
goto err;
@@ -3590,18 +3550,6 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
_trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map, index, err);
return err;
}
-
-int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog)
-{
- struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
- struct bpf_map *map = READ_ONCE(ri->map);
-
- if (likely(map))
- return xdp_do_redirect_map(dev, xdp, xdp_prog, map, ri);
-
- return xdp_do_redirect_slow(dev, xdp, xdp_prog, ri);
-}
EXPORT_SYMBOL_GPL(xdp_do_redirect);
static int xdp_do_generic_redirect_map(struct net_device *dev,
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT
2020-01-13 18:10 [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen
2020-01-13 18:10 ` [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen
2020-01-13 18:10 ` [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths Toke Høiland-Jørgensen
@ 2020-01-14 17:47 ` Alexei Starovoitov
2020-01-15 17:49 ` John Fastabend
2 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2020-01-14 17:47 UTC (permalink / raw
To: Toke Høiland-Jørgensen
Cc: Network Development, bpf, Daniel Borkmann, Alexei Starovoitov,
David Miller, Jesper Dangaard Brouer, Björn Töpel,
John Fastabend
On Mon, Jan 13, 2020 at 10:11 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Since commit 96360004b862 ("xdp: Make devmap flush_list common for all map
> instances"), devmap flushing is a global operation instead of tied to a
> particular map. This means that with a bit of refactoring, we can finally fix
> the performance delta between the bpf_redirect_map() and bpf_redirect() helper
> functions, by introducing bulking for the latter as well.
>
> This series makes this change by moving the data structure used for the bulking
> into struct net_device itself, so we can access it even when there is not
> devmap. Once this is done, moving the bpf_redirect() helper to use the bulking
> mechanism becomes quite trivial, and brings bpf_redirect() up to the same as
> bpf_redirect_map():
>
> Before: After:
> 1 CPU:
> bpf_redirect_map: 8.4 Mpps 8.4 Mpps (no change)
> bpf_redirect: 5.0 Mpps 8.4 Mpps (+68%)
> 2 CPUs:
> bpf_redirect_map: 15.9 Mpps 16.1 Mpps (+1% or ~no change)
> bpf_redirect: 9.5 Mpps 15.9 Mpps (+67%)
>
> After this patch series, the only semantics different between the two variants
> of the bpf() helper (apart from the absence of a map argument, obviously) is
> that the _map() variant will return an error if passed an invalid map index,
> whereas the bpf_redirect() helper will succeed, but drop packets on
> xdp_do_redirect(). This is because the helper has no reference to the calling
> netdev, so unfortunately we can't do the ifindex lookup directly in the helper.
>
> Changelog:
>
> v2:
> - Consolidate code paths and tracepoints for map and non-map redirect variants
> (Björn)
> - Add performance data for 2-CPU test (Jesper)
> - Move fields to avoid shifting cache lines in struct net_device (Eric)
John, since you commented on v1 please review this v2. Thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths
2020-01-13 18:10 ` [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths Toke Høiland-Jørgensen
@ 2020-01-15 12:16 ` Maciej Fijalkowski
2020-01-15 19:43 ` John Fastabend
1 sibling, 0 replies; 13+ messages in thread
From: Maciej Fijalkowski @ 2020-01-15 12:16 UTC (permalink / raw
To: Toke Høiland-Jørgensen
Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
Jesper Dangaard Brouer, Björn Töpel, John Fastabend
On Mon, Jan 13, 2020 at 07:10:56PM +0100, Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
>
> Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
> we can re-use the bulking for the non-map version of the bpf_redirect()
> helper. This is a simple matter of having xdp_do_redirect_slow() queue the
> frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
>
> Unfortunately we can't make the bpf_redirect() helper return an error if
> the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
> have a reference to the network namespace of the ingress device at the time
> the helper is called. So we have to leave it as-is and keep the device
> lookup in xdp_do_redirect_slow().
>
> Since this leaves less reason to have the non-map redirect code in a
> separate function, so we get rid of the xdp_do_redirect_slow() function
> entirely. This does lose us the tracepoint disambiguation, but fortunately
> the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
> entry structures. This means both can contain a map index, so we can just
> amend the tracepoint definitions so we always emit the xdp_redirect(_err)
> tracepoints, but with the map ID only populated if a map is present. This
> means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
> the definitions around in case someone is still listening for them.
>
> With this change, the performance of the xdp_redirect sample program goes
> from 5Mpps to 8.4Mpps (a 68% increase).
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
> include/linux/bpf.h | 13 +++++-
> include/trace/events/xdp.h | 102 +++++++++++++++++++-------------------------
> kernel/bpf/devmap.c | 31 +++++++++----
> net/core/filter.c | 86 +++++++------------------------------
> 4 files changed, 95 insertions(+), 137 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index b14e51d56a82..25c050202536 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -962,7 +962,9 @@ struct sk_buff;
>
> struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
> struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key);
> -void __dev_map_flush(void);
> +void __dev_flush(void);
> +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
> + struct net_device *dev_rx);
> int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
> struct net_device *dev_rx);
> int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
> @@ -1071,13 +1073,20 @@ static inline struct net_device *__dev_map_hash_lookup_elem(struct bpf_map *map
> return NULL;
> }
>
> -static inline void __dev_map_flush(void)
> +static inline void __dev_flush(void)
> {
> }
>
> struct xdp_buff;
> struct bpf_dtab_netdev;
>
> +static inline
> +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
> + struct net_device *dev_rx)
> +{
> + return 0;
> +}
> +
> static inline
> int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
> struct net_device *dev_rx)
> diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
> index 72bad13d4a3c..cf568a38f852 100644
> --- a/include/trace/events/xdp.h
> +++ b/include/trace/events/xdp.h
> @@ -79,14 +79,27 @@ TRACE_EVENT(xdp_bulk_tx,
> __entry->sent, __entry->drops, __entry->err)
> );
>
> +#ifndef __DEVMAP_OBJ_TYPE
> +#define __DEVMAP_OBJ_TYPE
> +struct _bpf_dtab_netdev {
> + struct net_device *dev;
> +};
> +#endif /* __DEVMAP_OBJ_TYPE */
> +
> +#define devmap_ifindex(tgt, map) \
> + (((map->map_type == BPF_MAP_TYPE_DEVMAP || \
> + map->map_type == BPF_MAP_TYPE_DEVMAP_HASH)) ? \
> + ((struct _bpf_dtab_netdev *)tgt)->dev->ifindex : 0)
> +
Delete one blank line
> +
> DECLARE_EVENT_CLASS(xdp_redirect_template,
>
> TP_PROTO(const struct net_device *dev,
> const struct bpf_prog *xdp,
> - int to_ifindex, int err,
> - const struct bpf_map *map, u32 map_index),
> + const void *tgt, int err,
> + const struct bpf_map *map, u32 index),
>
> - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index),
> + TP_ARGS(dev, xdp, tgt, err, map, index),
>
> TP_STRUCT__entry(
> __field(int, prog_id)
> @@ -103,90 +116,65 @@ DECLARE_EVENT_CLASS(xdp_redirect_template,
> __entry->act = XDP_REDIRECT;
> __entry->ifindex = dev->ifindex;
> __entry->err = err;
> - __entry->to_ifindex = to_ifindex;
> + __entry->to_ifindex = map ? devmap_ifindex(tgt, map) :
> + index;
> __entry->map_id = map ? map->id : 0;
> - __entry->map_index = map_index;
> + __entry->map_index = map ? index : 0;
> ),
>
> - TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d",
> + TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d"
> + " map_id=%d map_index=%d",
> __entry->prog_id,
> __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
> __entry->ifindex, __entry->to_ifindex,
> - __entry->err)
> + __entry->err, __entry->map_id, __entry->map_index)
> );
>
> DEFINE_EVENT(xdp_redirect_template, xdp_redirect,
> TP_PROTO(const struct net_device *dev,
> const struct bpf_prog *xdp,
> - int to_ifindex, int err,
> - const struct bpf_map *map, u32 map_index),
> - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index)
> + const void *tgt, int err,
> + const struct bpf_map *map, u32 index),
> + TP_ARGS(dev, xdp, tgt, err, map, index)
> );
>
> DEFINE_EVENT(xdp_redirect_template, xdp_redirect_err,
> TP_PROTO(const struct net_device *dev,
> const struct bpf_prog *xdp,
> - int to_ifindex, int err,
> - const struct bpf_map *map, u32 map_index),
> - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index)
> + const void *tgt, int err,
> + const struct bpf_map *map, u32 index),
> + TP_ARGS(dev, xdp, tgt, err, map, index)
> );
>
> #define _trace_xdp_redirect(dev, xdp, to) \
> - trace_xdp_redirect(dev, xdp, to, 0, NULL, 0);
> + trace_xdp_redirect(dev, xdp, NULL, 0, NULL, to);
>
> #define _trace_xdp_redirect_err(dev, xdp, to, err) \
> - trace_xdp_redirect_err(dev, xdp, to, err, NULL, 0);
> + trace_xdp_redirect_err(dev, xdp, NULL, err, NULL, to);
> +
> +#define _trace_xdp_redirect_map(dev, xdp, to, map, index) \
> + trace_xdp_redirect(dev, xdp, to, 0, map, index);
>
> -DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map,
> +#define _trace_xdp_redirect_map_err(dev, xdp, to, map, index, err) \
> + trace_xdp_redirect_err(dev, xdp, to, err, map, index);
> +
> +/* not used anymore, but kept around so as not to break old programs */
> +DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map,
> TP_PROTO(const struct net_device *dev,
> const struct bpf_prog *xdp,
> - int to_ifindex, int err,
> - const struct bpf_map *map, u32 map_index),
> - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index),
> - TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d"
> - " map_id=%d map_index=%d",
> - __entry->prog_id,
> - __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
> - __entry->ifindex, __entry->to_ifindex,
> - __entry->err,
> - __entry->map_id, __entry->map_index)
> + const void *tgt, int err,
> + const struct bpf_map *map, u32 index),
> + TP_ARGS(dev, xdp, tgt, err, map, index)
> );
>
> -DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map_err,
> +DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map_err,
> TP_PROTO(const struct net_device *dev,
> const struct bpf_prog *xdp,
> - int to_ifindex, int err,
> - const struct bpf_map *map, u32 map_index),
> - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index),
> - TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d"
> - " map_id=%d map_index=%d",
> - __entry->prog_id,
> - __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
> - __entry->ifindex, __entry->to_ifindex,
> - __entry->err,
> - __entry->map_id, __entry->map_index)
> + const void *tgt, int err,
> + const struct bpf_map *map, u32 index),
> + TP_ARGS(dev, xdp, tgt, err, map, index)
> );
>
> -#ifndef __DEVMAP_OBJ_TYPE
> -#define __DEVMAP_OBJ_TYPE
> -struct _bpf_dtab_netdev {
> - struct net_device *dev;
> -};
> -#endif /* __DEVMAP_OBJ_TYPE */
> -
> -#define devmap_ifindex(fwd, map) \
> - ((map->map_type == BPF_MAP_TYPE_DEVMAP || \
> - map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) ? \
> - ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0)
> -
> -#define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx) \
> - trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map), \
> - 0, map, idx)
> -
> -#define _trace_xdp_redirect_map_err(dev, xdp, fwd, map, idx, err) \
> - trace_xdp_redirect_map_err(dev, xdp, devmap_ifindex(fwd, map), \
> - err, map, idx)
> -
> TRACE_EVENT(xdp_cpumap_kthread,
>
> TP_PROTO(int map_id, unsigned int processed, unsigned int drops,
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index 030d125c3839..db32272c4f77 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -81,7 +81,7 @@ struct bpf_dtab {
> u32 n_buckets;
> };
>
> -static DEFINE_PER_CPU(struct list_head, dev_map_flush_list);
> +static DEFINE_PER_CPU(struct list_head, dev_flush_list);
> static DEFINE_SPINLOCK(dev_map_lock);
> static LIST_HEAD(dev_map_list);
>
> @@ -357,16 +357,16 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
> goto out;
> }
>
> -/* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled
> +/* __dev_flush is called from xdp_do_flush_map() which _must_ be signaled
> * from the driver before returning from its napi->poll() routine. The poll()
> * routine is called either from busy_poll context or net_rx_action signaled
> * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the
> * net device can be torn down. On devmap tear down we ensure the flush list
> * is empty before completing to ensure all flush operations have completed.
> */
> -void __dev_map_flush(void)
> +void __dev_flush(void)
> {
> - struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
> + struct list_head *flush_list = this_cpu_ptr(&dev_flush_list);
> struct xdp_dev_bulk_queue *bq, *tmp;
>
> rcu_read_lock();
> @@ -398,7 +398,7 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
> struct net_device *dev_rx)
>
^^^
While you're at this part of code maybe you could remove another blank
line? :)
> {
> - struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
> + struct list_head *flush_list = this_cpu_ptr(&dev_flush_list);
> struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq);
>
> if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
> @@ -419,10 +419,9 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
> return 0;
> }
>
> -int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
> - struct net_device *dev_rx)
> +static inline int _xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
> + struct net_device *dev_rx)
> {
> - struct net_device *dev = dst->dev;
> struct xdp_frame *xdpf;
> int err;
>
> @@ -440,6 +439,20 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
> return bq_enqueue(dev, xdpf, dev_rx);
> }
>
> +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
> + struct net_device *dev_rx)
> +{
> + return _xdp_enqueue(dev, xdp, dev_rx);
AFAIK normally the internal functions are prefixed with a double
underscore, no? Could we have it renamed to __xdp_enqueue?
> +}
> +
> +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
> + struct net_device *dev_rx)
> +{
> + struct net_device *dev = dst->dev;
> +
> + return _xdp_enqueue(dev, xdp, dev_rx);
> +}
> +
> int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
> struct bpf_prog *xdp_prog)
> {
> @@ -762,7 +775,7 @@ static int __init dev_map_init(void)
> register_netdevice_notifier(&dev_map_notifier);
>
> for_each_possible_cpu(cpu)
> - INIT_LIST_HEAD(&per_cpu(dev_map_flush_list, cpu));
> + INIT_LIST_HEAD(&per_cpu(dev_flush_list, cpu));
> return 0;
> }
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 42fd17c48c5f..f023f3a8f351 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3458,58 +3458,6 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = {
> .arg2_type = ARG_ANYTHING,
> };
>
> -static int __bpf_tx_xdp(struct net_device *dev,
> - struct bpf_map *map,
> - struct xdp_buff *xdp,
> - u32 index)
> -{
> - struct xdp_frame *xdpf;
> - int err, sent;
> -
> - if (!dev->netdev_ops->ndo_xdp_xmit) {
> - return -EOPNOTSUPP;
> - }
> -
> - err = xdp_ok_fwd_dev(dev, xdp->data_end - xdp->data);
> - if (unlikely(err))
> - return err;
> -
> - xdpf = convert_to_xdp_frame(xdp);
> - if (unlikely(!xdpf))
> - return -EOVERFLOW;
> -
> - sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf, XDP_XMIT_FLUSH);
> - if (sent <= 0)
> - return sent;
> - return 0;
> -}
> -
> -static noinline int
> -xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp,
> - struct bpf_prog *xdp_prog, struct bpf_redirect_info *ri)
> -{
> - struct net_device *fwd;
> - u32 index = ri->tgt_index;
> - int err;
> -
> - fwd = dev_get_by_index_rcu(dev_net(dev), index);
> - ri->tgt_index = 0;
> - if (unlikely(!fwd)) {
> - err = -EINVAL;
> - goto err;
> - }
> -
> - err = __bpf_tx_xdp(fwd, NULL, xdp, 0);
> - if (unlikely(err))
> - goto err;
> -
> - _trace_xdp_redirect(dev, xdp_prog, index);
> - return 0;
> -err:
> - _trace_xdp_redirect_err(dev, xdp_prog, index, err);
> - return err;
> -}
> -
> static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
> struct bpf_map *map, struct xdp_buff *xdp)
> {
> @@ -3529,7 +3477,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
>
> void xdp_do_flush_map(void)
> {
> - __dev_map_flush();
> + __dev_flush();
Hmm maybe it's also time for s/xdp_do_flush_map/xdp_do_flush ? Drivers
changes, though :<
> __cpu_map_flush();
> __xsk_map_flush();
> }
> @@ -3568,10 +3516,11 @@ void bpf_clear_redirect_map(struct bpf_map *map)
> }
> }
>
> -static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
> - struct bpf_prog *xdp_prog, struct bpf_map *map,
> - struct bpf_redirect_info *ri)
> +int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> + struct bpf_prog *xdp_prog)
> {
> + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> + struct bpf_map *map = READ_ONCE(ri->map);
> u32 index = ri->tgt_index;
> void *fwd = ri->tgt_value;
> int err;
> @@ -3580,7 +3529,18 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
> ri->tgt_value = NULL;
> WRITE_ONCE(ri->map, NULL);
>
> - err = __bpf_tx_xdp_map(dev, fwd, map, xdp);
> + if (unlikely(!map)) {
> + fwd = dev_get_by_index_rcu(dev_net(dev), index);
> + if (unlikely(!fwd)) {
> + err = -EINVAL;
> + goto err;
> + }
> +
> + err = dev_xdp_enqueue(fwd, xdp, dev);
> + } else {
> + err = __bpf_tx_xdp_map(dev, fwd, map, xdp);
> + }
> +
> if (unlikely(err))
> goto err;
>
> @@ -3590,18 +3550,6 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
> _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map, index, err);
> return err;
> }
> -
> -int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> - struct bpf_prog *xdp_prog)
> -{
> - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> - struct bpf_map *map = READ_ONCE(ri->map);
> -
> - if (likely(map))
> - return xdp_do_redirect_map(dev, xdp, xdp_prog, map, ri);
> -
> - return xdp_do_redirect_slow(dev, xdp, xdp_prog, ri);
> -}
> EXPORT_SYMBOL_GPL(xdp_do_redirect);
>
> static int xdp_do_generic_redirect_map(struct net_device *dev,
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT
2020-01-14 17:47 ` [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Alexei Starovoitov
@ 2020-01-15 17:49 ` John Fastabend
0 siblings, 0 replies; 13+ messages in thread
From: John Fastabend @ 2020-01-15 17:49 UTC (permalink / raw
To: Alexei Starovoitov, Toke Høiland-Jørgensen
Cc: Network Development, bpf, Daniel Borkmann, Alexei Starovoitov,
David Miller, Jesper Dangaard Brouer, Björn Töpel,
John Fastabend
Alexei Starovoitov wrote:
> On Mon, Jan 13, 2020 at 10:11 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > Since commit 96360004b862 ("xdp: Make devmap flush_list common for all map
> > instances"), devmap flushing is a global operation instead of tied to a
> > particular map. This means that with a bit of refactoring, we can finally fix
> > the performance delta between the bpf_redirect_map() and bpf_redirect() helper
> > functions, by introducing bulking for the latter as well.
> >
> > This series makes this change by moving the data structure used for the bulking
> > into struct net_device itself, so we can access it even when there is not
> > devmap. Once this is done, moving the bpf_redirect() helper to use the bulking
> > mechanism becomes quite trivial, and brings bpf_redirect() up to the same as
> > bpf_redirect_map():
> >
> > Before: After:
> > 1 CPU:
> > bpf_redirect_map: 8.4 Mpps 8.4 Mpps (no change)
> > bpf_redirect: 5.0 Mpps 8.4 Mpps (+68%)
> > 2 CPUs:
> > bpf_redirect_map: 15.9 Mpps 16.1 Mpps (+1% or ~no change)
> > bpf_redirect: 9.5 Mpps 15.9 Mpps (+67%)
> >
> > After this patch series, the only semantics different between the two variants
> > of the bpf() helper (apart from the absence of a map argument, obviously) is
> > that the _map() variant will return an error if passed an invalid map index,
> > whereas the bpf_redirect() helper will succeed, but drop packets on
> > xdp_do_redirect(). This is because the helper has no reference to the calling
> > netdev, so unfortunately we can't do the ifindex lookup directly in the helper.
> >
> > Changelog:
> >
> > v2:
> > - Consolidate code paths and tracepoints for map and non-map redirect variants
> > (Björn)
> > - Add performance data for 2-CPU test (Jesper)
> > - Move fields to avoid shifting cache lines in struct net_device (Eric)
>
> John, since you commented on v1 please review this v2. Thanks!
hmm don't think I had an initial comment but will review regardless ;)
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths
2020-01-13 18:10 ` [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths Toke Høiland-Jørgensen
2020-01-15 12:16 ` Maciej Fijalkowski
@ 2020-01-15 19:43 ` John Fastabend
1 sibling, 0 replies; 13+ messages in thread
From: John Fastabend @ 2020-01-15 19:43 UTC (permalink / raw
To: Toke Høiland-Jørgensen, netdev
Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
Jesper Dangaard Brouer, Björn Töpel, John Fastabend
Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
>
> Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
> we can re-use the bulking for the non-map version of the bpf_redirect()
> helper. This is a simple matter of having xdp_do_redirect_slow() queue the
> frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
>
> Unfortunately we can't make the bpf_redirect() helper return an error if
> the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
> have a reference to the network namespace of the ingress device at the time
> the helper is called. So we have to leave it as-is and keep the device
> lookup in xdp_do_redirect_slow().
>
> Since this leaves less reason to have the non-map redirect code in a
> separate function, so we get rid of the xdp_do_redirect_slow() function
> entirely. This does lose us the tracepoint disambiguation, but fortunately
> the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
> entry structures. This means both can contain a map index, so we can just
> amend the tracepoint definitions so we always emit the xdp_redirect(_err)
> tracepoints, but with the map ID only populated if a map is present. This
> means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
> the definitions around in case someone is still listening for them.
>
> With this change, the performance of the xdp_redirect sample program goes
> from 5Mpps to 8.4Mpps (a 68% increase).
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device
2020-01-13 18:10 ` [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen
@ 2020-01-15 19:45 ` John Fastabend
2020-01-15 22:22 ` Toke Høiland-Jørgensen
2020-01-15 20:17 ` Jesper Dangaard Brouer
1 sibling, 1 reply; 13+ messages in thread
From: John Fastabend @ 2020-01-15 19:45 UTC (permalink / raw
To: Toke Høiland-Jørgensen, netdev
Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
Jesper Dangaard Brouer, Björn Töpel, John Fastabend
Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
>
> Commit 96360004b862 ("xdp: Make devmap flush_list common for all map
> instances"), changed devmap flushing to be a global operation instead of a
> per-map operation. However, the queue structure used for bulking was still
> allocated as part of the containing map.
>
> This patch moves the devmap bulk queue into struct net_device. The
> motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
> which will be changed in a subsequent commit. To avoid other fields of
> struct net_device moving to different cache lines, we also move a couple of
> other members around.
>
> We defer the actual allocation of the bulk queue structure until the
> NETDEV_REGISTER notification devmap.c. This makes it possible to check for
> ndo_xdp_xmit support before allocating the structure, which is not possible
> at the time struct net_device is allocated. However, we keep the freeing in
> free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.
>
> Because of this change, we lose the reference back to the map that
> originated the redirect, so change the tracepoint to always return 0 as the
> map ID and index. Otherwise no functional change is intended with this
> patch.
>
> Acked-by: Björn Töpel <bjorn.topel@intel.com>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
LGTM. I didn't check the net_device layout with pahole though so I'm
trusting they are good from v1 discussion.
Acked-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device
2020-01-13 18:10 ` [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen
2020-01-15 19:45 ` John Fastabend
@ 2020-01-15 20:17 ` Jesper Dangaard Brouer
2020-01-15 22:11 ` Toke Høiland-Jørgensen
1 sibling, 1 reply; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2020-01-15 20:17 UTC (permalink / raw
To: Toke Høiland-Jørgensen
Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
Björn Töpel, John Fastabend, brouer
On Mon, 13 Jan 2020 19:10:55 +0100
Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index da9c832fc5c8..030d125c3839 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
[...]
> @@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
> out:
> bq->count = 0;
>
> - trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx,
> - sent, drops, bq->dev_rx, dev, err);
> + trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err);
Hmm ... I don't like that we lose the map_id and map_index identifier.
This is part of our troubleshooting interface.
> bq->dev_rx = NULL;
> __list_del_clearprev(&bq->flush_node);
> return 0;
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device
2020-01-15 20:17 ` Jesper Dangaard Brouer
@ 2020-01-15 22:11 ` Toke Høiland-Jørgensen
2020-01-16 11:24 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-15 22:11 UTC (permalink / raw
To: Jesper Dangaard Brouer
Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
Björn Töpel, John Fastabend, brouer
Jesper Dangaard Brouer <brouer@redhat.com> writes:
> On Mon, 13 Jan 2020 19:10:55 +0100
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
>> index da9c832fc5c8..030d125c3839 100644
>> --- a/kernel/bpf/devmap.c
>> +++ b/kernel/bpf/devmap.c
> [...]
>> @@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
>> out:
>> bq->count = 0;
>>
>> - trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx,
>> - sent, drops, bq->dev_rx, dev, err);
>> + trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err);
>
> Hmm ... I don't like that we lose the map_id and map_index identifier.
> This is part of our troubleshooting interface.
Hmm, I guess I can take another look at whether there's a way to avoid
that. Any ideas?
-Toke
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device
2020-01-15 19:45 ` John Fastabend
@ 2020-01-15 22:22 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-15 22:22 UTC (permalink / raw
To: John Fastabend, netdev
Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
Jesper Dangaard Brouer, Björn Töpel, John Fastabend
John Fastabend <john.fastabend@gmail.com> writes:
> Toke Høiland-Jørgensen wrote:
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> Commit 96360004b862 ("xdp: Make devmap flush_list common for all map
>> instances"), changed devmap flushing to be a global operation instead of a
>> per-map operation. However, the queue structure used for bulking was still
>> allocated as part of the containing map.
>>
>> This patch moves the devmap bulk queue into struct net_device. The
>> motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
>> which will be changed in a subsequent commit. To avoid other fields of
>> struct net_device moving to different cache lines, we also move a couple of
>> other members around.
>>
>> We defer the actual allocation of the bulk queue structure until the
>> NETDEV_REGISTER notification devmap.c. This makes it possible to check for
>> ndo_xdp_xmit support before allocating the structure, which is not possible
>> at the time struct net_device is allocated. However, we keep the freeing in
>> free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.
>>
>> Because of this change, we lose the reference back to the map that
>> originated the redirect, so change the tracepoint to always return 0 as the
>> map ID and index. Otherwise no functional change is intended with this
>> patch.
>>
>> Acked-by: Björn Töpel <bjorn.topel@intel.com>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>
> LGTM. I didn't check the net_device layout with pahole though so I'm
> trusting they are good from v1 discussion.
I believe so; looks like this now:
/* --- cacheline 14 boundary (896 bytes) --- */
struct netdev_queue * _tx __attribute__((__aligned__(64))); /* 896 8 */
unsigned int num_tx_queues; /* 904 4 */
unsigned int real_num_tx_queues; /* 908 4 */
struct Qdisc * qdisc; /* 912 8 */
unsigned int tx_queue_len; /* 920 4 */
spinlock_t tx_global_lock; /* 924 4 */
struct xdp_dev_bulk_queue * xdp_bulkq; /* 928 8 */
struct xps_dev_maps * xps_cpus_map; /* 936 8 */
struct xps_dev_maps * xps_rxqs_map; /* 944 8 */
struct mini_Qdisc * miniq_egress; /* 952 8 */
/* --- cacheline 15 boundary (960 bytes) --- */
struct hlist_head qdisc_hash[16]; /* 960 128 */
/* --- cacheline 17 boundary (1088 bytes) --- */
struct timer_list watchdog_timer; /* 1088 40 */
/* XXX last struct has 4 bytes of padding */
int watchdog_timeo; /* 1128 4 */
/* XXX 4 bytes hole, try to pack */
int * pcpu_refcnt; /* 1136 8 */
struct list_head todo_list; /* 1144 16 */
/* --- cacheline 18 boundary (1152 bytes) was 8 bytes ago --- */
-Toke
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device
2020-01-15 22:11 ` Toke Høiland-Jørgensen
@ 2020-01-16 11:24 ` Jesper Dangaard Brouer
2020-01-16 13:51 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2020-01-16 11:24 UTC (permalink / raw
To: Toke Høiland-Jørgensen
Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
Björn Töpel, John Fastabend, brouer
On Wed, 15 Jan 2020 23:11:21 +0100
Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>
> > On Mon, 13 Jan 2020 19:10:55 +0100
> > Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> >> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> >> index da9c832fc5c8..030d125c3839 100644
> >> --- a/kernel/bpf/devmap.c
> >> +++ b/kernel/bpf/devmap.c
> > [...]
> >> @@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
> >> out:
> >> bq->count = 0;
> >>
> >> - trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx,
> >> - sent, drops, bq->dev_rx, dev, err);
> >> + trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err);
> >
> > Hmm ... I don't like that we lose the map_id and map_index identifier.
> > This is part of our troubleshooting interface.
>
> Hmm, I guess I can take another look at whether there's a way to avoid
> that. Any ideas?
Looking at the code and the other tracepoints...
I will actually suggest to remove these two arguments, because the
trace_xdp_redirect_map tracepoint also contains the ifindex'es, and to
troubleshoot people can record both tracepoints and do the correlation
themselves.
When changing the tracepoint I would like to keep member 'drops' and
'sent' at the same struct offsets. As our xdp_monitor example reads
these and I hope we can kept it working this way.
I've coded it up, and tested it. The new xdp_monitor will work on
older kernels, but the old xdp_monitor will fail attaching on newer
kernels. I think this is fair enough, as we are backwards compatible.
[PATCH] devmap: adjust tracepoing after Tokes changes
From: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/trace/events/xdp.h | 29 ++++++++++++-----------------
kernel/bpf/devmap.c | 2 +-
samples/bpf/xdp_monitor_kern.c | 8 +++-----
3 files changed, 16 insertions(+), 23 deletions(-)
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index cf568a38f852..f1e64689ce94 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -247,43 +247,38 @@ TRACE_EVENT(xdp_cpumap_enqueue,
TRACE_EVENT(xdp_devmap_xmit,
- TP_PROTO(const struct bpf_map *map, u32 map_index,
- int sent, int drops,
- const struct net_device *from_dev,
- const struct net_device *to_dev, int err),
+ TP_PROTO(const struct net_device *from_dev,
+ const struct net_device *to_dev,
+ int sent, int drops, int err),
- TP_ARGS(map, map_index, sent, drops, from_dev, to_dev, err),
+ TP_ARGS(from_dev, to_dev, sent, drops, err),
TP_STRUCT__entry(
- __field(int, map_id)
+ __field(int, from_ifindex)
__field(u32, act)
- __field(u32, map_index)
+ __field(int, to_ifindex)
__field(int, drops)
__field(int, sent)
- __field(int, from_ifindex)
- __field(int, to_ifindex)
__field(int, err)
),
TP_fast_assign(
- __entry->map_id = map ? map->id : 0;
+ __entry->from_ifindex = from_dev->ifindex;
__entry->act = XDP_REDIRECT;
- __entry->map_index = map_index;
+ __entry->to_ifindex = to_dev->ifindex;
__entry->drops = drops;
__entry->sent = sent;
- __entry->from_ifindex = from_dev->ifindex;
- __entry->to_ifindex = to_dev->ifindex;
__entry->err = err;
),
TP_printk("ndo_xdp_xmit"
- " map_id=%d map_index=%d action=%s"
+ " from_ifindex=%d to_ifindex=%d action=%s"
" sent=%d drops=%d"
- " from_ifindex=%d to_ifindex=%d err=%d",
- __entry->map_id, __entry->map_index,
+ " err=%d",
+ __entry->from_ifindex, __entry->to_ifindex,
__print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
__entry->sent, __entry->drops,
- __entry->from_ifindex, __entry->to_ifindex, __entry->err)
+ __entry->err)
);
/* Expect users already include <net/xdp.h>, but not xdp_priv.h */
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index db32272c4f77..1b4bfe4e06d6 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -340,7 +340,7 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
out:
bq->count = 0;
- trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err);
+ trace_xdp_devmap_xmit(bq->dev_rx, dev, sent, drops, err);
bq->dev_rx = NULL;
__list_del_clearprev(&bq->flush_node);
return 0;
diff --git a/samples/bpf/xdp_monitor_kern.c b/samples/bpf/xdp_monitor_kern.c
index ad10fe700d7d..39458a44472e 100644
--- a/samples/bpf/xdp_monitor_kern.c
+++ b/samples/bpf/xdp_monitor_kern.c
@@ -222,14 +222,12 @@ struct bpf_map_def SEC("maps") devmap_xmit_cnt = {
*/
struct devmap_xmit_ctx {
u64 __pad; // First 8 bytes are not accessible by bpf code
- int map_id; // offset:8; size:4; signed:1;
+ int from_ifindex; // offset:8; size:4; signed:1;
u32 act; // offset:12; size:4; signed:0;
- u32 map_index; // offset:16; size:4; signed:0;
+ int to_ifindex; // offset:16; size:4; signed:1;
int drops; // offset:20; size:4; signed:1;
int sent; // offset:24; size:4; signed:1;
- int from_ifindex; // offset:28; size:4; signed:1;
- int to_ifindex; // offset:32; size:4; signed:1;
- int err; // offset:36; size:4; signed:1;
+ int err; // offset:28; size:4; signed:1;
};
SEC("tracepoint/xdp/xdp_devmap_xmit")
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device
2020-01-16 11:24 ` Jesper Dangaard Brouer
@ 2020-01-16 13:51 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-16 13:51 UTC (permalink / raw
To: Jesper Dangaard Brouer
Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
Björn Töpel, John Fastabend, brouer
Jesper Dangaard Brouer <brouer@redhat.com> writes:
> On Wed, 15 Jan 2020 23:11:21 +0100
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>>
>> > On Mon, 13 Jan 2020 19:10:55 +0100
>> > Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >
>> >> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
>> >> index da9c832fc5c8..030d125c3839 100644
>> >> --- a/kernel/bpf/devmap.c
>> >> +++ b/kernel/bpf/devmap.c
>> > [...]
>> >> @@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
>> >> out:
>> >> bq->count = 0;
>> >>
>> >> - trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx,
>> >> - sent, drops, bq->dev_rx, dev, err);
>> >> + trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err);
>> >
>> > Hmm ... I don't like that we lose the map_id and map_index identifier.
>> > This is part of our troubleshooting interface.
>>
>> Hmm, I guess I can take another look at whether there's a way to avoid
>> that. Any ideas?
>
> Looking at the code and the other tracepoints...
>
> I will actually suggest to remove these two arguments, because the
> trace_xdp_redirect_map tracepoint also contains the ifindex'es, and to
> troubleshoot people can record both tracepoints and do the correlation
> themselves.
>
> When changing the tracepoint I would like to keep member 'drops' and
> 'sent' at the same struct offsets. As our xdp_monitor example reads
> these and I hope we can kept it working this way.
>
> I've coded it up, and tested it. The new xdp_monitor will work on
> older kernels, but the old xdp_monitor will fail attaching on newer
> kernels. I think this is fair enough, as we are backwards compatible.
SGTM - thanks! I'll respin and include this :)
-Toke
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-01-16 13:51 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-01-13 18:10 [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen
2020-01-13 18:10 ` [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen
2020-01-15 19:45 ` John Fastabend
2020-01-15 22:22 ` Toke Høiland-Jørgensen
2020-01-15 20:17 ` Jesper Dangaard Brouer
2020-01-15 22:11 ` Toke Høiland-Jørgensen
2020-01-16 11:24 ` Jesper Dangaard Brouer
2020-01-16 13:51 ` Toke Høiland-Jørgensen
2020-01-13 18:10 ` [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths Toke Høiland-Jørgensen
2020-01-15 12:16 ` Maciej Fijalkowski
2020-01-15 19:43 ` John Fastabend
2020-01-14 17:47 ` [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Alexei Starovoitov
2020-01-15 17:49 ` John Fastabend
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.