All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/29] Phase 2 of fib_trie updates
@ 2015-02-24 20:47 Alexander Duyck
  2015-02-24 20:48 ` [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list Alexander Duyck
                   ` (29 more replies)
  0 siblings, 30 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:47 UTC (permalink / raw
  To: netdev

This patch series implements the second phase of the fib_trie changes.  I
presented on these and the previous changes at Netdev01 and netconf.  The
slides for the Netdev01 presentation can be found at
https://www.netdev01.org/docs/duyck-fib-trie.pdf.

I'm currently debating if I should just submit the entire patch-set as-is
or if I should hold off on submitting the last 10 patches as they currently
have a potential performance impact in the case of a large number of
entries placed in the local table.  Specifically I have seen that removing
an interface in the case of 8K local subnets being configured on it
resulted in the time for a dummy interface being removed increasing from
about .6 seconds to 2.4 seconds.  I am not sure how common of a use-case
something like this would be.  I have not seen the same issue if I assign
8K routes to the interface as I believe the fib_table_flush aggregates them
all in to one resize action.

The entire series reduces the total look-up time by another 20-35% versus
what is currently in the 4.0-rc1 kernel.  So for example a set of routing
look-ups which took 140ns in the 4.0-rc1 kernel will now only take about
105ns after these patches.

---

Alexander Duyck (29):
      fib_trie: Convert fib_alias to hlist from list
      fib_trie: Replace plen with slen in leaf_info
      fib_trie: Add slen to fib alias
      fib_trie: Remove leaf_info
      fib_trie: Only resize N/2 times instead N * log(N) times in fib_table_flush
      fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf
      fib_trie: Fib find node should return parent
      fib_trie: Update insert and delete to make use of tp from find_node
      fib_trie: Make fib_table rcu safe
      fib_trie: Return pointer to tnode pointer in resize/inflate/halve
      fib_trie: Rename tnode to key_vector
      fib_trie: move leaf and tnode to occupy the same spot in the key vector
      fib_trie: replace tnode_get_child functions with get_child macros
      fib_trie: Rename tnode_child_length to child_length
      fib_trie: Add tnode struct as a container for fields not needed in key_vector
      fib_trie: Move rcu from key_vector to tnode, add accessors.
      fib_trie: Pull empty_children and full_children into tnode
      fib_trie: Move parent from key_vector to tnode
      fib_trie: Add key vector to root, return parent key_vector in resize
      fib_trie: Push net pointer down into fib_trie insert/delete/flush calls
      fib_trie: Rewrite handling of RCU to include parent in replacement
      fib_trie: Allocate tnode as array of key_vectors instead of key_vector as array of tnode pointers
      fib_trie: Add leaf_init
      fib_trie: Update tnode_new to drop use of put_child_root
      fib_trie: Add function for dropping children from trie
      fib_trie: Use put_child to only copy key_vectors instead of pointers
      fib_trie: Move key and pos into key_vector from tnode
      fib_trie: Move slen from tnode to key vector
      fib_trie: Push bits up one level, and move leaves up into parent key_vector array


 include/net/ip_fib.h     |   80 +-
 include/net/netns/ipv4.h |    7 
 net/ipv4/fib_frontend.c  |   89 ++
 net/ipv4/fib_lookup.h    |    3 
 net/ipv4/fib_semantics.c |    4 
 net/ipv4/fib_trie.c      | 1974 ++++++++++++++++++++++++----------------------
 6 files changed, 1152 insertions(+), 1005 deletions(-)

--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
@ 2015-02-24 20:48 ` Alexander Duyck
  2015-02-24 21:51   ` Or Gerlitz
                     ` (2 more replies)
  2015-02-24 20:48 ` [RFC PATCH 02/29] fib_trie: Replace plen with slen in leaf_info Alexander Duyck
                   ` (28 subsequent siblings)
  29 siblings, 3 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:48 UTC (permalink / raw
  To: netdev

There isn't any advantage to having it as a list and by making it an hlist
we make the fib_alias more compatible with the list_info in terms of the
type of list used.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/net/ip_fib.h     |    2 +
 net/ipv4/fib_lookup.h    |    2 +
 net/ipv4/fib_semantics.c |    4 +--
 net/ipv4/fib_trie.c      |   72 ++++++++++++++++++++++++++--------------------
 4 files changed, 44 insertions(+), 36 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 5bd120e..cba4b7c 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -136,7 +136,7 @@ struct fib_result {
 	u32		tclassid;
 	struct fib_info *fi;
 	struct fib_table *table;
-	struct list_head *fa_head;
+	struct hlist_head *fa_head;
 };
 
 struct fib_result_nl {
diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index 825981b..3cd444f 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -6,7 +6,7 @@
 #include <net/ip_fib.h>
 
 struct fib_alias {
-	struct list_head	fa_list;
+	struct hlist_node	fa_list;
 	struct fib_info		*fa_info;
 	u8			fa_tos;
 	u8			fa_type;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 1e2090e..c6d2674 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1163,12 +1163,12 @@ int fib_sync_down_dev(struct net_device *dev, int force)
 void fib_select_default(struct fib_result *res)
 {
 	struct fib_info *fi = NULL, *last_resort = NULL;
-	struct list_head *fa_head = res->fa_head;
+	struct hlist_head *fa_head = res->fa_head;
 	struct fib_table *tb = res->table;
 	int order = -1, last_idx = -1;
 	struct fib_alias *fa;
 
-	list_for_each_entry_rcu(fa, fa_head, fa_list) {
+	hlist_for_each_entry_rcu(fa, fa_head, fa_list) {
 		struct fib_info *next_fi = fa->fa_info;
 
 		if (next_fi->fib_scope != res->scope ||
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 3daf022..e0d44b7 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -116,7 +116,7 @@ struct leaf_info {
 	struct hlist_node hlist;
 	int plen;
 	u32 mask_plen; /* ntohl(inet_make_mask(plen)) */
-	struct list_head falh;
+	struct hlist_head falh;
 	struct rcu_head rcu;
 };
 
@@ -339,7 +339,7 @@ static struct leaf_info *leaf_info_new(int plen)
 	if (li) {
 		li->plen = plen;
 		li->mask_plen = ntohl(inet_make_mask(plen));
-		INIT_LIST_HEAD(&li->falh);
+		INIT_HLIST_HEAD(&li->falh);
 	}
 	return li;
 }
@@ -881,7 +881,7 @@ static struct leaf_info *find_leaf_info(struct tnode *l, int plen)
 	return NULL;
 }
 
-static inline struct list_head *get_fa_head(struct tnode *l, int plen)
+static inline struct hlist_head *get_fa_head(struct tnode *l, int plen)
 {
 	struct leaf_info *li = find_leaf_info(l, plen);
 
@@ -994,14 +994,15 @@ static struct tnode *fib_find_node(struct trie *t, u32 key)
 /* Return the first fib alias matching TOS with
  * priority less than or equal to PRIO.
  */
-static struct fib_alias *fib_find_alias(struct list_head *fah, u8 tos, u32 prio)
+static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 tos,
+					u32 prio)
 {
 	struct fib_alias *fa;
 
 	if (!fah)
 		return NULL;
 
-	list_for_each_entry(fa, fah, fa_list) {
+	hlist_for_each_entry(fa, fah, fa_list) {
 		if (fa->fa_tos > tos)
 			continue;
 		if (fa->fa_info->fib_priority >= prio || fa->fa_tos < tos)
@@ -1027,9 +1028,9 @@ static void trie_rebalance(struct trie *t, struct tnode *tn)
 
 /* only used from updater-side */
 
-static struct list_head *fib_insert_node(struct trie *t, u32 key, int plen)
+static struct hlist_head *fib_insert_node(struct trie *t, u32 key, int plen)
 {
-	struct list_head *fa_head = NULL;
+	struct hlist_head *fa_head = NULL;
 	struct tnode *l, *n, *tp = NULL;
 	struct leaf_info *li;
 
@@ -1130,7 +1131,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *) tb->tb_data;
 	struct fib_alias *fa, *new_fa;
-	struct list_head *fa_head = NULL;
+	struct hlist_head *fa_head = NULL;
 	struct fib_info *fi;
 	int plen = cfg->fc_dst_len;
 	u8 tos = cfg->fc_tos;
@@ -1192,8 +1193,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 		 */
 		fa_match = NULL;
 		fa_first = fa;
-		fa = list_entry(fa->fa_list.prev, struct fib_alias, fa_list);
-		list_for_each_entry_continue(fa, fa_head, fa_list) {
+		hlist_for_each_entry_from(fa, fa_list) {
 			if (fa->fa_tos != tos)
 				break;
 			if (fa->fa_info->fib_priority != fi->fib_priority)
@@ -1227,7 +1227,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 			state = fa->fa_state;
 			new_fa->fa_state = state & ~FA_S_ACCESSED;
 
-			list_replace_rcu(&fa->fa_list, &new_fa->fa_list);
+			hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);
 			alias_free_mem_rcu(fa);
 
 			fib_release_info(fi_drop);
@@ -1276,8 +1276,17 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	if (!plen)
 		tb->tb_num_default++;
 
-	list_add_tail_rcu(&new_fa->fa_list,
-			  (fa ? &fa->fa_list : fa_head));
+	if (!fa) {
+		struct fib_alias *last;
+
+		hlist_for_each_entry(last, fa_head, fa_list)
+			fa = last;
+	}
+
+	if (fa)
+		hlist_add_behind_rcu(&new_fa->fa_list, &fa->fa_list);
+	else
+		hlist_add_head_rcu(&new_fa->fa_list, fa_head);
 
 	rt_cache_flush(cfg->fc_nlinfo.nl_net);
 	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, tb->tb_id,
@@ -1419,7 +1428,7 @@ found:
 		if ((key ^ n->key) & li->mask_plen)
 			continue;
 
-		list_for_each_entry_rcu(fa, &li->falh, fa_list) {
+		hlist_for_each_entry_rcu(fa, &li->falh, fa_list) {
 			struct fib_info *fi = fa->fa_info;
 			int nhsel, err;
 
@@ -1501,7 +1510,7 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	int plen = cfg->fc_dst_len;
 	u8 tos = cfg->fc_tos;
 	struct fib_alias *fa, *fa_to_delete;
-	struct list_head *fa_head;
+	struct hlist_head *fa_head;
 	struct tnode *l;
 	struct leaf_info *li;
 
@@ -1534,8 +1543,7 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	pr_debug("Deleting %08x/%d tos=%d t=%p\n", key, plen, tos, t);
 
 	fa_to_delete = NULL;
-	fa = list_entry(fa->fa_list.prev, struct fib_alias, fa_list);
-	list_for_each_entry_continue(fa, fa_head, fa_list) {
+	hlist_for_each_entry_from(fa, fa_list) {
 		struct fib_info *fi = fa->fa_info;
 
 		if (fa->fa_tos != tos)
@@ -1561,12 +1569,12 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
 
-	list_del_rcu(&fa->fa_list);
+	hlist_del_rcu(&fa->fa_list);
 
 	if (!plen)
 		tb->tb_num_default--;
 
-	if (list_empty(fa_head)) {
+	if (hlist_empty(fa_head)) {
 		remove_leaf_info(l, li);
 		free_leaf_info(li);
 	}
@@ -1582,16 +1590,17 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	return 0;
 }
 
-static int trie_flush_list(struct list_head *head)
+static int trie_flush_list(struct hlist_head *head)
 {
-	struct fib_alias *fa, *fa_node;
+	struct hlist_node *tmp;
+	struct fib_alias *fa;
 	int found = 0;
 
-	list_for_each_entry_safe(fa, fa_node, head, fa_list) {
+	hlist_for_each_entry_safe(fa, tmp, head, fa_list) {
 		struct fib_info *fi = fa->fa_info;
 
 		if (fi && (fi->fib_flags & RTNH_F_DEAD)) {
-			list_del_rcu(&fa->fa_list);
+			hlist_del_rcu(&fa->fa_list);
 			fib_release_info(fa->fa_info);
 			alias_free_mem_rcu(fa);
 			found++;
@@ -1603,15 +1612,14 @@ static int trie_flush_list(struct list_head *head)
 static int trie_flush_leaf(struct tnode *l)
 {
 	int found = 0;
-	struct hlist_head *lih = &l->list;
 	struct hlist_node *tmp;
-	struct leaf_info *li = NULL;
+	struct leaf_info *li;
 	unsigned char plen = KEYLENGTH;
 
-	hlist_for_each_entry_safe(li, tmp, lih, hlist) {
+	hlist_for_each_entry_safe(li, tmp, &l->list, hlist) {
 		found += trie_flush_list(&li->falh);
 
-		if (list_empty(&li->falh)) {
+		if (hlist_empty(&li->falh)) {
 			hlist_del_rcu(&li->hlist);
 			free_leaf_info(li);
 			continue;
@@ -1731,7 +1739,7 @@ void fib_free_table(struct fib_table *tb)
 	kfree(tb);
 }
 
-static int fn_trie_dump_fa(t_key key, int plen, struct list_head *fah,
+static int fn_trie_dump_fa(t_key key, int plen, struct hlist_head *fah,
 			   struct fib_table *tb,
 			   struct sk_buff *skb, struct netlink_callback *cb)
 {
@@ -1744,7 +1752,7 @@ static int fn_trie_dump_fa(t_key key, int plen, struct list_head *fah,
 
 	/* rcu_read_lock is hold by caller */
 
-	list_for_each_entry_rcu(fa, fah, fa_list) {
+	hlist_for_each_entry_rcu(fa, fah, fa_list) {
 		if (i < s_i) {
 			i++;
 			continue;
@@ -1787,7 +1795,7 @@ static int fn_trie_dump_leaf(struct tnode *l, struct fib_table *tb,
 		if (i > s_i)
 			cb->args[5] = 0;
 
-		if (list_empty(&li->falh))
+		if (hlist_empty(&li->falh))
 			continue;
 
 		if (fn_trie_dump_fa(l->key, li->plen, &li->falh, tb, skb, cb) < 0) {
@@ -2272,7 +2280,7 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
 		hlist_for_each_entry_rcu(li, &n->list, hlist) {
 			struct fib_alias *fa;
 
-			list_for_each_entry_rcu(fa, &li->falh, fa_list) {
+			hlist_for_each_entry_rcu(fa, &li->falh, fa_list) {
 				char buf1[32], buf2[32];
 
 				seq_indent(seq, iter->depth+1);
@@ -2429,7 +2437,7 @@ static int fib_route_seq_show(struct seq_file *seq, void *v)
 		mask = inet_make_mask(li->plen);
 		prefix = htonl(l->key);
 
-		list_for_each_entry_rcu(fa, &li->falh, fa_list) {
+		hlist_for_each_entry_rcu(fa, &li->falh, fa_list) {
 			const struct fib_info *fi = fa->fa_info;
 			unsigned int flags = fib_flag_trans(fa->fa_type, mask, fi);
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 02/29] fib_trie: Replace plen with slen in leaf_info
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
  2015-02-24 20:48 ` [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list Alexander Duyck
@ 2015-02-24 20:48 ` Alexander Duyck
  2015-02-24 20:48 ` [RFC PATCH 03/29] fib_trie: Add slen to fib alias Alexander Duyck
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:48 UTC (permalink / raw
  To: netdev

This replaces the prefix length variable in the leaf_info structure with a
suffix length value, or host identifier length in bits.  By doing this it
makes it easier to sort out since the tnodes and leaf are carrying this
value as well since it is compatible with the ->pos field in tnodes.

I also cleaned up one spot that had some list manipulation that could be
simplified.  I basically updated it so that we just use hlist_add_head_rcu
instead of calling hlist_add_before_rcu on the first node in the list.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   63 ++++++++++++++++++++++++---------------------------
 1 file changed, 30 insertions(+), 33 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index e0d44b7..e04f102 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -114,8 +114,7 @@ struct tnode {
 
 struct leaf_info {
 	struct hlist_node hlist;
-	int plen;
-	u32 mask_plen; /* ntohl(inet_make_mask(plen)) */
+	unsigned char slen;
 	struct hlist_head falh;
 	struct rcu_head rcu;
 };
@@ -337,8 +336,7 @@ static struct leaf_info *leaf_info_new(int plen)
 {
 	struct leaf_info *li = kmalloc(sizeof(struct leaf_info),  GFP_KERNEL);
 	if (li) {
-		li->plen = plen;
-		li->mask_plen = ntohl(inet_make_mask(plen));
+		li->slen = KEYLENGTH - plen;
 		INIT_HLIST_HEAD(&li->falh);
 	}
 	return li;
@@ -873,9 +871,10 @@ static struct leaf_info *find_leaf_info(struct tnode *l, int plen)
 {
 	struct hlist_head *head = &l->list;
 	struct leaf_info *li;
+	int slen = KEYLENGTH - plen;
 
 	hlist_for_each_entry_rcu(li, head, hlist)
-		if (li->plen == plen)
+		if (li->slen == slen)
 			return li;
 
 	return NULL;
@@ -929,33 +928,29 @@ static void remove_leaf_info(struct tnode *l, struct leaf_info *old)
 		return;
 
 	/* update the trie with the latest suffix length */
-	l->slen = KEYLENGTH - li->plen;
+	l->slen = li->slen;
 	leaf_pull_suffix(l);
 }
 
 static void insert_leaf_info(struct tnode *l, struct leaf_info *new)
 {
 	struct hlist_head *head = &l->list;
-	struct leaf_info *li = NULL, *last = NULL;
-
-	if (hlist_empty(head)) {
-		hlist_add_head_rcu(&new->hlist, head);
-	} else {
-		hlist_for_each_entry(li, head, hlist) {
-			if (new->plen > li->plen)
-				break;
+	struct leaf_info *li, *last = NULL;
 
-			last = li;
-		}
-		if (last)
-			hlist_add_behind_rcu(&new->hlist, &last->hlist);
-		else
-			hlist_add_before_rcu(&new->hlist, &li->hlist);
+	hlist_for_each_entry(li, head, hlist) {
+		if (new->slen < li->slen)
+			break;
+		last = li;
 	}
 
+	if (last)
+		hlist_add_behind_rcu(&new->hlist, &last->hlist);
+	else
+		hlist_add_head_rcu(&new->hlist, head);
+
 	/* if we added to the tail node then we need to update slen */
-	if (l->slen < (KEYLENGTH - new->plen)) {
-		l->slen = KEYLENGTH - new->plen;
+	if (l->slen < new->slen) {
+		l->slen = new->slen;
 		leaf_push_suffix(l);
 	}
 }
@@ -1139,7 +1134,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	int err;
 	struct tnode *l;
 
-	if (plen > 32)
+	if (plen > KEYLENGTH)
 		return -EINVAL;
 
 	key = ntohl(cfg->fc_dst);
@@ -1425,7 +1420,8 @@ found:
 	hlist_for_each_entry_rcu(li, &n->list, hlist) {
 		struct fib_alias *fa;
 
-		if ((key ^ n->key) & li->mask_plen)
+		if (((key ^ n->key) >> li->slen) &&
+		    (li->slen != KEYLENGTH))
 			continue;
 
 		hlist_for_each_entry_rcu(fa, &li->falh, fa_list) {
@@ -1459,7 +1455,7 @@ found:
 				if (!(fib_flags & FIB_LOOKUP_NOREF))
 					atomic_inc(&fi->fib_clntref);
 
-				res->prefixlen = li->plen;
+				res->prefixlen = KEYLENGTH - li->slen;
 				res->nh_sel = nhsel;
 				res->type = fa->fa_type;
 				res->scope = fi->fib_scope;
@@ -1614,7 +1610,7 @@ static int trie_flush_leaf(struct tnode *l)
 	int found = 0;
 	struct hlist_node *tmp;
 	struct leaf_info *li;
-	unsigned char plen = KEYLENGTH;
+	unsigned char slen = 0;
 
 	hlist_for_each_entry_safe(li, tmp, &l->list, hlist) {
 		found += trie_flush_list(&li->falh);
@@ -1625,10 +1621,10 @@ static int trie_flush_leaf(struct tnode *l)
 			continue;
 		}
 
-		plen = li->plen;
+		slen = li->slen;
 	}
 
-	l->slen = KEYLENGTH - plen;
+	l->slen = slen;
 
 	return found;
 }
@@ -1739,7 +1735,7 @@ void fib_free_table(struct fib_table *tb)
 	kfree(tb);
 }
 
-static int fn_trie_dump_fa(t_key key, int plen, struct hlist_head *fah,
+static int fn_trie_dump_fa(t_key key, int slen, struct hlist_head *fah,
 			   struct fib_table *tb,
 			   struct sk_buff *skb, struct netlink_callback *cb)
 {
@@ -1764,7 +1760,7 @@ static int fn_trie_dump_fa(t_key key, int plen, struct hlist_head *fah,
 				  tb->tb_id,
 				  fa->fa_type,
 				  xkey,
-				  plen,
+				  KEYLENGTH - slen,
 				  fa->fa_tos,
 				  fa->fa_info, NLM_F_MULTI) < 0) {
 			cb->args[5] = i;
@@ -1798,7 +1794,7 @@ static int fn_trie_dump_leaf(struct tnode *l, struct fib_table *tb,
 		if (hlist_empty(&li->falh))
 			continue;
 
-		if (fn_trie_dump_fa(l->key, li->plen, &li->falh, tb, skb, cb) < 0) {
+		if (fn_trie_dump_fa(l->key, li->slen, &li->falh, tb, skb, cb) < 0) {
 			cb->args[4] = i;
 			return -1;
 		}
@@ -2284,7 +2280,8 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
 				char buf1[32], buf2[32];
 
 				seq_indent(seq, iter->depth+1);
-				seq_printf(seq, "  /%d %s %s", li->plen,
+				seq_printf(seq, "  /%zu %s %s",
+					   KEYLENGTH - li->slen,
 					   rtn_scope(buf1, sizeof(buf1),
 						     fa->fa_info->fib_scope),
 					   rtn_type(buf2, sizeof(buf2),
@@ -2434,7 +2431,7 @@ static int fib_route_seq_show(struct seq_file *seq, void *v)
 		struct fib_alias *fa;
 		__be32 mask, prefix;
 
-		mask = inet_make_mask(li->plen);
+		mask = inet_make_mask(KEYLENGTH - li->slen);
 		prefix = htonl(l->key);
 
 		hlist_for_each_entry_rcu(fa, &li->falh, fa_list) {

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 03/29] fib_trie: Add slen to fib alias
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
  2015-02-24 20:48 ` [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list Alexander Duyck
  2015-02-24 20:48 ` [RFC PATCH 02/29] fib_trie: Replace plen with slen in leaf_info Alexander Duyck
@ 2015-02-24 20:48 ` Alexander Duyck
  2015-02-24 20:48 ` [RFC PATCH 04/29] fib_trie: Remove leaf_info Alexander Duyck
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:48 UTC (permalink / raw
  To: netdev

Make use of an empty spot in the alias to store the suffix length so that
we don't need to pull that information from the leaf_info structure.

This patch also makes a slight change to the user statistics.  Instead of
incrementing semantic_match_miss once per leaf_info miss we now just
increment it once per leaf if a match was not found.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_lookup.h |    1 +
 net/ipv4/fib_trie.c   |   36 +++++++++++++++++-------------------
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index 3cd444f..ae2e6ee 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -11,6 +11,7 @@ struct fib_alias {
 	u8			fa_tos;
 	u8			fa_type;
 	u8			fa_state;
+	u8			fa_slen;
 	struct rcu_head		rcu;
 };
 
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index e04f102..1c35261 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1221,6 +1221,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 			new_fa->fa_type = cfg->fc_type;
 			state = fa->fa_state;
 			new_fa->fa_state = state & ~FA_S_ACCESSED;
+			new_fa->fa_slen = fa->fa_slen;
 
 			hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);
 			alias_free_mem_rcu(fa);
@@ -1256,10 +1257,9 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	new_fa->fa_tos = tos;
 	new_fa->fa_type = cfg->fc_type;
 	new_fa->fa_state = 0;
-	/*
-	 * Insert new entry to the list.
-	 */
+	new_fa->fa_slen = KEYLENGTH - plen;
 
+	/* Insert new entry to the list. */
 	if (!fa_head) {
 		fa_head = fib_insert_node(t, key, plen);
 		if (unlikely(!fa_head)) {
@@ -1420,14 +1420,13 @@ found:
 	hlist_for_each_entry_rcu(li, &n->list, hlist) {
 		struct fib_alias *fa;
 
-		if (((key ^ n->key) >> li->slen) &&
-		    (li->slen != KEYLENGTH))
-			continue;
-
 		hlist_for_each_entry_rcu(fa, &li->falh, fa_list) {
 			struct fib_info *fi = fa->fa_info;
 			int nhsel, err;
 
+			if (((key ^ n->key) >> fa->fa_slen) &&
+			    (fa->fa_slen != KEYLENGTH))
+				continue;
 			if (fa->fa_tos && fa->fa_tos != flp->flowi4_tos)
 				continue;
 			if (fi->fib_dead)
@@ -1455,7 +1454,7 @@ found:
 				if (!(fib_flags & FIB_LOOKUP_NOREF))
 					atomic_inc(&fi->fib_clntref);
 
-				res->prefixlen = KEYLENGTH - li->slen;
+				res->prefixlen = KEYLENGTH - fa->fa_slen;
 				res->nh_sel = nhsel;
 				res->type = fa->fa_type;
 				res->scope = fi->fib_scope;
@@ -1468,11 +1467,10 @@ found:
 				return err;
 			}
 		}
-
+	}
 #ifdef CONFIG_IP_FIB_TRIE_STATS
-		this_cpu_inc(stats->semantic_match_miss);
+	this_cpu_inc(stats->semantic_match_miss);
 #endif
-	}
 	goto backtrace;
 }
 EXPORT_SYMBOL_GPL(fib_table_lookup);
@@ -1735,7 +1733,7 @@ void fib_free_table(struct fib_table *tb)
 	kfree(tb);
 }
 
-static int fn_trie_dump_fa(t_key key, int slen, struct hlist_head *fah,
+static int fn_trie_dump_fa(t_key key, struct hlist_head *fah,
 			   struct fib_table *tb,
 			   struct sk_buff *skb, struct netlink_callback *cb)
 {
@@ -1760,7 +1758,7 @@ static int fn_trie_dump_fa(t_key key, int slen, struct hlist_head *fah,
 				  tb->tb_id,
 				  fa->fa_type,
 				  xkey,
-				  KEYLENGTH - slen,
+				  KEYLENGTH - fa->fa_slen,
 				  fa->fa_tos,
 				  fa->fa_info, NLM_F_MULTI) < 0) {
 			cb->args[5] = i;
@@ -1794,7 +1792,7 @@ static int fn_trie_dump_leaf(struct tnode *l, struct fib_table *tb,
 		if (hlist_empty(&li->falh))
 			continue;
 
-		if (fn_trie_dump_fa(l->key, li->slen, &li->falh, tb, skb, cb) < 0) {
+		if (fn_trie_dump_fa(l->key, &li->falh, tb, skb, cb) < 0) {
 			cb->args[4] = i;
 			return -1;
 		}
@@ -2281,7 +2279,7 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
 
 				seq_indent(seq, iter->depth+1);
 				seq_printf(seq, "  /%zu %s %s",
-					   KEYLENGTH - li->slen,
+					   KEYLENGTH - fa->fa_slen,
 					   rtn_scope(buf1, sizeof(buf1),
 						     fa->fa_info->fib_scope),
 					   rtn_type(buf2, sizeof(buf2),
@@ -2419,6 +2417,7 @@ static int fib_route_seq_show(struct seq_file *seq, void *v)
 {
 	struct tnode *l = v;
 	struct leaf_info *li;
+	__be32 prefix;
 
 	if (v == SEQ_START_TOKEN) {
 		seq_printf(seq, "%-127s\n", "Iface\tDestination\tGateway "
@@ -2427,15 +2426,14 @@ static int fib_route_seq_show(struct seq_file *seq, void *v)
 		return 0;
 	}
 
+	prefix = htonl(l->key);
+
 	hlist_for_each_entry_rcu(li, &l->list, hlist) {
 		struct fib_alias *fa;
-		__be32 mask, prefix;
-
-		mask = inet_make_mask(KEYLENGTH - li->slen);
-		prefix = htonl(l->key);
 
 		hlist_for_each_entry_rcu(fa, &li->falh, fa_list) {
 			const struct fib_info *fi = fa->fa_info;
+			__be32 mask = inet_make_mask(KEYLENGTH - fa->fa_slen);
 			unsigned int flags = fib_flag_trans(fa->fa_type, mask, fi);
 
 			if (fa->fa_type == RTN_BROADCAST

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 04/29] fib_trie: Remove leaf_info
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (2 preceding siblings ...)
  2015-02-24 20:48 ` [RFC PATCH 03/29] fib_trie: Add slen to fib alias Alexander Duyck
@ 2015-02-24 20:48 ` Alexander Duyck
  2015-02-24 20:48 ` [RFC PATCH 05/29] fib_trie: Only resize N/2 times instead N * log(N) times in fib_table_flush Alexander Duyck
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:48 UTC (permalink / raw
  To: netdev

At this point the leaf_info hash is redundant.  By adding the suffix length
to the fib_alias hash list we no longer have need of leaf_info as we can
determin the prefix length from fa_slen.  So we can compress things by
dropping the leaf_info structure from fib_trie and instead directly connect
the leaves to the fib_alias hash list.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |  458 +++++++++++++++++----------------------------------
 1 file changed, 155 insertions(+), 303 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 1c35261..8896617 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -108,17 +108,10 @@ struct tnode {
 			struct tnode __rcu *child[0];
 		};
 		/* This list pointer if valid if bits == 0 (LEAF) */
-		struct hlist_head list;
+		struct hlist_head leaf;
 	};
 };
 
-struct leaf_info {
-	struct hlist_node hlist;
-	unsigned char slen;
-	struct hlist_head falh;
-	struct rcu_head rcu;
-};
-
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 struct trie_use_stats {
 	unsigned int gets;
@@ -289,11 +282,6 @@ static void __node_free_rcu(struct rcu_head *head)
 
 #define node_free(n) call_rcu(&n->rcu, __node_free_rcu)
 
-static inline void free_leaf_info(struct leaf_info *leaf)
-{
-	kfree_rcu(leaf, rcu);
-}
-
 static struct tnode *tnode_alloc(size_t size)
 {
 	if (size <= PAGE_SIZE)
@@ -327,21 +315,11 @@ static struct tnode *leaf_new(t_key key)
 		/* set bits to 0 indicating we are not a tnode */
 		l->bits = 0;
 
-		INIT_HLIST_HEAD(&l->list);
+		INIT_HLIST_HEAD(&l->leaf);
 	}
 	return l;
 }
 
-static struct leaf_info *leaf_info_new(int plen)
-{
-	struct leaf_info *li = kmalloc(sizeof(struct leaf_info),  GFP_KERNEL);
-	if (li) {
-		li->slen = KEYLENGTH - plen;
-		INIT_HLIST_HEAD(&li->falh);
-	}
-	return li;
-}
-
 static struct tnode *tnode_new(t_key key, int pos, int bits)
 {
 	size_t sz = offsetof(struct tnode, child[1ul << bits]);
@@ -864,32 +842,6 @@ static void resize(struct trie *t, struct tnode *tn)
 	}
 }
 
-/* readside must use rcu_read_lock currently dump routines
- via get_fa_head and dump */
-
-static struct leaf_info *find_leaf_info(struct tnode *l, int plen)
-{
-	struct hlist_head *head = &l->list;
-	struct leaf_info *li;
-	int slen = KEYLENGTH - plen;
-
-	hlist_for_each_entry_rcu(li, head, hlist)
-		if (li->slen == slen)
-			return li;
-
-	return NULL;
-}
-
-static inline struct hlist_head *get_fa_head(struct tnode *l, int plen)
-{
-	struct leaf_info *li = find_leaf_info(l, plen);
-
-	if (!li)
-		return NULL;
-
-	return &li->falh;
-}
-
 static void leaf_pull_suffix(struct tnode *l)
 {
 	struct tnode *tp = node_parent(l);
@@ -914,43 +866,47 @@ static void leaf_push_suffix(struct tnode *l)
 	}
 }
 
-static void remove_leaf_info(struct tnode *l, struct leaf_info *old)
+static void fib_remove_alias(struct tnode *l, struct fib_alias *old)
 {
 	/* record the location of the previous list_info entry */
-	struct hlist_node **pprev = old->hlist.pprev;
-	struct leaf_info *li = hlist_entry(pprev, typeof(*li), hlist.next);
+	struct hlist_node **pprev = old->fa_list.pprev;
+	struct fib_alias *fa = hlist_entry(pprev, typeof(*fa), fa_list.next);
 
-	/* remove the leaf info from the list */
-	hlist_del_rcu(&old->hlist);
+	/* remove the fib_alias from the list */
+	hlist_del_rcu(&old->fa_list);
 
-	/* only access li if it is pointing at the last valid hlist_node */
-	if (hlist_empty(&l->list) || (*pprev))
+	/* only access fa if it is pointing at the last valid hlist_node */
+	if (hlist_empty(&l->leaf) || (*pprev))
 		return;
 
 	/* update the trie with the latest suffix length */
-	l->slen = li->slen;
+	l->slen = fa->fa_slen;
 	leaf_pull_suffix(l);
 }
 
-static void insert_leaf_info(struct tnode *l, struct leaf_info *new)
+static void fib_insert_alias(struct tnode *l, struct fib_alias *fa,
+			     struct fib_alias *new)
 {
-	struct hlist_head *head = &l->list;
-	struct leaf_info *li, *last = NULL;
+	struct hlist_head *head = &l->leaf;
 
-	hlist_for_each_entry(li, head, hlist) {
-		if (new->slen < li->slen)
-			break;
-		last = li;
+	if (!fa) {
+		struct fib_alias *last;
+
+		hlist_for_each_entry(last, head, fa_list) {
+			if (new->fa_slen < last->fa_slen)
+				break;
+			fa = last;
+		}
 	}
 
-	if (last)
-		hlist_add_behind_rcu(&new->hlist, &last->hlist);
+	if (fa)
+		hlist_add_behind_rcu(&new->fa_list, &fa->fa_list);
 	else
-		hlist_add_head_rcu(&new->hlist, head);
+		hlist_add_head_rcu(&new->fa_list, head);
 
 	/* if we added to the tail node then we need to update slen */
-	if (l->slen < new->slen) {
-		l->slen = new->slen;
+	if (l->slen < new->fa_slen) {
+		l->slen = new->fa_slen;
 		leaf_push_suffix(l);
 	}
 }
@@ -989,8 +945,8 @@ static struct tnode *fib_find_node(struct trie *t, u32 key)
 /* Return the first fib alias matching TOS with
  * priority less than or equal to PRIO.
  */
-static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 tos,
-					u32 prio)
+static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen,
+					u8 tos, u32 prio)
 {
 	struct fib_alias *fa;
 
@@ -998,6 +954,10 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 tos,
 		return NULL;
 
 	hlist_for_each_entry(fa, fah, fa_list) {
+		if (fa->fa_slen < slen)
+			continue;
+		if (fa->fa_slen != slen)
+			break;
 		if (fa->fa_tos > tos)
 			continue;
 		if (fa->fa_info->fib_priority >= prio || fa->fa_tos < tos)
@@ -1023,16 +983,9 @@ static void trie_rebalance(struct trie *t, struct tnode *tn)
 
 /* only used from updater-side */
 
-static struct hlist_head *fib_insert_node(struct trie *t, u32 key, int plen)
+static struct tnode *fib_insert_node(struct trie *t, u32 key, int plen)
 {
-	struct hlist_head *fa_head = NULL;
 	struct tnode *l, *n, *tp = NULL;
-	struct leaf_info *li;
-
-	li = leaf_info_new(plen);
-	if (!li)
-		return NULL;
-	fa_head = &li->falh;
 
 	n = rtnl_dereference(t->trie);
 
@@ -1063,8 +1016,7 @@ static struct hlist_head *fib_insert_node(struct trie *t, u32 key, int plen)
 		/* we have found a leaf. Prefixes have already been compared */
 		if (IS_LEAF(n)) {
 			/* Case 1: n is a leaf, and prefixes match*/
-			insert_leaf_info(n, li);
-			return fa_head;
+			return n;
 		}
 
 		tp = n;
@@ -1072,12 +1024,8 @@ static struct hlist_head *fib_insert_node(struct trie *t, u32 key, int plen)
 	}
 
 	l = leaf_new(key);
-	if (!l) {
-		free_leaf_info(li);
+	if (!l)
 		return NULL;
-	}
-
-	insert_leaf_info(l, li);
 
 	/* Case 2: n is a LEAF or a TNODE and the key doesn't match.
 	 *
@@ -1090,7 +1038,6 @@ static struct hlist_head *fib_insert_node(struct trie *t, u32 key, int plen)
 
 		tn = tnode_new(key, __fls(key ^ n->key), 1);
 		if (!tn) {
-			free_leaf_info(li);
 			node_free(l);
 			return NULL;
 		}
@@ -1116,7 +1063,7 @@ static struct hlist_head *fib_insert_node(struct trie *t, u32 key, int plen)
 		rcu_assign_pointer(t->trie, l);
 	}
 
-	return fa_head;
+	return l;
 }
 
 /*
@@ -1126,9 +1073,9 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *) tb->tb_data;
 	struct fib_alias *fa, *new_fa;
-	struct hlist_head *fa_head = NULL;
 	struct fib_info *fi;
-	int plen = cfg->fc_dst_len;
+	u8 plen = cfg->fc_dst_len;
+	u8 slen = KEYLENGTH - plen;
 	u8 tos = cfg->fc_tos;
 	u32 key, mask;
 	int err;
@@ -1146,8 +1093,6 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	if (key & ~mask)
 		return -EINVAL;
 
-	key = key & mask;
-
 	fi = fib_create_info(cfg);
 	if (IS_ERR(fi)) {
 		err = PTR_ERR(fi);
@@ -1155,12 +1100,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	}
 
 	l = fib_find_node(t, key);
-	fa = NULL;
-
-	if (l) {
-		fa_head = get_fa_head(l, plen);
-		fa = fib_find_alias(fa_head, tos, fi->fib_priority);
-	}
+	fa = l ? fib_find_alias(&l->leaf, slen, tos, fi->fib_priority) : NULL;
 
 	/* Now fa, if non-NULL, points to the first fib alias
 	 * with the same keys [prefix,tos,priority], if such key already
@@ -1189,7 +1129,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 		fa_match = NULL;
 		fa_first = fa;
 		hlist_for_each_entry_from(fa, fa_list) {
-			if (fa->fa_tos != tos)
+			if ((fa->fa_slen != slen) || (fa->fa_tos != tos))
 				break;
 			if (fa->fa_info->fib_priority != fi->fib_priority)
 				break;
@@ -1257,12 +1197,12 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	new_fa->fa_tos = tos;
 	new_fa->fa_type = cfg->fc_type;
 	new_fa->fa_state = 0;
-	new_fa->fa_slen = KEYLENGTH - plen;
+	new_fa->fa_slen = slen;
 
 	/* Insert new entry to the list. */
-	if (!fa_head) {
-		fa_head = fib_insert_node(t, key, plen);
-		if (unlikely(!fa_head)) {
+	if (!l) {
+		l = fib_insert_node(t, key, plen);
+		if (unlikely(!l)) {
 			err = -ENOMEM;
 			goto out_free_new_fa;
 		}
@@ -1271,17 +1211,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	if (!plen)
 		tb->tb_num_default++;
 
-	if (!fa) {
-		struct fib_alias *last;
-
-		hlist_for_each_entry(last, fa_head, fa_list)
-			fa = last;
-	}
-
-	if (fa)
-		hlist_add_behind_rcu(&new_fa->fa_list, &fa->fa_list);
-	else
-		hlist_add_head_rcu(&new_fa->fa_list, fa_head);
+	fib_insert_alias(l, fa, new_fa);
 
 	rt_cache_flush(cfg->fc_nlinfo.nl_net);
 	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, tb->tb_id,
@@ -1314,7 +1244,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 #endif
 	const t_key key = ntohl(flp->daddr);
 	struct tnode *n, *pn;
-	struct leaf_info *li;
+	struct fib_alias *fa;
 	t_key cindex;
 
 	n = rcu_dereference(t->trie);
@@ -1417,55 +1347,51 @@ backtrace:
 
 found:
 	/* Step 3: Process the leaf, if that fails fall back to backtracing */
-	hlist_for_each_entry_rcu(li, &n->list, hlist) {
-		struct fib_alias *fa;
-
-		hlist_for_each_entry_rcu(fa, &li->falh, fa_list) {
-			struct fib_info *fi = fa->fa_info;
-			int nhsel, err;
+	hlist_for_each_entry_rcu(fa, &n->leaf, fa_list) {
+		struct fib_info *fi = fa->fa_info;
+		int nhsel, err;
 
-			if (((key ^ n->key) >> fa->fa_slen) &&
-			    (fa->fa_slen != KEYLENGTH))
-				continue;
-			if (fa->fa_tos && fa->fa_tos != flp->flowi4_tos)
-				continue;
-			if (fi->fib_dead)
-				continue;
-			if (fa->fa_info->fib_scope < flp->flowi4_scope)
-				continue;
-			fib_alias_accessed(fa);
-			err = fib_props[fa->fa_type].error;
-			if (unlikely(err < 0)) {
+		if (((key ^ n->key) >> fa->fa_slen) &&
+		    (fa->fa_slen != KEYLENGTH))
+			continue;
+		if (fa->fa_tos && fa->fa_tos != flp->flowi4_tos)
+			continue;
+		if (fi->fib_dead)
+			continue;
+		if (fa->fa_info->fib_scope < flp->flowi4_scope)
+			continue;
+		fib_alias_accessed(fa);
+		err = fib_props[fa->fa_type].error;
+		if (unlikely(err < 0)) {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
-				this_cpu_inc(stats->semantic_match_passed);
+			this_cpu_inc(stats->semantic_match_passed);
 #endif
-				return err;
-			}
-			if (fi->fib_flags & RTNH_F_DEAD)
+			return err;
+		}
+		if (fi->fib_flags & RTNH_F_DEAD)
+			continue;
+		for (nhsel = 0; nhsel < fi->fib_nhs; nhsel++) {
+			const struct fib_nh *nh = &fi->fib_nh[nhsel];
+
+			if (nh->nh_flags & RTNH_F_DEAD)
 				continue;
-			for (nhsel = 0; nhsel < fi->fib_nhs; nhsel++) {
-				const struct fib_nh *nh = &fi->fib_nh[nhsel];
-
-				if (nh->nh_flags & RTNH_F_DEAD)
-					continue;
-				if (flp->flowi4_oif && flp->flowi4_oif != nh->nh_oif)
-					continue;
-
-				if (!(fib_flags & FIB_LOOKUP_NOREF))
-					atomic_inc(&fi->fib_clntref);
-
-				res->prefixlen = KEYLENGTH - fa->fa_slen;
-				res->nh_sel = nhsel;
-				res->type = fa->fa_type;
-				res->scope = fi->fib_scope;
-				res->fi = fi;
-				res->table = tb;
-				res->fa_head = &li->falh;
+			if (flp->flowi4_oif && flp->flowi4_oif != nh->nh_oif)
+				continue;
+
+			if (!(fib_flags & FIB_LOOKUP_NOREF))
+				atomic_inc(&fi->fib_clntref);
+
+			res->prefixlen = KEYLENGTH - fa->fa_slen;
+			res->nh_sel = nhsel;
+			res->type = fa->fa_type;
+			res->scope = fi->fib_scope;
+			res->fi = fi;
+			res->table = tb;
+			res->fa_head = &n->leaf;
 #ifdef CONFIG_IP_FIB_TRIE_STATS
-				this_cpu_inc(stats->semantic_match_passed);
+			this_cpu_inc(stats->semantic_match_passed);
 #endif
-				return err;
-			}
+			return err;
 		}
 	}
 #ifdef CONFIG_IP_FIB_TRIE_STATS
@@ -1500,15 +1426,14 @@ static void trie_leaf_remove(struct trie *t, struct tnode *l)
 int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *) tb->tb_data;
-	u32 key, mask;
-	int plen = cfg->fc_dst_len;
-	u8 tos = cfg->fc_tos;
 	struct fib_alias *fa, *fa_to_delete;
-	struct hlist_head *fa_head;
+	u8 plen = cfg->fc_dst_len;
+	u8 tos = cfg->fc_tos;
+	u8 slen = KEYLENGTH - plen;
 	struct tnode *l;
-	struct leaf_info *li;
+	u32 key, mask;
 
-	if (plen > 32)
+	if (plen > KEYLENGTH)
 		return -EINVAL;
 
 	key = ntohl(cfg->fc_dst);
@@ -1517,19 +1442,11 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	if (key & ~mask)
 		return -EINVAL;
 
-	key = key & mask;
 	l = fib_find_node(t, key);
-
 	if (!l)
 		return -ESRCH;
 
-	li = find_leaf_info(l, plen);
-
-	if (!li)
-		return -ESRCH;
-
-	fa_head = &li->falh;
-	fa = fib_find_alias(fa_head, tos, 0);
+	fa = fib_find_alias(&l->leaf, slen, tos, 0);
 
 	if (!fa)
 		return -ESRCH;
@@ -1540,7 +1457,7 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	hlist_for_each_entry_from(fa, fa_list) {
 		struct fib_info *fi = fa->fa_info;
 
-		if (fa->fa_tos != tos)
+		if ((fa->fa_slen != slen) || (fa->fa_tos != tos))
 			break;
 
 		if ((!cfg->fc_type || fa->fa_type == cfg->fc_type) &&
@@ -1563,17 +1480,12 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
 
-	hlist_del_rcu(&fa->fa_list);
+	fib_remove_alias(l, fa);
 
 	if (!plen)
 		tb->tb_num_default--;
 
-	if (hlist_empty(fa_head)) {
-		remove_leaf_info(l, li);
-		free_leaf_info(li);
-	}
-
-	if (hlist_empty(&l->list))
+	if (hlist_empty(&l->leaf))
 		trie_leaf_remove(t, l);
 
 	if (fa->fa_state & FA_S_ACCESSED)
@@ -1584,13 +1496,14 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	return 0;
 }
 
-static int trie_flush_list(struct hlist_head *head)
+static int trie_flush_leaf(struct tnode *l)
 {
 	struct hlist_node *tmp;
+	unsigned char slen = 0;
 	struct fib_alias *fa;
 	int found = 0;
 
-	hlist_for_each_entry_safe(fa, tmp, head, fa_list) {
+	hlist_for_each_entry_safe(fa, tmp, &l->leaf, fa_list) {
 		struct fib_info *fi = fa->fa_info;
 
 		if (fi && (fi->fib_flags & RTNH_F_DEAD)) {
@@ -1598,28 +1511,11 @@ static int trie_flush_list(struct hlist_head *head)
 			fib_release_info(fa->fa_info);
 			alias_free_mem_rcu(fa);
 			found++;
-		}
-	}
-	return found;
-}
-
-static int trie_flush_leaf(struct tnode *l)
-{
-	int found = 0;
-	struct hlist_node *tmp;
-	struct leaf_info *li;
-	unsigned char slen = 0;
 
-	hlist_for_each_entry_safe(li, tmp, &l->list, hlist) {
-		found += trie_flush_list(&li->falh);
-
-		if (hlist_empty(&li->falh)) {
-			hlist_del_rcu(&li->hlist);
-			free_leaf_info(li);
 			continue;
 		}
 
-		slen = li->slen;
+		slen = fa->fa_slen;
 	}
 
 	l->slen = slen;
@@ -1627,8 +1523,7 @@ static int trie_flush_leaf(struct tnode *l)
 	return found;
 }
 
-/*
- * Scan for the next right leaf starting at node p->child[idx]
+/* Scan for the next right leaf starting at node p->child[idx]
  * Since we have back pointer, no recursion necessary.
  */
 static struct tnode *leaf_walk_rcu(struct tnode *p, struct tnode *c)
@@ -1703,7 +1598,7 @@ int fib_table_flush(struct fib_table *tb)
 		found += trie_flush_leaf(l);
 
 		if (ll) {
-			if (hlist_empty(&ll->list))
+			if (hlist_empty(&ll->leaf))
 				trie_leaf_remove(t, ll);
 			else
 				leaf_pull_suffix(ll);
@@ -1713,7 +1608,7 @@ int fib_table_flush(struct fib_table *tb)
 	}
 
 	if (ll) {
-		if (hlist_empty(&ll->list))
+		if (hlist_empty(&ll->leaf))
 			trie_leaf_remove(t, ll);
 		else
 			leaf_pull_suffix(ll);
@@ -1733,20 +1628,18 @@ void fib_free_table(struct fib_table *tb)
 	kfree(tb);
 }
 
-static int fn_trie_dump_fa(t_key key, struct hlist_head *fah,
-			   struct fib_table *tb,
-			   struct sk_buff *skb, struct netlink_callback *cb)
+static int fn_trie_dump_leaf(struct tnode *l, struct fib_table *tb,
+			     struct sk_buff *skb, struct netlink_callback *cb)
 {
-	int i, s_i;
+	__be32 xkey = htonl(l->key);
 	struct fib_alias *fa;
-	__be32 xkey = htonl(key);
+	int i, s_i;
 
-	s_i = cb->args[5];
+	s_i = cb->args[4];
 	i = 0;
 
 	/* rcu_read_lock is hold by caller */
-
-	hlist_for_each_entry_rcu(fa, fah, fa_list) {
+	hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
 		if (i < s_i) {
 			i++;
 			continue;
@@ -1761,38 +1654,6 @@ static int fn_trie_dump_fa(t_key key, struct hlist_head *fah,
 				  KEYLENGTH - fa->fa_slen,
 				  fa->fa_tos,
 				  fa->fa_info, NLM_F_MULTI) < 0) {
-			cb->args[5] = i;
-			return -1;
-		}
-		i++;
-	}
-	cb->args[5] = i;
-	return skb->len;
-}
-
-static int fn_trie_dump_leaf(struct tnode *l, struct fib_table *tb,
-			struct sk_buff *skb, struct netlink_callback *cb)
-{
-	struct leaf_info *li;
-	int i, s_i;
-
-	s_i = cb->args[4];
-	i = 0;
-
-	/* rcu_read_lock is hold by caller */
-	hlist_for_each_entry_rcu(li, &l->list, hlist) {
-		if (i < s_i) {
-			i++;
-			continue;
-		}
-
-		if (i > s_i)
-			cb->args[5] = 0;
-
-		if (hlist_empty(&li->falh))
-			continue;
-
-		if (fn_trie_dump_fa(l->key, &li->falh, tb, skb, cb) < 0) {
 			cb->args[4] = i;
 			return -1;
 		}
@@ -1852,8 +1713,7 @@ void __init fib_trie_init(void)
 					  0, SLAB_PANIC, NULL);
 
 	trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
-					   max(sizeof(struct tnode),
-					       sizeof(struct leaf_info)),
+					   sizeof(struct tnode),
 					   0, SLAB_PANIC, NULL);
 }
 
@@ -1975,14 +1835,14 @@ static void trie_collect_stats(struct trie *t, struct trie_stat *s)
 	rcu_read_lock();
 	for (n = fib_trie_get_first(&iter, t); n; n = fib_trie_get_next(&iter)) {
 		if (IS_LEAF(n)) {
-			struct leaf_info *li;
+			struct fib_alias *fa;
 
 			s->leaves++;
 			s->totdepth += iter.depth;
 			if (iter.depth > s->maxdepth)
 				s->maxdepth = iter.depth;
 
-			hlist_for_each_entry_rcu(li, &n->list, hlist)
+			hlist_for_each_entry_rcu(fa, &n->leaf, fa_list)
 				++s->prefixes;
 		} else {
 			s->tnodes++;
@@ -2014,7 +1874,7 @@ static void trie_show_stats(struct seq_file *seq, struct trie_stat *stat)
 	bytes = sizeof(struct tnode) * stat->leaves;
 
 	seq_printf(seq, "\tPrefixes:       %u\n", stat->prefixes);
-	bytes += sizeof(struct leaf_info) * stat->prefixes;
+	bytes += sizeof(struct fib_alias) * stat->prefixes;
 
 	seq_printf(seq, "\tInternal nodes: %u\n\t", stat->tnodes);
 	bytes += sizeof(struct tnode) * stat->tnodes;
@@ -2265,29 +2125,25 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
 			   &prf, KEYLENGTH - n->pos - n->bits, n->bits,
 			   n->full_children, n->empty_children);
 	} else {
-		struct leaf_info *li;
 		__be32 val = htonl(n->key);
+		struct fib_alias *fa;
 
 		seq_indent(seq, iter->depth);
 		seq_printf(seq, "  |-- %pI4\n", &val);
 
-		hlist_for_each_entry_rcu(li, &n->list, hlist) {
-			struct fib_alias *fa;
-
-			hlist_for_each_entry_rcu(fa, &li->falh, fa_list) {
-				char buf1[32], buf2[32];
-
-				seq_indent(seq, iter->depth+1);
-				seq_printf(seq, "  /%zu %s %s",
-					   KEYLENGTH - fa->fa_slen,
-					   rtn_scope(buf1, sizeof(buf1),
-						     fa->fa_info->fib_scope),
-					   rtn_type(buf2, sizeof(buf2),
-						    fa->fa_type));
-				if (fa->fa_tos)
-					seq_printf(seq, " tos=%d", fa->fa_tos);
-				seq_putc(seq, '\n');
-			}
+		hlist_for_each_entry_rcu(fa, &n->leaf, fa_list) {
+			char buf1[32], buf2[32];
+
+			seq_indent(seq, iter->depth + 1);
+			seq_printf(seq, "  /%zu %s %s",
+				   KEYLENGTH - fa->fa_slen,
+				   rtn_scope(buf1, sizeof(buf1),
+					     fa->fa_info->fib_scope),
+				   rtn_type(buf2, sizeof(buf2),
+					    fa->fa_type));
+			if (fa->fa_tos)
+				seq_printf(seq, " tos=%d", fa->fa_tos);
+			seq_putc(seq, '\n');
 		}
 	}
 
@@ -2415,8 +2271,8 @@ static unsigned int fib_flag_trans(int type, __be32 mask, const struct fib_info
  */
 static int fib_route_seq_show(struct seq_file *seq, void *v)
 {
+	struct fib_alias *fa;
 	struct tnode *l = v;
-	struct leaf_info *li;
 	__be32 prefix;
 
 	if (v == SEQ_START_TOKEN) {
@@ -2428,42 +2284,38 @@ static int fib_route_seq_show(struct seq_file *seq, void *v)
 
 	prefix = htonl(l->key);
 
-	hlist_for_each_entry_rcu(li, &l->list, hlist) {
-		struct fib_alias *fa;
-
-		hlist_for_each_entry_rcu(fa, &li->falh, fa_list) {
-			const struct fib_info *fi = fa->fa_info;
-			__be32 mask = inet_make_mask(KEYLENGTH - fa->fa_slen);
-			unsigned int flags = fib_flag_trans(fa->fa_type, mask, fi);
+	hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
+		const struct fib_info *fi = fa->fa_info;
+		__be32 mask = inet_make_mask(KEYLENGTH - fa->fa_slen);
+		unsigned int flags = fib_flag_trans(fa->fa_type, mask, fi);
 
-			if (fa->fa_type == RTN_BROADCAST
-			    || fa->fa_type == RTN_MULTICAST)
-				continue;
+		if ((fa->fa_type == RTN_BROADCAST) ||
+		    (fa->fa_type == RTN_MULTICAST))
+			continue;
 
-			seq_setwidth(seq, 127);
-
-			if (fi)
-				seq_printf(seq,
-					 "%s\t%08X\t%08X\t%04X\t%d\t%u\t"
-					 "%d\t%08X\t%d\t%u\t%u",
-					 fi->fib_dev ? fi->fib_dev->name : "*",
-					 prefix,
-					 fi->fib_nh->nh_gw, flags, 0, 0,
-					 fi->fib_priority,
-					 mask,
-					 (fi->fib_advmss ?
-					  fi->fib_advmss + 40 : 0),
-					 fi->fib_window,
-					 fi->fib_rtt >> 3);
-			else
-				seq_printf(seq,
-					 "*\t%08X\t%08X\t%04X\t%d\t%u\t"
-					 "%d\t%08X\t%d\t%u\t%u",
-					 prefix, 0, flags, 0, 0, 0,
-					 mask, 0, 0, 0);
+		seq_setwidth(seq, 127);
+
+		if (fi)
+			seq_printf(seq,
+				   "%s\t%08X\t%08X\t%04X\t%d\t%u\t"
+				   "%d\t%08X\t%d\t%u\t%u",
+				   fi->fib_dev ? fi->fib_dev->name : "*",
+				   prefix,
+				   fi->fib_nh->nh_gw, flags, 0, 0,
+				   fi->fib_priority,
+				   mask,
+				   (fi->fib_advmss ?
+				    fi->fib_advmss + 40 : 0),
+				   fi->fib_window,
+				   fi->fib_rtt >> 3);
+		else
+			seq_printf(seq,
+				   "*\t%08X\t%08X\t%04X\t%d\t%u\t"
+				   "%d\t%08X\t%d\t%u\t%u",
+				   prefix, 0, flags, 0, 0, 0,
+				   mask, 0, 0, 0);
 
-			seq_pad(seq, '\n');
-		}
+		seq_pad(seq, '\n');
 	}
 
 	return 0;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 05/29] fib_trie: Only resize N/2 times instead N * log(N) times in fib_table_flush
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (3 preceding siblings ...)
  2015-02-24 20:48 ` [RFC PATCH 04/29] fib_trie: Remove leaf_info Alexander Duyck
@ 2015-02-24 20:48 ` Alexander Duyck
  2015-02-24 20:48 ` [RFC PATCH 06/29] fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf Alexander Duyck
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:48 UTC (permalink / raw
  To: netdev

This change makes it so that we only call resize on the tnodes, instead of
from each of the leaves.  By doing this we can significantly reduce the
time spent flushing the trie as we don't call trie rebalance and have it
walk up the trie with each removed leaf.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |  141 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 78 insertions(+), 63 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 8896617..02e5126 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1402,25 +1402,6 @@ found:
 EXPORT_SYMBOL_GPL(fib_table_lookup);
 
 /*
- * Remove the leaf and return parent.
- */
-static void trie_leaf_remove(struct trie *t, struct tnode *l)
-{
-	struct tnode *tp = node_parent(l);
-
-	pr_debug("entering trie_leaf_remove(%p)\n", l);
-
-	if (tp) {
-		put_child(tp, get_index(l->key, tp), NULL);
-		trie_rebalance(t, tp);
-	} else {
-		RCU_INIT_POINTER(t->trie, NULL);
-	}
-
-	node_free(l);
-}
-
-/*
  * Caller must hold RTNL.
  */
 int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
@@ -1485,8 +1466,18 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	if (!plen)
 		tb->tb_num_default--;
 
-	if (hlist_empty(&l->leaf))
-		trie_leaf_remove(t, l);
+	if (hlist_empty(&l->leaf)) {
+		struct tnode *tp = node_parent(l);
+
+		if (tp) {
+			put_child(tp, get_index(l->key, tp), NULL);
+			trie_rebalance(t, tp);
+		} else {
+			RCU_INIT_POINTER(t->trie, NULL);
+		}
+
+		node_free(l);
+	}
 
 	if (fa->fa_state & FA_S_ACCESSED)
 		rt_cache_flush(cfg->fc_nlinfo.nl_net);
@@ -1496,33 +1487,6 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	return 0;
 }
 
-static int trie_flush_leaf(struct tnode *l)
-{
-	struct hlist_node *tmp;
-	unsigned char slen = 0;
-	struct fib_alias *fa;
-	int found = 0;
-
-	hlist_for_each_entry_safe(fa, tmp, &l->leaf, fa_list) {
-		struct fib_info *fi = fa->fa_info;
-
-		if (fi && (fi->fib_flags & RTNH_F_DEAD)) {
-			hlist_del_rcu(&fa->fa_list);
-			fib_release_info(fa->fa_info);
-			alias_free_mem_rcu(fa);
-			found++;
-
-			continue;
-		}
-
-		slen = fa->fa_slen;
-	}
-
-	l->slen = slen;
-
-	return found;
-}
-
 /* Scan for the next right leaf starting at node p->child[idx]
  * Since we have back pointer, no recursion necessary.
  */
@@ -1590,30 +1554,81 @@ static struct tnode *trie_leafindex(struct trie *t, int index)
  */
 int fib_table_flush(struct fib_table *tb)
 {
-	struct trie *t = (struct trie *) tb->tb_data;
-	struct tnode *l, *ll = NULL;
+	struct trie *t = (struct trie *)tb->tb_data;
+	struct hlist_node *tmp;
+	struct fib_alias *fa;
+	struct tnode *n, *pn;
+	unsigned long cindex;
+	unsigned char slen;
 	int found = 0;
 
-	for (l = trie_firstleaf(t); l; l = trie_nextleaf(l)) {
-		found += trie_flush_leaf(l);
+	n = rcu_dereference(t->trie);
+	if (!n)
+		goto flush_complete;
+
+	pn = NULL;
+	cindex = 0;
+
+	while (IS_TNODE(n)) {
+		/* record pn and cindex for leaf walking */
+		pn = n;
+		cindex = 1ul << n->bits;
+backtrace:
+		/* walk trie in reverse order */
+		do {
+			while (!(cindex--)) {
+				t_key pkey = pn->key;
+
+				n = pn;
+				pn = node_parent(n);
+
+				/* resize completed node */
+				resize(t, n);
+
+				/* if we got the root we are done */
+				if (!pn)
+					goto flush_complete;
 
-		if (ll) {
-			if (hlist_empty(&ll->leaf))
-				trie_leaf_remove(t, ll);
-			else
-				leaf_pull_suffix(ll);
+				cindex = get_index(pkey, pn);
+			}
+
+			/* grab the next available node */
+			n = tnode_get_child(pn, cindex);
+		} while (!n);
+	}
+
+	/* track slen in case any prefixes survive */
+	slen = 0;
+
+	hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) {
+		struct fib_info *fi = fa->fa_info;
+
+		if (fi && (fi->fib_flags & RTNH_F_DEAD)) {
+			hlist_del_rcu(&fa->fa_list);
+			fib_release_info(fa->fa_info);
+			alias_free_mem_rcu(fa);
+			found++;
+
+			continue;
 		}
 
-		ll = l;
+		slen = fa->fa_slen;
 	}
 
-	if (ll) {
-		if (hlist_empty(&ll->leaf))
-			trie_leaf_remove(t, ll);
-		else
-			leaf_pull_suffix(ll);
+	/* update leaf slen */
+	n->slen = slen;
+
+	if (hlist_empty(&n->leaf)) {
+		put_child_root(pn, t, n->key, NULL);
+		node_free(n);
+	} else {
+		leaf_pull_suffix(n);
 	}
 
+	/* if trie is leaf only loop is completed */
+	if (pn)
+		goto backtrace;
+flush_complete:
 	pr_debug("trie_flush found=%d\n", found);
 	return found;
 }

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 06/29] fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (4 preceding siblings ...)
  2015-02-24 20:48 ` [RFC PATCH 05/29] fib_trie: Only resize N/2 times instead N * log(N) times in fib_table_flush Alexander Duyck
@ 2015-02-24 20:48 ` Alexander Duyck
  2015-02-24 20:48 ` [RFC PATCH 07/29] fib_trie: Fib find node should return parent Alexander Duyck
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:48 UTC (permalink / raw
  To: netdev

This change makes it so that leaf_walk_rcu takes a tnode and a key instead
of the trie and a leaf.

The main idea behind this is to avoid using the leaf parent pointer as that
can have additional overhead in the future as I am trying to reduce the
size of a leaf down to 16 bytes on 64b systems and 12b on 32b systems.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |  216 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 120 insertions(+), 96 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 02e5126..4c82e60 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1487,71 +1487,71 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	return 0;
 }
 
-/* Scan for the next right leaf starting at node p->child[idx]
- * Since we have back pointer, no recursion necessary.
- */
-static struct tnode *leaf_walk_rcu(struct tnode *p, struct tnode *c)
+/* Scan for the next leaf starting at the provided key value */
+static struct tnode *leaf_walk_rcu(struct tnode **pn, t_key key)
 {
-	do {
-		unsigned long idx = c ? idx = get_index(c->key, p) + 1 : 0;
-
-		while (idx < tnode_child_length(p)) {
-			c = tnode_get_child_rcu(p, idx++);
-			if (!c)
-				continue;
-
-			if (IS_LEAF(c))
-				return c;
-
-			/* Rescan start scanning in new node */
-			p = c;
-			idx = 0;
-		}
+	struct tnode *tn = NULL, *n = *pn;
+	unsigned long cindex;
 
-		/* Node empty, walk back up to parent */
-		c = p;
-	} while ((p = node_parent_rcu(c)) != NULL);
+	/* record parent node for backtracing */
+	tn = n;
+	cindex = n ? get_index(key, n) : 0;
 
-	return NULL; /* Root of trie */
-}
+	/* this loop is meant to try and find the key in the trie */
+	while (n) {
+		unsigned long idx = get_index(key, n);
 
-static struct tnode *trie_firstleaf(struct trie *t)
-{
-	struct tnode *n = rcu_dereference_rtnl(t->trie);
+		/* guarantee forward progress on the keys */
+		if (IS_LEAF(n) && (n->key >= key))
+			goto found;
+		if (idx >> n->bits)
+			break;
 
-	if (!n)
-		return NULL;
+		/* record parent and next child index */
+		tn = n;
+		cindex = idx;
 
-	if (IS_LEAF(n))          /* trie is just a leaf */
-		return n;
+		/* descend into the next child */
+		n = tnode_get_child_rcu(tn, cindex++);
+	}
 
-	return leaf_walk_rcu(n, NULL);
-}
+	/* this loop will search for the next leaf with a greater key */
+	while (tn) {
+		/* if we exhausted the parent node we will need to climb */
+		if (cindex >> tn->bits) {
+			t_key pkey = tn->key;
 
-static struct tnode *trie_nextleaf(struct tnode *l)
-{
-	struct tnode *p = node_parent_rcu(l);
+			tn = node_parent_rcu(tn);
+			if (!tn)
+				break;
 
-	if (!p)
-		return NULL;	/* trie with just one leaf */
+			cindex = get_index(pkey, tn) + 1;
+			continue;
+		}
 
-	return leaf_walk_rcu(p, l);
-}
+		/* grab the next available node */
+		n = tnode_get_child_rcu(tn, cindex++);
+		if (!n)
+			continue;
 
-static struct tnode *trie_leafindex(struct trie *t, int index)
-{
-	struct tnode *l = trie_firstleaf(t);
+		/* no need to compare keys since we bumped the index */
+		if (IS_LEAF(n))
+			goto found;
 
-	while (l && index-- > 0)
-		l = trie_nextleaf(l);
+		/* Rescan start scanning in new node */
+		tn = n;
+		cindex = 0;
+	}
 
-	return l;
+	*pn = tn;
+	return NULL; /* Root of trie */
+found:
+	/* if we are at the limit for keys just return NULL for the tnode */
+	*pn = (n->key == KEY_MAX) ? NULL : tn;
+	return n;
 }
 
-
-/*
- * Caller must hold RTNL.
- */
+/* Caller must hold RTNL. */
 int fib_table_flush(struct fib_table *tb)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
@@ -1682,42 +1682,42 @@ static int fn_trie_dump_leaf(struct tnode *l, struct fib_table *tb,
 int fib_table_dump(struct fib_table *tb, struct sk_buff *skb,
 		   struct netlink_callback *cb)
 {
-	struct tnode *l;
-	struct trie *t = (struct trie *) tb->tb_data;
-	t_key key = cb->args[2];
-	int count = cb->args[3];
-
-	rcu_read_lock();
+	struct trie *t = (struct trie *)tb->tb_data;
+	struct tnode *l, *tp;
 	/* Dump starting at last key.
 	 * Note: 0.0.0.0/0 (ie default) is first key.
 	 */
-	if (count == 0)
-		l = trie_firstleaf(t);
-	else {
-		/* Normally, continue from last key, but if that is missing
-		 * fallback to using slow rescan
-		 */
-		l = fib_find_node(t, key);
-		if (!l)
-			l = trie_leafindex(t, count);
-	}
+	int count = cb->args[2];
+	t_key key = cb->args[3];
+
+	rcu_read_lock();
 
-	while (l) {
-		cb->args[2] = l->key;
+	tp = rcu_dereference_rtnl(t->trie);
+
+	while ((l = leaf_walk_rcu(&tp, key)) != NULL) {
 		if (fn_trie_dump_leaf(l, tb, skb, cb) < 0) {
-			cb->args[3] = count;
+			cb->args[3] = key;
+			cb->args[2] = count;
 			rcu_read_unlock();
 			return -1;
 		}
 
 		++count;
-		l = trie_nextleaf(l);
+		key = l->key + 1;
+
 		memset(&cb->args[4], 0,
 		       sizeof(cb->args) - 4*sizeof(cb->args[0]));
+
+		/* stop loop if key wrapped back to 0 */
+		if (key < l->key)
+			break;
 	}
-	cb->args[3] = count;
+
 	rcu_read_unlock();
 
+	cb->args[3] = key;
+	cb->args[2] = count;
+
 	return skb->len;
 }
 
@@ -2188,31 +2188,46 @@ static const struct file_operations fib_trie_fops = {
 
 struct fib_route_iter {
 	struct seq_net_private p;
-	struct trie *main_trie;
+	struct fib_table *main_tb;
+	struct tnode *tnode;
 	loff_t	pos;
 	t_key	key;
 };
 
 static struct tnode *fib_route_get_idx(struct fib_route_iter *iter, loff_t pos)
 {
-	struct tnode *l = NULL;
-	struct trie *t = iter->main_trie;
+	struct fib_table *tb = iter->main_tb;
+	struct tnode *l, **tp = &iter->tnode;
+	struct trie *t;
+	t_key key;
 
-	/* use cache location of last found key */
-	if (iter->pos > 0 && pos >= iter->pos && (l = fib_find_node(t, iter->key)))
+	/* use cache location of next-to-find key */
+	if (iter->pos > 0 && pos >= iter->pos) {
 		pos -= iter->pos;
-	else {
+		key = iter->key;
+	} else {
+		t = (struct trie *)tb->tb_data;
+		iter->tnode = rcu_dereference_rtnl(t->trie);
 		iter->pos = 0;
-		l = trie_firstleaf(t);
+		key = 0;
 	}
 
-	while (l && pos-- > 0) {
+	while ((l = leaf_walk_rcu(tp, key)) != NULL) {
+		key = l->key + 1;
 		iter->pos++;
-		l = trie_nextleaf(l);
+
+		if (pos-- <= 0)
+			break;
+
+		l = NULL;
+
+		/* handle unlikely case of a key wrap */
+		if (!key)
+			break;
 	}
 
 	if (l)
-		iter->key = pos;	/* remember it */
+		iter->key = key;	/* remember it */
 	else
 		iter->pos = 0;		/* forget it */
 
@@ -2224,37 +2239,46 @@ static void *fib_route_seq_start(struct seq_file *seq, loff_t *pos)
 {
 	struct fib_route_iter *iter = seq->private;
 	struct fib_table *tb;
+	struct trie *t;
 
 	rcu_read_lock();
+
 	tb = fib_get_table(seq_file_net(seq), RT_TABLE_MAIN);
 	if (!tb)
 		return NULL;
 
-	iter->main_trie = (struct trie *) tb->tb_data;
-	if (*pos == 0)
-		return SEQ_START_TOKEN;
-	else
-		return fib_route_get_idx(iter, *pos - 1);
+	iter->main_tb = tb;
+
+	if (*pos != 0)
+		return fib_route_get_idx(iter, *pos);
+
+	t = (struct trie *)tb->tb_data;
+	iter->tnode = rcu_dereference_rtnl(t->trie);
+	iter->pos = 0;
+	iter->key = 0;
+
+	return SEQ_START_TOKEN;
 }
 
 static void *fib_route_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
 	struct fib_route_iter *iter = seq->private;
-	struct tnode *l = v;
+	struct tnode *l = NULL;
+	t_key key = iter->key;
 
 	++*pos;
-	if (v == SEQ_START_TOKEN) {
-		iter->pos = 0;
-		l = trie_firstleaf(iter->main_trie);
-	} else {
+
+	/* only allow key of 0 for start of sequence */
+	if ((v == SEQ_START_TOKEN) || key)
+		l = leaf_walk_rcu(&iter->tnode, key);
+
+	if (l) {
+		iter->key = l->key + 1;
 		iter->pos++;
-		l = trie_nextleaf(l);
+	} else {
+		iter->pos = 0;
 	}
 
-	if (l)
-		iter->key = l->key;
-	else
-		iter->pos = 0;
 	return l;
 }
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 07/29] fib_trie: Fib find node should return parent
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (5 preceding siblings ...)
  2015-02-24 20:48 ` [RFC PATCH 06/29] fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf Alexander Duyck
@ 2015-02-24 20:48 ` Alexander Duyck
  2015-02-24 20:48 ` [RFC PATCH 08/29] fib_trie: Update insert and delete to make use of tp from find_node Alexander Duyck
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:48 UTC (permalink / raw
  To: netdev

This change makes it so that the parent pointer is returned by reference in
fib_find_node.  By doing this I can use it to find the parent node when I
am performing an insertion and I don't have to look for it again in
fib_insert_node.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 4c82e60..1cb9e92 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -912,10 +912,12 @@ static void fib_insert_alias(struct tnode *l, struct fib_alias *fa,
 }
 
 /* rcu_read_lock needs to be hold by caller from readside */
-static struct tnode *fib_find_node(struct trie *t, u32 key)
+static struct tnode *fib_find_node(struct trie *t, struct tnode **tp, u32 key)
 {
 	struct tnode *n = rcu_dereference_rtnl(t->trie);
 
+	*tp = NULL;
+
 	while (n) {
 		unsigned long index = get_index(key, n);
 
@@ -936,6 +938,7 @@ static struct tnode *fib_find_node(struct trie *t, u32 key)
 		if (IS_LEAF(n))
 			break;
 
+		*tp = n;
 		n = tnode_get_child_rcu(n, index);
 	}
 
@@ -1071,15 +1074,15 @@ static struct tnode *fib_insert_node(struct trie *t, u32 key, int plen)
  */
 int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 {
-	struct trie *t = (struct trie *) tb->tb_data;
+	struct trie *t = (struct trie *)tb->tb_data;
 	struct fib_alias *fa, *new_fa;
+	struct tnode *l, *tp;
 	struct fib_info *fi;
 	u8 plen = cfg->fc_dst_len;
 	u8 slen = KEYLENGTH - plen;
 	u8 tos = cfg->fc_tos;
-	u32 key, mask;
+	u32 key;
 	int err;
-	struct tnode *l;
 
 	if (plen > KEYLENGTH)
 		return -EINVAL;
@@ -1088,9 +1091,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 
 	pr_debug("Insert table=%u %08x/%d\n", tb->tb_id, key, plen);
 
-	mask = ntohl(inet_make_mask(plen));
-
-	if (key & ~mask)
+	if ((plen < KEYLENGTH) && (key << plen))
 		return -EINVAL;
 
 	fi = fib_create_info(cfg);
@@ -1099,7 +1100,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 		goto err;
 	}
 
-	l = fib_find_node(t, key);
+	l = fib_find_node(t, &tp, key);
 	fa = l ? fib_find_alias(&l->leaf, slen, tos, fi->fib_priority) : NULL;
 
 	/* Now fa, if non-NULL, points to the first fib alias
@@ -1408,22 +1409,21 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *) tb->tb_data;
 	struct fib_alias *fa, *fa_to_delete;
+	struct tnode *l, *tp;
 	u8 plen = cfg->fc_dst_len;
-	u8 tos = cfg->fc_tos;
 	u8 slen = KEYLENGTH - plen;
-	struct tnode *l;
-	u32 key, mask;
+	u8 tos = cfg->fc_tos;
+	u32 key;
 
 	if (plen > KEYLENGTH)
 		return -EINVAL;
 
 	key = ntohl(cfg->fc_dst);
-	mask = ntohl(inet_make_mask(plen));
 
-	if (key & ~mask)
+	if ((plen < KEYLENGTH) && (key << plen))
 		return -EINVAL;
 
-	l = fib_find_node(t, key);
+	l = fib_find_node(t, &tp, key);
 	if (!l)
 		return -ESRCH;
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 08/29] fib_trie: Update insert and delete to make use of tp from find_node
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (6 preceding siblings ...)
  2015-02-24 20:48 ` [RFC PATCH 07/29] fib_trie: Fib find node should return parent Alexander Duyck
@ 2015-02-24 20:48 ` Alexander Duyck
  2015-02-24 20:48 ` [RFC PATCH 09/29] fib_trie: Make fib_table rcu safe Alexander Duyck
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:48 UTC (permalink / raw
  To: netdev

This change makes it so that the insert and delete functions make use of
the tnode pointer returned in the fib_find_node call.  By doing this we
will not have to rely on the parent pointer in the leaf which will be going
away soon.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |  237 ++++++++++++++++++++-------------------------------
 1 file changed, 94 insertions(+), 143 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 1cb9e92..dcdf636 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -300,7 +300,7 @@ static inline void empty_child_dec(struct tnode *n)
 	n->empty_children-- ? : n->full_children--;
 }
 
-static struct tnode *leaf_new(t_key key)
+static struct tnode *leaf_new(t_key key, struct fib_alias *fa)
 {
 	struct tnode *l = kmem_cache_alloc(trie_leaf_kmem, GFP_KERNEL);
 	if (l) {
@@ -310,12 +310,14 @@ static struct tnode *leaf_new(t_key key)
 		 * as the nodes are searched
 		 */
 		l->key = key;
-		l->slen = 0;
+		l->slen = fa->fa_slen;
 		l->pos = 0;
 		/* set bits to 0 indicating we are not a tnode */
 		l->bits = 0;
 
+		/* link leaf to fib alias */
 		INIT_HLIST_HEAD(&l->leaf);
+		hlist_add_head(&fa->fa_list, &l->leaf);
 	}
 	return l;
 }
@@ -842,10 +844,8 @@ static void resize(struct trie *t, struct tnode *tn)
 	}
 }
 
-static void leaf_pull_suffix(struct tnode *l)
+static void leaf_pull_suffix(struct tnode *tp, struct tnode *l)
 {
-	struct tnode *tp = node_parent(l);
-
 	while (tp && (tp->slen > tp->pos) && (tp->slen > l->slen)) {
 		if (update_suffix(tp) > l->slen)
 			break;
@@ -853,10 +853,8 @@ static void leaf_pull_suffix(struct tnode *l)
 	}
 }
 
-static void leaf_push_suffix(struct tnode *l)
+static void leaf_push_suffix(struct tnode *tn, struct tnode *l)
 {
-	struct tnode *tn = node_parent(l);
-
 	/* if this is a new leaf then tn will be NULL and we can sort
 	 * out parent suffix lengths as a part of trie_rebalance
 	 */
@@ -866,51 +864,6 @@ static void leaf_push_suffix(struct tnode *l)
 	}
 }
 
-static void fib_remove_alias(struct tnode *l, struct fib_alias *old)
-{
-	/* record the location of the previous list_info entry */
-	struct hlist_node **pprev = old->fa_list.pprev;
-	struct fib_alias *fa = hlist_entry(pprev, typeof(*fa), fa_list.next);
-
-	/* remove the fib_alias from the list */
-	hlist_del_rcu(&old->fa_list);
-
-	/* only access fa if it is pointing at the last valid hlist_node */
-	if (hlist_empty(&l->leaf) || (*pprev))
-		return;
-
-	/* update the trie with the latest suffix length */
-	l->slen = fa->fa_slen;
-	leaf_pull_suffix(l);
-}
-
-static void fib_insert_alias(struct tnode *l, struct fib_alias *fa,
-			     struct fib_alias *new)
-{
-	struct hlist_head *head = &l->leaf;
-
-	if (!fa) {
-		struct fib_alias *last;
-
-		hlist_for_each_entry(last, head, fa_list) {
-			if (new->fa_slen < last->fa_slen)
-				break;
-			fa = last;
-		}
-	}
-
-	if (fa)
-		hlist_add_behind_rcu(&new->fa_list, &fa->fa_list);
-	else
-		hlist_add_head_rcu(&new->fa_list, head);
-
-	/* if we added to the tail node then we need to update slen */
-	if (l->slen < new->fa_slen) {
-		l->slen = new->fa_slen;
-		leaf_push_suffix(l);
-	}
-}
-
 /* rcu_read_lock needs to be hold by caller from readside */
 static struct tnode *fib_find_node(struct trie *t, struct tnode **tp, u32 key)
 {
@@ -974,61 +927,28 @@ static void trie_rebalance(struct trie *t, struct tnode *tn)
 {
 	struct tnode *tp;
 
-	while ((tp = node_parent(tn)) != NULL) {
+	while (tn) {
+		tp = node_parent(tn);
 		resize(t, tn);
 		tn = tp;
 	}
-
-	/* Handle last (top) tnode */
-	if (IS_TNODE(tn))
-		resize(t, tn);
 }
 
 /* only used from updater-side */
-
-static struct tnode *fib_insert_node(struct trie *t, u32 key, int plen)
+static int fib_insert_node(struct trie *t, struct tnode *tp,
+			   struct fib_alias *new, t_key key)
 {
-	struct tnode *l, *n, *tp = NULL;
-
-	n = rtnl_dereference(t->trie);
-
-	/* If we point to NULL, stop. Either the tree is empty and we should
-	 * just put a new leaf in if, or we have reached an empty child slot,
-	 * and we should just put our new leaf in that.
-	 *
-	 * If we hit a node with a key that does't match then we should stop
-	 * and create a new tnode to replace that node and insert ourselves
-	 * and the other node into the new tnode.
-	 */
-	while (n) {
-		unsigned long index = get_index(key, n);
+	struct tnode *n, *l;
 
-		/* This bit of code is a bit tricky but it combines multiple
-		 * checks into a single check.  The prefix consists of the
-		 * prefix plus zeros for the "bits" in the prefix. The index
-		 * is the difference between the key and this value.  From
-		 * this we can actually derive several pieces of data.
-		 *   if !(index >> bits)
-		 *     we know the value is child index
-		 *   else
-		 *     we have a mismatch in skip bits and failed
-		 */
-		if (index >> n->bits)
-			break;
-
-		/* we have found a leaf. Prefixes have already been compared */
-		if (IS_LEAF(n)) {
-			/* Case 1: n is a leaf, and prefixes match*/
-			return n;
-		}
-
-		tp = n;
-		n = tnode_get_child_rcu(n, index);
-	}
-
-	l = leaf_new(key);
+	l = leaf_new(key, new);
 	if (!l)
-		return NULL;
+		return -ENOMEM;
+
+	/* retrieve child from parent node */
+	if (tp)
+		n = tnode_get_child(tp, get_index(key, tp));
+	else
+		n = rcu_dereference_rtnl(t->trie);
 
 	/* Case 2: n is a LEAF or a TNODE and the key doesn't match.
 	 *
@@ -1042,7 +962,7 @@ static struct tnode *fib_insert_node(struct trie *t, u32 key, int plen)
 		tn = tnode_new(key, __fls(key ^ n->key), 1);
 		if (!tn) {
 			node_free(l);
-			return NULL;
+			return -ENOMEM;
 		}
 
 		/* initialize routes out of node */
@@ -1058,20 +978,45 @@ static struct tnode *fib_insert_node(struct trie *t, u32 key, int plen)
 	}
 
 	/* Case 3: n is NULL, and will just insert a new leaf */
-	if (tp) {
-		NODE_INIT_PARENT(l, tp);
-		put_child(tp, get_index(key, tp), l);
-		trie_rebalance(t, tp);
-	} else {
-		rcu_assign_pointer(t->trie, l);
+	NODE_INIT_PARENT(l, tp);
+	put_child_root(tp, t, key, l);
+	trie_rebalance(t, tp);
+
+	return 0;
+}
+
+static int fib_insert_alias(struct trie *t, struct tnode *tp,
+			    struct tnode *l, struct fib_alias *new,
+			    struct fib_alias *fa, t_key key)
+{
+	if (!l)
+		return fib_insert_node(t, tp, new, key);
+
+	if (!fa) {
+		struct fib_alias *last;
+
+		hlist_for_each_entry(last, &l->leaf, fa_list) {
+			if (new->fa_slen < last->fa_slen)
+				break;
+			fa = last;
+		}
 	}
 
-	return l;
+	if (fa)
+		hlist_add_behind_rcu(&new->fa_list, &fa->fa_list);
+	else
+		hlist_add_head_rcu(&new->fa_list, &l->leaf);
+
+	/* if we added to the tail node then we need to update slen */
+	if (l->slen < new->fa_slen) {
+		l->slen = new->fa_slen;
+		leaf_push_suffix(tp, l);
+	}
+
+	return 0;
 }
 
-/*
- * Caller must hold RTNL.
- */
+/* Caller must hold RTNL. */
 int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
@@ -1201,19 +1146,13 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	new_fa->fa_slen = slen;
 
 	/* Insert new entry to the list. */
-	if (!l) {
-		l = fib_insert_node(t, key, plen);
-		if (unlikely(!l)) {
-			err = -ENOMEM;
-			goto out_free_new_fa;
-		}
-	}
+	err = fib_insert_alias(t, tp, l, new_fa, fa, key);
+	if (err)
+		goto out_free_new_fa;
 
 	if (!plen)
 		tb->tb_num_default++;
 
-	fib_insert_alias(l, fa, new_fa);
-
 	rt_cache_flush(cfg->fc_nlinfo.nl_net);
 	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
@@ -1402,9 +1341,36 @@ found:
 }
 EXPORT_SYMBOL_GPL(fib_table_lookup);
 
-/*
- * Caller must hold RTNL.
- */
+static void fib_remove_alias(struct trie *t, struct tnode *tp,
+			     struct tnode *l, struct fib_alias *old)
+{
+	/* record the location of the previous list_info entry */
+	struct hlist_node **pprev = old->fa_list.pprev;
+	struct fib_alias *fa = hlist_entry(pprev, typeof(*fa), fa_list.next);
+
+	/* remove the fib_alias from the list */
+	hlist_del_rcu(&old->fa_list);
+
+	/* if we emptied the list this leaf will be freed and we can sort
+	 * out parent suffix lengths as a part of trie_rebalance
+	 */
+	if (hlist_empty(&l->leaf)) {
+		put_child_root(tp, t, l->key, NULL);
+		node_free(l);
+		trie_rebalance(t, tp);
+		return;
+	}
+
+	/* only access fa if it is pointing at the last valid hlist_node */
+	if (*pprev)
+		return;
+
+	/* update the trie with the latest suffix length */
+	l->slen = fa->fa_slen;
+	leaf_pull_suffix(tp, l);
+}
+
+/* Caller must hold RTNL. */
 int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *) tb->tb_data;
@@ -1428,7 +1394,6 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 		return -ESRCH;
 
 	fa = fib_find_alias(&l->leaf, slen, tos, 0);
-
 	if (!fa)
 		return -ESRCH;
 
@@ -1457,33 +1422,19 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	if (!fa_to_delete)
 		return -ESRCH;
 
-	fa = fa_to_delete;
-	rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
+	rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
 
-	fib_remove_alias(l, fa);
-
 	if (!plen)
 		tb->tb_num_default--;
 
-	if (hlist_empty(&l->leaf)) {
-		struct tnode *tp = node_parent(l);
-
-		if (tp) {
-			put_child(tp, get_index(l->key, tp), NULL);
-			trie_rebalance(t, tp);
-		} else {
-			RCU_INIT_POINTER(t->trie, NULL);
-		}
-
-		node_free(l);
-	}
+	fib_remove_alias(t, tp, l, fa_to_delete);
 
-	if (fa->fa_state & FA_S_ACCESSED)
+	if (fa_to_delete->fa_state & FA_S_ACCESSED)
 		rt_cache_flush(cfg->fc_nlinfo.nl_net);
 
-	fib_release_info(fa->fa_info);
-	alias_free_mem_rcu(fa);
+	fib_release_info(fa_to_delete->fa_info);
+	alias_free_mem_rcu(fa_to_delete);
 	return 0;
 }
 
@@ -1622,7 +1573,7 @@ backtrace:
 		put_child_root(pn, t, n->key, NULL);
 		node_free(n);
 	} else {
-		leaf_pull_suffix(n);
+		leaf_pull_suffix(pn, n);
 	}
 
 	/* if trie is leaf only loop is completed */

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 09/29] fib_trie: Make fib_table rcu safe
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (7 preceding siblings ...)
  2015-02-24 20:48 ` [RFC PATCH 08/29] fib_trie: Update insert and delete to make use of tp from find_node Alexander Duyck
@ 2015-02-24 20:48 ` Alexander Duyck
  2015-02-24 20:49 ` [RFC PATCH 10/29] fib_trie: Return pointer to tnode pointer in resize/inflate/halve Alexander Duyck
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:48 UTC (permalink / raw
  To: netdev

The fib_table was wrapped in several places with an
rcu_read_lock/rcu_read_unlock however after looking over the code I found
several spots where the tables were being accessed as just standard
pointers without any protections.  This change fixes that so that all of
the proper protections are in place when accessing the table to take RCU
replacement of the table into account.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/net/ip_fib.h     |   69 +++++++++++++++++++++++++++-------------------
 include/net/netns/ipv4.h |    7 +++--
 net/ipv4/fib_frontend.c  |   52 +++++++++++++++++++++++++----------
 net/ipv4/fib_trie.c      |   28 +++++++++++++------
 4 files changed, 102 insertions(+), 54 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index cba4b7c..8aa6f82 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -206,12 +206,16 @@ void fib_free_table(struct fib_table *tb);
 
 static inline struct fib_table *fib_get_table(struct net *net, u32 id)
 {
+	struct hlist_node *tb_hlist;
 	struct hlist_head *ptr;
 
 	ptr = id == RT_TABLE_LOCAL ?
 		&net->ipv4.fib_table_hash[TABLE_LOCAL_INDEX] :
 		&net->ipv4.fib_table_hash[TABLE_MAIN_INDEX];
-	return hlist_entry(ptr->first, struct fib_table, tb_hlist);
+
+	tb_hlist = rcu_dereference_rtnl(hlist_first_rcu(ptr));
+
+	return hlist_entry(tb_hlist, struct fib_table, tb_hlist);
 }
 
 static inline struct fib_table *fib_new_table(struct net *net, u32 id)
@@ -222,15 +226,19 @@ static inline struct fib_table *fib_new_table(struct net *net, u32 id)
 static inline int fib_lookup(struct net *net, const struct flowi4 *flp,
 			     struct fib_result *res)
 {
-	int err = -ENETUNREACH;
+	struct fib_table *tb;
+	int err;
 
 	rcu_read_lock();
 
-	if (!fib_table_lookup(fib_get_table(net, RT_TABLE_LOCAL), flp, res,
-			      FIB_LOOKUP_NOREF) ||
-	    !fib_table_lookup(fib_get_table(net, RT_TABLE_MAIN), flp, res,
-			      FIB_LOOKUP_NOREF))
-		err = 0;
+	for (err = 0; !err; err = -ENETUNREACH) {
+		tb = fib_get_table(net, RT_TABLE_LOCAL);
+		if (tb && !fib_table_lookup(tb, flp, res, FIB_LOOKUP_NOREF))
+			break;
+		tb = fib_get_table(net, RT_TABLE_MAIN);
+		if (tb && !fib_table_lookup(tb, flp, res, FIB_LOOKUP_NOREF))
+			break;
+	}
 
 	rcu_read_unlock();
 
@@ -249,28 +257,33 @@ int __fib_lookup(struct net *net, struct flowi4 *flp, struct fib_result *res);
 static inline int fib_lookup(struct net *net, struct flowi4 *flp,
 			     struct fib_result *res)
 {
-	if (!net->ipv4.fib_has_custom_rules) {
-		int err = -ENETUNREACH;
-
-		rcu_read_lock();
-
-		res->tclassid = 0;
-		if ((net->ipv4.fib_local &&
-		     !fib_table_lookup(net->ipv4.fib_local, flp, res,
-				       FIB_LOOKUP_NOREF)) ||
-		    (net->ipv4.fib_main &&
-		     !fib_table_lookup(net->ipv4.fib_main, flp, res,
-				       FIB_LOOKUP_NOREF)) ||
-		    (net->ipv4.fib_default &&
-		     !fib_table_lookup(net->ipv4.fib_default, flp, res,
-				       FIB_LOOKUP_NOREF)))
-			err = 0;
-
-		rcu_read_unlock();
-
-		return err;
+	struct fib_table *tb;
+	int err;
+
+	if (net->ipv4.fib_has_custom_rules)
+		return __fib_lookup(net, flp, res);
+
+	rcu_read_lock();
+
+	res->tclassid = 0;
+
+	for (err = 0; !err; err = -ENETUNREACH) {
+		tb = rcu_dereference_rtnl(net->ipv4.fib_local);
+		if (tb && !fib_table_lookup(tb, flp, res, FIB_LOOKUP_NOREF))
+			break;
+
+		tb = rcu_dereference_rtnl(net->ipv4.fib_main);
+		if (tb && !fib_table_lookup(tb, flp, res, FIB_LOOKUP_NOREF))
+			break;
+
+		tb = rcu_dereference_rtnl(net->ipv4.fib_default);
+		if (tb && !fib_table_lookup(tb, flp, res, FIB_LOOKUP_NOREF))
+			break;
 	}
-	return __fib_lookup(net, flp, res);
+
+	rcu_read_unlock();
+
+	return err;
 }
 
 #endif /* CONFIG_IP_MULTIPLE_TABLES */
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index dbe2254..d4f5b6f 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -7,6 +7,7 @@
 
 #include <linux/uidgid.h>
 #include <net/inet_frag.h>
+#include <linux/rcupdate.h>
 
 struct tcpm_hash_bucket;
 struct ctl_table_header;
@@ -38,9 +39,9 @@ struct netns_ipv4 {
 #ifdef CONFIG_IP_MULTIPLE_TABLES
 	struct fib_rules_ops	*rules_ops;
 	bool			fib_has_custom_rules;
-	struct fib_table	*fib_local;
-	struct fib_table	*fib_main;
-	struct fib_table	*fib_default;
+	struct fib_table __rcu	*fib_local;
+	struct fib_table __rcu	*fib_main;
+	struct fib_table __rcu	*fib_default;
 #endif
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	int			fib_num_tclassid_users;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 57be71d..220c4b4 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -89,17 +89,14 @@ struct fib_table *fib_new_table(struct net *net, u32 id)
 
 	switch (id) {
 	case RT_TABLE_LOCAL:
-		net->ipv4.fib_local = tb;
+		rcu_assign_pointer(net->ipv4.fib_local, tb);
 		break;
-
 	case RT_TABLE_MAIN:
-		net->ipv4.fib_main = tb;
+		rcu_assign_pointer(net->ipv4.fib_main, tb);
 		break;
-
 	case RT_TABLE_DEFAULT:
-		net->ipv4.fib_default = tb;
+		rcu_assign_pointer(net->ipv4.fib_default, tb);
 		break;
-
 	default:
 		break;
 	}
@@ -132,13 +129,14 @@ struct fib_table *fib_get_table(struct net *net, u32 id)
 static void fib_flush(struct net *net)
 {
 	int flushed = 0;
-	struct fib_table *tb;
-	struct hlist_head *head;
 	unsigned int h;
 
 	for (h = 0; h < FIB_TABLE_HASHSZ; h++) {
-		head = &net->ipv4.fib_table_hash[h];
-		hlist_for_each_entry(tb, head, tb_hlist)
+		struct hlist_head *head = &net->ipv4.fib_table_hash[h];
+		struct hlist_node *tmp;
+		struct fib_table *tb;
+
+		hlist_for_each_entry_safe(tb, tmp, head, tb_hlist)
 			flushed += fib_table_flush(tb);
 	}
 
@@ -665,10 +663,12 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	s_h = cb->args[0];
 	s_e = cb->args[1];
 
+	rcu_read_lock();
+
 	for (h = s_h; h < FIB_TABLE_HASHSZ; h++, s_e = 0) {
 		e = 0;
 		head = &net->ipv4.fib_table_hash[h];
-		hlist_for_each_entry(tb, head, tb_hlist) {
+		hlist_for_each_entry_rcu(tb, head, tb_hlist) {
 			if (e < s_e)
 				goto next;
 			if (dumped)
@@ -682,6 +682,8 @@ next:
 		}
 	}
 out:
+	rcu_read_unlock();
+
 	cb->args[1] = e;
 	cb->args[0] = h;
 
@@ -1117,14 +1119,34 @@ static void ip_fib_net_exit(struct net *net)
 
 	rtnl_lock();
 	for (i = 0; i < FIB_TABLE_HASHSZ; i++) {
-		struct fib_table *tb;
-		struct hlist_head *head;
+		struct hlist_head *head = &net->ipv4.fib_table_hash[i];
 		struct hlist_node *tmp;
+		struct fib_table *tb;
+
+		/* this is done in two passes as flushing the table could
+		 * cause it to be reallocated in order to accommodate new
+		 * tnodes at the root as the table shrinks.
+		 */
+		hlist_for_each_entry_safe(tb, tmp, head, tb_hlist)
+			fib_table_flush(tb);
 
-		head = &net->ipv4.fib_table_hash[i];
 		hlist_for_each_entry_safe(tb, tmp, head, tb_hlist) {
+#ifdef CONFIG_IP_MULTIPLE_TABLES
+			switch (tb->tb_id) {
+			case RT_TABLE_LOCAL:
+				RCU_INIT_POINTER(net->ipv4.fib_local, NULL);
+				break;
+			case RT_TABLE_MAIN:
+				RCU_INIT_POINTER(net->ipv4.fib_main, NULL);
+				break;
+			case RT_TABLE_DEFAULT:
+				RCU_INIT_POINTER(net->ipv4.fib_default, NULL);
+				break;
+			default:
+				break;
+			}
+#endif
 			hlist_del(&tb->tb_hlist);
-			fib_table_flush(tb);
 			fib_free_table(tb);
 		}
 	}
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index dcdf636..b895ee7 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -138,6 +138,7 @@ struct trie {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	struct trie_use_stats __percpu *stats;
 #endif
+	struct rcu_head rcu;
 };
 
 static void resize(struct trie *t, struct tnode *tn);
@@ -190,6 +191,13 @@ static inline struct tnode *tnode_get_child_rcu(const struct tnode *tn,
 	return rcu_dereference_rtnl(tn->child[i]);
 }
 
+static inline struct fib_table *trie_get_table(struct trie *t)
+{
+	unsigned long *tb_data = (unsigned long *)t;
+
+	return container_of(tb_data, struct fib_table, tb_data[0]);
+}
+
 /* To understand this stuff, an understanding of keys and all their bits is
  * necessary. Every node in the trie has a key associated with it, but not
  * all of the bits in that key are significant.
@@ -1584,16 +1592,24 @@ flush_complete:
 	return found;
 }
 
-void fib_free_table(struct fib_table *tb)
+static void __trie_free_rcu(struct rcu_head *head)
 {
-#ifdef CONFIG_IP_FIB_TRIE_STATS
-	struct trie *t = (struct trie *)tb->tb_data;
+	struct trie *t = container_of(head, struct trie, rcu);
+	struct fib_table *tb = trie_get_table(t);
 
+#ifdef CONFIG_IP_FIB_TRIE_STATS
 	free_percpu(t->stats);
 #endif /* CONFIG_IP_FIB_TRIE_STATS */
 	kfree(tb);
 }
 
+void fib_free_table(struct fib_table *tb)
+{
+	struct trie *t = (struct trie *)tb->tb_data;
+
+	call_rcu(&t->rcu, __trie_free_rcu);
+}
+
 static int fn_trie_dump_leaf(struct tnode *l, struct fib_table *tb,
 			     struct sk_buff *skb, struct netlink_callback *cb)
 {
@@ -1630,6 +1646,7 @@ static int fn_trie_dump_leaf(struct tnode *l, struct fib_table *tb,
 	return skb->len;
 }
 
+/* rcu_read_lock needs to be hold by caller from readside */
 int fib_table_dump(struct fib_table *tb, struct sk_buff *skb,
 		   struct netlink_callback *cb)
 {
@@ -1641,15 +1658,12 @@ int fib_table_dump(struct fib_table *tb, struct sk_buff *skb,
 	int count = cb->args[2];
 	t_key key = cb->args[3];
 
-	rcu_read_lock();
-
 	tp = rcu_dereference_rtnl(t->trie);
 
 	while ((l = leaf_walk_rcu(&tp, key)) != NULL) {
 		if (fn_trie_dump_leaf(l, tb, skb, cb) < 0) {
 			cb->args[3] = key;
 			cb->args[2] = count;
-			rcu_read_unlock();
 			return -1;
 		}
 
@@ -1664,8 +1678,6 @@ int fib_table_dump(struct fib_table *tb, struct sk_buff *skb,
 			break;
 	}
 
-	rcu_read_unlock();
-
 	cb->args[3] = key;
 	cb->args[2] = count;
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 10/29] fib_trie: Return pointer to tnode pointer in resize/inflate/halve
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (8 preceding siblings ...)
  2015-02-24 20:48 ` [RFC PATCH 09/29] fib_trie: Make fib_table rcu safe Alexander Duyck
@ 2015-02-24 20:49 ` Alexander Duyck
  2015-02-24 20:49 ` [RFC PATCH 11/29] fib_trie: Rename tnode to key_vector Alexander Duyck
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:49 UTC (permalink / raw
  To: netdev

Resize related functions now all return a pointer to the pointer that
references the object that was resized.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |  132 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 80 insertions(+), 52 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index b895ee7..be1ffe8 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -141,7 +141,7 @@ struct trie {
 	struct rcu_head rcu;
 };
 
-static void resize(struct trie *t, struct tnode *tn);
+static struct tnode **resize(struct trie *t, struct tnode *tn);
 static size_t tnode_free_size;
 
 /*
@@ -455,9 +455,11 @@ static void tnode_free(struct tnode *tn)
 	}
 }
 
-static void replace(struct trie *t, struct tnode *oldtnode, struct tnode *tn)
+static struct tnode __rcu **replace(struct trie *t, struct tnode *oldtnode,
+				    struct tnode *tn)
 {
 	struct tnode *tp = node_parent(oldtnode);
+	struct tnode **cptr;
 	unsigned long i;
 
 	/* setup the parent pointer out of and back into this node */
@@ -470,6 +472,9 @@ static void replace(struct trie *t, struct tnode *oldtnode, struct tnode *tn)
 	/* all pointers should be clean so we are done */
 	tnode_free(oldtnode);
 
+	/* record the pointer that is pointing to this node */
+	cptr = tp ? tp->child : &t->trie;
+
 	/* resize children now that oldtnode is freed */
 	for (i = tnode_child_length(tn); i;) {
 		struct tnode *inode = tnode_get_child(tn, --i);
@@ -478,9 +483,11 @@ static void replace(struct trie *t, struct tnode *oldtnode, struct tnode *tn)
 		if (tnode_full(tn, inode))
 			resize(t, inode);
 	}
+
+	return cptr;
 }
 
-static int inflate(struct trie *t, struct tnode *oldtnode)
+static struct tnode __rcu **inflate(struct trie *t, struct tnode *oldtnode)
 {
 	struct tnode *tn;
 	unsigned long i;
@@ -490,7 +497,7 @@ static int inflate(struct trie *t, struct tnode *oldtnode)
 
 	tn = tnode_new(oldtnode->key, oldtnode->pos - 1, oldtnode->bits + 1);
 	if (!tn)
-		return -ENOMEM;
+		goto notnode;
 
 	/* prepare oldtnode to be freed */
 	tnode_free_init(oldtnode);
@@ -567,16 +574,15 @@ static int inflate(struct trie *t, struct tnode *oldtnode)
 	}
 
 	/* setup the parent pointers into and out of this node */
-	replace(t, oldtnode, tn);
-
-	return 0;
+	return replace(t, oldtnode, tn);
 nomem:
 	/* all pointers should be clean so we are done */
 	tnode_free(tn);
-	return -ENOMEM;
+notnode:
+	return NULL;
 }
 
-static int halve(struct trie *t, struct tnode *oldtnode)
+static struct tnode __rcu **halve(struct trie *t, struct tnode *oldtnode)
 {
 	struct tnode *tn;
 	unsigned long i;
@@ -585,7 +591,7 @@ static int halve(struct trie *t, struct tnode *oldtnode)
 
 	tn = tnode_new(oldtnode->key, oldtnode->pos + 1, oldtnode->bits - 1);
 	if (!tn)
-		return -ENOMEM;
+		goto notnode;
 
 	/* prepare oldtnode to be freed */
 	tnode_free_init(oldtnode);
@@ -608,10 +614,8 @@ static int halve(struct trie *t, struct tnode *oldtnode)
 
 		/* Two nonempty children */
 		inode = tnode_new(node0->key, oldtnode->pos, 1);
-		if (!inode) {
-			tnode_free(tn);
-			return -ENOMEM;
-		}
+		if (!inode)
+			goto nomem;
 		tnode_free_append(tn, inode);
 
 		/* initialize pointers out of node */
@@ -624,9 +628,12 @@ static int halve(struct trie *t, struct tnode *oldtnode)
 	}
 
 	/* setup the parent pointers into and out of this node */
-	replace(t, oldtnode, tn);
-
-	return 0;
+	return replace(t, oldtnode, tn);
+nomem:
+	/* all pointers should be clean so we are done */
+	tnode_free(tn);
+notnode:
+	return NULL;
 }
 
 static void collapse(struct trie *t, struct tnode *oldtnode)
@@ -783,10 +790,14 @@ static bool should_collapse(const struct tnode *tn)
 }
 
 #define MAX_WORK 10
-static void resize(struct trie *t, struct tnode *tn)
+static struct tnode __rcu **resize(struct trie *t, struct tnode *tn)
 {
+#ifdef CONFIG_IP_FIB_TRIE_STATS
+	struct trie_use_stats __percpu *stats = t->stats;
+#endif
 	struct tnode *tp = node_parent(tn);
-	struct tnode __rcu **cptr;
+	unsigned long cindex = tp ? get_index(tn->key, tp) : 0;
+	struct tnode __rcu **cptr = tp ? tp->child : &t->trie;
 	int max_work = MAX_WORK;
 
 	pr_debug("In tnode_resize %p inflate_threshold=%d threshold=%d\n",
@@ -796,52 +807,57 @@ static void resize(struct trie *t, struct tnode *tn)
 	 * doing it ourselves.  This way we can let RCU fully do its
 	 * thing without us interfering
 	 */
-	cptr = tp ? &tp->child[get_index(tn->key, tp)] : &t->trie;
-	BUG_ON(tn != rtnl_dereference(*cptr));
+	BUG_ON(tn != rtnl_dereference(cptr[cindex]));
 
 	/* Double as long as the resulting node has a number of
 	 * nonempty nodes that are above the threshold.
 	 */
 	while (should_inflate(tp, tn) && max_work) {
-		if (inflate(t, tn)) {
+		struct tnode __rcu **tcptr = inflate(t, tn);
+
+		if (!tcptr) {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
-			this_cpu_inc(t->stats->resize_node_skipped);
+			this_cpu_inc(stats->resize_node_skipped);
 #endif
 			break;
 		}
 
 		max_work--;
-		tn = rtnl_dereference(*cptr);
+		cptr = tcptr;
+		tn = rtnl_dereference(cptr[cindex]);
 	}
 
 	/* Return if at least one inflate is run */
 	if (max_work != MAX_WORK)
-		return;
+		return cptr;
 
 	/* Halve as long as the number of empty children in this
 	 * node is above threshold.
 	 */
 	while (should_halve(tp, tn) && max_work) {
-		if (halve(t, tn)) {
+		struct tnode __rcu **tcptr = halve(t, tn);
+
+		if (!tcptr) {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
-			this_cpu_inc(t->stats->resize_node_skipped);
+			this_cpu_inc(stats->resize_node_skipped);
 #endif
 			break;
 		}
 
 		max_work--;
-		tn = rtnl_dereference(*cptr);
+		cptr = tcptr;
+		tn = rtnl_dereference(cptr[cindex]);
 	}
 
 	/* Only one child remains */
 	if (should_collapse(tn)) {
 		collapse(t, tn);
-		return;
+		return cptr;
 	}
 
 	/* Return if at least one deflate was run */
 	if (max_work != MAX_WORK)
-		return;
+		return cptr;
 
 	/* push the suffix length to the parent node */
 	if (tn->slen > tn->pos) {
@@ -850,6 +866,8 @@ static void resize(struct trie *t, struct tnode *tn)
 		if (tp && (slen > tp->slen))
 			tp->slen = slen;
 	}
+
+	return cptr;
 }
 
 static void leaf_pull_suffix(struct tnode *tp, struct tnode *l)
@@ -931,26 +949,30 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen,
 	return NULL;
 }
 
-static void trie_rebalance(struct trie *t, struct tnode *tn)
+static struct fib_table *trie_rebalance(struct trie *t, struct tnode *tn)
 {
-	struct tnode *tp;
+	struct tnode __rcu **cptr = &t->trie;
 
 	while (tn) {
-		tp = node_parent(tn);
-		resize(t, tn);
-		tn = tp;
+		struct tnode *tp = node_parent(tn);
+
+		cptr = resize(t, tn);
+		if (!tp)
+			break;
+		tn = container_of(cptr, struct tnode, child[0]);
 	}
+
+	return trie_get_table(container_of(cptr, struct trie, trie));
 }
 
-/* only used from updater-side */
-static int fib_insert_node(struct trie *t, struct tnode *tp,
-			   struct fib_alias *new, t_key key)
+static struct fib_table *fib_insert_node(struct trie *t, struct tnode *tp,
+					 struct fib_alias *new, t_key key)
 {
 	struct tnode *n, *l;
 
 	l = leaf_new(key, new);
 	if (!l)
-		return -ENOMEM;
+		goto noleaf;
 
 	/* retrieve child from parent node */
 	if (tp)
@@ -968,10 +990,8 @@ static int fib_insert_node(struct trie *t, struct tnode *tp,
 		struct tnode *tn;
 
 		tn = tnode_new(key, __fls(key ^ n->key), 1);
-		if (!tn) {
-			node_free(l);
-			return -ENOMEM;
-		}
+		if (!tn)
+			goto notnode;
 
 		/* initialize routes out of node */
 		NODE_INIT_PARENT(tn, tp);
@@ -988,14 +1008,19 @@ static int fib_insert_node(struct trie *t, struct tnode *tp,
 	/* Case 3: n is NULL, and will just insert a new leaf */
 	NODE_INIT_PARENT(l, tp);
 	put_child_root(tp, t, key, l);
-	trie_rebalance(t, tp);
 
-	return 0;
+	return trie_rebalance(t, tp);
+notnode:
+	node_free(l);
+noleaf:
+	return NULL;
 }
 
-static int fib_insert_alias(struct trie *t, struct tnode *tp,
-			    struct tnode *l, struct fib_alias *new,
-			    struct fib_alias *fa, t_key key)
+static struct fib_table *fib_insert_alias(struct trie *t,
+					  struct tnode *tp, struct tnode *l,
+					  struct fib_alias *new,
+					  struct fib_alias *fa,
+					  t_key key)
 {
 	if (!l)
 		return fib_insert_node(t, tp, new, key);
@@ -1021,7 +1046,7 @@ static int fib_insert_alias(struct trie *t, struct tnode *tp,
 		leaf_push_suffix(tp, l);
 	}
 
-	return 0;
+	return trie_get_table(t);
 }
 
 /* Caller must hold RTNL. */
@@ -1154,8 +1179,9 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	new_fa->fa_slen = slen;
 
 	/* Insert new entry to the list. */
-	err = fib_insert_alias(t, tp, l, new_fa, fa, key);
-	if (err)
+	err = -ENOMEM;
+	tb = fib_insert_alias(t, tp, l, new_fa, fa, key);
+	if (!tb)
 		goto out_free_new_fa;
 
 	if (!plen)
@@ -1536,18 +1562,20 @@ backtrace:
 		/* walk trie in reverse order */
 		do {
 			while (!(cindex--)) {
+				struct tnode __rcu **cptr;
 				t_key pkey = pn->key;
 
 				n = pn;
 				pn = node_parent(n);
 
 				/* resize completed node */
-				resize(t, n);
+				cptr = resize(t, n);
 
 				/* if we got the root we are done */
 				if (!pn)
 					goto flush_complete;
 
+				pn = container_of(cptr, struct tnode, child[0]);
 				cindex = get_index(pkey, pn);
 			}
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 11/29] fib_trie: Rename tnode to key_vector
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (9 preceding siblings ...)
  2015-02-24 20:49 ` [RFC PATCH 10/29] fib_trie: Return pointer to tnode pointer in resize/inflate/halve Alexander Duyck
@ 2015-02-24 20:49 ` Alexander Duyck
  2015-02-24 20:49 ` [RFC PATCH 12/29] fib_trie: move leaf and tnode to occupy the same spot in the key vector Alexander Duyck
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:49 UTC (permalink / raw
  To: netdev

Rename the tnode to key_vector.  The key_vector will be the eventual
container for all of the information needed by either a leaf or a tnode.
The final result should be much smaller than the 40 bytes currently needed
for either one.

This also updates the trie struct so that it contains an array of size 1 of
tnode pointers.  This is to bring the structure more inline with how an
actual tnode itself is configured.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |  267 +++++++++++++++++++++++++++------------------------
 1 file changed, 141 insertions(+), 126 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index be1ffe8..cffbe47 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -93,19 +93,19 @@ typedef unsigned int t_key;
 
 #define get_index(_key, _kv) (((_key) ^ (_kv)->key) >> (_kv)->pos)
 
-struct tnode {
+struct key_vector {
 	t_key key;
 	unsigned char bits;		/* 2log(KEYLENGTH) bits needed */
 	unsigned char pos;		/* 2log(KEYLENGTH) bits needed */
 	unsigned char slen;
-	struct tnode __rcu *parent;
+	struct key_vector __rcu *parent;
 	struct rcu_head rcu;
 	union {
 		/* The fields in this struct are valid if bits > 0 (TNODE) */
 		struct {
 			t_key empty_children; /* KEYLENGTH bits needed */
 			t_key full_children;  /* KEYLENGTH bits needed */
-			struct tnode __rcu *child[0];
+			struct key_vector __rcu *tnode[0];
 		};
 		/* This list pointer if valid if bits == 0 (LEAF) */
 		struct hlist_head leaf;
@@ -134,14 +134,14 @@ struct trie_stat {
 };
 
 struct trie {
-	struct tnode __rcu *trie;
+	struct key_vector __rcu *tnode[1];
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	struct trie_use_stats __percpu *stats;
 #endif
 	struct rcu_head rcu;
 };
 
-static struct tnode **resize(struct trie *t, struct tnode *tn);
+static struct key_vector **resize(struct trie *t, struct key_vector *tn);
 static size_t tnode_free_size;
 
 /*
@@ -161,7 +161,7 @@ static struct kmem_cache *trie_leaf_kmem __read_mostly;
 #define node_parent_rcu(n) rcu_dereference_rtnl((n)->parent)
 
 /* wrapper for rcu_assign_pointer */
-static inline void node_set_parent(struct tnode *n, struct tnode *tp)
+static inline void node_set_parent(struct key_vector *n, struct key_vector *tp)
 {
 	if (n)
 		rcu_assign_pointer(n->parent, tp);
@@ -172,23 +172,23 @@ static inline void node_set_parent(struct tnode *n, struct tnode *tp)
 /* This provides us with the number of children in this node, in the case of a
  * leaf this will return 0 meaning none of the children are accessible.
  */
-static inline unsigned long tnode_child_length(const struct tnode *tn)
+static inline unsigned long tnode_child_length(const struct key_vector *tn)
 {
 	return (1ul << tn->bits) & ~(1ul);
 }
 
 /* caller must hold RTNL */
-static inline struct tnode *tnode_get_child(const struct tnode *tn,
-					    unsigned long i)
+static inline struct key_vector *tnode_get_child(struct key_vector *tn,
+						 unsigned long i)
 {
-	return rtnl_dereference(tn->child[i]);
+	return rtnl_dereference(tn->tnode[i]);
 }
 
 /* caller must hold RCU read lock or RTNL */
-static inline struct tnode *tnode_get_child_rcu(const struct tnode *tn,
-						unsigned long i)
+static inline struct key_vector *tnode_get_child_rcu(struct key_vector *tn,
+						     unsigned long i)
 {
-	return rcu_dereference_rtnl(tn->child[i]);
+	return rcu_dereference_rtnl(tn->tnode[i]);
 }
 
 static inline struct fib_table *trie_get_table(struct trie *t)
@@ -273,12 +273,13 @@ static inline void alias_free_mem_rcu(struct fib_alias *fa)
 	call_rcu(&fa->rcu, __alias_free_mem);
 }
 
+#define TNODE_SIZE sizeof(struct key_vector)
 #define TNODE_KMALLOC_MAX \
-	ilog2((PAGE_SIZE - sizeof(struct tnode)) / sizeof(struct tnode *))
+	ilog2((PAGE_SIZE - TNODE_SIZE) / sizeof(struct key_vector *))
 
 static void __node_free_rcu(struct rcu_head *head)
 {
-	struct tnode *n = container_of(head, struct tnode, rcu);
+	struct key_vector *n = container_of(head, struct key_vector, rcu);
 
 	if (IS_LEAF(n))
 		kmem_cache_free(trie_leaf_kmem, n);
@@ -290,7 +291,7 @@ static void __node_free_rcu(struct rcu_head *head)
 
 #define node_free(n) call_rcu(&n->rcu, __node_free_rcu)
 
-static struct tnode *tnode_alloc(size_t size)
+static struct key_vector *tnode_alloc(size_t size)
 {
 	if (size <= PAGE_SIZE)
 		return kzalloc(size, GFP_KERNEL);
@@ -298,19 +299,19 @@ static struct tnode *tnode_alloc(size_t size)
 		return vzalloc(size);
 }
 
-static inline void empty_child_inc(struct tnode *n)
+static inline void empty_child_inc(struct key_vector *n)
 {
 	++n->empty_children ? : ++n->full_children;
 }
 
-static inline void empty_child_dec(struct tnode *n)
+static inline void empty_child_dec(struct key_vector *n)
 {
 	n->empty_children-- ? : n->full_children--;
 }
 
-static struct tnode *leaf_new(t_key key, struct fib_alias *fa)
+static struct key_vector *leaf_new(t_key key, struct fib_alias *fa)
 {
-	struct tnode *l = kmem_cache_alloc(trie_leaf_kmem, GFP_KERNEL);
+	struct key_vector *l = kmem_cache_alloc(trie_leaf_kmem, GFP_KERNEL);
 	if (l) {
 		l->parent = NULL;
 		/* set key and pos to reflect full key value
@@ -330,10 +331,10 @@ static struct tnode *leaf_new(t_key key, struct fib_alias *fa)
 	return l;
 }
 
-static struct tnode *tnode_new(t_key key, int pos, int bits)
+static struct key_vector *tnode_new(t_key key, int pos, int bits)
 {
-	size_t sz = offsetof(struct tnode, child[1ul << bits]);
-	struct tnode *tn = tnode_alloc(sz);
+	size_t sz = offsetof(struct key_vector, tnode[1ul << bits]);
+	struct key_vector *tn = tnode_alloc(sz);
 	unsigned int shift = pos + bits;
 
 	/* verify bits and pos their msb bits clear and values are valid */
@@ -351,15 +352,15 @@ static struct tnode *tnode_new(t_key key, int pos, int bits)
 			tn->empty_children = 1ul << bits;
 	}
 
-	pr_debug("AT %p s=%zu %zu\n", tn, sizeof(struct tnode),
-		 sizeof(struct tnode *) << bits);
+	pr_debug("AT %p s=%zu %zu\n", tn, TNODE_SIZE,
+		 sizeof(struct key_vector *) << bits);
 	return tn;
 }
 
 /* Check whether a tnode 'n' is "full", i.e. it is an internal node
  * and no bits are skipped. See discussion in dyntree paper p. 6
  */
-static inline int tnode_full(const struct tnode *tn, const struct tnode *n)
+static inline int tnode_full(struct key_vector *tn, struct key_vector *n)
 {
 	return n && ((n->pos + n->bits) == tn->pos) && IS_TNODE(n);
 }
@@ -367,9 +368,10 @@ static inline int tnode_full(const struct tnode *tn, const struct tnode *n)
 /* Add a child at position i overwriting the old value.
  * Update the value of full_children and empty_children.
  */
-static void put_child(struct tnode *tn, unsigned long i, struct tnode *n)
+static void put_child(struct key_vector *tn, unsigned long i,
+		      struct key_vector *n)
 {
-	struct tnode *chi = tnode_get_child(tn, i);
+	struct key_vector *chi = tnode_get_child(tn, i);
 	int isfull, wasfull;
 
 	BUG_ON(i >= tnode_child_length(tn));
@@ -392,16 +394,16 @@ static void put_child(struct tnode *tn, unsigned long i, struct tnode *n)
 	if (n && (tn->slen < n->slen))
 		tn->slen = n->slen;
 
-	rcu_assign_pointer(tn->child[i], n);
+	rcu_assign_pointer(tn->tnode[i], n);
 }
 
-static void update_children(struct tnode *tn)
+static void update_children(struct key_vector *tn)
 {
 	unsigned long i;
 
 	/* update all of the child parent pointers */
 	for (i = tnode_child_length(tn); i;) {
-		struct tnode *inode = tnode_get_child(tn, --i);
+		struct key_vector *inode = tnode_get_child(tn, --i);
 
 		if (!inode)
 			continue;
@@ -417,36 +419,38 @@ static void update_children(struct tnode *tn)
 	}
 }
 
-static inline void put_child_root(struct tnode *tp, struct trie *t,
-				  t_key key, struct tnode *n)
+static inline void put_child_root(struct key_vector *tp, struct trie *t,
+				  t_key key, struct key_vector *n)
 {
 	if (tp)
 		put_child(tp, get_index(key, tp), n);
 	else
-		rcu_assign_pointer(t->trie, n);
+		rcu_assign_pointer(t->tnode[0], n);
 }
 
-static inline void tnode_free_init(struct tnode *tn)
+static inline void tnode_free_init(struct key_vector *tn)
 {
 	tn->rcu.next = NULL;
 }
 
-static inline void tnode_free_append(struct tnode *tn, struct tnode *n)
+static inline void tnode_free_append(struct key_vector *tn,
+				     struct key_vector *n)
 {
 	n->rcu.next = tn->rcu.next;
 	tn->rcu.next = &n->rcu;
 }
 
-static void tnode_free(struct tnode *tn)
+static void tnode_free(struct key_vector *tn)
 {
 	struct callback_head *head = &tn->rcu;
 
 	while (head) {
 		head = head->next;
-		tnode_free_size += offsetof(struct tnode, child[1 << tn->bits]);
+		tnode_free_size += offsetof(struct key_vector,
+					    tnode[1 << tn->bits]);
 		node_free(tn);
 
-		tn = container_of(head, struct tnode, rcu);
+		tn = container_of(head, struct key_vector, rcu);
 	}
 
 	if (tnode_free_size >= PAGE_SIZE * sync_pages) {
@@ -455,11 +459,12 @@ static void tnode_free(struct tnode *tn)
 	}
 }
 
-static struct tnode __rcu **replace(struct trie *t, struct tnode *oldtnode,
-				    struct tnode *tn)
+static struct key_vector __rcu **replace(struct trie *t,
+					 struct key_vector *oldtnode,
+					 struct key_vector *tn)
 {
-	struct tnode *tp = node_parent(oldtnode);
-	struct tnode **cptr;
+	struct key_vector *tp = node_parent(oldtnode);
+	struct key_vector **cptr;
 	unsigned long i;
 
 	/* setup the parent pointer out of and back into this node */
@@ -473,11 +478,11 @@ static struct tnode __rcu **replace(struct trie *t, struct tnode *oldtnode,
 	tnode_free(oldtnode);
 
 	/* record the pointer that is pointing to this node */
-	cptr = tp ? tp->child : &t->trie;
+	cptr = tp ? tp->tnode : t->tnode;
 
 	/* resize children now that oldtnode is freed */
 	for (i = tnode_child_length(tn); i;) {
-		struct tnode *inode = tnode_get_child(tn, --i);
+		struct key_vector *inode = tnode_get_child(tn, --i);
 
 		/* resize child node */
 		if (tnode_full(tn, inode))
@@ -487,9 +492,10 @@ static struct tnode __rcu **replace(struct trie *t, struct tnode *oldtnode,
 	return cptr;
 }
 
-static struct tnode __rcu **inflate(struct trie *t, struct tnode *oldtnode)
+static struct key_vector __rcu **inflate(struct trie *t,
+					 struct key_vector *oldtnode)
 {
-	struct tnode *tn;
+	struct key_vector *tn;
 	unsigned long i;
 	t_key m;
 
@@ -508,8 +514,8 @@ static struct tnode __rcu **inflate(struct trie *t, struct tnode *oldtnode)
 	 * nodes.
 	 */
 	for (i = tnode_child_length(oldtnode), m = 1u << tn->pos; i;) {
-		struct tnode *inode = tnode_get_child(oldtnode, --i);
-		struct tnode *node0, *node1;
+		struct key_vector *inode = tnode_get_child(oldtnode, --i);
+		struct key_vector *node0, *node1;
 		unsigned long j, k;
 
 		/* An empty child */
@@ -582,9 +588,10 @@ notnode:
 	return NULL;
 }
 
-static struct tnode __rcu **halve(struct trie *t, struct tnode *oldtnode)
+static struct key_vector __rcu **halve(struct trie *t,
+				       struct key_vector *oldtnode)
 {
-	struct tnode *tn;
+	struct key_vector *tn;
 	unsigned long i;
 
 	pr_debug("In halve\n");
@@ -602,9 +609,9 @@ static struct tnode __rcu **halve(struct trie *t, struct tnode *oldtnode)
 	 * nodes.
 	 */
 	for (i = tnode_child_length(oldtnode); i;) {
-		struct tnode *node1 = tnode_get_child(oldtnode, --i);
-		struct tnode *node0 = tnode_get_child(oldtnode, --i);
-		struct tnode *inode;
+		struct key_vector *node1 = tnode_get_child(oldtnode, --i);
+		struct key_vector *node0 = tnode_get_child(oldtnode, --i);
+		struct key_vector *inode;
 
 		/* At least one of the children is empty */
 		if (!node1 || !node0) {
@@ -636,9 +643,9 @@ notnode:
 	return NULL;
 }
 
-static void collapse(struct trie *t, struct tnode *oldtnode)
+static void collapse(struct trie *t, struct key_vector *oldtnode)
 {
-	struct tnode *n, *tp;
+	struct key_vector *n, *tp;
 	unsigned long i;
 
 	/* scan the tnode looking for that one child that might still exist */
@@ -654,7 +661,7 @@ static void collapse(struct trie *t, struct tnode *oldtnode)
 	node_free(oldtnode);
 }
 
-static unsigned char update_suffix(struct tnode *tn)
+static unsigned char update_suffix(struct key_vector *tn)
 {
 	unsigned char slen = tn->pos;
 	unsigned long stride, i;
@@ -665,7 +672,7 @@ static unsigned char update_suffix(struct tnode *tn)
 	 * represent the nodes with suffix length equal to tn->pos
 	 */
 	for (i = 0, stride = 0x2ul ; i < tnode_child_length(tn); i += stride) {
-		struct tnode *n = tnode_get_child(tn, i);
+		struct key_vector *n = tnode_get_child(tn, i);
 
 		if (!n || (n->slen <= slen))
 			continue;
@@ -746,7 +753,7 @@ static unsigned char update_suffix(struct tnode *tn)
  *    tnode_child_length(tn)
  *
  */
-static bool should_inflate(const struct tnode *tp, const struct tnode *tn)
+static inline bool should_inflate(struct key_vector *tp, struct key_vector *tn)
 {
 	unsigned long used = tnode_child_length(tn);
 	unsigned long threshold = used;
@@ -761,7 +768,7 @@ static bool should_inflate(const struct tnode *tp, const struct tnode *tn)
 	return (used > 1) && tn->pos && ((50 * used) >= threshold);
 }
 
-static bool should_halve(const struct tnode *tp, const struct tnode *tn)
+static inline bool should_halve(struct key_vector *tp, struct key_vector *tn)
 {
 	unsigned long used = tnode_child_length(tn);
 	unsigned long threshold = used;
@@ -775,7 +782,7 @@ static bool should_halve(const struct tnode *tp, const struct tnode *tn)
 	return (used > 1) && (tn->bits > 1) && ((100 * used) < threshold);
 }
 
-static bool should_collapse(const struct tnode *tn)
+static inline bool should_collapse(struct key_vector *tn)
 {
 	unsigned long used = tnode_child_length(tn);
 
@@ -790,14 +797,15 @@ static bool should_collapse(const struct tnode *tn)
 }
 
 #define MAX_WORK 10
-static struct tnode __rcu **resize(struct trie *t, struct tnode *tn)
+static struct key_vector __rcu **resize(struct trie *t,
+					struct key_vector *tn)
 {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	struct trie_use_stats __percpu *stats = t->stats;
 #endif
-	struct tnode *tp = node_parent(tn);
+	struct key_vector *tp = node_parent(tn);
 	unsigned long cindex = tp ? get_index(tn->key, tp) : 0;
-	struct tnode __rcu **cptr = tp ? tp->child : &t->trie;
+	struct key_vector __rcu **cptr = tp ? tp->tnode : t->tnode;
 	int max_work = MAX_WORK;
 
 	pr_debug("In tnode_resize %p inflate_threshold=%d threshold=%d\n",
@@ -813,7 +821,7 @@ static struct tnode __rcu **resize(struct trie *t, struct tnode *tn)
 	 * nonempty nodes that are above the threshold.
 	 */
 	while (should_inflate(tp, tn) && max_work) {
-		struct tnode __rcu **tcptr = inflate(t, tn);
+		struct key_vector __rcu **tcptr = inflate(t, tn);
 
 		if (!tcptr) {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
@@ -835,7 +843,7 @@ static struct tnode __rcu **resize(struct trie *t, struct tnode *tn)
 	 * node is above threshold.
 	 */
 	while (should_halve(tp, tn) && max_work) {
-		struct tnode __rcu **tcptr = halve(t, tn);
+		struct key_vector __rcu **tcptr = halve(t, tn);
 
 		if (!tcptr) {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
@@ -870,7 +878,7 @@ static struct tnode __rcu **resize(struct trie *t, struct tnode *tn)
 	return cptr;
 }
 
-static void leaf_pull_suffix(struct tnode *tp, struct tnode *l)
+static void leaf_pull_suffix(struct key_vector *tp, struct key_vector *l)
 {
 	while (tp && (tp->slen > tp->pos) && (tp->slen > l->slen)) {
 		if (update_suffix(tp) > l->slen)
@@ -879,7 +887,7 @@ static void leaf_pull_suffix(struct tnode *tp, struct tnode *l)
 	}
 }
 
-static void leaf_push_suffix(struct tnode *tn, struct tnode *l)
+static void leaf_push_suffix(struct key_vector *tn, struct key_vector *l)
 {
 	/* if this is a new leaf then tn will be NULL and we can sort
 	 * out parent suffix lengths as a part of trie_rebalance
@@ -891,9 +899,10 @@ static void leaf_push_suffix(struct tnode *tn, struct tnode *l)
 }
 
 /* rcu_read_lock needs to be hold by caller from readside */
-static struct tnode *fib_find_node(struct trie *t, struct tnode **tp, u32 key)
+static struct key_vector *fib_find_node(struct trie *t,
+					struct key_vector **tp, u32 key)
 {
-	struct tnode *n = rcu_dereference_rtnl(t->trie);
+	struct key_vector *n = rcu_dereference_rtnl(t->tnode[0]);
 
 	*tp = NULL;
 
@@ -949,26 +958,29 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen,
 	return NULL;
 }
 
-static struct fib_table *trie_rebalance(struct trie *t, struct tnode *tn)
+static struct fib_table *trie_rebalance(struct trie *t,
+					struct key_vector *tn)
 {
-	struct tnode __rcu **cptr = &t->trie;
+	struct key_vector __rcu **cptr = t->tnode;
 
 	while (tn) {
-		struct tnode *tp = node_parent(tn);
+		struct key_vector *tp = node_parent(tn);
 
 		cptr = resize(t, tn);
 		if (!tp)
 			break;
-		tn = container_of(cptr, struct tnode, child[0]);
+		tn = container_of(cptr, struct key_vector, tnode[0]);
 	}
 
-	return trie_get_table(container_of(cptr, struct trie, trie));
+	return trie_get_table(container_of(cptr, struct trie, tnode[0]));
 }
 
-static struct fib_table *fib_insert_node(struct trie *t, struct tnode *tp,
-					 struct fib_alias *new, t_key key)
+static struct fib_table *fib_insert_node(struct trie *t,
+					 struct key_vector *tp,
+					 struct fib_alias *new,
+					 t_key key)
 {
-	struct tnode *n, *l;
+	struct key_vector *n, *l;
 
 	l = leaf_new(key, new);
 	if (!l)
@@ -978,7 +990,7 @@ static struct fib_table *fib_insert_node(struct trie *t, struct tnode *tp,
 	if (tp)
 		n = tnode_get_child(tp, get_index(key, tp));
 	else
-		n = rcu_dereference_rtnl(t->trie);
+		n = rcu_dereference_rtnl(t->tnode[0]);
 
 	/* Case 2: n is a LEAF or a TNODE and the key doesn't match.
 	 *
@@ -987,7 +999,7 @@ static struct fib_table *fib_insert_node(struct trie *t, struct tnode *tp,
 	 *  leaves us in position for handling as case 3
 	 */
 	if (n) {
-		struct tnode *tn;
+		struct key_vector *tn;
 
 		tn = tnode_new(key, __fls(key ^ n->key), 1);
 		if (!tn)
@@ -1017,7 +1029,8 @@ noleaf:
 }
 
 static struct fib_table *fib_insert_alias(struct trie *t,
-					  struct tnode *tp, struct tnode *l,
+					  struct key_vector *tp,
+					  struct key_vector *l,
 					  struct fib_alias *new,
 					  struct fib_alias *fa,
 					  t_key key)
@@ -1054,7 +1067,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
 	struct fib_alias *fa, *new_fa;
-	struct tnode *l, *tp;
+	struct key_vector *l, *tp;
 	struct fib_info *fi;
 	u8 plen = cfg->fc_dst_len;
 	u8 slen = KEYLENGTH - plen;
@@ -1201,7 +1214,7 @@ err:
 	return err;
 }
 
-static inline t_key prefix_mismatch(t_key key, struct tnode *n)
+static inline t_key prefix_mismatch(t_key key, struct key_vector *n)
 {
 	t_key prefix = n->key;
 
@@ -1217,11 +1230,11 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 	struct trie_use_stats __percpu *stats = t->stats;
 #endif
 	const t_key key = ntohl(flp->daddr);
-	struct tnode *n, *pn;
+	struct key_vector *n, *pn;
 	struct fib_alias *fa;
 	t_key cindex;
 
-	n = rcu_dereference(t->trie);
+	n = rcu_dereference(t->tnode[0]);
 	if (!n)
 		return -EAGAIN;
 
@@ -1269,7 +1282,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 	/* Step 2: Sort out leaves and begin backtracing for longest prefix */
 	for (;;) {
 		/* record the pointer where our next node pointer is stored */
-		struct tnode __rcu **cptr = n->child;
+		struct key_vector __rcu **cptr = n->tnode;
 
 		/* This test verifies that none of the bits that differ
 		 * between the key and the prefix exist in the region of
@@ -1315,7 +1328,7 @@ backtrace:
 			cindex &= cindex - 1;
 
 			/* grab pointer for next child node */
-			cptr = &pn->child[cindex];
+			cptr = &pn->tnode[cindex];
 		}
 	}
 
@@ -1375,8 +1388,8 @@ found:
 }
 EXPORT_SYMBOL_GPL(fib_table_lookup);
 
-static void fib_remove_alias(struct trie *t, struct tnode *tp,
-			     struct tnode *l, struct fib_alias *old)
+static void fib_remove_alias(struct trie *t, struct key_vector *tp,
+			     struct key_vector *l, struct fib_alias *old)
 {
 	/* record the location of the previous list_info entry */
 	struct hlist_node **pprev = old->fa_list.pprev;
@@ -1409,7 +1422,7 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *) tb->tb_data;
 	struct fib_alias *fa, *fa_to_delete;
-	struct tnode *l, *tp;
+	struct key_vector *l, *tp;
 	u8 plen = cfg->fc_dst_len;
 	u8 slen = KEYLENGTH - plen;
 	u8 tos = cfg->fc_tos;
@@ -1473,9 +1486,9 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 }
 
 /* Scan for the next leaf starting at the provided key value */
-static struct tnode *leaf_walk_rcu(struct tnode **pn, t_key key)
+static struct key_vector *leaf_walk_rcu(struct key_vector **pn, t_key key)
 {
-	struct tnode *tn = NULL, *n = *pn;
+	struct key_vector *tn = NULL, *n = *pn;
 	unsigned long cindex;
 
 	/* record parent node for backtracing */
@@ -1540,14 +1553,14 @@ found:
 int fib_table_flush(struct fib_table *tb)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
+	struct key_vector *n, *pn;
 	struct hlist_node *tmp;
 	struct fib_alias *fa;
-	struct tnode *n, *pn;
 	unsigned long cindex;
 	unsigned char slen;
 	int found = 0;
 
-	n = rcu_dereference(t->trie);
+	n = rcu_dereference(t->tnode[0]);
 	if (!n)
 		goto flush_complete;
 
@@ -1562,7 +1575,7 @@ backtrace:
 		/* walk trie in reverse order */
 		do {
 			while (!(cindex--)) {
-				struct tnode __rcu **cptr;
+				struct key_vector __rcu **cptr;
 				t_key pkey = pn->key;
 
 				n = pn;
@@ -1575,7 +1588,8 @@ backtrace:
 				if (!pn)
 					goto flush_complete;
 
-				pn = container_of(cptr, struct tnode, child[0]);
+				pn = container_of(cptr, struct key_vector,
+						  tnode[0]);
 				cindex = get_index(pkey, pn);
 			}
 
@@ -1638,7 +1652,7 @@ void fib_free_table(struct fib_table *tb)
 	call_rcu(&t->rcu, __trie_free_rcu);
 }
 
-static int fn_trie_dump_leaf(struct tnode *l, struct fib_table *tb,
+static int fn_trie_dump_leaf(struct key_vector *l, struct fib_table *tb,
 			     struct sk_buff *skb, struct netlink_callback *cb)
 {
 	__be32 xkey = htonl(l->key);
@@ -1679,14 +1693,14 @@ int fib_table_dump(struct fib_table *tb, struct sk_buff *skb,
 		   struct netlink_callback *cb)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
-	struct tnode *l, *tp;
+	struct key_vector *l, *tp;
 	/* Dump starting at last key.
 	 * Note: 0.0.0.0/0 (ie default) is first key.
 	 */
 	int count = cb->args[2];
 	t_key key = cb->args[3];
 
-	tp = rcu_dereference_rtnl(t->trie);
+	tp = rcu_dereference_rtnl(t->tnode[0]);
 
 	while ((l = leaf_walk_rcu(&tp, key)) != NULL) {
 		if (fn_trie_dump_leaf(l, tb, skb, cb) < 0) {
@@ -1719,7 +1733,7 @@ void __init fib_trie_init(void)
 					  0, SLAB_PANIC, NULL);
 
 	trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
-					   sizeof(struct tnode),
+					   TNODE_SIZE,
 					   0, SLAB_PANIC, NULL);
 }
 
@@ -1739,7 +1753,7 @@ struct fib_table *fib_trie_table(u32 id)
 	tb->tb_num_default = 0;
 
 	t = (struct trie *) tb->tb_data;
-	RCU_INIT_POINTER(t->trie, NULL);
+	RCU_INIT_POINTER(t->tnode[0], NULL);
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	t->stats = alloc_percpu(struct trie_use_stats);
 	if (!t->stats) {
@@ -1756,16 +1770,16 @@ struct fib_table *fib_trie_table(u32 id)
 struct fib_trie_iter {
 	struct seq_net_private p;
 	struct fib_table *tb;
-	struct tnode *tnode;
+	struct key_vector *tnode;
 	unsigned int index;
 	unsigned int depth;
 };
 
-static struct tnode *fib_trie_get_next(struct fib_trie_iter *iter)
+static struct key_vector *fib_trie_get_next(struct fib_trie_iter *iter)
 {
 	unsigned long cindex = iter->index;
-	struct tnode *tn = iter->tnode;
-	struct tnode *p;
+	struct key_vector *tn = iter->tnode;
+	struct key_vector *p;
 
 	/* A single entry routing table */
 	if (!tn)
@@ -1775,7 +1789,7 @@ static struct tnode *fib_trie_get_next(struct fib_trie_iter *iter)
 		 iter->tnode, iter->index, iter->depth);
 rescan:
 	while (cindex < tnode_child_length(tn)) {
-		struct tnode *n = tnode_get_child_rcu(tn, cindex);
+		struct key_vector *n = tnode_get_child_rcu(tn, cindex);
 
 		if (n) {
 			if (IS_LEAF(n)) {
@@ -1806,15 +1820,15 @@ rescan:
 	return NULL;
 }
 
-static struct tnode *fib_trie_get_first(struct fib_trie_iter *iter,
-				       struct trie *t)
+static struct key_vector *fib_trie_get_first(struct fib_trie_iter *iter,
+					     struct trie *t)
 {
-	struct tnode *n;
+	struct key_vector *n;
 
 	if (!t)
 		return NULL;
 
-	n = rcu_dereference(t->trie);
+	n = rcu_dereference(t->tnode[0]);
 	if (!n)
 		return NULL;
 
@@ -1833,7 +1847,7 @@ static struct tnode *fib_trie_get_first(struct fib_trie_iter *iter,
 
 static void trie_collect_stats(struct trie *t, struct trie_stat *s)
 {
-	struct tnode *n;
+	struct key_vector *n;
 	struct fib_trie_iter iter;
 
 	memset(s, 0, sizeof(*s));
@@ -1877,13 +1891,13 @@ static void trie_show_stats(struct seq_file *seq, struct trie_stat *stat)
 	seq_printf(seq, "\tMax depth:      %u\n", stat->maxdepth);
 
 	seq_printf(seq, "\tLeaves:         %u\n", stat->leaves);
-	bytes = sizeof(struct tnode) * stat->leaves;
+	bytes = TNODE_SIZE * stat->leaves;
 
 	seq_printf(seq, "\tPrefixes:       %u\n", stat->prefixes);
 	bytes += sizeof(struct fib_alias) * stat->prefixes;
 
 	seq_printf(seq, "\tInternal nodes: %u\n\t", stat->tnodes);
-	bytes += sizeof(struct tnode) * stat->tnodes;
+	bytes += TNODE_SIZE * stat->tnodes;
 
 	max = MAX_STAT_DEPTH;
 	while (max > 0 && stat->nodesizes[max-1] == 0)
@@ -1898,7 +1912,7 @@ static void trie_show_stats(struct seq_file *seq, struct trie_stat *stat)
 	seq_putc(seq, '\n');
 	seq_printf(seq, "\tPointers: %u\n", pointers);
 
-	bytes += sizeof(struct tnode *) * pointers;
+	bytes += sizeof(struct key_vector *) * pointers;
 	seq_printf(seq, "Null ptrs: %u\n", stat->nullpointers);
 	seq_printf(seq, "Total size: %u  kB\n", (bytes + 1023) / 1024);
 }
@@ -1952,7 +1966,7 @@ static int fib_triestat_seq_show(struct seq_file *seq, void *v)
 	seq_printf(seq,
 		   "Basic info: size of leaf:"
 		   " %Zd bytes, size of tnode: %Zd bytes.\n",
-		   sizeof(struct tnode), sizeof(struct tnode));
+		   TNODE_SIZE, TNODE_SIZE);
 
 	for (h = 0; h < FIB_TABLE_HASHSZ; h++) {
 		struct hlist_head *head = &net->ipv4.fib_table_hash[h];
@@ -1991,7 +2005,7 @@ static const struct file_operations fib_triestat_fops = {
 	.release = single_release_net,
 };
 
-static struct tnode *fib_trie_get_idx(struct seq_file *seq, loff_t pos)
+static struct key_vector *fib_trie_get_idx(struct seq_file *seq, loff_t pos)
 {
 	struct fib_trie_iter *iter = seq->private;
 	struct net *net = seq_file_net(seq);
@@ -2003,7 +2017,7 @@ static struct tnode *fib_trie_get_idx(struct seq_file *seq, loff_t pos)
 		struct fib_table *tb;
 
 		hlist_for_each_entry_rcu(tb, head, tb_hlist) {
-			struct tnode *n;
+			struct key_vector *n;
 
 			for (n = fib_trie_get_first(iter,
 						    (struct trie *) tb->tb_data);
@@ -2032,7 +2046,7 @@ static void *fib_trie_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	struct fib_table *tb = iter->tb;
 	struct hlist_node *tb_node;
 	unsigned int h;
-	struct tnode *n;
+	struct key_vector *n;
 
 	++*pos;
 	/* next node in same table */
@@ -2118,7 +2132,7 @@ static inline const char *rtn_type(char *buf, size_t len, unsigned int t)
 static int fib_trie_seq_show(struct seq_file *seq, void *v)
 {
 	const struct fib_trie_iter *iter = seq->private;
-	struct tnode *n = v;
+	struct key_vector *n = v;
 
 	if (!node_parent_rcu(n))
 		fib_table_print(seq, iter->tb);
@@ -2180,15 +2194,16 @@ static const struct file_operations fib_trie_fops = {
 struct fib_route_iter {
 	struct seq_net_private p;
 	struct fib_table *main_tb;
-	struct tnode *tnode;
+	struct key_vector *tnode;
 	loff_t	pos;
 	t_key	key;
 };
 
-static struct tnode *fib_route_get_idx(struct fib_route_iter *iter, loff_t pos)
+static struct key_vector *fib_route_get_idx(struct fib_route_iter *iter,
+					    loff_t pos)
 {
 	struct fib_table *tb = iter->main_tb;
-	struct tnode *l, **tp = &iter->tnode;
+	struct key_vector *l, **tp = &iter->tnode;
 	struct trie *t;
 	t_key key;
 
@@ -2198,7 +2213,7 @@ static struct tnode *fib_route_get_idx(struct fib_route_iter *iter, loff_t pos)
 		key = iter->key;
 	} else {
 		t = (struct trie *)tb->tb_data;
-		iter->tnode = rcu_dereference_rtnl(t->trie);
+		iter->tnode = rcu_dereference_rtnl(t->tnode[0]);
 		iter->pos = 0;
 		key = 0;
 	}
@@ -2244,7 +2259,7 @@ static void *fib_route_seq_start(struct seq_file *seq, loff_t *pos)
 		return fib_route_get_idx(iter, *pos);
 
 	t = (struct trie *)tb->tb_data;
-	iter->tnode = rcu_dereference_rtnl(t->trie);
+	iter->tnode = rcu_dereference_rtnl(t->tnode[0]);
 	iter->pos = 0;
 	iter->key = 0;
 
@@ -2254,7 +2269,7 @@ static void *fib_route_seq_start(struct seq_file *seq, loff_t *pos)
 static void *fib_route_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
 	struct fib_route_iter *iter = seq->private;
-	struct tnode *l = NULL;
+	struct key_vector *l = NULL;
 	t_key key = iter->key;
 
 	++*pos;
@@ -2302,7 +2317,7 @@ static unsigned int fib_flag_trans(int type, __be32 mask, const struct fib_info
 static int fib_route_seq_show(struct seq_file *seq, void *v)
 {
 	struct fib_alias *fa;
-	struct tnode *l = v;
+	struct key_vector *l = v;
 	__be32 prefix;
 
 	if (v == SEQ_START_TOKEN) {

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 12/29] fib_trie: move leaf and tnode to occupy the same spot in the key vector
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (10 preceding siblings ...)
  2015-02-24 20:49 ` [RFC PATCH 11/29] fib_trie: Rename tnode to key_vector Alexander Duyck
@ 2015-02-24 20:49 ` Alexander Duyck
  2015-02-24 20:49 ` [RFC PATCH 13/29] fib_trie: replace tnode_get_child functions with get_child macros Alexander Duyck
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:49 UTC (permalink / raw
  To: netdev

If we are going to compact the leaf and tnode we first need to make sure
the fields are all in the same place.  In that regard I am moving the leaf
pointer which represents the fib_alias hash list to occupy what is
currently the first key_vector pointer.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index cffbe47..caa2e28 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -94,21 +94,19 @@ typedef unsigned int t_key;
 #define get_index(_key, _kv) (((_key) ^ (_kv)->key) >> (_kv)->pos)
 
 struct key_vector {
+	struct rcu_head rcu;
+	t_key empty_children;		/* KEYLENGTH bits needed */
+	t_key full_children;		/* KEYLENGTH bits needed */
+	struct key_vector __rcu *parent;
 	t_key key;
 	unsigned char bits;		/* 2log(KEYLENGTH) bits needed */
 	unsigned char pos;		/* 2log(KEYLENGTH) bits needed */
 	unsigned char slen;
-	struct key_vector __rcu *parent;
-	struct rcu_head rcu;
 	union {
-		/* The fields in this struct are valid if bits > 0 (TNODE) */
-		struct {
-			t_key empty_children; /* KEYLENGTH bits needed */
-			t_key full_children;  /* KEYLENGTH bits needed */
-			struct key_vector __rcu *tnode[0];
-		};
 		/* This list pointer if valid if bits == 0 (LEAF) */
 		struct hlist_head leaf;
+		/* The fields in this struct are valid if bits > 0 (TNODE) */
+		struct key_vector __rcu *tnode[0];
 	};
 };
 
@@ -273,7 +271,7 @@ static inline void alias_free_mem_rcu(struct fib_alias *fa)
 	call_rcu(&fa->rcu, __alias_free_mem);
 }
 
-#define TNODE_SIZE sizeof(struct key_vector)
+#define TNODE_SIZE offsetof(struct key_vector, tnode[0])
 #define TNODE_KMALLOC_MAX \
 	ilog2((PAGE_SIZE - TNODE_SIZE) / sizeof(struct key_vector *))
 
@@ -1733,7 +1731,7 @@ void __init fib_trie_init(void)
 					  0, SLAB_PANIC, NULL);
 
 	trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
-					   TNODE_SIZE,
+					   sizeof(struct key_vector),
 					   0, SLAB_PANIC, NULL);
 }
 
@@ -1891,7 +1889,7 @@ static void trie_show_stats(struct seq_file *seq, struct trie_stat *stat)
 	seq_printf(seq, "\tMax depth:      %u\n", stat->maxdepth);
 
 	seq_printf(seq, "\tLeaves:         %u\n", stat->leaves);
-	bytes = TNODE_SIZE * stat->leaves;
+	bytes = sizeof(struct key_vector) * stat->leaves;
 
 	seq_printf(seq, "\tPrefixes:       %u\n", stat->prefixes);
 	bytes += sizeof(struct fib_alias) * stat->prefixes;
@@ -1966,7 +1964,7 @@ static int fib_triestat_seq_show(struct seq_file *seq, void *v)
 	seq_printf(seq,
 		   "Basic info: size of leaf:"
 		   " %Zd bytes, size of tnode: %Zd bytes.\n",
-		   TNODE_SIZE, TNODE_SIZE);
+		   sizeof(struct key_vector), TNODE_SIZE);
 
 	for (h = 0; h < FIB_TABLE_HASHSZ; h++) {
 		struct hlist_head *head = &net->ipv4.fib_table_hash[h];

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 13/29] fib_trie: replace tnode_get_child functions with get_child macros
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (11 preceding siblings ...)
  2015-02-24 20:49 ` [RFC PATCH 12/29] fib_trie: move leaf and tnode to occupy the same spot in the key vector Alexander Duyck
@ 2015-02-24 20:49 ` Alexander Duyck
  2015-02-24 20:49 ` [RFC PATCH 14/29] fib_trie: Rename tnode_child_length to child_length Alexander Duyck
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:49 UTC (permalink / raw
  To: netdev

I am replacing the tnode_get_child call with get_child since we are
techically pulling the child out of a key_vector now and not a tnode.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   58 ++++++++++++++++++++-------------------------------
 1 file changed, 23 insertions(+), 35 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index caa2e28..d9b192b 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -154,9 +154,11 @@ static struct kmem_cache *trie_leaf_kmem __read_mostly;
 
 /* caller must hold RTNL */
 #define node_parent(n) rtnl_dereference((n)->parent)
+#define get_child(tn, i) rtnl_dereference((tn)->tnode[i])
 
 /* caller must hold RCU read lock or RTNL */
 #define node_parent_rcu(n) rcu_dereference_rtnl((n)->parent)
+#define get_child_rcu(tn, i) rcu_dereference_rtnl((tn)->tnode[i])
 
 /* wrapper for rcu_assign_pointer */
 static inline void node_set_parent(struct key_vector *n, struct key_vector *tp)
@@ -175,20 +177,6 @@ static inline unsigned long tnode_child_length(const struct key_vector *tn)
 	return (1ul << tn->bits) & ~(1ul);
 }
 
-/* caller must hold RTNL */
-static inline struct key_vector *tnode_get_child(struct key_vector *tn,
-						 unsigned long i)
-{
-	return rtnl_dereference(tn->tnode[i]);
-}
-
-/* caller must hold RCU read lock or RTNL */
-static inline struct key_vector *tnode_get_child_rcu(struct key_vector *tn,
-						     unsigned long i)
-{
-	return rcu_dereference_rtnl(tn->tnode[i]);
-}
-
 static inline struct fib_table *trie_get_table(struct trie *t)
 {
 	unsigned long *tb_data = (unsigned long *)t;
@@ -369,7 +357,7 @@ static inline int tnode_full(struct key_vector *tn, struct key_vector *n)
 static void put_child(struct key_vector *tn, unsigned long i,
 		      struct key_vector *n)
 {
-	struct key_vector *chi = tnode_get_child(tn, i);
+	struct key_vector *chi = get_child(tn, i);
 	int isfull, wasfull;
 
 	BUG_ON(i >= tnode_child_length(tn));
@@ -401,7 +389,7 @@ static void update_children(struct key_vector *tn)
 
 	/* update all of the child parent pointers */
 	for (i = tnode_child_length(tn); i;) {
-		struct key_vector *inode = tnode_get_child(tn, --i);
+		struct key_vector *inode = get_child(tn, --i);
 
 		if (!inode)
 			continue;
@@ -480,7 +468,7 @@ static struct key_vector __rcu **replace(struct trie *t,
 
 	/* resize children now that oldtnode is freed */
 	for (i = tnode_child_length(tn); i;) {
-		struct key_vector *inode = tnode_get_child(tn, --i);
+		struct key_vector *inode = get_child(tn, --i);
 
 		/* resize child node */
 		if (tnode_full(tn, inode))
@@ -512,7 +500,7 @@ static struct key_vector __rcu **inflate(struct trie *t,
 	 * nodes.
 	 */
 	for (i = tnode_child_length(oldtnode), m = 1u << tn->pos; i;) {
-		struct key_vector *inode = tnode_get_child(oldtnode, --i);
+		struct key_vector *inode = get_child(oldtnode, --i);
 		struct key_vector *node0, *node1;
 		unsigned long j, k;
 
@@ -531,8 +519,8 @@ static struct key_vector __rcu **inflate(struct trie *t,
 
 		/* An internal node with two children */
 		if (inode->bits == 1) {
-			put_child(tn, 2 * i + 1, tnode_get_child(inode, 1));
-			put_child(tn, 2 * i, tnode_get_child(inode, 0));
+			put_child(tn, 2 * i + 1, get_child(inode, 1));
+			put_child(tn, 2 * i, get_child(inode, 0));
 			continue;
 		}
 
@@ -562,10 +550,10 @@ static struct key_vector __rcu **inflate(struct trie *t,
 
 		/* populate child pointers in new nodes */
 		for (k = tnode_child_length(inode), j = k / 2; j;) {
-			put_child(node1, --j, tnode_get_child(inode, --k));
-			put_child(node0, j, tnode_get_child(inode, j));
-			put_child(node1, --j, tnode_get_child(inode, --k));
-			put_child(node0, j, tnode_get_child(inode, j));
+			put_child(node1, --j, get_child(inode, --k));
+			put_child(node0, j, get_child(inode, j));
+			put_child(node1, --j, get_child(inode, --k));
+			put_child(node0, j, get_child(inode, j));
 		}
 
 		/* link new nodes to parent */
@@ -607,8 +595,8 @@ static struct key_vector __rcu **halve(struct trie *t,
 	 * nodes.
 	 */
 	for (i = tnode_child_length(oldtnode); i;) {
-		struct key_vector *node1 = tnode_get_child(oldtnode, --i);
-		struct key_vector *node0 = tnode_get_child(oldtnode, --i);
+		struct key_vector *node1 = get_child(oldtnode, --i);
+		struct key_vector *node0 = get_child(oldtnode, --i);
 		struct key_vector *inode;
 
 		/* At least one of the children is empty */
@@ -648,7 +636,7 @@ static void collapse(struct trie *t, struct key_vector *oldtnode)
 
 	/* scan the tnode looking for that one child that might still exist */
 	for (n = NULL, i = tnode_child_length(oldtnode); !n && i;)
-		n = tnode_get_child(oldtnode, --i);
+		n = get_child(oldtnode, --i);
 
 	/* compress one level */
 	tp = node_parent(oldtnode);
@@ -670,7 +658,7 @@ static unsigned char update_suffix(struct key_vector *tn)
 	 * represent the nodes with suffix length equal to tn->pos
 	 */
 	for (i = 0, stride = 0x2ul ; i < tnode_child_length(tn); i += stride) {
-		struct key_vector *n = tnode_get_child(tn, i);
+		struct key_vector *n = get_child(tn, i);
 
 		if (!n || (n->slen <= slen))
 			continue;
@@ -925,7 +913,7 @@ static struct key_vector *fib_find_node(struct trie *t,
 			break;
 
 		*tp = n;
-		n = tnode_get_child_rcu(n, index);
+		n = get_child_rcu(n, index);
 	}
 
 	return n;
@@ -986,7 +974,7 @@ static struct fib_table *fib_insert_node(struct trie *t,
 
 	/* retrieve child from parent node */
 	if (tp)
-		n = tnode_get_child(tp, get_index(key, tp));
+		n = get_child(tp, get_index(key, tp));
 	else
 		n = rcu_dereference_rtnl(t->tnode[0]);
 
@@ -1272,7 +1260,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 			cindex = index;
 		}
 
-		n = tnode_get_child_rcu(n, index);
+		n = get_child_rcu(n, index);
 		if (unlikely(!n))
 			goto backtrace;
 	}
@@ -1508,7 +1496,7 @@ static struct key_vector *leaf_walk_rcu(struct key_vector **pn, t_key key)
 		cindex = idx;
 
 		/* descend into the next child */
-		n = tnode_get_child_rcu(tn, cindex++);
+		n = get_child_rcu(tn, cindex++);
 	}
 
 	/* this loop will search for the next leaf with a greater key */
@@ -1526,7 +1514,7 @@ static struct key_vector *leaf_walk_rcu(struct key_vector **pn, t_key key)
 		}
 
 		/* grab the next available node */
-		n = tnode_get_child_rcu(tn, cindex++);
+		n = get_child_rcu(tn, cindex++);
 		if (!n)
 			continue;
 
@@ -1592,7 +1580,7 @@ backtrace:
 			}
 
 			/* grab the next available node */
-			n = tnode_get_child(pn, cindex);
+			n = get_child(pn, cindex);
 		} while (!n);
 	}
 
@@ -1787,7 +1775,7 @@ static struct key_vector *fib_trie_get_next(struct fib_trie_iter *iter)
 		 iter->tnode, iter->index, iter->depth);
 rescan:
 	while (cindex < tnode_child_length(tn)) {
-		struct key_vector *n = tnode_get_child_rcu(tn, cindex);
+		struct key_vector *n = get_child_rcu(tn, cindex);
 
 		if (n) {
 			if (IS_LEAF(n)) {

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 14/29] fib_trie: Rename tnode_child_length to child_length
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (12 preceding siblings ...)
  2015-02-24 20:49 ` [RFC PATCH 13/29] fib_trie: replace tnode_get_child functions with get_child macros Alexander Duyck
@ 2015-02-24 20:49 ` Alexander Duyck
  2015-02-24 20:49 ` [RFC PATCH 15/29] fib_trie: Add tnode struct as a container for fields not needed in key_vector Alexander Duyck
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:49 UTC (permalink / raw
  To: netdev

We are now checking the length of a key_vector instead of a tnode so it
makes sense to probably just rename this to child_length since it would
probably even be applicable to a leaf.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   53 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 29 insertions(+), 24 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index d9b192b..711f8f2 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -91,8 +91,6 @@ typedef unsigned int t_key;
 #define IS_TNODE(n) ((n)->bits)
 #define IS_LEAF(n) (!(n)->bits)
 
-#define get_index(_key, _kv) (((_key) ^ (_kv)->key) >> (_kv)->pos)
-
 struct key_vector {
 	struct rcu_head rcu;
 	t_key empty_children;		/* KEYLENGTH bits needed */
@@ -172,11 +170,18 @@ static inline void node_set_parent(struct key_vector *n, struct key_vector *tp)
 /* This provides us with the number of children in this node, in the case of a
  * leaf this will return 0 meaning none of the children are accessible.
  */
-static inline unsigned long tnode_child_length(const struct key_vector *tn)
+static inline unsigned long child_length(const struct key_vector *tn)
 {
 	return (1ul << tn->bits) & ~(1ul);
 }
 
+static inline unsigned long get_index(t_key key, struct key_vector *kv)
+{
+	unsigned long index = key ^ kv->key;
+
+	return index >> kv->pos;
+}
+
 static inline struct fib_table *trie_get_table(struct trie *t)
 {
 	unsigned long *tb_data = (unsigned long *)t;
@@ -360,7 +365,7 @@ static void put_child(struct key_vector *tn, unsigned long i,
 	struct key_vector *chi = get_child(tn, i);
 	int isfull, wasfull;
 
-	BUG_ON(i >= tnode_child_length(tn));
+	BUG_ON(i >= child_length(tn));
 
 	/* update emptyChildren, overflow into fullChildren */
 	if (n == NULL && chi != NULL)
@@ -388,7 +393,7 @@ static void update_children(struct key_vector *tn)
 	unsigned long i;
 
 	/* update all of the child parent pointers */
-	for (i = tnode_child_length(tn); i;) {
+	for (i = child_length(tn); i;) {
 		struct key_vector *inode = get_child(tn, --i);
 
 		if (!inode)
@@ -467,7 +472,7 @@ static struct key_vector __rcu **replace(struct trie *t,
 	cptr = tp ? tp->tnode : t->tnode;
 
 	/* resize children now that oldtnode is freed */
-	for (i = tnode_child_length(tn); i;) {
+	for (i = child_length(tn); i;) {
 		struct key_vector *inode = get_child(tn, --i);
 
 		/* resize child node */
@@ -499,7 +504,7 @@ static struct key_vector __rcu **inflate(struct trie *t,
 	 * point to existing tnodes and the links between our allocated
 	 * nodes.
 	 */
-	for (i = tnode_child_length(oldtnode), m = 1u << tn->pos; i;) {
+	for (i = child_length(oldtnode), m = 1u << tn->pos; i;) {
 		struct key_vector *inode = get_child(oldtnode, --i);
 		struct key_vector *node0, *node1;
 		unsigned long j, k;
@@ -549,7 +554,7 @@ static struct key_vector __rcu **inflate(struct trie *t,
 		tnode_free_append(tn, node0);
 
 		/* populate child pointers in new nodes */
-		for (k = tnode_child_length(inode), j = k / 2; j;) {
+		for (k = child_length(inode), j = k / 2; j;) {
 			put_child(node1, --j, get_child(inode, --k));
 			put_child(node0, j, get_child(inode, j));
 			put_child(node1, --j, get_child(inode, --k));
@@ -594,7 +599,7 @@ static struct key_vector __rcu **halve(struct trie *t,
 	 * point to existing tnodes and the links between our allocated
 	 * nodes.
 	 */
-	for (i = tnode_child_length(oldtnode); i;) {
+	for (i = child_length(oldtnode); i;) {
 		struct key_vector *node1 = get_child(oldtnode, --i);
 		struct key_vector *node0 = get_child(oldtnode, --i);
 		struct key_vector *inode;
@@ -635,7 +640,7 @@ static void collapse(struct trie *t, struct key_vector *oldtnode)
 	unsigned long i;
 
 	/* scan the tnode looking for that one child that might still exist */
-	for (n = NULL, i = tnode_child_length(oldtnode); !n && i;)
+	for (n = NULL, i = child_length(oldtnode); !n && i;)
 		n = get_child(oldtnode, --i);
 
 	/* compress one level */
@@ -657,7 +662,7 @@ static unsigned char update_suffix(struct key_vector *tn)
 	 * why we start with a stride of 2 since a stride of 1 would
 	 * represent the nodes with suffix length equal to tn->pos
 	 */
-	for (i = 0, stride = 0x2ul ; i < tnode_child_length(tn); i += stride) {
+	for (i = 0, stride = 0x2ul ; i < child_length(tn); i += stride) {
 		struct key_vector *n = get_child(tn, i);
 
 		if (!n || (n->slen <= slen))
@@ -690,12 +695,12 @@ static unsigned char update_suffix(struct key_vector *tn)
  *
  * 'high' in this instance is the variable 'inflate_threshold'. It
  * is expressed as a percentage, so we multiply it with
- * tnode_child_length() and instead of multiplying by 2 (since the
+ * child_length() and instead of multiplying by 2 (since the
  * child array will be doubled by inflate()) and multiplying
  * the left-hand side by 100 (to handle the percentage thing) we
  * multiply the left-hand side by 50.
  *
- * The left-hand side may look a bit weird: tnode_child_length(tn)
+ * The left-hand side may look a bit weird: child_length(tn)
  * - tn->empty_children is of course the number of non-null children
  * in the current node. tn->full_children is the number of "full"
  * children, that is non-null tnodes with a skip value of 0.
@@ -705,10 +710,10 @@ static unsigned char update_suffix(struct key_vector *tn)
  * A clearer way to write this would be:
  *
  * to_be_doubled = tn->full_children;
- * not_to_be_doubled = tnode_child_length(tn) - tn->empty_children -
+ * not_to_be_doubled = child_length(tn) - tn->empty_children -
  *     tn->full_children;
  *
- * new_child_length = tnode_child_length(tn) * 2;
+ * new_child_length = child_length(tn) * 2;
  *
  * new_fill_factor = 100 * (not_to_be_doubled + 2*to_be_doubled) /
  *      new_child_length;
@@ -725,23 +730,23 @@ static unsigned char update_suffix(struct key_vector *tn)
  *      inflate_threshold * new_child_length
  *
  * expand not_to_be_doubled and to_be_doubled, and shorten:
- * 100 * (tnode_child_length(tn) - tn->empty_children +
+ * 100 * (child_length(tn) - tn->empty_children +
  *    tn->full_children) >= inflate_threshold * new_child_length
  *
  * expand new_child_length:
- * 100 * (tnode_child_length(tn) - tn->empty_children +
+ * 100 * (child_length(tn) - tn->empty_children +
  *    tn->full_children) >=
- *      inflate_threshold * tnode_child_length(tn) * 2
+ *      inflate_threshold * child_length(tn) * 2
  *
  * shorten again:
- * 50 * (tn->full_children + tnode_child_length(tn) -
+ * 50 * (tn->full_children + child_length(tn) -
  *    tn->empty_children) >= inflate_threshold *
- *    tnode_child_length(tn)
+ *    child_length(tn)
  *
  */
 static inline bool should_inflate(struct key_vector *tp, struct key_vector *tn)
 {
-	unsigned long used = tnode_child_length(tn);
+	unsigned long used = child_length(tn);
 	unsigned long threshold = used;
 
 	/* Keep root node larger */
@@ -756,7 +761,7 @@ static inline bool should_inflate(struct key_vector *tp, struct key_vector *tn)
 
 static inline bool should_halve(struct key_vector *tp, struct key_vector *tn)
 {
-	unsigned long used = tnode_child_length(tn);
+	unsigned long used = child_length(tn);
 	unsigned long threshold = used;
 
 	/* Keep root node larger */
@@ -770,7 +775,7 @@ static inline bool should_halve(struct key_vector *tp, struct key_vector *tn)
 
 static inline bool should_collapse(struct key_vector *tn)
 {
-	unsigned long used = tnode_child_length(tn);
+	unsigned long used = child_length(tn);
 
 	used -= tn->empty_children;
 
@@ -1774,7 +1779,7 @@ static struct key_vector *fib_trie_get_next(struct fib_trie_iter *iter)
 	pr_debug("get_next iter={node=%p index=%d depth=%d}\n",
 		 iter->tnode, iter->index, iter->depth);
 rescan:
-	while (cindex < tnode_child_length(tn)) {
+	while (cindex < child_length(tn)) {
 		struct key_vector *n = get_child_rcu(tn, cindex);
 
 		if (n) {

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 15/29] fib_trie: Add tnode struct as a container for fields not needed in key_vector
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (13 preceding siblings ...)
  2015-02-24 20:49 ` [RFC PATCH 14/29] fib_trie: Rename tnode_child_length to child_length Alexander Duyck
@ 2015-02-24 20:49 ` Alexander Duyck
  2015-02-24 20:49 ` [RFC PATCH 16/29] fib_trie: Move rcu from key_vector to tnode, add accessors Alexander Duyck
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:49 UTC (permalink / raw
  To: netdev

This change pulls the fields not explicitly needed in the key_vector and
placed them in the new tnode structure.  By doing this we will eventually
be able to reduce the key_vector down to 16 bytes on 64 bit systems, and
12 bytes on 32 bit systems.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   87 +++++++++++++++++++++++++++------------------------
 1 file changed, 46 insertions(+), 41 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 711f8f2..b1e9141 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -108,6 +108,10 @@ struct key_vector {
 	};
 };
 
+struct tnode {
+	struct key_vector kv[1];
+};
+
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 struct trie_use_stats {
 	unsigned int gets;
@@ -264,9 +268,9 @@ static inline void alias_free_mem_rcu(struct fib_alias *fa)
 	call_rcu(&fa->rcu, __alias_free_mem);
 }
 
-#define TNODE_SIZE offsetof(struct key_vector, tnode[0])
+#define TNODE_SIZE(n) offsetof(struct tnode, kv[0].tnode[n])
 #define TNODE_KMALLOC_MAX \
-	ilog2((PAGE_SIZE - TNODE_SIZE) / sizeof(struct key_vector *))
+	ilog2((PAGE_SIZE - TNODE_SIZE(0)) / sizeof(struct key_vector *))
 
 static void __node_free_rcu(struct rcu_head *head)
 {
@@ -282,7 +286,7 @@ static void __node_free_rcu(struct rcu_head *head)
 
 #define node_free(n) call_rcu(&n->rcu, __node_free_rcu)
 
-static struct key_vector *tnode_alloc(size_t size)
+static struct tnode *tnode_alloc(size_t size)
 {
 	if (size <= PAGE_SIZE)
 		return kzalloc(size, GFP_KERNEL);
@@ -302,49 +306,51 @@ static inline void empty_child_dec(struct key_vector *n)
 
 static struct key_vector *leaf_new(t_key key, struct fib_alias *fa)
 {
-	struct key_vector *l = kmem_cache_alloc(trie_leaf_kmem, GFP_KERNEL);
-	if (l) {
-		l->parent = NULL;
-		/* set key and pos to reflect full key value
-		 * any trailing zeros in the key should be ignored
-		 * as the nodes are searched
-		 */
-		l->key = key;
-		l->slen = fa->fa_slen;
-		l->pos = 0;
-		/* set bits to 0 indicating we are not a tnode */
-		l->bits = 0;
-
-		/* link leaf to fib alias */
-		INIT_HLIST_HEAD(&l->leaf);
-		hlist_add_head(&fa->fa_list, &l->leaf);
-	}
+	struct tnode *kv = kmem_cache_alloc(trie_leaf_kmem, GFP_KERNEL);
+	struct key_vector *l = kv->kv;
+
+	if (!kv)
+		return NULL;
+
+	/* initialize key vector */
+	l->key = key;
+	l->pos = 0;
+	l->bits = 0;
+	l->slen = fa->fa_slen;
+
+	/* link leaf to fib alias */
+	INIT_HLIST_HEAD(&l->leaf);
+	hlist_add_head(&fa->fa_list, &l->leaf);
+
 	return l;
 }
 
 static struct key_vector *tnode_new(t_key key, int pos, int bits)
 {
-	size_t sz = offsetof(struct key_vector, tnode[1ul << bits]);
-	struct key_vector *tn = tnode_alloc(sz);
+	size_t sz = TNODE_SIZE(1ul << bits);
+	struct tnode *tnode = tnode_alloc(sz);
 	unsigned int shift = pos + bits;
+	struct key_vector *tn = tnode->kv;
 
 	/* verify bits and pos their msb bits clear and values are valid */
 	BUG_ON(!bits || (shift > KEYLENGTH));
 
-	if (tn) {
-		tn->parent = NULL;
-		tn->slen = pos;
-		tn->pos = pos;
-		tn->bits = bits;
-		tn->key = (shift < KEYLENGTH) ? (key >> shift) << shift : 0;
-		if (bits == KEYLENGTH)
-			tn->full_children = 1;
-		else
-			tn->empty_children = 1ul << bits;
-	}
-
-	pr_debug("AT %p s=%zu %zu\n", tn, TNODE_SIZE,
+	pr_debug("AT %p s=%zu %zu\n", tnode, TNODE_SIZE(0),
 		 sizeof(struct key_vector *) << bits);
+
+	if (!tnode)
+		return NULL;
+
+	if (bits == KEYLENGTH)
+		tn->full_children = 1;
+	else
+		tn->empty_children = 1ul << bits;
+
+	tn->key = (shift < KEYLENGTH) ? (key >> shift) << shift : 0;
+	tn->pos = pos;
+	tn->bits = bits;
+	tn->slen = pos;
+
 	return tn;
 }
 
@@ -437,8 +443,7 @@ static void tnode_free(struct key_vector *tn)
 
 	while (head) {
 		head = head->next;
-		tnode_free_size += offsetof(struct key_vector,
-					    tnode[1 << tn->bits]);
+		tnode_free_size += TNODE_SIZE(1 << tn->bits);
 		node_free(tn);
 
 		tn = container_of(head, struct key_vector, rcu);
@@ -1724,7 +1729,7 @@ void __init fib_trie_init(void)
 					  0, SLAB_PANIC, NULL);
 
 	trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
-					   sizeof(struct key_vector),
+					   sizeof(struct tnode),
 					   0, SLAB_PANIC, NULL);
 }
 
@@ -1882,13 +1887,13 @@ static void trie_show_stats(struct seq_file *seq, struct trie_stat *stat)
 	seq_printf(seq, "\tMax depth:      %u\n", stat->maxdepth);
 
 	seq_printf(seq, "\tLeaves:         %u\n", stat->leaves);
-	bytes = sizeof(struct key_vector) * stat->leaves;
+	bytes = TNODE_SIZE(1) * stat->leaves;
 
 	seq_printf(seq, "\tPrefixes:       %u\n", stat->prefixes);
 	bytes += sizeof(struct fib_alias) * stat->prefixes;
 
 	seq_printf(seq, "\tInternal nodes: %u\n\t", stat->tnodes);
-	bytes += TNODE_SIZE * stat->tnodes;
+	bytes += TNODE_SIZE(0) * stat->tnodes;
 
 	max = MAX_STAT_DEPTH;
 	while (max > 0 && stat->nodesizes[max-1] == 0)
@@ -1957,7 +1962,7 @@ static int fib_triestat_seq_show(struct seq_file *seq, void *v)
 	seq_printf(seq,
 		   "Basic info: size of leaf:"
 		   " %Zd bytes, size of tnode: %Zd bytes.\n",
-		   sizeof(struct key_vector), TNODE_SIZE);
+		   TNODE_SIZE(1), TNODE_SIZE(0));
 
 	for (h = 0; h < FIB_TABLE_HASHSZ; h++) {
 		struct hlist_head *head = &net->ipv4.fib_table_hash[h];

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 16/29] fib_trie: Move rcu from key_vector to tnode, add accessors.
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (14 preceding siblings ...)
  2015-02-24 20:49 ` [RFC PATCH 15/29] fib_trie: Add tnode struct as a container for fields not needed in key_vector Alexander Duyck
@ 2015-02-24 20:49 ` Alexander Duyck
  2015-02-24 20:49 ` [RFC PATCH 17/29] fib_trie: Pull empty_children and full_children into tnode Alexander Duyck
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:49 UTC (permalink / raw
  To: netdev

RCU is only needed once for the entire node, not once per key_vector so we
can pull that out and move it to the tnode structure.

In addition add accessors to be used inside the RCU functions so that we
can more easily get from the key vector to either the tnode or the trie
pointers.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   46 ++++++++++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index b1e9141..76215d7 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -92,7 +92,6 @@ typedef unsigned int t_key;
 #define IS_LEAF(n) (!(n)->bits)
 
 struct key_vector {
-	struct rcu_head rcu;
 	t_key empty_children;		/* KEYLENGTH bits needed */
 	t_key full_children;		/* KEYLENGTH bits needed */
 	struct key_vector __rcu *parent;
@@ -109,7 +108,9 @@ struct key_vector {
 };
 
 struct tnode {
+	struct rcu_head rcu;
 	struct key_vector kv[1];
+#define tn_bits kv[0].bits
 };
 
 #ifdef CONFIG_IP_FIB_TRIE_STATS
@@ -154,6 +155,18 @@ static const int sync_pages = 128;
 static struct kmem_cache *fn_alias_kmem __read_mostly;
 static struct kmem_cache *trie_leaf_kmem __read_mostly;
 
+static inline struct tnode *tn_info(struct key_vector *kv)
+{
+	return container_of(kv, struct tnode, kv[0]);
+}
+
+static inline struct fib_table *table_info(struct trie *t)
+{
+	unsigned long *tb_data = (unsigned long *)t;
+
+	return container_of(tb_data, struct fib_table, tb_data[0]);
+}
+
 /* caller must hold RTNL */
 #define node_parent(n) rtnl_dereference((n)->parent)
 #define get_child(tn, i) rtnl_dereference((tn)->tnode[i])
@@ -186,13 +199,6 @@ static inline unsigned long get_index(t_key key, struct key_vector *kv)
 	return index >> kv->pos;
 }
 
-static inline struct fib_table *trie_get_table(struct trie *t)
-{
-	unsigned long *tb_data = (unsigned long *)t;
-
-	return container_of(tb_data, struct fib_table, tb_data[0]);
-}
-
 /* To understand this stuff, an understanding of keys and all their bits is
  * necessary. Every node in the trie has a key associated with it, but not
  * all of the bits in that key are significant.
@@ -274,17 +280,17 @@ static inline void alias_free_mem_rcu(struct fib_alias *fa)
 
 static void __node_free_rcu(struct rcu_head *head)
 {
-	struct key_vector *n = container_of(head, struct key_vector, rcu);
+	struct tnode *n = container_of(head, struct tnode, rcu);
 
-	if (IS_LEAF(n))
+	if (!n->tn_bits)
 		kmem_cache_free(trie_leaf_kmem, n);
-	else if (n->bits <= TNODE_KMALLOC_MAX)
+	else if (n->tn_bits <= TNODE_KMALLOC_MAX)
 		kfree(n);
 	else
 		vfree(n);
 }
 
-#define node_free(n) call_rcu(&n->rcu, __node_free_rcu)
+#define node_free(n) call_rcu(&tn_info(n)->rcu, __node_free_rcu)
 
 static struct tnode *tnode_alloc(size_t size)
 {
@@ -427,26 +433,26 @@ static inline void put_child_root(struct key_vector *tp, struct trie *t,
 
 static inline void tnode_free_init(struct key_vector *tn)
 {
-	tn->rcu.next = NULL;
+	tn_info(tn)->rcu.next = NULL;
 }
 
 static inline void tnode_free_append(struct key_vector *tn,
 				     struct key_vector *n)
 {
-	n->rcu.next = tn->rcu.next;
-	tn->rcu.next = &n->rcu;
+	tn_info(n)->rcu.next = tn_info(tn)->rcu.next;
+	tn_info(tn)->rcu.next = &tn_info(n)->rcu;
 }
 
 static void tnode_free(struct key_vector *tn)
 {
-	struct callback_head *head = &tn->rcu;
+	struct callback_head *head = &tn_info(tn)->rcu;
 
 	while (head) {
 		head = head->next;
 		tnode_free_size += TNODE_SIZE(1 << tn->bits);
 		node_free(tn);
 
-		tn = container_of(head, struct key_vector, rcu);
+		tn = container_of(head, struct tnode, rcu)->kv;
 	}
 
 	if (tnode_free_size >= PAGE_SIZE * sync_pages) {
@@ -968,7 +974,7 @@ static struct fib_table *trie_rebalance(struct trie *t,
 		tn = container_of(cptr, struct key_vector, tnode[0]);
 	}
 
-	return trie_get_table(container_of(cptr, struct trie, tnode[0]));
+	return table_info(container_of(cptr, struct trie, tnode[0]));
 }
 
 static struct fib_table *fib_insert_node(struct trie *t,
@@ -1055,7 +1061,7 @@ static struct fib_table *fib_insert_alias(struct trie *t,
 		leaf_push_suffix(tp, l);
 	}
 
-	return trie_get_table(t);
+	return table_info(t);
 }
 
 /* Caller must hold RTNL. */
@@ -1633,7 +1639,7 @@ flush_complete:
 static void __trie_free_rcu(struct rcu_head *head)
 {
 	struct trie *t = container_of(head, struct trie, rcu);
-	struct fib_table *tb = trie_get_table(t);
+	struct fib_table *tb = table_info(t);
 
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	free_percpu(t->stats);

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 17/29] fib_trie: Pull empty_children and full_children into tnode
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (15 preceding siblings ...)
  2015-02-24 20:49 ` [RFC PATCH 16/29] fib_trie: Move rcu from key_vector to tnode, add accessors Alexander Duyck
@ 2015-02-24 20:49 ` Alexander Duyck
  2015-02-24 20:49 ` [RFC PATCH 18/29] fib_trie: Move parent from key_vector to tnode Alexander Duyck
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:49 UTC (permalink / raw
  To: netdev

This pulls the information about the child array out of the key_vector and
places it in the tnode since that is where it is needed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 76215d7..4386db7 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -92,8 +92,6 @@ typedef unsigned int t_key;
 #define IS_LEAF(n) (!(n)->bits)
 
 struct key_vector {
-	t_key empty_children;		/* KEYLENGTH bits needed */
-	t_key full_children;		/* KEYLENGTH bits needed */
 	struct key_vector __rcu *parent;
 	t_key key;
 	unsigned char bits;		/* 2log(KEYLENGTH) bits needed */
@@ -109,6 +107,8 @@ struct key_vector {
 
 struct tnode {
 	struct rcu_head rcu;
+	t_key empty_children;		/* KEYLENGTH bits needed */
+	t_key full_children;		/* KEYLENGTH bits needed */
 	struct key_vector kv[1];
 #define tn_bits kv[0].bits
 };
@@ -302,12 +302,12 @@ static struct tnode *tnode_alloc(size_t size)
 
 static inline void empty_child_inc(struct key_vector *n)
 {
-	++n->empty_children ? : ++n->full_children;
+	++tn_info(n)->empty_children ? : ++tn_info(n)->full_children;
 }
 
 static inline void empty_child_dec(struct key_vector *n)
 {
-	n->empty_children-- ? : n->full_children--;
+	tn_info(n)->empty_children-- ? : tn_info(n)->full_children--;
 }
 
 static struct key_vector *leaf_new(t_key key, struct fib_alias *fa)
@@ -348,9 +348,9 @@ static struct key_vector *tnode_new(t_key key, int pos, int bits)
 		return NULL;
 
 	if (bits == KEYLENGTH)
-		tn->full_children = 1;
+		tnode->full_children = 1;
 	else
-		tn->empty_children = 1ul << bits;
+		tnode->empty_children = 1ul << bits;
 
 	tn->key = (shift < KEYLENGTH) ? (key >> shift) << shift : 0;
 	tn->pos = pos;
@@ -390,9 +390,9 @@ static void put_child(struct key_vector *tn, unsigned long i,
 	isfull = tnode_full(tn, n);
 
 	if (wasfull && !isfull)
-		tn->full_children--;
+		tn_info(tn)->full_children--;
 	else if (!wasfull && isfull)
-		tn->full_children++;
+		tn_info(tn)->full_children++;
 
 	if (n && (tn->slen < n->slen))
 		tn->slen = n->slen;
@@ -762,8 +762,8 @@ static inline bool should_inflate(struct key_vector *tp, struct key_vector *tn)
 
 	/* Keep root node larger */
 	threshold *= tp ? inflate_threshold : inflate_threshold_root;
-	used -= tn->empty_children;
-	used += tn->full_children;
+	used -= tn_info(tn)->empty_children;
+	used += tn_info(tn)->full_children;
 
 	/* if bits == KEYLENGTH then pos = 0, and will fail below */
 
@@ -777,7 +777,7 @@ static inline bool should_halve(struct key_vector *tp, struct key_vector *tn)
 
 	/* Keep root node larger */
 	threshold *= tp ? halve_threshold : halve_threshold_root;
-	used -= tn->empty_children;
+	used -= tn_info(tn)->empty_children;
 
 	/* if bits == KEYLENGTH then used = 100% on wrap, and will fail below */
 
@@ -788,10 +788,10 @@ static inline bool should_collapse(struct key_vector *tn)
 {
 	unsigned long used = child_length(tn);
 
-	used -= tn->empty_children;
+	used -= tn_info(tn)->empty_children;
 
 	/* account for bits == KEYLENGTH case */
-	if ((tn->bits == KEYLENGTH) && tn->full_children)
+	if ((tn->bits == KEYLENGTH) && tn_info(tn)->full_children)
 		used -= KEY_MAX;
 
 	/* One child or none, time to drop us from the trie */
@@ -1870,7 +1870,7 @@ static void trie_collect_stats(struct trie *t, struct trie_stat *s)
 			s->tnodes++;
 			if (n->bits < MAX_STAT_DEPTH)
 				s->nodesizes[n->bits]++;
-			s->nullpointers += n->empty_children;
+			s->nullpointers += tn_info(n)->empty_children;
 		}
 	}
 	rcu_read_unlock();
@@ -2145,7 +2145,8 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
 		seq_indent(seq, iter->depth-1);
 		seq_printf(seq, "  +-- %pI4/%zu %u %u %u\n",
 			   &prf, KEYLENGTH - n->pos - n->bits, n->bits,
-			   n->full_children, n->empty_children);
+			   tn_info(n)->full_children,
+			   tn_info(n)->empty_children);
 	} else {
 		__be32 val = htonl(n->key);
 		struct fib_alias *fa;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 18/29] fib_trie: Move parent from key_vector to tnode
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (16 preceding siblings ...)
  2015-02-24 20:49 ` [RFC PATCH 17/29] fib_trie: Pull empty_children and full_children into tnode Alexander Duyck
@ 2015-02-24 20:49 ` Alexander Duyck
  2015-02-24 20:50 ` [RFC PATCH 19/29] fib_trie: Add key vector to root, return parent key_vector in resize Alexander Duyck
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:49 UTC (permalink / raw
  To: netdev

This change pulls the parent pointer from the key_vector and places it in
the tnode structure.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 4386db7..b2737b7 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -92,10 +92,9 @@ typedef unsigned int t_key;
 #define IS_LEAF(n) (!(n)->bits)
 
 struct key_vector {
-	struct key_vector __rcu *parent;
 	t_key key;
-	unsigned char bits;		/* 2log(KEYLENGTH) bits needed */
 	unsigned char pos;		/* 2log(KEYLENGTH) bits needed */
+	unsigned char bits;		/* 2log(KEYLENGTH) bits needed */
 	unsigned char slen;
 	union {
 		/* This list pointer if valid if bits == 0 (LEAF) */
@@ -109,6 +108,7 @@ struct tnode {
 	struct rcu_head rcu;
 	t_key empty_children;		/* KEYLENGTH bits needed */
 	t_key full_children;		/* KEYLENGTH bits needed */
+	struct key_vector __rcu *parent;
 	struct key_vector kv[1];
 #define tn_bits kv[0].bits
 };
@@ -168,21 +168,21 @@ static inline struct fib_table *table_info(struct trie *t)
 }
 
 /* caller must hold RTNL */
-#define node_parent(n) rtnl_dereference((n)->parent)
+#define node_parent(tn) rtnl_dereference(tn_info(tn)->parent)
 #define get_child(tn, i) rtnl_dereference((tn)->tnode[i])
 
 /* caller must hold RCU read lock or RTNL */
-#define node_parent_rcu(n) rcu_dereference_rtnl((n)->parent)
+#define node_parent_rcu(tn) rcu_dereference_rtnl(tn_info(tn)->parent)
 #define get_child_rcu(tn, i) rcu_dereference_rtnl((tn)->tnode[i])
 
 /* wrapper for rcu_assign_pointer */
 static inline void node_set_parent(struct key_vector *n, struct key_vector *tp)
 {
 	if (n)
-		rcu_assign_pointer(n->parent, tp);
+		rcu_assign_pointer(tn_info(n)->parent, tp);
 }
 
-#define NODE_INIT_PARENT(n, p) RCU_INIT_POINTER((n)->parent, p)
+#define NODE_INIT_PARENT(n, p) RCU_INIT_POINTER(tn_info(n)->parent, p)
 
 /* This provides us with the number of children in this node, in the case of a
  * leaf this will return 0 meaning none of the children are accessible.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 19/29] fib_trie: Add key vector to root, return parent key_vector in resize
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (17 preceding siblings ...)
  2015-02-24 20:49 ` [RFC PATCH 18/29] fib_trie: Move parent from key_vector to tnode Alexander Duyck
@ 2015-02-24 20:50 ` Alexander Duyck
  2015-02-24 20:50 ` [RFC PATCH 20/29] fib_trie: Push net pointer down into fib_trie insert/delete/flush calls Alexander Duyck
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:50 UTC (permalink / raw
  To: netdev

This change makes it so that the root of the trie contains a key_vector, by
doing this we make room to essentially collapse the entire trie by at least
one cache line as we can store the information about the tnode or leaf that
is pointed to in the root.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |  378 +++++++++++++++++++++++----------------------------
 1 file changed, 174 insertions(+), 204 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index b2737b7..432a875 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -88,8 +88,9 @@
 
 typedef unsigned int t_key;
 
-#define IS_TNODE(n) ((n)->bits)
-#define IS_LEAF(n) (!(n)->bits)
+#define IS_TRIE(n)	((n)->pos >= KEYLENGTH)
+#define IS_TNODE(n)	((n)->bits)
+#define IS_LEAF(n)	(!(n)->bits)
 
 struct key_vector {
 	t_key key;
@@ -135,14 +136,14 @@ struct trie_stat {
 };
 
 struct trie {
-	struct key_vector __rcu *tnode[1];
+	struct key_vector kv[1];
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	struct trie_use_stats __percpu *stats;
 #endif
 	struct rcu_head rcu;
 };
 
-static struct key_vector **resize(struct trie *t, struct key_vector *tn);
+static struct key_vector *resize(struct trie *t, struct key_vector *tn);
 static size_t tnode_free_size;
 
 /*
@@ -160,8 +161,9 @@ static inline struct tnode *tn_info(struct key_vector *kv)
 	return container_of(kv, struct tnode, kv[0]);
 }
 
-static inline struct fib_table *table_info(struct trie *t)
+static inline struct fib_table *table_info(struct key_vector *kv)
 {
+	struct trie *t = container_of(kv, struct trie, kv[0]);
 	unsigned long *tb_data = (unsigned long *)t;
 
 	return container_of(tb_data, struct fib_table, tb_data[0]);
@@ -192,10 +194,15 @@ static inline unsigned long child_length(const struct key_vector *tn)
 	return (1ul << tn->bits) & ~(1ul);
 }
 
+#define get_cindex(key, kv) (((key) ^ (kv)->key) >> (kv)->pos)
+
 static inline unsigned long get_index(t_key key, struct key_vector *kv)
 {
 	unsigned long index = key ^ kv->key;
 
+	if ((BITS_PER_LONG <= KEYLENGTH) && (KEYLENGTH == kv->pos))
+		return 0;
+
 	return index >> kv->pos;
 }
 
@@ -422,13 +429,13 @@ static void update_children(struct key_vector *tn)
 	}
 }
 
-static inline void put_child_root(struct key_vector *tp, struct trie *t,
-				  t_key key, struct key_vector *n)
+static inline void put_child_root(struct key_vector *tp, t_key key,
+				  struct key_vector *n)
 {
-	if (tp)
-		put_child(tp, get_index(key, tp), n);
+	if (IS_TRIE(tp))
+		rcu_assign_pointer(tp->tnode[0], n);
 	else
-		rcu_assign_pointer(t->tnode[0], n);
+		put_child(tp, get_index(key, tp), n);
 }
 
 static inline void tnode_free_init(struct key_vector *tn)
@@ -461,17 +468,16 @@ static void tnode_free(struct key_vector *tn)
 	}
 }
 
-static struct key_vector __rcu **replace(struct trie *t,
-					 struct key_vector *oldtnode,
-					 struct key_vector *tn)
+static struct key_vector *replace(struct trie *t,
+				  struct key_vector *oldtnode,
+				  struct key_vector *tn)
 {
 	struct key_vector *tp = node_parent(oldtnode);
-	struct key_vector **cptr;
 	unsigned long i;
 
 	/* setup the parent pointer out of and back into this node */
 	NODE_INIT_PARENT(tn, tp);
-	put_child_root(tp, t, tn->key, tn);
+	put_child_root(tp, tn->key, tn);
 
 	/* update all of the child parent pointers */
 	update_children(tn);
@@ -479,23 +485,20 @@ static struct key_vector __rcu **replace(struct trie *t,
 	/* all pointers should be clean so we are done */
 	tnode_free(oldtnode);
 
-	/* record the pointer that is pointing to this node */
-	cptr = tp ? tp->tnode : t->tnode;
-
 	/* resize children now that oldtnode is freed */
 	for (i = child_length(tn); i;) {
 		struct key_vector *inode = get_child(tn, --i);
 
 		/* resize child node */
 		if (tnode_full(tn, inode))
-			resize(t, inode);
+			tn = resize(t, inode);
 	}
 
-	return cptr;
+	return tp;
 }
 
-static struct key_vector __rcu **inflate(struct trie *t,
-					 struct key_vector *oldtnode)
+static struct key_vector *inflate(struct trie *t,
+				  struct key_vector *oldtnode)
 {
 	struct key_vector *tn;
 	unsigned long i;
@@ -590,8 +593,8 @@ notnode:
 	return NULL;
 }
 
-static struct key_vector __rcu **halve(struct trie *t,
-				       struct key_vector *oldtnode)
+static struct key_vector *halve(struct trie *t,
+				struct key_vector *oldtnode)
 {
 	struct key_vector *tn;
 	unsigned long i;
@@ -645,7 +648,8 @@ notnode:
 	return NULL;
 }
 
-static void collapse(struct trie *t, struct key_vector *oldtnode)
+static struct key_vector *collapse(struct trie *t,
+				   struct key_vector *oldtnode)
 {
 	struct key_vector *n, *tp;
 	unsigned long i;
@@ -656,11 +660,13 @@ static void collapse(struct trie *t, struct key_vector *oldtnode)
 
 	/* compress one level */
 	tp = node_parent(oldtnode);
-	put_child_root(tp, t, oldtnode->key, n);
+	put_child_root(tp, oldtnode->key, n);
 	node_set_parent(n, tp);
 
 	/* drop dead node */
 	node_free(oldtnode);
+
+	return tp;
 }
 
 static unsigned char update_suffix(struct key_vector *tn)
@@ -761,7 +767,7 @@ static inline bool should_inflate(struct key_vector *tp, struct key_vector *tn)
 	unsigned long threshold = used;
 
 	/* Keep root node larger */
-	threshold *= tp ? inflate_threshold : inflate_threshold_root;
+	threshold *= IS_TRIE(tp) ? inflate_threshold_root : inflate_threshold;
 	used -= tn_info(tn)->empty_children;
 	used += tn_info(tn)->full_children;
 
@@ -776,7 +782,7 @@ static inline bool should_halve(struct key_vector *tp, struct key_vector *tn)
 	unsigned long threshold = used;
 
 	/* Keep root node larger */
-	threshold *= tp ? halve_threshold : halve_threshold_root;
+	threshold *= IS_TRIE(tp) ? halve_threshold_root : halve_threshold;
 	used -= tn_info(tn)->empty_children;
 
 	/* if bits == KEYLENGTH then used = 100% on wrap, and will fail below */
@@ -799,15 +805,13 @@ static inline bool should_collapse(struct key_vector *tn)
 }
 
 #define MAX_WORK 10
-static struct key_vector __rcu **resize(struct trie *t,
-					struct key_vector *tn)
+static struct key_vector *resize(struct trie *t, struct key_vector *tn)
 {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	struct trie_use_stats __percpu *stats = t->stats;
 #endif
 	struct key_vector *tp = node_parent(tn);
-	unsigned long cindex = tp ? get_index(tn->key, tp) : 0;
-	struct key_vector __rcu **cptr = tp ? tp->tnode : t->tnode;
+	unsigned long cindex = get_index(tn->key, tp);
 	int max_work = MAX_WORK;
 
 	pr_debug("In tnode_resize %p inflate_threshold=%d threshold=%d\n",
@@ -817,15 +821,14 @@ static struct key_vector __rcu **resize(struct trie *t,
 	 * doing it ourselves.  This way we can let RCU fully do its
 	 * thing without us interfering
 	 */
-	BUG_ON(tn != rtnl_dereference(cptr[cindex]));
+	BUG_ON(tn != get_child(tp, cindex));
 
 	/* Double as long as the resulting node has a number of
 	 * nonempty nodes that are above the threshold.
 	 */
 	while (should_inflate(tp, tn) && max_work) {
-		struct key_vector __rcu **tcptr = inflate(t, tn);
-
-		if (!tcptr) {
+		tp = inflate(t, tn);
+		if (!tp) {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 			this_cpu_inc(stats->resize_node_skipped);
 #endif
@@ -833,21 +836,19 @@ static struct key_vector __rcu **resize(struct trie *t,
 		}
 
 		max_work--;
-		cptr = tcptr;
-		tn = rtnl_dereference(cptr[cindex]);
+		tn = get_child(tp, cindex);
 	}
 
 	/* Return if at least one inflate is run */
 	if (max_work != MAX_WORK)
-		return cptr;
+		return node_parent(tn);
 
 	/* Halve as long as the number of empty children in this
 	 * node is above threshold.
 	 */
 	while (should_halve(tp, tn) && max_work) {
-		struct key_vector __rcu **tcptr = halve(t, tn);
-
-		if (!tcptr) {
+		tp = halve(t, tn);
+		if (!tp) {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 			this_cpu_inc(stats->resize_node_skipped);
 #endif
@@ -855,34 +856,34 @@ static struct key_vector __rcu **resize(struct trie *t,
 		}
 
 		max_work--;
-		cptr = tcptr;
-		tn = rtnl_dereference(cptr[cindex]);
+		tn = get_child(tp, cindex);
 	}
 
 	/* Only one child remains */
-	if (should_collapse(tn)) {
-		collapse(t, tn);
-		return cptr;
-	}
+	if (should_collapse(tn))
+		return collapse(t, tn);
+
+	/* update parent in case inflate or halve failed */
+	tp = node_parent(tn);
 
 	/* Return if at least one deflate was run */
 	if (max_work != MAX_WORK)
-		return cptr;
+		return tp;
 
 	/* push the suffix length to the parent node */
 	if (tn->slen > tn->pos) {
 		unsigned char slen = update_suffix(tn);
 
-		if (tp && (slen > tp->slen))
+		if (slen > tp->slen)
 			tp->slen = slen;
 	}
 
-	return cptr;
+	return tp;
 }
 
 static void leaf_pull_suffix(struct key_vector *tp, struct key_vector *l)
 {
-	while (tp && (tp->slen > tp->pos) && (tp->slen > l->slen)) {
+	while ((tp->slen > tp->pos) && (tp->slen > l->slen)) {
 		if (update_suffix(tp) > l->slen)
 			break;
 		tp = node_parent(tp);
@@ -894,7 +895,7 @@ static void leaf_push_suffix(struct key_vector *tn, struct key_vector *l)
 	/* if this is a new leaf then tn will be NULL and we can sort
 	 * out parent suffix lengths as a part of trie_rebalance
 	 */
-	while (tn && (tn->slen < l->slen)) {
+	while (tn->slen < l->slen) {
 		tn->slen = l->slen;
 		tn = node_parent(tn);
 	}
@@ -904,12 +905,17 @@ static void leaf_push_suffix(struct key_vector *tn, struct key_vector *l)
 static struct key_vector *fib_find_node(struct trie *t,
 					struct key_vector **tp, u32 key)
 {
-	struct key_vector *n = rcu_dereference_rtnl(t->tnode[0]);
+	struct key_vector *n = t->kv;
+	unsigned long index = 0;
 
-	*tp = NULL;
+	do {
+		*tp = n;
+		n = get_child_rcu(n, index);
+
+		if (!n)
+			break;
 
-	while (n) {
-		unsigned long index = get_index(key, n);
+		index = get_cindex(key, n);
 
 		/* This bit of code is a bit tricky but it combines multiple
 		 * checks into a single check.  The prefix consists of the
@@ -924,13 +930,8 @@ static struct key_vector *fib_find_node(struct trie *t,
 		if (index & (~0ul << n->bits))
 			return NULL;
 
-		/* we have found a leaf. Prefixes have already been compared */
-		if (IS_LEAF(n))
-			break;
-
-		*tp = n;
-		n = get_child_rcu(n, index);
-	}
+		/* keep searching until we find a perfect match leaf or NULL */
+	} while (IS_TNODE(n));
 
 	return n;
 }
@@ -963,18 +964,10 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen,
 static struct fib_table *trie_rebalance(struct trie *t,
 					struct key_vector *tn)
 {
-	struct key_vector __rcu **cptr = t->tnode;
+	while (!IS_TRIE(tn))
+		tn = resize(t, tn);
 
-	while (tn) {
-		struct key_vector *tp = node_parent(tn);
-
-		cptr = resize(t, tn);
-		if (!tp)
-			break;
-		tn = container_of(cptr, struct key_vector, tnode[0]);
-	}
-
-	return table_info(container_of(cptr, struct trie, tnode[0]));
+	return table_info(tn);
 }
 
 static struct fib_table *fib_insert_node(struct trie *t,
@@ -989,10 +982,7 @@ static struct fib_table *fib_insert_node(struct trie *t,
 		goto noleaf;
 
 	/* retrieve child from parent node */
-	if (tp)
-		n = get_child(tp, get_index(key, tp));
-	else
-		n = rcu_dereference_rtnl(t->tnode[0]);
+	n = get_child(tp, get_index(key, tp));
 
 	/* Case 2: n is a LEAF or a TNODE and the key doesn't match.
 	 *
@@ -1012,7 +1002,7 @@ static struct fib_table *fib_insert_node(struct trie *t,
 		put_child(tn, get_index(key, tn) ^ 1, n);
 
 		/* start adding routes into the node */
-		put_child_root(tp, t, key, tn);
+		put_child_root(tp, key, tn);
 		node_set_parent(n, tn);
 
 		/* parent now has a NULL spot where the leaf can go */
@@ -1021,7 +1011,7 @@ static struct fib_table *fib_insert_node(struct trie *t,
 
 	/* Case 3: n is NULL, and will just insert a new leaf */
 	NODE_INIT_PARENT(l, tp);
-	put_child_root(tp, t, key, l);
+	put_child_root(tp, key, l);
 
 	return trie_rebalance(t, tp);
 notnode:
@@ -1061,7 +1051,7 @@ static struct fib_table *fib_insert_alias(struct trie *t,
 		leaf_push_suffix(tp, l);
 	}
 
-	return table_info(t);
+	return table_info(t->kv);
 }
 
 /* Caller must hold RTNL. */
@@ -1236,7 +1226,10 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 	struct fib_alias *fa;
 	t_key cindex;
 
-	n = rcu_dereference(t->tnode[0]);
+	pn = t->kv;
+	cindex = 0;
+
+	n = get_child_rcu(pn, cindex);
 	if (!n)
 		return -EAGAIN;
 
@@ -1244,12 +1237,9 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 	this_cpu_inc(stats->gets);
 #endif
 
-	pn = n;
-	cindex = 0;
-
 	/* Step 1: Travel to the longest prefix match in the trie */
 	for (;;) {
-		unsigned long index = get_index(key, n);
+		unsigned long index = get_cindex(key, n);
 
 		/* This bit of code is a bit tricky but it combines multiple
 		 * checks into a single check.  The prefix consists of the
@@ -1316,13 +1306,17 @@ backtrace:
 			while (!cindex) {
 				t_key pkey = pn->key;
 
-				pn = node_parent_rcu(pn);
-				if (unlikely(!pn))
+				/* If we don't have a parent then there is
+				 * nothing for us to do as we do not have any
+				 * further nodes to parse.
+				 */
+				if (IS_TRIE(pn))
 					return -EAGAIN;
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 				this_cpu_inc(stats->backtrack);
 #endif
 				/* Get Child's index */
+				pn = node_parent_rcu(pn);
 				cindex = get_index(pkey, pn);
 			}
 
@@ -1404,7 +1398,7 @@ static void fib_remove_alias(struct trie *t, struct key_vector *tp,
 	 * out parent suffix lengths as a part of trie_rebalance
 	 */
 	if (hlist_empty(&l->leaf)) {
-		put_child_root(tp, t, l->key, NULL);
+		put_child_root(tp, l->key, NULL);
 		node_free(l);
 		trie_rebalance(t, tp);
 		return;
@@ -1490,41 +1484,35 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 /* Scan for the next leaf starting at the provided key value */
 static struct key_vector *leaf_walk_rcu(struct key_vector **pn, t_key key)
 {
-	struct key_vector *tn = NULL, *n = *pn;
+	struct key_vector *tn, *n = *pn;
 	unsigned long cindex;
 
-	/* record parent node for backtracing */
-	tn = n;
-	cindex = n ? get_index(key, n) : 0;
-
 	/* this loop is meant to try and find the key in the trie */
-	while (n) {
-		unsigned long idx = get_index(key, n);
-
-		/* guarantee forward progress on the keys */
-		if (IS_LEAF(n) && (n->key >= key))
-			goto found;
-		if (idx >> n->bits)
-			break;
-
+	do {
 		/* record parent and next child index */
 		tn = n;
-		cindex = idx;
+		cindex = get_index(key, tn);
+
+		if (cindex >> tn->bits)
+			break;
 
 		/* descend into the next child */
 		n = get_child_rcu(tn, cindex++);
-	}
+		if (!n)
+			break;
+
+		/* guarantee forward progress on the keys */
+		if (IS_LEAF(n) && (n->key >= key))
+			goto found;
+	} while (IS_TNODE(n));
 
 	/* this loop will search for the next leaf with a greater key */
-	while (tn) {
+	while (!IS_TRIE(tn)) {
 		/* if we exhausted the parent node we will need to climb */
 		if (cindex >> tn->bits) {
 			t_key pkey = tn->key;
 
 			tn = node_parent_rcu(tn);
-			if (!tn)
-				break;
-
 			cindex = get_index(pkey, tn) + 1;
 			continue;
 		}
@@ -1547,7 +1535,7 @@ static struct key_vector *leaf_walk_rcu(struct key_vector **pn, t_key key)
 	return NULL; /* Root of trie */
 found:
 	/* if we are at the limit for keys just return NULL for the tnode */
-	*pn = (n->key == KEY_MAX) ? NULL : tn;
+	*pn = tn;
 	return n;
 }
 
@@ -1555,83 +1543,70 @@ found:
 int fib_table_flush(struct fib_table *tb)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
-	struct key_vector *n, *pn;
+	struct key_vector *pn = t->kv;
+	unsigned long cindex = 1;
 	struct hlist_node *tmp;
 	struct fib_alias *fa;
-	unsigned long cindex;
-	unsigned char slen;
 	int found = 0;
 
-	n = rcu_dereference(t->tnode[0]);
-	if (!n)
-		goto flush_complete;
+	/* walk trie in reverse order */
+	for (;;) {
+		unsigned char slen = 0;
+		struct key_vector *n;
 
-	pn = NULL;
-	cindex = 0;
+		if (!(cindex--)) {
+			t_key pkey = pn->key;
 
-	while (IS_TNODE(n)) {
-		/* record pn and cindex for leaf walking */
-		pn = n;
-		cindex = 1ul << n->bits;
-backtrace:
-		/* walk trie in reverse order */
-		do {
-			while (!(cindex--)) {
-				struct key_vector __rcu **cptr;
-				t_key pkey = pn->key;
+			/* cannot resize the trie vector */
+			if (IS_TRIE(pn))
+				break;
 
-				n = pn;
-				pn = node_parent(n);
+			/* resize completed node */
+			pn = resize(t, pn);
+			cindex = get_index(pkey, pn);
 
-				/* resize completed node */
-				cptr = resize(t, n);
+			continue;
+		}
 
-				/* if we got the root we are done */
-				if (!pn)
-					goto flush_complete;
+		/* grab the next available node */
+		n = get_child(pn, cindex);
+		if (!n)
+			continue;
 
-				pn = container_of(cptr, struct key_vector,
-						  tnode[0]);
-				cindex = get_index(pkey, pn);
-			}
+		if (IS_TNODE(n)) {
+			/* record pn and cindex for leaf walking */
+			pn = n;
+			cindex = 1ul << n->bits;
 
-			/* grab the next available node */
-			n = get_child(pn, cindex);
-		} while (!n);
-	}
+			continue;
+		}
 
-	/* track slen in case any prefixes survive */
-	slen = 0;
+		hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) {
+			struct fib_info *fi = fa->fa_info;
 
-	hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) {
-		struct fib_info *fi = fa->fa_info;
+			if (fi && (fi->fib_flags & RTNH_F_DEAD)) {
+				hlist_del_rcu(&fa->fa_list);
+				fib_release_info(fa->fa_info);
+				alias_free_mem_rcu(fa);
+				found++;
 
-		if (fi && (fi->fib_flags & RTNH_F_DEAD)) {
-			hlist_del_rcu(&fa->fa_list);
-			fib_release_info(fa->fa_info);
-			alias_free_mem_rcu(fa);
-			found++;
+				continue;
+			}
 
-			continue;
+			slen = fa->fa_slen;
 		}
 
-		slen = fa->fa_slen;
-	}
-
-	/* update leaf slen */
-	n->slen = slen;
+		/* update leaf slen */
+		n->slen = slen;
 
-	if (hlist_empty(&n->leaf)) {
-		put_child_root(pn, t, n->key, NULL);
-		node_free(n);
-	} else {
-		leaf_pull_suffix(pn, n);
+		if (hlist_empty(&n->leaf)) {
+			put_child_root(pn, n->key, NULL);
+			node_free(n);
+		} else {
+			leaf_pull_suffix(pn, n);
+		}
 	}
 
-	/* if trie is leaf only loop is completed */
-	if (pn)
-		goto backtrace;
-flush_complete:
 	pr_debug("trie_flush found=%d\n", found);
 	return found;
 }
@@ -1639,7 +1614,7 @@ flush_complete:
 static void __trie_free_rcu(struct rcu_head *head)
 {
 	struct trie *t = container_of(head, struct trie, rcu);
-	struct fib_table *tb = table_info(t);
+	struct fib_table *tb = table_info(t->kv);
 
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	free_percpu(t->stats);
@@ -1695,15 +1670,13 @@ int fib_table_dump(struct fib_table *tb, struct sk_buff *skb,
 		   struct netlink_callback *cb)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
-	struct key_vector *l, *tp;
+	struct key_vector *l, *tp = t->kv;
 	/* Dump starting at last key.
 	 * Note: 0.0.0.0/0 (ie default) is first key.
 	 */
 	int count = cb->args[2];
 	t_key key = cb->args[3];
 
-	tp = rcu_dereference_rtnl(t->tnode[0]);
-
 	while ((l = leaf_walk_rcu(&tp, key)) != NULL) {
 		if (fn_trie_dump_leaf(l, tb, skb, cb) < 0) {
 			cb->args[3] = key;
@@ -1739,14 +1712,12 @@ void __init fib_trie_init(void)
 					   0, SLAB_PANIC, NULL);
 }
 
-
 struct fib_table *fib_trie_table(u32 id)
 {
 	struct fib_table *tb;
 	struct trie *t;
 
-	tb = kmalloc(sizeof(struct fib_table) + sizeof(struct trie),
-		     GFP_KERNEL);
+	tb = kzalloc(sizeof(*tb) + sizeof(struct trie), GFP_KERNEL);
 	if (tb == NULL)
 		return NULL;
 
@@ -1755,7 +1726,8 @@ struct fib_table *fib_trie_table(u32 id)
 	tb->tb_num_default = 0;
 
 	t = (struct trie *) tb->tb_data;
-	RCU_INIT_POINTER(t->tnode[0], NULL);
+	t->kv[0].pos = KEYLENGTH;
+	t->kv[0].slen = KEYLENGTH;
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	t->stats = alloc_percpu(struct trie_use_stats);
 	if (!t->stats) {
@@ -1780,57 +1752,55 @@ struct fib_trie_iter {
 static struct key_vector *fib_trie_get_next(struct fib_trie_iter *iter)
 {
 	unsigned long cindex = iter->index;
-	struct key_vector *tn = iter->tnode;
-	struct key_vector *p;
-
-	/* A single entry routing table */
-	if (!tn)
-		return NULL;
+	struct key_vector *pn = iter->tnode;
+	t_key pkey;
 
 	pr_debug("get_next iter={node=%p index=%d depth=%d}\n",
 		 iter->tnode, iter->index, iter->depth);
-rescan:
-	while (cindex < child_length(tn)) {
-		struct key_vector *n = get_child_rcu(tn, cindex);
 
-		if (n) {
+	while (!IS_TRIE(pn)) {
+		while (cindex < child_length(pn)) {
+			struct key_vector *n = get_child_rcu(pn, cindex++);
+
+			if (!n)
+				continue;
+
 			if (IS_LEAF(n)) {
-				iter->tnode = tn;
-				iter->index = cindex + 1;
+				iter->tnode = pn;
+				iter->index = cindex;
 			} else {
 				/* push down one level */
 				iter->tnode = n;
 				iter->index = 0;
 				++iter->depth;
 			}
+
 			return n;
 		}
 
-		++cindex;
-	}
-
-	/* Current node exhausted, pop back up */
-	p = node_parent_rcu(tn);
-	if (p) {
-		cindex = get_index(tn->key, p) + 1;
-		tn = p;
+		/* Current node exhausted, pop back up */
+		pkey = pn->key;
+		pn = node_parent_rcu(pn);
+		cindex = get_index(pkey, pn) + 1;
 		--iter->depth;
-		goto rescan;
 	}
 
-	/* got root? */
+	/* record root node so further searches know we are done */
+	iter->tnode = pn;
+	iter->index = 0;
+
 	return NULL;
 }
 
 static struct key_vector *fib_trie_get_first(struct fib_trie_iter *iter,
 					     struct trie *t)
 {
-	struct key_vector *n;
+	struct key_vector *n, *pn = t->kv;
 
 	if (!t)
 		return NULL;
 
-	n = rcu_dereference(t->tnode[0]);
+	n = rcu_dereference(pn->tnode[0]);
 	if (!n)
 		return NULL;
 
@@ -1839,7 +1809,7 @@ static struct key_vector *fib_trie_get_first(struct fib_trie_iter *iter,
 		iter->index = 0;
 		iter->depth = 1;
 	} else {
-		iter->tnode = NULL;
+		iter->tnode = pn;
 		iter->index = 0;
 		iter->depth = 0;
 	}
@@ -2136,7 +2106,7 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
 	const struct fib_trie_iter *iter = seq->private;
 	struct key_vector *n = v;
 
-	if (!node_parent_rcu(n))
+	if (IS_TRIE(node_parent_rcu(n)))
 		fib_table_print(seq, iter->tb);
 
 	if (IS_TNODE(n)) {
@@ -2216,7 +2186,7 @@ static struct key_vector *fib_route_get_idx(struct fib_route_iter *iter,
 		key = iter->key;
 	} else {
 		t = (struct trie *)tb->tb_data;
-		iter->tnode = rcu_dereference_rtnl(t->tnode[0]);
+		iter->tnode = t->kv;
 		iter->pos = 0;
 		key = 0;
 	}
@@ -2262,7 +2232,7 @@ static void *fib_route_seq_start(struct seq_file *seq, loff_t *pos)
 		return fib_route_get_idx(iter, *pos);
 
 	t = (struct trie *)tb->tb_data;
-	iter->tnode = rcu_dereference_rtnl(t->tnode[0]);
+	iter->tnode = t->kv;
 	iter->pos = 0;
 	iter->key = 0;
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 20/29] fib_trie: Push net pointer down into fib_trie insert/delete/flush calls
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (18 preceding siblings ...)
  2015-02-24 20:50 ` [RFC PATCH 19/29] fib_trie: Add key vector to root, return parent key_vector in resize Alexander Duyck
@ 2015-02-24 20:50 ` Alexander Duyck
  2015-02-24 20:50 ` [RFC PATCH 21/29] fib_trie: Rewrite handling of RCU to include parent in replacement Alexander Duyck
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:50 UTC (permalink / raw
  To: netdev

In order to make use of fib_table_replace it is necessary to pass the net
pointer to the point where the call is made.  This is the first pass at
pushing this down to where it is need in the resize function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/net/ip_fib.h    |    9 ++++---
 net/ipv4/fib_frontend.c |   39 +++++++++++++++++++++++++------
 net/ipv4/fib_trie.c     |   59 +++++++++++++++++++++++++----------------------
 3 files changed, 68 insertions(+), 39 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 8aa6f82..52f76c5 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -190,14 +190,15 @@ struct fib_table {
 
 int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		     struct fib_result *res, int fib_flags);
-int fib_table_insert(struct fib_table *, struct fib_config *);
-int fib_table_delete(struct fib_table *, struct fib_config *);
+int fib_table_insert(struct net *net, struct fib_table *, struct fib_config *);
+int fib_table_delete(struct net *net, struct fib_table *, struct fib_config *);
 int fib_table_dump(struct fib_table *table, struct sk_buff *skb,
 		   struct netlink_callback *cb);
-int fib_table_flush(struct fib_table *table);
+int fib_table_flush(struct net *net, struct fib_table *table);
 void fib_free_table(struct fib_table *tb);
 
-
+void fib_replace_table(struct net *net, struct fib_table *old,
+		       struct fib_table *new);
 
 #ifndef CONFIG_IP_MULTIPLE_TABLES
 
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 220c4b4..71979ed 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -126,6 +126,29 @@ struct fib_table *fib_get_table(struct net *net, u32 id)
 }
 #endif /* CONFIG_IP_MULTIPLE_TABLES */
 
+void fib_replace_table(struct net *net, struct fib_table *old,
+		       struct fib_table *new)
+{
+#ifdef CONFIG_IP_MULTIPLE_TABLES
+	switch (new->tb_id) {
+	case RT_TABLE_LOCAL:
+		rcu_assign_pointer(net->ipv4.fib_local, new);
+		break;
+	case RT_TABLE_MAIN:
+		rcu_assign_pointer(net->ipv4.fib_main, new);
+		break;
+	case RT_TABLE_DEFAULT:
+		rcu_assign_pointer(net->ipv4.fib_default, new);
+		break;
+	default:
+		break;
+	}
+
+#endif
+	/* replace the old table in the hlist */
+	hlist_replace_rcu(&old->tb_hlist, &new->tb_hlist);
+}
+
 static void fib_flush(struct net *net)
 {
 	int flushed = 0;
@@ -137,7 +160,7 @@ static void fib_flush(struct net *net)
 		struct fib_table *tb;
 
 		hlist_for_each_entry_safe(tb, tmp, head, tb_hlist)
-			flushed += fib_table_flush(tb);
+			flushed += fib_table_flush(net, tb);
 	}
 
 	if (flushed)
@@ -499,13 +522,13 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 			if (cmd == SIOCDELRT) {
 				tb = fib_get_table(net, cfg.fc_table);
 				if (tb)
-					err = fib_table_delete(tb, &cfg);
+					err = fib_table_delete(net, tb, &cfg);
 				else
 					err = -ESRCH;
 			} else {
 				tb = fib_new_table(net, cfg.fc_table);
 				if (tb)
-					err = fib_table_insert(tb, &cfg);
+					err = fib_table_insert(net, tb, &cfg);
 				else
 					err = -ENOBUFS;
 			}
@@ -620,7 +643,7 @@ static int inet_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 		goto errout;
 	}
 
-	err = fib_table_delete(tb, &cfg);
+	err = fib_table_delete(net, tb, &cfg);
 errout:
 	return err;
 }
@@ -642,7 +665,7 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 		goto errout;
 	}
 
-	err = fib_table_insert(tb, &cfg);
+	err = fib_table_insert(net, tb, &cfg);
 errout:
 	return err;
 }
@@ -729,9 +752,9 @@ static void fib_magic(int cmd, int type, __be32 dst, int dst_len, struct in_ifad
 		cfg.fc_scope = RT_SCOPE_HOST;
 
 	if (cmd == RTM_NEWROUTE)
-		fib_table_insert(tb, &cfg);
+		fib_table_insert(net, tb, &cfg);
 	else
-		fib_table_delete(tb, &cfg);
+		fib_table_delete(net, tb, &cfg);
 }
 
 void fib_add_ifaddr(struct in_ifaddr *ifa)
@@ -1128,7 +1151,7 @@ static void ip_fib_net_exit(struct net *net)
 		 * tnodes at the root as the table shrinks.
 		 */
 		hlist_for_each_entry_safe(tb, tmp, head, tb_hlist)
-			fib_table_flush(tb);
+			fib_table_flush(net, tb);
 
 		hlist_for_each_entry_safe(tb, tmp, head, tb_hlist) {
 #ifdef CONFIG_IP_MULTIPLE_TABLES
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 432a875..2db318e 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -143,7 +143,8 @@ struct trie {
 	struct rcu_head rcu;
 };
 
-static struct key_vector *resize(struct trie *t, struct key_vector *tn);
+static struct key_vector *resize(struct net *net, struct trie *t,
+				 struct key_vector *tn);
 static size_t tnode_free_size;
 
 /*
@@ -468,7 +469,7 @@ static void tnode_free(struct key_vector *tn)
 	}
 }
 
-static struct key_vector *replace(struct trie *t,
+static struct key_vector *replace(struct net *net, struct trie *t,
 				  struct key_vector *oldtnode,
 				  struct key_vector *tn)
 {
@@ -491,13 +492,13 @@ static struct key_vector *replace(struct trie *t,
 
 		/* resize child node */
 		if (tnode_full(tn, inode))
-			tn = resize(t, inode);
+			tn = resize(net, t, inode);
 	}
 
 	return tp;
 }
 
-static struct key_vector *inflate(struct trie *t,
+static struct key_vector *inflate(struct net *net, struct trie *t,
 				  struct key_vector *oldtnode)
 {
 	struct key_vector *tn;
@@ -585,7 +586,7 @@ static struct key_vector *inflate(struct trie *t,
 	}
 
 	/* setup the parent pointers into and out of this node */
-	return replace(t, oldtnode, tn);
+	return replace(net, t, oldtnode, tn);
 nomem:
 	/* all pointers should be clean so we are done */
 	tnode_free(tn);
@@ -593,7 +594,7 @@ notnode:
 	return NULL;
 }
 
-static struct key_vector *halve(struct trie *t,
+static struct key_vector *halve(struct net *net, struct trie *t,
 				struct key_vector *oldtnode)
 {
 	struct key_vector *tn;
@@ -640,7 +641,7 @@ static struct key_vector *halve(struct trie *t,
 	}
 
 	/* setup the parent pointers into and out of this node */
-	return replace(t, oldtnode, tn);
+	return replace(net, t, oldtnode, tn);
 nomem:
 	/* all pointers should be clean so we are done */
 	tnode_free(tn);
@@ -648,7 +649,7 @@ notnode:
 	return NULL;
 }
 
-static struct key_vector *collapse(struct trie *t,
+static struct key_vector *collapse(struct net *net, struct trie *t,
 				   struct key_vector *oldtnode)
 {
 	struct key_vector *n, *tp;
@@ -805,7 +806,8 @@ static inline bool should_collapse(struct key_vector *tn)
 }
 
 #define MAX_WORK 10
-static struct key_vector *resize(struct trie *t, struct key_vector *tn)
+static struct key_vector *resize(struct net *net, struct trie *t,
+				 struct key_vector *tn)
 {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	struct trie_use_stats __percpu *stats = t->stats;
@@ -827,7 +829,7 @@ static struct key_vector *resize(struct trie *t, struct key_vector *tn)
 	 * nonempty nodes that are above the threshold.
 	 */
 	while (should_inflate(tp, tn) && max_work) {
-		tp = inflate(t, tn);
+		tp = inflate(net, t, tn);
 		if (!tp) {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 			this_cpu_inc(stats->resize_node_skipped);
@@ -847,7 +849,7 @@ static struct key_vector *resize(struct trie *t, struct key_vector *tn)
 	 * node is above threshold.
 	 */
 	while (should_halve(tp, tn) && max_work) {
-		tp = halve(t, tn);
+		tp = halve(net, t, tn);
 		if (!tp) {
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 			this_cpu_inc(stats->resize_node_skipped);
@@ -861,7 +863,7 @@ static struct key_vector *resize(struct trie *t, struct key_vector *tn)
 
 	/* Only one child remains */
 	if (should_collapse(tn))
-		return collapse(t, tn);
+		return collapse(net, t, tn);
 
 	/* update parent in case inflate or halve failed */
 	tp = node_parent(tn);
@@ -961,16 +963,16 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen,
 	return NULL;
 }
 
-static struct fib_table *trie_rebalance(struct trie *t,
+static struct fib_table *trie_rebalance(struct net *net, struct trie *t,
 					struct key_vector *tn)
 {
 	while (!IS_TRIE(tn))
-		tn = resize(t, tn);
+		tn = resize(net, t, tn);
 
 	return table_info(tn);
 }
 
-static struct fib_table *fib_insert_node(struct trie *t,
+static struct fib_table *fib_insert_node(struct net *net, struct trie *t,
 					 struct key_vector *tp,
 					 struct fib_alias *new,
 					 t_key key)
@@ -1013,14 +1015,14 @@ static struct fib_table *fib_insert_node(struct trie *t,
 	NODE_INIT_PARENT(l, tp);
 	put_child_root(tp, key, l);
 
-	return trie_rebalance(t, tp);
+	return trie_rebalance(net, t, tp);
 notnode:
 	node_free(l);
 noleaf:
 	return NULL;
 }
 
-static struct fib_table *fib_insert_alias(struct trie *t,
+static struct fib_table *fib_insert_alias(struct net *net, struct trie *t,
 					  struct key_vector *tp,
 					  struct key_vector *l,
 					  struct fib_alias *new,
@@ -1028,7 +1030,7 @@ static struct fib_table *fib_insert_alias(struct trie *t,
 					  t_key key)
 {
 	if (!l)
-		return fib_insert_node(t, tp, new, key);
+		return fib_insert_node(net, t, tp, new, key);
 
 	if (!fa) {
 		struct fib_alias *last;
@@ -1055,7 +1057,8 @@ static struct fib_table *fib_insert_alias(struct trie *t,
 }
 
 /* Caller must hold RTNL. */
-int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
+int fib_table_insert(struct net *net, struct fib_table *tb,
+		     struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
 	struct fib_alias *fa, *new_fa;
@@ -1185,7 +1188,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 
 	/* Insert new entry to the list. */
 	err = -ENOMEM;
-	tb = fib_insert_alias(t, tp, l, new_fa, fa, key);
+	tb = fib_insert_alias(net, t, tp, l, new_fa, fa, key);
 	if (!tb)
 		goto out_free_new_fa;
 
@@ -1384,8 +1387,9 @@ found:
 }
 EXPORT_SYMBOL_GPL(fib_table_lookup);
 
-static void fib_remove_alias(struct trie *t, struct key_vector *tp,
-			     struct key_vector *l, struct fib_alias *old)
+static void fib_remove_alias(struct net *net, struct trie *t,
+			     struct key_vector *tp, struct key_vector *l,
+			     struct fib_alias *old)
 {
 	/* record the location of the previous list_info entry */
 	struct hlist_node **pprev = old->fa_list.pprev;
@@ -1400,7 +1404,7 @@ static void fib_remove_alias(struct trie *t, struct key_vector *tp,
 	if (hlist_empty(&l->leaf)) {
 		put_child_root(tp, l->key, NULL);
 		node_free(l);
-		trie_rebalance(t, tp);
+		trie_rebalance(net, t, tp);
 		return;
 	}
 
@@ -1414,7 +1418,8 @@ static void fib_remove_alias(struct trie *t, struct key_vector *tp,
 }
 
 /* Caller must hold RTNL. */
-int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
+int fib_table_delete(struct net *net, struct fib_table *tb,
+		     struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *) tb->tb_data;
 	struct fib_alias *fa, *fa_to_delete;
@@ -1471,7 +1476,7 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	if (!plen)
 		tb->tb_num_default--;
 
-	fib_remove_alias(t, tp, l, fa_to_delete);
+	fib_remove_alias(net, t, tp, l, fa_to_delete);
 
 	if (fa_to_delete->fa_state & FA_S_ACCESSED)
 		rt_cache_flush(cfg->fc_nlinfo.nl_net);
@@ -1540,7 +1545,7 @@ found:
 }
 
 /* Caller must hold RTNL. */
-int fib_table_flush(struct fib_table *tb)
+int fib_table_flush(struct net *net, struct fib_table *tb)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
 	struct key_vector *pn = t->kv;
@@ -1562,7 +1567,7 @@ int fib_table_flush(struct fib_table *tb)
 				break;
 
 			/* resize completed node */
-			pn = resize(t, pn);
+			pn = resize(net, t, pn);
 			cindex = get_index(pkey, pn);
 
 			continue;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 21/29] fib_trie: Rewrite handling of RCU to include parent in replacement
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (19 preceding siblings ...)
  2015-02-24 20:50 ` [RFC PATCH 20/29] fib_trie: Push net pointer down into fib_trie insert/delete/flush calls Alexander Duyck
@ 2015-02-24 20:50 ` Alexander Duyck
  2015-02-24 20:50 ` [RFC PATCH 22/29] fib_trie: Allocate tnode as array of key_vectors instead of key_vector as array of tnode pointers Alexander Duyck
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:50 UTC (permalink / raw
  To: netdev

This change makes it so that when we either insert, or resize a tnode or
leaf we will also replace the parent of that tnode or leaf.  This is
necessary to allow for the introduction of a key_vector array inside of the
root node and tnodes.  By doing this it will be possible to push the values
currently contained in the key_vector up one level so that they can be
accessed sooner.

This is still a work in progress.  The current implementation makes resize
more expensive since we have to re-allocate the node for each child that
gets resized.  I hope to make it so that we can cut this down to at most 2
allocations per node in the future.

For example if I allocate 8K subnets on a dummy interface it used to take
.6 seconds to remove dummy0, now it takes ~2.4.  Most of this is due to the
fact that there are 8K child tnodes that are collapsed meaning that the
parent tnode has to be replaced 8K times.

The same issue doesn't seem to occur though if I am using only routes.  So
for example if I assing 8K routes to the same interface it still removes
all 8K very quickly. I believe this is due to the fact that the local table
is doing a mish-mash of the fib_table_flush and fib_table_delete whereas
the main routing table is likely handling all routes by flagging them as
dead and then flushing them.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |  320 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 217 insertions(+), 103 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 2db318e..58c8a89 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -136,6 +136,10 @@ struct trie_stat {
 };
 
 struct trie {
+	/* key vector must be first to allow for use of tb->tb_data to get
+	 * get the key vector OR trie as there are a few spots where getting
+	 * the kv via the trie is messier than just getting it from tb_data.
+	 */
 	struct key_vector kv[1];
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	struct trie_use_stats __percpu *stats;
@@ -179,12 +183,7 @@ static inline struct fib_table *table_info(struct key_vector *kv)
 #define get_child_rcu(tn, i) rcu_dereference_rtnl((tn)->tnode[i])
 
 /* wrapper for rcu_assign_pointer */
-static inline void node_set_parent(struct key_vector *n, struct key_vector *tp)
-{
-	if (n)
-		rcu_assign_pointer(tn_info(n)->parent, tp);
-}
-
+#define node_set_parent(n, p) rcu_assign_pointer(tn_info(n)->parent, p)
 #define NODE_INIT_PARENT(n, p) RCU_INIT_POINTER(tn_info(n)->parent, p)
 
 /* This provides us with the number of children in this node, in the case of a
@@ -339,35 +338,6 @@ static struct key_vector *leaf_new(t_key key, struct fib_alias *fa)
 	return l;
 }
 
-static struct key_vector *tnode_new(t_key key, int pos, int bits)
-{
-	size_t sz = TNODE_SIZE(1ul << bits);
-	struct tnode *tnode = tnode_alloc(sz);
-	unsigned int shift = pos + bits;
-	struct key_vector *tn = tnode->kv;
-
-	/* verify bits and pos their msb bits clear and values are valid */
-	BUG_ON(!bits || (shift > KEYLENGTH));
-
-	pr_debug("AT %p s=%zu %zu\n", tnode, TNODE_SIZE(0),
-		 sizeof(struct key_vector *) << bits);
-
-	if (!tnode)
-		return NULL;
-
-	if (bits == KEYLENGTH)
-		tnode->full_children = 1;
-	else
-		tnode->empty_children = 1ul << bits;
-
-	tn->key = (shift < KEYLENGTH) ? (key >> shift) << shift : 0;
-	tn->pos = pos;
-	tn->bits = bits;
-	tn->slen = pos;
-
-	return tn;
-}
-
 /* Check whether a tnode 'n' is "full", i.e. it is an internal node
  * and no bits are skipped. See discussion in dyntree paper p. 6
  */
@@ -413,7 +383,7 @@ static void update_children(struct key_vector *tn)
 	unsigned long i;
 
 	/* update all of the child parent pointers */
-	for (i = child_length(tn); i;) {
+	for (i = IS_TRIE(tn) ? 1 : child_length(tn); i;) {
 		struct key_vector *inode = get_child(tn, --i);
 
 		if (!inode)
@@ -439,6 +409,106 @@ static inline void put_child_root(struct key_vector *tp, t_key key,
 		put_child(tp, get_index(key, tp), n);
 }
 
+static struct key_vector *tnode_new(struct key_vector *pn, t_key key,
+				    int pos, int bits)
+{
+	size_t sz = TNODE_SIZE(1ul << bits);
+	struct tnode *tnode = tnode_alloc(sz);
+	struct key_vector *tn = tnode->kv;
+	unsigned int shift = pos + bits;
+
+	/* verify bits and pos their msb bits clear and values are valid */
+	BUG_ON(!bits || (shift > KEYLENGTH));
+
+	pr_debug("AT %p s=%zu %zu\n", tnode, TNODE_SIZE(0),
+		 sizeof(struct key_vector *) << bits);
+
+	if (!tnode)
+		return NULL;
+
+	/* populate tn_info section */
+	if (bits == KEYLENGTH)
+		tnode->full_children = 1;
+	else
+		tnode->empty_children = 1ul << bits;
+
+	/* populate key vector */
+	tn->key = (shift < KEYLENGTH) ? (key >> shift) << shift : 0;
+	tn->pos = pos;
+	tn->bits = bits;
+	tn->slen = pos;
+
+	/* link parent to node */
+	NODE_INIT_PARENT(tn, pn);
+	put_child_root(pn, tn->key, tn);
+
+	return tn;
+}
+
+static struct key_vector *tnode_clone(struct tnode *oldtnode)
+{
+	size_t sz = TNODE_SIZE(1ul << oldtnode->tn_bits);
+	struct tnode *tn = tnode_alloc(sz);
+
+	if (!tn)
+		return NULL;
+
+	memcpy(tn, oldtnode, sz);
+	return tn->kv;
+}
+
+static void tnode_replace(struct key_vector *oldtn,
+			  struct key_vector *tn)
+{
+	struct key_vector *pn = node_parent(oldtn);
+	unsigned long i = get_index(tn->key, pn);
+
+	/* setup the parent pointer out of and back into this node */
+	rcu_assign_pointer(pn->tnode[i], tn);
+
+	/* update all of the child parent pointers */
+	update_children(tn);
+
+	/* all pointers should be clean so we are done */
+	node_free(oldtn);
+}
+
+static struct key_vector *fib_table_clone(struct fib_table *oldtb)
+{
+	size_t sz = sizeof(struct fib_table) + sizeof(struct trie);
+	struct fib_table *tb;
+
+	tb = kmalloc(sz, GFP_KERNEL);
+	if (!tb)
+		return NULL;
+
+	memcpy(tb, oldtb, sz);
+	return (struct key_vector *)tb->tb_data;
+}
+
+static void __trie_drop_rcu(struct rcu_head *head)
+{
+	struct trie *t = container_of(head, struct trie, rcu);
+
+	kfree(table_info(t->kv));
+}
+
+static inline void fib_trie_replace(struct net *net,
+				    struct key_vector *oldtn,
+				    struct key_vector *tn)
+{
+	struct fib_table *tb = table_info(oldtn);
+	struct trie *t = (struct trie *)tb->tb_data;
+
+	/* replace the old table */
+	fib_replace_table(net, tb, table_info(tn));
+
+	/* update any child pointers */
+	update_children(tn);
+
+	call_rcu(&t->rcu, __trie_drop_rcu);
+}
+
 static inline void tnode_free_init(struct key_vector *tn)
 {
 	tn_info(tn)->rcu.next = NULL;
@@ -457,7 +527,7 @@ static void tnode_free(struct key_vector *tn)
 
 	while (head) {
 		head = head->next;
-		tnode_free_size += TNODE_SIZE(1 << tn->bits);
+		tnode_free_size += TNODE_SIZE(1ul << tn->bits);
 		node_free(tn);
 
 		tn = container_of(head, struct tnode, rcu)->kv;
@@ -469,22 +539,37 @@ static void tnode_free(struct key_vector *tn)
 	}
 }
 
-static struct key_vector *replace(struct net *net, struct trie *t,
-				  struct key_vector *oldtnode,
-				  struct key_vector *tn)
+static struct key_vector *vector_clone(struct key_vector *kv)
 {
-	struct key_vector *tp = node_parent(oldtnode);
-	unsigned long i;
+	/* generate a clone of either the trie, or the tnode based
+	 * if the key vector indicates if it is the root.
+	 */
+	return IS_TRIE(kv) ? fib_table_clone(table_info(kv)) :
+			     tnode_clone(tn_info(kv));
+}
 
-	/* setup the parent pointer out of and back into this node */
-	NODE_INIT_PARENT(tn, tp);
-	put_child_root(tp, tn->key, tn);
+static void vector_free(struct key_vector *kv)
+{
+	if (IS_TRIE(kv))
+		kfree(table_info(kv));
+	else
+		node_free(kv);
+}
 
-	/* update all of the child parent pointers */
-	update_children(tn);
+static void vector_replace(struct net *net, struct key_vector *oldtn,
+			   struct key_vector *tn)
+{
+	/* setup the parent pointer out of and back into this node */
+	if (IS_TRIE(oldtn))
+		fib_trie_replace(net, oldtn, tn);
+	else
+		tnode_replace(oldtn, tn);
+}
 
-	/* all pointers should be clean so we are done */
-	tnode_free(oldtnode);
+static struct key_vector *resize_children(struct net *net, struct trie *t,
+					  struct key_vector *tn)
+{
+	unsigned long i;
 
 	/* resize children now that oldtnode is freed */
 	for (i = child_length(tn); i;) {
@@ -495,19 +580,25 @@ static struct key_vector *replace(struct net *net, struct trie *t,
 			tn = resize(net, t, inode);
 	}
 
-	return tp;
+	return node_parent(tn);
 }
 
 static struct key_vector *inflate(struct net *net, struct trie *t,
 				  struct key_vector *oldtnode)
 {
-	struct key_vector *tn;
+	struct key_vector *tn, *pn;
 	unsigned long i;
 	t_key m;
 
 	pr_debug("In inflate\n");
 
-	tn = tnode_new(oldtnode->key, oldtnode->pos - 1, oldtnode->bits + 1);
+	/* clone parent for us to place new tnode into */
+	pn = vector_clone(node_parent(oldtnode));
+	if (!pn)
+		return NULL;
+
+	tn = tnode_new(pn, oldtnode->key,
+		       oldtnode->pos - 1, oldtnode->bits + 1);
 	if (!tn)
 		goto notnode;
 
@@ -558,10 +649,12 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 		 * node0 and node1. So... we synthesize that bit in the
 		 * two new keys.
 		 */
-		node1 = tnode_new(inode->key | m, inode->pos, inode->bits - 1);
+		node1 = tnode_new(tn, inode->key | m,
+				  inode->pos, inode->bits - 1);
 		if (!node1)
 			goto nomem;
-		node0 = tnode_new(inode->key, inode->pos, inode->bits - 1);
+		node0 = tnode_new(tn, inode->key,
+				  inode->pos, inode->bits - 1);
 
 		tnode_free_append(tn, node1);
 		if (!node0)
@@ -575,40 +668,40 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 			put_child(node1, --j, get_child(inode, --k));
 			put_child(node0, j, get_child(inode, j));
 		}
-
-		/* link new nodes to parent */
-		NODE_INIT_PARENT(node1, tn);
-		NODE_INIT_PARENT(node0, tn);
-
-		/* link parent to nodes */
-		put_child(tn, 2 * i + 1, node1);
-		put_child(tn, 2 * i, node0);
 	}
 
-	/* setup the parent pointers into and out of this node */
-	return replace(net, t, oldtnode, tn);
+	/* swap new parent for old and free oldtnode */
+	vector_replace(net, node_parent(oldtnode), pn);
+	tnode_free(oldtnode);
+
+	return resize_children(net, t, tn);
 nomem:
 	/* all pointers should be clean so we are done */
 	tnode_free(tn);
 notnode:
+	vector_free(pn);
+
 	return NULL;
 }
 
 static struct key_vector *halve(struct net *net, struct trie *t,
 				struct key_vector *oldtnode)
 {
-	struct key_vector *tn;
+	struct key_vector *tn, *pn;
 	unsigned long i;
 
 	pr_debug("In halve\n");
 
-	tn = tnode_new(oldtnode->key, oldtnode->pos + 1, oldtnode->bits - 1);
+	/* clone parent for us to place new tnode into */
+	pn = vector_clone(node_parent(oldtnode));
+	if (!pn)
+		return NULL;
+
+	tn = tnode_new(pn, oldtnode->key,
+		       oldtnode->pos + 1, oldtnode->bits - 1);
 	if (!tn)
 		goto notnode;
 
-	/* prepare oldtnode to be freed */
-	tnode_free_init(oldtnode);
-
 	/* Assemble all of the pointers in our cluster, in this case that
 	 * represents all of the pointers out of our allocated nodes that
 	 * point to existing tnodes and the links between our allocated
@@ -626,7 +719,7 @@ static struct key_vector *halve(struct net *net, struct trie *t,
 		}
 
 		/* Two nonempty children */
-		inode = tnode_new(node0->key, oldtnode->pos, 1);
+		inode = tnode_new(tn, node0->key, oldtnode->pos, 1);
 		if (!inode)
 			goto nomem;
 		tnode_free_append(tn, inode);
@@ -634,40 +727,56 @@ static struct key_vector *halve(struct net *net, struct trie *t,
 		/* initialize pointers out of node */
 		put_child(inode, 1, node1);
 		put_child(inode, 0, node0);
-		NODE_INIT_PARENT(inode, tn);
-
-		/* link parent to node */
-		put_child(tn, i / 2, inode);
 	}
 
-	/* setup the parent pointers into and out of this node */
-	return replace(net, t, oldtnode, tn);
+	/* swap new parent for old and free oldtnode */
+	vector_replace(net, node_parent(oldtnode), pn);
+	node_free(oldtnode);
+
+	return resize_children(net, t, tn);
 nomem:
 	/* all pointers should be clean so we are done */
 	tnode_free(tn);
 notnode:
+	vector_free(pn);
+
 	return NULL;
 }
 
 static struct key_vector *collapse(struct net *net, struct trie *t,
 				   struct key_vector *oldtnode)
 {
-	struct key_vector *n, *tp;
+	struct key_vector *pn = node_parent(oldtnode);
 	unsigned long i;
 
 	/* scan the tnode looking for that one child that might still exist */
-	for (n = NULL, i = child_length(oldtnode); !n && i;)
-		n = get_child(oldtnode, --i);
+	for (i = child_length(oldtnode); i--;) {
+		struct key_vector *n = get_child(oldtnode, i);
+
+		if (!n)
+			continue;
 
-	/* compress one level */
-	tp = node_parent(oldtnode);
-	put_child_root(tp, oldtnode->key, n);
-	node_set_parent(n, tp);
+		/* attempt to clone parent, on failure return old parent */
+		pn = vector_clone(pn);
+		if (!pn)
+			return node_parent(oldtnode);
 
-	/* drop dead node */
+		/* compress one level */
+		put_child_root(pn, oldtnode->key, n);
+
+		/* drop dead node */
+		vector_replace(net, node_parent(oldtnode), pn);
+		node_free(oldtnode);
+
+		/* resize child since it could be promoted to root */
+		return IS_TNODE(n) ? resize(net, t, n) : pn;
+	}
+
+	/* no children, just update pointer to NULL */
+	put_child_root(pn, oldtnode->key, NULL);
 	node_free(oldtnode);
 
-	return tp;
+	return pn;
 }
 
 static unsigned char update_suffix(struct key_vector *tn)
@@ -974,11 +1083,16 @@ static struct fib_table *trie_rebalance(struct net *net, struct trie *t,
 
 static struct fib_table *fib_insert_node(struct net *net, struct trie *t,
 					 struct key_vector *tp,
-					 struct fib_alias *new,
-					 t_key key)
+					 struct fib_alias *new, t_key key)
 {
-	struct key_vector *n, *l;
+	struct key_vector *tn, *l, *n;
 
+	/* allocate the new parent that must be replaced */
+	tn = vector_clone(tp);
+	if (!tn)
+		return NULL;
+
+	/* allocate the new leaf we will insert */
 	l = leaf_new(key, new);
 	if (!l)
 		goto noleaf;
@@ -993,32 +1107,33 @@ static struct fib_table *fib_insert_node(struct net *net, struct trie *t,
 	 *  leaves us in position for handling as case 3
 	 */
 	if (n) {
-		struct key_vector *tn;
-
-		tn = tnode_new(key, __fls(key ^ n->key), 1);
+		tn = tnode_new(tn, key, __fls(key ^ n->key), 1);
 		if (!tn)
 			goto notnode;
 
 		/* initialize routes out of node */
-		NODE_INIT_PARENT(tn, tp);
 		put_child(tn, get_index(key, tn) ^ 1, n);
 
-		/* start adding routes into the node */
-		put_child_root(tp, key, tn);
-		node_set_parent(n, tn);
-
-		/* parent now has a NULL spot where the leaf can go */
-		tp = tn;
+		/* pop back out to bring tn to the same level as tp */
+		n = tn;
+		tn = node_parent(tn);
+	} else {
+		/* indicate we are inserting at parent */
+		n = tn;
 	}
 
 	/* Case 3: n is NULL, and will just insert a new leaf */
-	NODE_INIT_PARENT(l, tp);
-	put_child_root(tp, key, l);
+	NODE_INIT_PARENT(l, n);
+	put_child_root(n, key, l);
 
-	return trie_rebalance(net, t, tp);
+	vector_replace(net, tp, tn);
+
+	return trie_rebalance(net, t, n);
 notnode:
 	node_free(l);
 noleaf:
+	vector_free(tn);
+
 	return NULL;
 }
 
@@ -1026,8 +1141,7 @@ static struct fib_table *fib_insert_alias(struct net *net, struct trie *t,
 					  struct key_vector *tp,
 					  struct key_vector *l,
 					  struct fib_alias *new,
-					  struct fib_alias *fa,
-					  t_key key)
+					  struct fib_alias *fa, t_key key)
 {
 	if (!l)
 		return fib_insert_node(net, t, tp, new, key);

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 22/29] fib_trie: Allocate tnode as array of key_vectors instead of key_vector as array of tnode pointers
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (20 preceding siblings ...)
  2015-02-24 20:50 ` [RFC PATCH 21/29] fib_trie: Rewrite handling of RCU to include parent in replacement Alexander Duyck
@ 2015-02-24 20:50 ` Alexander Duyck
  2015-02-24 20:50 ` [RFC PATCH 23/29] fib_trie: Add leaf_init Alexander Duyck
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:50 UTC (permalink / raw
  To: netdev

This change makes it so that we allocate a tnode as an array of key vectors
instead of a key vector with an array of tnode pointers.  By doing this we
set things up for a 1 cache line reduction since this places the key, key
info, and pointer to the next object all within 16B on 64b systems, and
within 12B on 32b systems.

In addition I have reordered the layout of the key_vector so that the leaf
and tnode pointers are at the start of the structure.  By doing this what
we end up doing is effectively navigating a list of pointers one after the
other to get to the fib_alias we are looking for, and we make it so that
there are no holes in the structure.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   99 ++++++++++++++++++++++++++-------------------------
 1 file changed, 51 insertions(+), 48 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 58c8a89..f9abbf4 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -93,16 +93,16 @@ typedef unsigned int t_key;
 #define IS_LEAF(n)	(!(n)->bits)
 
 struct key_vector {
-	t_key key;
-	unsigned char pos;		/* 2log(KEYLENGTH) bits needed */
-	unsigned char bits;		/* 2log(KEYLENGTH) bits needed */
-	unsigned char slen;
 	union {
 		/* This list pointer if valid if bits == 0 (LEAF) */
 		struct hlist_head leaf;
 		/* The fields in this struct are valid if bits > 0 (TNODE) */
-		struct key_vector __rcu *tnode[0];
+		struct key_vector __rcu *tnode;
 	};
+	t_key key;
+	unsigned char pos;		/* 2log(KEYLENGTH) bits needed */
+	unsigned char bits;		/* 2log(KEYLENGTH) bits needed */
+	unsigned char slen;
 };
 
 struct tnode {
@@ -110,7 +110,7 @@ struct tnode {
 	t_key empty_children;		/* KEYLENGTH bits needed */
 	t_key full_children;		/* KEYLENGTH bits needed */
 	struct key_vector __rcu *parent;
-	struct key_vector kv[1];
+	struct key_vector kv[0];
 #define tn_bits kv[0].bits
 };
 
@@ -176,11 +176,11 @@ static inline struct fib_table *table_info(struct key_vector *kv)
 
 /* caller must hold RTNL */
 #define node_parent(tn) rtnl_dereference(tn_info(tn)->parent)
-#define get_child(tn, i) rtnl_dereference((tn)->tnode[i])
+#define get_child(tn) rtnl_dereference((tn)->tnode)
 
 /* caller must hold RCU read lock or RTNL */
 #define node_parent_rcu(tn) rcu_dereference_rtnl(tn_info(tn)->parent)
-#define get_child_rcu(tn, i) rcu_dereference_rtnl((tn)->tnode[i])
+#define get_child_rcu(tn) rcu_dereference_rtnl((tn)->tnode)
 
 /* wrapper for rcu_assign_pointer */
 #define node_set_parent(n, p) rcu_assign_pointer(tn_info(n)->parent, p)
@@ -281,9 +281,9 @@ static inline void alias_free_mem_rcu(struct fib_alias *fa)
 	call_rcu(&fa->rcu, __alias_free_mem);
 }
 
-#define TNODE_SIZE(n) offsetof(struct tnode, kv[0].tnode[n])
+#define TNODE_SIZE(n) offsetof(struct tnode, kv[n])
 #define TNODE_KMALLOC_MAX \
-	ilog2((PAGE_SIZE - TNODE_SIZE(0)) / sizeof(struct key_vector *))
+	ilog2((PAGE_SIZE - TNODE_SIZE(0)) / sizeof(struct key_vector))
 
 static void __node_free_rcu(struct rcu_head *head)
 {
@@ -352,7 +352,7 @@ static inline int tnode_full(struct key_vector *tn, struct key_vector *n)
 static void put_child(struct key_vector *tn, unsigned long i,
 		      struct key_vector *n)
 {
-	struct key_vector *chi = get_child(tn, i);
+	struct key_vector *chi = get_child(tn + i);
 	int isfull, wasfull;
 
 	BUG_ON(i >= child_length(tn));
@@ -375,7 +375,10 @@ static void put_child(struct key_vector *tn, unsigned long i,
 	if (n && (tn->slen < n->slen))
 		tn->slen = n->slen;
 
-	rcu_assign_pointer(tn->tnode[i], n);
+	/* update offset to correct key_vector for update */
+	tn += i;
+
+	rcu_assign_pointer(tn->tnode, n);
 }
 
 static void update_children(struct key_vector *tn)
@@ -384,7 +387,7 @@ static void update_children(struct key_vector *tn)
 
 	/* update all of the child parent pointers */
 	for (i = IS_TRIE(tn) ? 1 : child_length(tn); i;) {
-		struct key_vector *inode = get_child(tn, --i);
+		struct key_vector *inode = get_child(tn + --i);
 
 		if (!inode)
 			continue;
@@ -404,7 +407,7 @@ static inline void put_child_root(struct key_vector *tp, t_key key,
 				  struct key_vector *n)
 {
 	if (IS_TRIE(tp))
-		rcu_assign_pointer(tp->tnode[0], n);
+		rcu_assign_pointer(tp->tnode, n);
 	else
 		put_child(tp, get_index(key, tp), n);
 }
@@ -461,10 +464,12 @@ static void tnode_replace(struct key_vector *oldtn,
 			  struct key_vector *tn)
 {
 	struct key_vector *pn = node_parent(oldtn);
-	unsigned long i = get_index(tn->key, pn);
+
+	/* update offset to correct key_vector for pointer update */
+	pn += get_index(tn->key, pn);
 
 	/* setup the parent pointer out of and back into this node */
-	rcu_assign_pointer(pn->tnode[i], tn);
+	rcu_assign_pointer(pn->tnode, tn);
 
 	/* update all of the child parent pointers */
 	update_children(tn);
@@ -573,7 +578,7 @@ static struct key_vector *resize_children(struct net *net, struct trie *t,
 
 	/* resize children now that oldtnode is freed */
 	for (i = child_length(tn); i;) {
-		struct key_vector *inode = get_child(tn, --i);
+		struct key_vector *inode = get_child(tn + --i);
 
 		/* resize child node */
 		if (tnode_full(tn, inode))
@@ -611,7 +616,7 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 	 * nodes.
 	 */
 	for (i = child_length(oldtnode), m = 1u << tn->pos; i;) {
-		struct key_vector *inode = get_child(oldtnode, --i);
+		struct key_vector *inode = get_child(oldtnode + --i);
 		struct key_vector *node0, *node1;
 		unsigned long j, k;
 
@@ -630,8 +635,8 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 
 		/* An internal node with two children */
 		if (inode->bits == 1) {
-			put_child(tn, 2 * i + 1, get_child(inode, 1));
-			put_child(tn, 2 * i, get_child(inode, 0));
+			put_child(tn, 2 * i + 1, get_child(inode + 1));
+			put_child(tn, 2 * i, get_child(inode));
 			continue;
 		}
 
@@ -663,10 +668,10 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 
 		/* populate child pointers in new nodes */
 		for (k = child_length(inode), j = k / 2; j;) {
-			put_child(node1, --j, get_child(inode, --k));
-			put_child(node0, j, get_child(inode, j));
-			put_child(node1, --j, get_child(inode, --k));
-			put_child(node0, j, get_child(inode, j));
+			put_child(node1, --j, get_child(inode + --k));
+			put_child(node0, j, get_child(inode + j));
+			put_child(node1, --j, get_child(inode + --k));
+			put_child(node0, j, get_child(inode + j));
 		}
 	}
 
@@ -708,8 +713,8 @@ static struct key_vector *halve(struct net *net, struct trie *t,
 	 * nodes.
 	 */
 	for (i = child_length(oldtnode); i;) {
-		struct key_vector *node1 = get_child(oldtnode, --i);
-		struct key_vector *node0 = get_child(oldtnode, --i);
+		struct key_vector *node1 = get_child(oldtnode + --i);
+		struct key_vector *node0 = get_child(oldtnode + --i);
 		struct key_vector *inode;
 
 		/* At least one of the children is empty */
@@ -751,7 +756,7 @@ static struct key_vector *collapse(struct net *net, struct trie *t,
 
 	/* scan the tnode looking for that one child that might still exist */
 	for (i = child_length(oldtnode); i--;) {
-		struct key_vector *n = get_child(oldtnode, i);
+		struct key_vector *n = get_child(oldtnode + i);
 
 		if (!n)
 			continue;
@@ -790,7 +795,7 @@ static unsigned char update_suffix(struct key_vector *tn)
 	 * represent the nodes with suffix length equal to tn->pos
 	 */
 	for (i = 0, stride = 0x2ul ; i < child_length(tn); i += stride) {
-		struct key_vector *n = get_child(tn, i);
+		struct key_vector *n = get_child(tn + i);
 
 		if (!n || (n->slen <= slen))
 			continue;
@@ -932,7 +937,7 @@ static struct key_vector *resize(struct net *net, struct trie *t,
 	 * doing it ourselves.  This way we can let RCU fully do its
 	 * thing without us interfering
 	 */
-	BUG_ON(tn != get_child(tp, cindex));
+	BUG_ON(tn != get_child(tp + cindex));
 
 	/* Double as long as the resulting node has a number of
 	 * nonempty nodes that are above the threshold.
@@ -947,7 +952,7 @@ static struct key_vector *resize(struct net *net, struct trie *t,
 		}
 
 		max_work--;
-		tn = get_child(tp, cindex);
+		tn = get_child(tp + cindex);
 	}
 
 	/* Return if at least one inflate is run */
@@ -967,7 +972,7 @@ static struct key_vector *resize(struct net *net, struct trie *t,
 		}
 
 		max_work--;
-		tn = get_child(tp, cindex);
+		tn = get_child(tp + cindex);
 	}
 
 	/* Only one child remains */
@@ -1021,7 +1026,7 @@ static struct key_vector *fib_find_node(struct trie *t,
 
 	do {
 		*tp = n;
-		n = get_child_rcu(n, index);
+		n = get_child_rcu(n + index);
 
 		if (!n)
 			break;
@@ -1098,7 +1103,7 @@ static struct fib_table *fib_insert_node(struct net *net, struct trie *t,
 		goto noleaf;
 
 	/* retrieve child from parent node */
-	n = get_child(tp, get_index(key, tp));
+	n = get_child(tp + get_index(key, tp));
 
 	/* Case 2: n is a LEAF or a TNODE and the key doesn't match.
 	 *
@@ -1346,7 +1351,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 	pn = t->kv;
 	cindex = 0;
 
-	n = get_child_rcu(pn, cindex);
+	n = get_child_rcu(pn);
 	if (!n)
 		return -EAGAIN;
 
@@ -1383,16 +1388,13 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 			cindex = index;
 		}
 
-		n = get_child_rcu(n, index);
+		n = get_child_rcu(n + index);
 		if (unlikely(!n))
 			goto backtrace;
 	}
 
 	/* Step 2: Sort out leaves and begin backtracing for longest prefix */
 	for (;;) {
-		/* record the pointer where our next node pointer is stored */
-		struct key_vector __rcu **cptr = n->tnode;
-
 		/* This test verifies that none of the bits that differ
 		 * between the key and the prefix exist in the region of
 		 * the lsb and higher in the prefix.
@@ -1409,7 +1411,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		 * we started this traversal anyway
 		 */
 
-		while ((n = rcu_dereference(*cptr)) == NULL) {
+		while ((n = get_child_rcu(n)) == NULL) {
 backtrace:
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 			if (!n)
@@ -1441,7 +1443,7 @@ backtrace:
 			cindex &= cindex - 1;
 
 			/* grab pointer for next child node */
-			cptr = &pn->tnode[cindex];
+			n = pn + cindex;
 		}
 	}
 
@@ -1616,7 +1618,7 @@ static struct key_vector *leaf_walk_rcu(struct key_vector **pn, t_key key)
 			break;
 
 		/* descend into the next child */
-		n = get_child_rcu(tn, cindex++);
+		n = get_child_rcu(tn + cindex++);
 		if (!n)
 			break;
 
@@ -1637,7 +1639,7 @@ static struct key_vector *leaf_walk_rcu(struct key_vector **pn, t_key key)
 		}
 
 		/* grab the next available node */
-		n = get_child_rcu(tn, cindex++);
+		n = get_child_rcu(tn + cindex++);
 		if (!n)
 			continue;
 
@@ -1688,7 +1690,7 @@ int fib_table_flush(struct net *net, struct fib_table *tb)
 		}
 
 		/* grab the next available node */
-		n = get_child(pn, cindex);
+		n = get_child(pn + cindex);
 		if (!n)
 			continue;
 
@@ -1827,7 +1829,8 @@ void __init fib_trie_init(void)
 					  0, SLAB_PANIC, NULL);
 
 	trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
-					   sizeof(struct tnode),
+					   sizeof(struct tnode) +
+					   sizeof(struct key_vector),
 					   0, SLAB_PANIC, NULL);
 }
 
@@ -1879,7 +1882,7 @@ static struct key_vector *fib_trie_get_next(struct fib_trie_iter *iter)
 
 	while (!IS_TRIE(pn)) {
 		while (cindex < child_length(pn)) {
-			struct key_vector *n = get_child_rcu(pn, cindex++);
+			struct key_vector *n = get_child_rcu(pn + cindex++);
 
 			if (!n)
 				continue;
@@ -1919,7 +1922,7 @@ static struct key_vector *fib_trie_get_first(struct fib_trie_iter *iter,
 	if (!t)
 		return NULL;
 
-	n = rcu_dereference(pn->tnode[0]);
+	n = rcu_dereference(pn->tnode);
 	if (!n)
 		return NULL;
 
@@ -2003,8 +2006,8 @@ static void trie_show_stats(struct seq_file *seq, struct trie_stat *stat)
 	seq_putc(seq, '\n');
 	seq_printf(seq, "\tPointers: %u\n", pointers);
 
-	bytes += sizeof(struct key_vector *) * pointers;
-	seq_printf(seq, "Null ptrs: %u\n", stat->nullpointers);
+	bytes += sizeof(struct key_vector) * pointers;
+	seq_printf(seq, "Empty vectors: %u\n", stat->nullpointers);
 	seq_printf(seq, "Total size: %u  kB\n", (bytes + 1023) / 1024);
 }
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 23/29] fib_trie: Add leaf_init
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (21 preceding siblings ...)
  2015-02-24 20:50 ` [RFC PATCH 22/29] fib_trie: Allocate tnode as array of key_vectors instead of key_vector as array of tnode pointers Alexander Duyck
@ 2015-02-24 20:50 ` Alexander Duyck
  2015-02-24 20:50 ` [RFC PATCH 24/29] fib_trie: Update tnode_new to drop use of put_child_root Alexander Duyck
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:50 UTC (permalink / raw
  To: netdev

Since the leaf will now need to configure an entry in the key_vector array
of the parent I am adding a new function called leaf_init.  It will
eventually replace leaf_new as we begin to pull fields out of the leaf
key_vector and push them into the parent.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index f9abbf4..65ea194 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -412,6 +412,31 @@ static inline void put_child_root(struct key_vector *tp, t_key key,
 		put_child(tp, get_index(key, tp), n);
 }
 
+static void leaf_init(struct key_vector *tn, t_key key, struct key_vector *l)
+{
+	/* link leaf to parent */
+	NODE_INIT_PARENT(l, tn);
+
+	/* update parent node stats */
+	if (!IS_TRIE(tn)) {
+		unsigned long i = get_index(key, tn);
+		struct key_vector *n = get_child(tn + i);
+
+		BUG_ON(i >= child_length(tn));
+
+		if (!n)
+			empty_child_dec(tn);
+		else if (tnode_full(tn, n))
+			tn_info(tn)->full_children--;
+
+		/* update offset to correct key_vector for update */
+		tn += i;
+	}
+
+	/* populate key vector */
+	rcu_assign_pointer(tn->tnode, l);
+}
+
 static struct key_vector *tnode_new(struct key_vector *pn, t_key key,
 				    int pos, int bits)
 {
@@ -1128,8 +1153,7 @@ static struct fib_table *fib_insert_node(struct net *net, struct trie *t,
 	}
 
 	/* Case 3: n is NULL, and will just insert a new leaf */
-	NODE_INIT_PARENT(l, n);
-	put_child_root(n, key, l);
+	leaf_init(n, key, l);
 
 	vector_replace(net, tp, tn);
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 24/29] fib_trie: Update tnode_new to drop use of put_child_root
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (22 preceding siblings ...)
  2015-02-24 20:50 ` [RFC PATCH 23/29] fib_trie: Add leaf_init Alexander Duyck
@ 2015-02-24 20:50 ` Alexander Duyck
  2015-02-24 20:50 ` [RFC PATCH 25/29] fib_trie: Add function for dropping children from trie Alexander Duyck
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:50 UTC (permalink / raw
  To: netdev

This change makes it so that the tnode_new function will now insert itself
into the parent.  This is to enable the eventual moving of fields from the
key_vector of the local node into the parent.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   47 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 35 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 65ea194..e91233b 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -437,40 +437,63 @@ static void leaf_init(struct key_vector *tn, t_key key, struct key_vector *l)
 	rcu_assign_pointer(tn->tnode, l);
 }
 
-static struct key_vector *tnode_new(struct key_vector *pn, t_key key,
+static struct key_vector *tnode_new(struct key_vector *tn, t_key key,
 				    int pos, int bits)
 {
 	size_t sz = TNODE_SIZE(1ul << bits);
 	struct tnode *tnode = tnode_alloc(sz);
-	struct key_vector *tn = tnode->kv;
 	unsigned int shift = pos + bits;
+	unsigned long i;
 
 	/* verify bits and pos their msb bits clear and values are valid */
 	BUG_ON(!bits || (shift > KEYLENGTH));
 
-	pr_debug("AT %p s=%zu %zu\n", tnode, TNODE_SIZE(0),
-		 sizeof(struct key_vector *) << bits);
+	pr_debug("AT %p s=%zu %zu\n", tnode, TNODE_SIZE(0), sz - TNODE_SIZE(0));
 
 	if (!tnode)
 		return NULL;
 
+	/* link tnode to parent */
+	NODE_INIT_PARENT(tnode->kv, tn);
+
+	/* update parent node stats */
+	if (!IS_TRIE(tn)) {
+		unsigned long idx = get_index(key, tn);
+		struct key_vector *n = get_child(tn + idx);
+
+		BUG_ON(idx >= child_length(tn));
+
+		if (!n)
+			empty_child_dec(tn);
+		else if (tnode_full(tn, n))
+			tn_info(tn)->full_children--;
+		if ((pos + bits) == tn->pos)
+			tn_info(tn)->full_children++;
+
+		/* update offset to correct key_vector for update */
+		tn += idx;
+	}
+
 	/* populate tn_info section */
+	key = (shift < KEYLENGTH) ? (key >> shift) << shift : 0;
 	if (bits == KEYLENGTH)
 		tnode->full_children = 1;
 	else
 		tnode->empty_children = 1ul << bits;
 
+	/* populate keys as though we are full of leaves */
+	for (i = (1ul << bits); i--;)
+		tnode->kv[i].key = key + (i << pos);
+
 	/* populate key vector */
-	tn->key = (shift < KEYLENGTH) ? (key >> shift) << shift : 0;
-	tn->pos = pos;
-	tn->bits = bits;
-	tn->slen = pos;
+	tnode->kv[0].pos = pos;
+	tnode->kv[0].bits = bits;
+	tnode->kv[0].slen = pos;
 
 	/* link parent to node */
-	NODE_INIT_PARENT(tn, pn);
-	put_child_root(pn, tn->key, tn);
+	rcu_assign_pointer(tn->tnode, tnode->kv);
 
-	return tn;
+	return tnode->kv;
 }
 
 static struct key_vector *tnode_clone(struct tnode *oldtnode)
@@ -2256,7 +2279,7 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
 		fib_table_print(seq, iter->tb);
 
 	if (IS_TNODE(n)) {
-		__be32 prf = htonl(n->key);
+		__be32 prf = htonl((n->key >> n->pos) << n->pos);
 
 		seq_indent(seq, iter->depth-1);
 		seq_printf(seq, "  +-- %pI4/%zu %u %u %u\n",

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 25/29] fib_trie: Add function for dropping children from trie
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (23 preceding siblings ...)
  2015-02-24 20:50 ` [RFC PATCH 24/29] fib_trie: Update tnode_new to drop use of put_child_root Alexander Duyck
@ 2015-02-24 20:50 ` Alexander Duyck
  2015-02-24 20:50 ` [RFC PATCH 26/29] fib_trie: Use put_child to only copy key_vectors instead of pointers Alexander Duyck
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:50 UTC (permalink / raw
  To: netdev

The freeing of tnodes and leaves becomes a special case when the
key_vectors for those nodes have been pushed up to the parent.  The
handling is pretty strait forward.  A removed node will have its' pointer
set to NULL and will require suffix length to be reset to pos when the
suffix and pos values have been moved into the parent key_vector array.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   41 ++++++++++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index e91233b..a76cc6d 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -346,6 +346,27 @@ static inline int tnode_full(struct key_vector *tn, struct key_vector *n)
 	return n && ((n->pos + n->bits) == tn->pos) && IS_TNODE(n);
 }
 
+static void drop_child(struct key_vector *tn, t_key key)
+{
+	/* update parent tnode statistics */
+	if (!IS_TRIE(tn)) {
+		unsigned long i = get_index(key, tn);
+		struct key_vector *n = get_child(tn + i);
+
+		if (n) {
+			empty_child_inc(tn);
+			if (tnode_full(tn, n))
+				tn_info(tn)->full_children--;
+		}
+
+		/* update offset to correct key_vector for update */
+		tn += i;
+	}
+
+	/* clear tnode pointers */
+	RCU_INIT_POINTER(tn->tnode, NULL);
+}
+
 /* Add a child at position i overwriting the old value.
  * Update the value of full_children and empty_children.
  */
@@ -403,15 +424,6 @@ static void update_children(struct key_vector *tn)
 	}
 }
 
-static inline void put_child_root(struct key_vector *tp, t_key key,
-				  struct key_vector *n)
-{
-	if (IS_TRIE(tp))
-		rcu_assign_pointer(tp->tnode, n);
-	else
-		put_child(tp, get_index(key, tp), n);
-}
-
 static void leaf_init(struct key_vector *tn, t_key key, struct key_vector *l)
 {
 	/* link leaf to parent */
@@ -815,7 +827,10 @@ static struct key_vector *collapse(struct net *net, struct trie *t,
 			return node_parent(oldtnode);
 
 		/* compress one level */
-		put_child_root(pn, oldtnode->key, n);
+		if (IS_TRIE(pn))
+			rcu_assign_pointer(pn->tnode, n);
+		else
+			put_child(pn, get_index(oldtnode->key, pn), n);
 
 		/* drop dead node */
 		vector_replace(net, node_parent(oldtnode), pn);
@@ -826,7 +841,7 @@ static struct key_vector *collapse(struct net *net, struct trie *t,
 	}
 
 	/* no children, just update pointer to NULL */
-	put_child_root(pn, oldtnode->key, NULL);
+	drop_child(pn, oldtnode->key);
 	node_free(oldtnode);
 
 	return pn;
@@ -1565,7 +1580,7 @@ static void fib_remove_alias(struct net *net, struct trie *t,
 	 * out parent suffix lengths as a part of trie_rebalance
 	 */
 	if (hlist_empty(&l->leaf)) {
-		put_child_root(tp, l->key, NULL);
+		drop_child(tp, l->key);
 		node_free(l);
 		trie_rebalance(net, t, tp);
 		return;
@@ -1768,7 +1783,7 @@ int fib_table_flush(struct net *net, struct fib_table *tb)
 		n->slen = slen;
 
 		if (hlist_empty(&n->leaf)) {
-			put_child_root(pn, n->key, NULL);
+			drop_child(pn, n->key);
 			node_free(n);
 		} else {
 			leaf_pull_suffix(pn, n);

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 26/29] fib_trie: Use put_child to only copy key_vectors instead of pointers
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (24 preceding siblings ...)
  2015-02-24 20:50 ` [RFC PATCH 25/29] fib_trie: Add function for dropping children from trie Alexander Duyck
@ 2015-02-24 20:50 ` Alexander Duyck
  2015-02-24 20:50 ` [RFC PATCH 27/29] fib_trie: Move key and pos into key_vector from tnode Alexander Duyck
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:50 UTC (permalink / raw
  To: netdev

This change prepares put_child to copy key_vectors from one tnode to
another.  The only consumers for this are insert, inflate, halve, and
collapse.  The general idea is to make sure the empty/full statistics for
the parent node remain correct when we copy a key_vector from one node to
another.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |  102 +++++++++++++++++++++++++++------------------------
 1 file changed, 53 insertions(+), 49 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index a76cc6d..4a807fc 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -373,33 +373,34 @@ static void drop_child(struct key_vector *tn, t_key key)
 static void put_child(struct key_vector *tn, unsigned long i,
 		      struct key_vector *n)
 {
-	struct key_vector *chi = get_child(tn + i);
-	int isfull, wasfull;
+	struct key_vector *tnode = get_child(n);
 
-	BUG_ON(i >= child_length(tn));
-
-	/* update emptyChildren, overflow into fullChildren */
-	if (n == NULL && chi != NULL)
-		empty_child_inc(tn);
-	if (n != NULL && chi == NULL)
-		empty_child_dec(tn);
+	if (!IS_TRIE(tn)) {
+		struct key_vector *chi = get_child(tn + i);
 
-	/* update fullChildren */
-	wasfull = tnode_full(tn, chi);
-	isfull = tnode_full(tn, n);
+		BUG_ON(i >= child_length(tn));
 
-	if (wasfull && !isfull)
-		tn_info(tn)->full_children--;
-	else if (!wasfull && isfull)
-		tn_info(tn)->full_children++;
+		/* update emptyChildren and fullChildren */
+		if (chi) {
+			empty_child_inc(tn);
+			if (tnode_full(tn, chi))
+				tn_info(tn)->full_children--;
+		}
+		if (tnode) {
+			empty_child_dec(tn);
+			if (tnode_full(tn, tnode))
+				tn_info(tn)->full_children++;
 
-	if (n && (tn->slen < n->slen))
-		tn->slen = n->slen;
+			/* update suffix length */
+			if (tn->slen < tnode->slen)
+				tn->slen = tnode->slen;
+		}
 
-	/* update offset to correct key_vector for update */
-	tn += i;
+		/* update offset to correct key_vector for update */
+		tn += i;
+	}
 
-	rcu_assign_pointer(tn->tnode, n);
+	rcu_assign_pointer(tn->tnode, tnode);
 }
 
 static void update_children(struct key_vector *tn)
@@ -676,31 +677,32 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 	 * nodes.
 	 */
 	for (i = child_length(oldtnode), m = 1u << tn->pos; i;) {
-		struct key_vector *inode = get_child(oldtnode + --i);
+		struct key_vector *inode = oldtnode + --i;
+		struct key_vector *tnode = get_child(inode);
 		struct key_vector *node0, *node1;
 		unsigned long j, k;
 
 		/* An empty child */
-		if (inode == NULL)
+		if (!tnode)
 			continue;
 
 		/* A leaf or an internal node with skipped bits */
-		if (!tnode_full(oldtnode, inode)) {
-			put_child(tn, get_index(inode->key, tn), inode);
+		if (!tnode_full(oldtnode, tnode)) {
+			put_child(tn, get_index(tnode->key, tn), inode);
 			continue;
 		}
 
 		/* drop the node in the old tnode free list */
-		tnode_free_append(oldtnode, inode);
+		tnode_free_append(oldtnode, tnode);
 
 		/* An internal node with two children */
-		if (inode->bits == 1) {
-			put_child(tn, 2 * i + 1, get_child(inode + 1));
-			put_child(tn, 2 * i, get_child(inode));
+		if (tnode->bits == 1) {
+			put_child(tn, 2 * i + 1, tnode + 1);
+			put_child(tn, 2 * i, tnode);
 			continue;
 		}
 
-		/* We will replace this node 'inode' with two new
+		/* We will replace this node 'tnode' with two new
 		 * ones, 'node0' and 'node1', each with half of the
 		 * original children. The two new nodes will have
 		 * a position one bit further down the key and this
@@ -714,12 +716,12 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 		 * node0 and node1. So... we synthesize that bit in the
 		 * two new keys.
 		 */
-		node1 = tnode_new(tn, inode->key | m,
-				  inode->pos, inode->bits - 1);
+		node1 = tnode_new(tn, tnode->key | m,
+				  tnode->pos, tnode->bits - 1);
 		if (!node1)
 			goto nomem;
-		node0 = tnode_new(tn, inode->key,
-				  inode->pos, inode->bits - 1);
+		node0 = tnode_new(tn, tnode->key,
+				  tnode->pos, tnode->bits - 1);
 
 		tnode_free_append(tn, node1);
 		if (!node0)
@@ -727,11 +729,11 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 		tnode_free_append(tn, node0);
 
 		/* populate child pointers in new nodes */
-		for (k = child_length(inode), j = k / 2; j;) {
-			put_child(node1, --j, get_child(inode + --k));
-			put_child(node0, j, get_child(inode + j));
-			put_child(node1, --j, get_child(inode + --k));
-			put_child(node0, j, get_child(inode + j));
+		for (k = child_length(tnode), j = k / 2; j;) {
+			put_child(node1, --j, tnode + --k);
+			put_child(node0, j, tnode + j);
+			put_child(node1, --j, tnode + --k);
+			put_child(node0, j, tnode + j);
 		}
 	}
 
@@ -773,18 +775,23 @@ static struct key_vector *halve(struct net *net, struct trie *t,
 	 * nodes.
 	 */
 	for (i = child_length(oldtnode); i;) {
-		struct key_vector *node1 = get_child(oldtnode + --i);
-		struct key_vector *node0 = get_child(oldtnode + --i);
+		struct key_vector *node1 = oldtnode + --i;
+		struct key_vector *node0 = oldtnode + --i;
+		struct key_vector *tnode = get_child(node0);
 		struct key_vector *inode;
 
 		/* At least one of the children is empty */
-		if (!node1 || !node0) {
-			put_child(tn, i / 2, node1 ? : node0);
+		if (!get_child(node1)) {
+			put_child(tn, i / 2, node0);
+			continue;
+		}
+		if (!get_child(node0)) {
+			put_child(tn, i / 2, node1);
 			continue;
 		}
 
 		/* Two nonempty children */
-		inode = tnode_new(tn, node0->key, oldtnode->pos, 1);
+		inode = tnode_new(tn, tnode->key, oldtnode->pos, 1);
 		if (!inode)
 			goto nomem;
 		tnode_free_append(tn, inode);
@@ -827,10 +834,7 @@ static struct key_vector *collapse(struct net *net, struct trie *t,
 			return node_parent(oldtnode);
 
 		/* compress one level */
-		if (IS_TRIE(pn))
-			rcu_assign_pointer(pn->tnode, n);
-		else
-			put_child(pn, get_index(oldtnode->key, pn), n);
+		put_child(pn, get_index(oldtnode->key, pn), oldtnode + i);
 
 		/* drop dead node */
 		vector_replace(net, node_parent(oldtnode), pn);
@@ -1180,7 +1184,7 @@ static struct fib_table *fib_insert_node(struct net *net, struct trie *t,
 			goto notnode;
 
 		/* initialize routes out of node */
-		put_child(tn, get_index(key, tn) ^ 1, n);
+		put_child(tn, get_index(key, tn) ^ 1, tp + get_index(key, tp));
 
 		/* pop back out to bring tn to the same level as tp */
 		n = tn;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 27/29] fib_trie: Move key and pos into key_vector from tnode
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (25 preceding siblings ...)
  2015-02-24 20:50 ` [RFC PATCH 26/29] fib_trie: Use put_child to only copy key_vectors instead of pointers Alexander Duyck
@ 2015-02-24 20:50 ` Alexander Duyck
  2015-02-24 20:51 ` [RFC PATCH 28/29] fib_trie: Move slen from tnode to key vector Alexander Duyck
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:50 UTC (permalink / raw
  To: netdev

This change mmoves the pos and key values from the tnodes and leaves up one
level to their parent vectors.  We also add tn_pos as a means of
back-tracing back up through the parent node based on the key.

We do not need a copy of the key from the parent node as the key in child 0
will always be a match for the parent key as long as the bits below tn_pos
are ignored.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |  124 +++++++++++++++++++++++++++++----------------------
 1 file changed, 70 insertions(+), 54 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 4a807fc..4d65f73 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -88,7 +88,7 @@
 
 typedef unsigned int t_key;
 
-#define IS_TRIE(n)	((n)->pos >= KEYLENGTH)
+#define IS_TRIE(n)	((n)->tn_pos >= KEYLENGTH)
 #define IS_TNODE(n)	((n)->bits)
 #define IS_LEAF(n)	(!(n)->bits)
 
@@ -103,6 +103,7 @@ struct key_vector {
 	unsigned char pos;		/* 2log(KEYLENGTH) bits needed */
 	unsigned char bits;		/* 2log(KEYLENGTH) bits needed */
 	unsigned char slen;
+	unsigned char tn_pos;		/* used to store tnode key info */
 };
 
 struct tnode {
@@ -200,10 +201,10 @@ static inline unsigned long get_index(t_key key, struct key_vector *kv)
 {
 	unsigned long index = key ^ kv->key;
 
-	if ((BITS_PER_LONG <= KEYLENGTH) && (KEYLENGTH == kv->pos))
+	if ((BITS_PER_LONG <= KEYLENGTH) && (KEYLENGTH == kv->tn_pos))
 		return 0;
 
-	return index >> kv->pos;
+	return index >> kv->tn_pos;
 }
 
 /* To understand this stuff, an understanding of keys and all their bits is
@@ -330,6 +331,7 @@ static struct key_vector *leaf_new(t_key key, struct fib_alias *fa)
 	l->pos = 0;
 	l->bits = 0;
 	l->slen = fa->fa_slen;
+	l->tn_pos = 0;
 
 	/* link leaf to fib alias */
 	INIT_HLIST_HEAD(&l->leaf);
@@ -343,7 +345,7 @@ static struct key_vector *leaf_new(t_key key, struct fib_alias *fa)
  */
 static inline int tnode_full(struct key_vector *tn, struct key_vector *n)
 {
-	return n && ((n->pos + n->bits) == tn->pos) && IS_TNODE(n);
+	return n && IS_TNODE(n) && ((n->tn_pos + n->bits) == tn->tn_pos);
 }
 
 static void drop_child(struct key_vector *tn, t_key key)
@@ -400,6 +402,9 @@ static void put_child(struct key_vector *tn, unsigned long i,
 		tn += i;
 	}
 
+	tn->key = n->key;
+	tn->pos = n->pos;
+
 	rcu_assign_pointer(tn->tnode, tnode);
 }
 
@@ -447,6 +452,9 @@ static void leaf_init(struct key_vector *tn, t_key key, struct key_vector *l)
 	}
 
 	/* populate key vector */
+	tn->key = key;
+	tn->pos = 0;
+
 	rcu_assign_pointer(tn->tnode, l);
 }
 
@@ -480,7 +488,7 @@ static struct key_vector *tnode_new(struct key_vector *tn, t_key key,
 			empty_child_dec(tn);
 		else if (tnode_full(tn, n))
 			tn_info(tn)->full_children--;
-		if ((pos + bits) == tn->pos)
+		if ((pos + bits) == tn->tn_pos)
 			tn_info(tn)->full_children++;
 
 		/* update offset to correct key_vector for update */
@@ -493,17 +501,18 @@ static struct key_vector *tnode_new(struct key_vector *tn, t_key key,
 		tnode->full_children = 1;
 	else
 		tnode->empty_children = 1ul << bits;
+	tnode->kv[0].tn_pos = pos;
+	tnode->kv[0].bits = bits;
+	tnode->kv[0].slen = pos;
 
 	/* populate keys as though we are full of leaves */
 	for (i = (1ul << bits); i--;)
 		tnode->kv[i].key = key + (i << pos);
 
 	/* populate key vector */
-	tnode->kv[0].pos = pos;
-	tnode->kv[0].bits = bits;
-	tnode->kv[0].slen = pos;
+	tn->key = key;
+	tn->pos = pos;
 
-	/* link parent to node */
 	rcu_assign_pointer(tn->tnode, tnode->kv);
 
 	return tnode->kv;
@@ -664,7 +673,7 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 		return NULL;
 
 	tn = tnode_new(pn, oldtnode->key,
-		       oldtnode->pos - 1, oldtnode->bits + 1);
+		       oldtnode->tn_pos - 1, oldtnode->bits + 1);
 	if (!tn)
 		goto notnode;
 
@@ -676,7 +685,7 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 	 * point to existing tnodes and the links between our allocated
 	 * nodes.
 	 */
-	for (i = child_length(oldtnode), m = 1u << tn->pos; i;) {
+	for (i = child_length(oldtnode), m = 1u << tn->tn_pos; i;) {
 		struct key_vector *inode = oldtnode + --i;
 		struct key_vector *tnode = get_child(inode);
 		struct key_vector *node0, *node1;
@@ -688,7 +697,7 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 
 		/* A leaf or an internal node with skipped bits */
 		if (!tnode_full(oldtnode, tnode)) {
-			put_child(tn, get_index(tnode->key, tn), inode);
+			put_child(tn, get_index(inode->key, tn), inode);
 			continue;
 		}
 
@@ -716,12 +725,12 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 		 * node0 and node1. So... we synthesize that bit in the
 		 * two new keys.
 		 */
-		node1 = tnode_new(tn, tnode->key | m,
-				  tnode->pos, tnode->bits - 1);
+		node1 = tnode_new(tn, inode->key | m,
+				  inode->pos, tnode->bits - 1);
 		if (!node1)
 			goto nomem;
-		node0 = tnode_new(tn, tnode->key,
-				  tnode->pos, tnode->bits - 1);
+		node0 = tnode_new(tn, inode->key,
+				  inode->pos, tnode->bits - 1);
 
 		tnode_free_append(tn, node1);
 		if (!node0)
@@ -765,7 +774,7 @@ static struct key_vector *halve(struct net *net, struct trie *t,
 		return NULL;
 
 	tn = tnode_new(pn, oldtnode->key,
-		       oldtnode->pos + 1, oldtnode->bits - 1);
+		       oldtnode->tn_pos + 1, oldtnode->bits - 1);
 	if (!tn)
 		goto notnode;
 
@@ -777,7 +786,6 @@ static struct key_vector *halve(struct net *net, struct trie *t,
 	for (i = child_length(oldtnode); i;) {
 		struct key_vector *node1 = oldtnode + --i;
 		struct key_vector *node0 = oldtnode + --i;
-		struct key_vector *tnode = get_child(node0);
 		struct key_vector *inode;
 
 		/* At least one of the children is empty */
@@ -791,7 +799,7 @@ static struct key_vector *halve(struct net *net, struct trie *t,
 		}
 
 		/* Two nonempty children */
-		inode = tnode_new(tn, tnode->key, oldtnode->pos, 1);
+		inode = tnode_new(tn, node0->key, oldtnode->tn_pos, 1);
 		if (!inode)
 			goto nomem;
 		tnode_free_append(tn, inode);
@@ -853,13 +861,17 @@ static struct key_vector *collapse(struct net *net, struct trie *t,
 
 static unsigned char update_suffix(struct key_vector *tn)
 {
-	unsigned char slen = tn->pos;
+	unsigned char slen = tn->tn_pos;
 	unsigned long stride, i;
 
+	/* simply bail out if there is nothing to do */
+	if (tn->slen == slen)
+		return 0;
+
 	/* search though the list of children looking for nodes that might
 	 * have a suffix greater than the one we currently have.  This is
 	 * why we start with a stride of 2 since a stride of 1 would
-	 * represent the nodes with suffix length equal to tn->pos
+	 * represent the nodes with suffix length equal to tn->tn_pos
 	 */
 	for (i = 0, stride = 0x2ul ; i < child_length(tn); i += stride) {
 		struct key_vector *n = get_child(tn + i);
@@ -877,11 +889,14 @@ static unsigned char update_suffix(struct key_vector *tn)
 		 * 0 and 1 << (bits - 1) could have that as their suffix
 		 * length.
 		 */
-		if ((slen + 1) >= (tn->pos + tn->bits))
+		if ((slen + 1) >= (tn->tn_pos + tn->bits))
 			break;
 	}
 
-	tn->slen = slen;
+	if (tn->slen < slen)
+		tn->slen = slen;
+	else
+		slen = 0;
 
 	return slen;
 }
@@ -955,7 +970,7 @@ static inline bool should_inflate(struct key_vector *tp, struct key_vector *tn)
 
 	/* if bits == KEYLENGTH then pos = 0, and will fail below */
 
-	return (used > 1) && tn->pos && ((50 * used) >= threshold);
+	return (used > 1) && tn->tn_pos && ((50 * used) >= threshold);
 }
 
 static inline bool should_halve(struct key_vector *tp, struct key_vector *tn)
@@ -1054,31 +1069,26 @@ static struct key_vector *resize(struct net *net, struct trie *t,
 		return tp;
 
 	/* push the suffix length to the parent node */
-	if (tn->slen > tn->pos) {
-		unsigned char slen = update_suffix(tn);
-
-		if (slen > tp->slen)
-			tp->slen = slen;
-	}
+	update_suffix(tn);
 
 	return tp;
 }
 
 static void leaf_pull_suffix(struct key_vector *tp, struct key_vector *l)
 {
-	while ((tp->slen > tp->pos) && (tp->slen > l->slen)) {
-		if (update_suffix(tp) > l->slen)
+	while (!IS_TRIE(tp) && tp->slen > l->slen) {
+		/* if the suffix doesn't change then we are done */
+		if (update_suffix(tp))
 			break;
+
 		tp = node_parent(tp);
 	}
 }
 
 static void leaf_push_suffix(struct key_vector *tn, struct key_vector *l)
 {
-	/* if this is a new leaf then tn will be NULL and we can sort
-	 * out parent suffix lengths as a part of trie_rebalance
-	 */
-	while (tn->slen < l->slen) {
+	/* work our way back up the trie sorting out slen in the key vectors */
+	while (!IS_TRIE(tn) && (tn->slen < l->slen)) {
 		tn->slen = l->slen;
 		tn = node_parent(tn);
 	}
@@ -1093,13 +1103,13 @@ static struct key_vector *fib_find_node(struct trie *t,
 
 	do {
 		*tp = n;
-		n = get_child_rcu(n + index);
+		n += index;
 
+		index = get_cindex(key, n);
+		n = get_child_rcu(n);
 		if (!n)
 			break;
 
-		index = get_cindex(key, n);
-
 		/* This bit of code is a bit tricky but it combines multiple
 		 * checks into a single check.  The prefix consists of the
 		 * prefix plus zeros for the bits in the cindex. The index
@@ -1170,7 +1180,7 @@ static struct fib_table *fib_insert_node(struct net *net, struct trie *t,
 		goto noleaf;
 
 	/* retrieve child from parent node */
-	n = get_child(tp + get_index(key, tp));
+	n = tp + get_index(key, tp);
 
 	/* Case 2: n is a LEAF or a TNODE and the key doesn't match.
 	 *
@@ -1178,13 +1188,13 @@ static struct fib_table *fib_insert_node(struct net *net, struct trie *t,
 	 *  first tnode need some special handling
 	 *  leaves us in position for handling as case 3
 	 */
-	if (n) {
+	if (get_child(n)) {
 		tn = tnode_new(tn, key, __fls(key ^ n->key), 1);
 		if (!tn)
 			goto notnode;
 
 		/* initialize routes out of node */
-		put_child(tn, get_index(key, tn) ^ 1, tp + get_index(key, tp));
+		put_child(tn, get_index(key, tn) ^ 1, n);
 
 		/* pop back out to bring tn to the same level as tp */
 		n = tn;
@@ -1410,14 +1420,15 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 	struct trie_use_stats __percpu *stats = t->stats;
 #endif
 	const t_key key = ntohl(flp->daddr);
-	struct key_vector *n, *pn;
+	struct key_vector *n, *pn, *tn;
+	unsigned long cindex;
 	struct fib_alias *fa;
-	t_key cindex;
 
 	pn = t->kv;
 	cindex = 0;
 
-	n = get_child_rcu(pn);
+	tn = pn;
+	n = get_child_rcu(tn);
 	if (!n)
 		return -EAGAIN;
 
@@ -1427,7 +1438,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 
 	/* Step 1: Travel to the longest prefix match in the trie */
 	for (;;) {
-		unsigned long index = get_cindex(key, n);
+		unsigned long index = get_cindex(key, tn);
 
 		/* This bit of code is a bit tricky but it combines multiple
 		 * checks into a single check.  The prefix consists of the
@@ -1449,12 +1460,15 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		/* only record pn and cindex if we are going to be chopping
 		 * bits later.  Otherwise we are just wasting cycles.
 		 */
-		if (n->slen > n->pos) {
+		if (n->slen > tn->pos) {
 			pn = n;
 			cindex = index;
 		}
 
-		n = get_child_rcu(n + index);
+		tn = n + index;
+
+		/* verify there is a tnode to go with the key vector */
+		n = get_child_rcu(tn);
 		if (unlikely(!n))
 			goto backtrace;
 	}
@@ -1465,7 +1479,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		 * between the key and the prefix exist in the region of
 		 * the lsb and higher in the prefix.
 		 */
-		if (unlikely(prefix_mismatch(key, n)) || (n->slen == n->pos))
+		if (unlikely(prefix_mismatch(key, tn)) || (n->slen <= tn->pos))
 			goto backtrace;
 
 		/* exit out and process leaf */
@@ -1477,7 +1491,9 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		 * we started this traversal anyway
 		 */
 
-		while ((n = get_child_rcu(n)) == NULL) {
+		tn = n;
+
+		while ((n = get_child_rcu(tn)) == NULL) {
 backtrace:
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 			if (!n)
@@ -1509,7 +1525,7 @@ backtrace:
 			cindex &= cindex - 1;
 
 			/* grab pointer for next child node */
-			n = pn + cindex;
+			tn = pn + cindex;
 		}
 	}
 
@@ -1914,7 +1930,7 @@ struct fib_table *fib_trie_table(u32 id)
 	tb->tb_num_default = 0;
 
 	t = (struct trie *) tb->tb_data;
-	t->kv[0].pos = KEYLENGTH;
+	t->kv[0].tn_pos = KEYLENGTH;
 	t->kv[0].slen = KEYLENGTH;
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	t->stats = alloc_percpu(struct trie_use_stats);
@@ -2298,11 +2314,11 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
 		fib_table_print(seq, iter->tb);
 
 	if (IS_TNODE(n)) {
-		__be32 prf = htonl((n->key >> n->pos) << n->pos);
+		__be32 prf = htonl((n->key >> n->tn_pos) << n->tn_pos);
 
 		seq_indent(seq, iter->depth-1);
 		seq_printf(seq, "  +-- %pI4/%zu %u %u %u\n",
-			   &prf, KEYLENGTH - n->pos - n->bits, n->bits,
+			   &prf, KEYLENGTH - n->tn_pos - n->bits, n->bits,
 			   tn_info(n)->full_children,
 			   tn_info(n)->empty_children);
 	} else {

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 28/29] fib_trie: Move slen from tnode to key vector
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (26 preceding siblings ...)
  2015-02-24 20:50 ` [RFC PATCH 27/29] fib_trie: Move key and pos into key_vector from tnode Alexander Duyck
@ 2015-02-24 20:51 ` Alexander Duyck
  2015-02-24 20:51 ` [RFC PATCH 29/29] fib_trie: Push bits up one level, and move leaves up into parent key_vector array Alexander Duyck
  2015-02-25  3:53 ` [RFC PATCH 00/29] Phase 2 of fib_trie updates David Miller
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:51 UTC (permalink / raw
  To: netdev

This change pushes the suffix length up one level.  An added advantage to
pushing the suffix length up on level is that we now only have to check the
values contained in the local tnode instead of having to peek into the
pointer for each node.

I have also added a function called get_vector.  It is meant to find the
key_vector for a given tnode within the parent array for that tnode.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |   61 ++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 48 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 4d65f73..0953247 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -207,6 +207,13 @@ static inline unsigned long get_index(t_key key, struct key_vector *kv)
 	return index >> kv->tn_pos;
 }
 
+static inline struct key_vector *get_vector(struct key_vector *tn)
+{
+	struct key_vector *kv = node_parent(tn);
+
+	return kv + get_index(tn->key, kv);
+}
+
 /* To understand this stuff, an understanding of keys and all their bits is
  * necessary. Every node in the trie has a key associated with it, but not
  * all of the bits in that key are significant.
@@ -389,13 +396,15 @@ static void put_child(struct key_vector *tn, unsigned long i,
 				tn_info(tn)->full_children--;
 		}
 		if (tnode) {
+			struct key_vector *kv = get_vector(tn);
+
 			empty_child_dec(tn);
 			if (tnode_full(tn, tnode))
 				tn_info(tn)->full_children++;
 
 			/* update suffix length */
-			if (tn->slen < tnode->slen)
-				tn->slen = tnode->slen;
+			if (kv->slen < n->slen)
+				kv->slen = n->slen;
 		}
 
 		/* update offset to correct key_vector for update */
@@ -404,6 +413,7 @@ static void put_child(struct key_vector *tn, unsigned long i,
 
 	tn->key = n->key;
 	tn->pos = n->pos;
+	tn->slen = n->slen;
 
 	rcu_assign_pointer(tn->tnode, tnode);
 }
@@ -437,6 +447,7 @@ static void leaf_init(struct key_vector *tn, t_key key, struct key_vector *l)
 
 	/* update parent node stats */
 	if (!IS_TRIE(tn)) {
+		struct key_vector *kv = get_vector(tn);
 		unsigned long i = get_index(key, tn);
 		struct key_vector *n = get_child(tn + i);
 
@@ -447,6 +458,10 @@ static void leaf_init(struct key_vector *tn, t_key key, struct key_vector *l)
 		else if (tnode_full(tn, n))
 			tn_info(tn)->full_children--;
 
+		/* update suffix length */
+		if (kv->slen < l->slen)
+			kv->slen = l->slen;
+
 		/* update offset to correct key_vector for update */
 		tn += i;
 	}
@@ -454,6 +469,7 @@ static void leaf_init(struct key_vector *tn, t_key key, struct key_vector *l)
 	/* populate key vector */
 	tn->key = key;
 	tn->pos = 0;
+	tn->slen = l->slen;
 
 	rcu_assign_pointer(tn->tnode, l);
 }
@@ -503,7 +519,6 @@ static struct key_vector *tnode_new(struct key_vector *tn, t_key key,
 		tnode->empty_children = 1ul << bits;
 	tnode->kv[0].tn_pos = pos;
 	tnode->kv[0].bits = bits;
-	tnode->kv[0].slen = pos;
 
 	/* populate keys as though we are full of leaves */
 	for (i = (1ul << bits); i--;)
@@ -512,6 +527,7 @@ static struct key_vector *tnode_new(struct key_vector *tn, t_key key,
 	/* populate key vector */
 	tn->key = key;
 	tn->pos = pos;
+	tn->slen = pos;
 
 	rcu_assign_pointer(tn->tnode, tnode->kv);
 
@@ -862,8 +878,12 @@ static struct key_vector *collapse(struct net *net, struct trie *t,
 static unsigned char update_suffix(struct key_vector *tn)
 {
 	unsigned char slen = tn->tn_pos;
+	struct key_vector *n, *tp = tn;
 	unsigned long stride, i;
 
+	/* move tn from the tnode, up to the tnode pointer */
+	tn = get_vector(tn);
+
 	/* simply bail out if there is nothing to do */
 	if (tn->slen == slen)
 		return 0;
@@ -873,10 +893,10 @@ static unsigned char update_suffix(struct key_vector *tn)
 	 * why we start with a stride of 2 since a stride of 1 would
 	 * represent the nodes with suffix length equal to tn->tn_pos
 	 */
-	for (i = 0, stride = 0x2ul ; i < child_length(tn); i += stride) {
-		struct key_vector *n = get_child(tn + i);
+	for (i = 0, stride = 0x2ul ; i < child_length(tp); i += stride) {
+		n = tp + i;
 
-		if (!n || (n->slen <= slen))
+		if (!get_child(n) || (n->slen <= slen))
 			continue;
 
 		/* update stride and slen based on new value */
@@ -889,7 +909,7 @@ static unsigned char update_suffix(struct key_vector *tn)
 		 * 0 and 1 << (bits - 1) could have that as their suffix
 		 * length.
 		 */
-		if ((slen + 1) >= (tn->tn_pos + tn->bits))
+		if ((slen + 1) >= (tn->pos + tp->bits))
 			break;
 	}
 
@@ -1076,7 +1096,13 @@ static struct key_vector *resize(struct net *net, struct trie *t,
 
 static void leaf_pull_suffix(struct key_vector *tp, struct key_vector *l)
 {
-	while (!IS_TRIE(tp) && tp->slen > l->slen) {
+	struct key_vector *n = tp + get_index(l->key, tp);
+
+	/* update our local vector first */
+	n->slen = l->slen;
+
+	/* work our way back up the trie sorting out slen in the key vectors */
+	while (!IS_TRIE(tp)) {
 		/* if the suffix doesn't change then we are done */
 		if (update_suffix(tp))
 			break;
@@ -1087,9 +1113,19 @@ static void leaf_pull_suffix(struct key_vector *tp, struct key_vector *l)
 
 static void leaf_push_suffix(struct key_vector *tn, struct key_vector *l)
 {
+	struct key_vector *n = tn + get_index(l->key, tn);
+
+	/* update our local vector first */
+	n->slen = l->slen;
+
 	/* work our way back up the trie sorting out slen in the key vectors */
-	while (!IS_TRIE(tn) && (tn->slen < l->slen)) {
-		tn->slen = l->slen;
+	while (!IS_TRIE(tn)) {
+		n = get_vector(tn);
+
+		/* if the suffix doesn't change then we are done */
+		if (n->slen < l->slen)
+			n->slen = l->slen;
+
 		tn = node_parent(tn);
 	}
 }
@@ -1460,7 +1496,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		/* only record pn and cindex if we are going to be chopping
 		 * bits later.  Otherwise we are just wasting cycles.
 		 */
-		if (n->slen > tn->pos) {
+		if (tn->slen > tn->pos) {
 			pn = n;
 			cindex = index;
 		}
@@ -1479,7 +1515,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		 * between the key and the prefix exist in the region of
 		 * the lsb and higher in the prefix.
 		 */
-		if (unlikely(prefix_mismatch(key, tn)) || (n->slen <= tn->pos))
+		if (unlikely(prefix_mismatch(key, tn)) || (tn->slen <= tn->pos))
 			goto backtrace;
 
 		/* exit out and process leaf */
@@ -1931,7 +1967,6 @@ struct fib_table *fib_trie_table(u32 id)
 
 	t = (struct trie *) tb->tb_data;
 	t->kv[0].tn_pos = KEYLENGTH;
-	t->kv[0].slen = KEYLENGTH;
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	t->stats = alloc_percpu(struct trie_use_stats);
 	if (!t->stats) {

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 29/29] fib_trie: Push bits up one level, and move leaves up into parent key_vector array
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (27 preceding siblings ...)
  2015-02-24 20:51 ` [RFC PATCH 28/29] fib_trie: Move slen from tnode to key vector Alexander Duyck
@ 2015-02-24 20:51 ` Alexander Duyck
  2015-02-25  3:53 ` [RFC PATCH 00/29] Phase 2 of fib_trie updates David Miller
  29 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 20:51 UTC (permalink / raw
  To: netdev

This is the last bit to get the key_vector up into the parent key_vector
array.  With this the bits field is moved from the local node up to the
parent, and as a result the key_vector is now defunct.

Since the key_vector is now defunct we can do a number of things.  The
first was to remove the leaf allocation since they are now just elements in
the key_vector array contained in the tnode and trie structures, and the
second was to rearrange the fib_table_lookup and fib_find_node functions to
take advantage of the fact that the trie has been pushed up one level.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/ipv4/fib_trie.c |  484 ++++++++++++++++++++++++---------------------------
 1 file changed, 229 insertions(+), 255 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 0953247..8f48a03 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -112,7 +112,7 @@ struct tnode {
 	t_key full_children;		/* KEYLENGTH bits needed */
 	struct key_vector __rcu *parent;
 	struct key_vector kv[0];
-#define tn_bits kv[0].bits
+#define tn_bits kv[1].tn_pos
 };
 
 #ifdef CONFIG_IP_FIB_TRIE_STATS
@@ -160,7 +160,6 @@ static size_t tnode_free_size;
 static const int sync_pages = 128;
 
 static struct kmem_cache *fn_alias_kmem __read_mostly;
-static struct kmem_cache *trie_leaf_kmem __read_mostly;
 
 static inline struct tnode *tn_info(struct key_vector *kv)
 {
@@ -190,9 +189,9 @@ static inline struct fib_table *table_info(struct key_vector *kv)
 /* This provides us with the number of children in this node, in the case of a
  * leaf this will return 0 meaning none of the children are accessible.
  */
-static inline unsigned long child_length(const struct key_vector *tn)
+static inline unsigned long child_length(struct key_vector *tn)
 {
-	return (1ul << tn->bits) & ~(1ul);
+	return (1ul << tn_info(tn)->tn_bits);
 }
 
 #define get_cindex(key, kv) (((key) ^ (kv)->key) >> (kv)->pos)
@@ -297,9 +296,7 @@ static void __node_free_rcu(struct rcu_head *head)
 {
 	struct tnode *n = container_of(head, struct tnode, rcu);
 
-	if (!n->tn_bits)
-		kmem_cache_free(trie_leaf_kmem, n);
-	else if (n->tn_bits <= TNODE_KMALLOC_MAX)
+	if (n->tn_bits <= TNODE_KMALLOC_MAX)
 		kfree(n);
 	else
 		vfree(n);
@@ -325,55 +322,12 @@ static inline void empty_child_dec(struct key_vector *n)
 	tn_info(n)->empty_children-- ? : tn_info(n)->full_children--;
 }
 
-static struct key_vector *leaf_new(t_key key, struct fib_alias *fa)
-{
-	struct tnode *kv = kmem_cache_alloc(trie_leaf_kmem, GFP_KERNEL);
-	struct key_vector *l = kv->kv;
-
-	if (!kv)
-		return NULL;
-
-	/* initialize key vector */
-	l->key = key;
-	l->pos = 0;
-	l->bits = 0;
-	l->slen = fa->fa_slen;
-	l->tn_pos = 0;
-
-	/* link leaf to fib alias */
-	INIT_HLIST_HEAD(&l->leaf);
-	hlist_add_head(&fa->fa_list, &l->leaf);
-
-	return l;
-}
-
 /* Check whether a tnode 'n' is "full", i.e. it is an internal node
  * and no bits are skipped. See discussion in dyntree paper p. 6
  */
 static inline int tnode_full(struct key_vector *tn, struct key_vector *n)
 {
-	return n && IS_TNODE(n) && ((n->tn_pos + n->bits) == tn->tn_pos);
-}
-
-static void drop_child(struct key_vector *tn, t_key key)
-{
-	/* update parent tnode statistics */
-	if (!IS_TRIE(tn)) {
-		unsigned long i = get_index(key, tn);
-		struct key_vector *n = get_child(tn + i);
-
-		if (n) {
-			empty_child_inc(tn);
-			if (tnode_full(tn, n))
-				tn_info(tn)->full_children--;
-		}
-
-		/* update offset to correct key_vector for update */
-		tn += i;
-	}
-
-	/* clear tnode pointers */
-	RCU_INIT_POINTER(tn->tnode, NULL);
+	return IS_TNODE(n) && ((n->pos + n->bits) == tn->tn_pos);
 }
 
 /* Add a child at position i overwriting the old value.
@@ -385,12 +339,12 @@ static void put_child(struct key_vector *tn, unsigned long i,
 	struct key_vector *tnode = get_child(n);
 
 	if (!IS_TRIE(tn)) {
-		struct key_vector *chi = get_child(tn + i);
+		struct key_vector *chi = tn + i;
 
 		BUG_ON(i >= child_length(tn));
 
 		/* update emptyChildren and fullChildren */
-		if (chi) {
+		if (get_child(chi)) {
 			empty_child_inc(tn);
 			if (tnode_full(tn, chi))
 				tn_info(tn)->full_children--;
@@ -399,7 +353,7 @@ static void put_child(struct key_vector *tn, unsigned long i,
 			struct key_vector *kv = get_vector(tn);
 
 			empty_child_dec(tn);
-			if (tnode_full(tn, tnode))
+			if (tnode_full(tn, n))
 				tn_info(tn)->full_children++;
 
 			/* update suffix length */
@@ -408,11 +362,12 @@ static void put_child(struct key_vector *tn, unsigned long i,
 		}
 
 		/* update offset to correct key_vector for update */
-		tn += i;
+		tn = chi;
 	}
 
 	tn->key = n->key;
 	tn->pos = n->pos;
+	tn->bits = n->bits;
 	tn->slen = n->slen;
 
 	rcu_assign_pointer(tn->tnode, tnode);
@@ -424,54 +379,60 @@ static void update_children(struct key_vector *tn)
 
 	/* update all of the child parent pointers */
 	for (i = IS_TRIE(tn) ? 1 : child_length(tn); i;) {
-		struct key_vector *inode = get_child(tn + --i);
-
-		if (!inode)
-			continue;
+		struct key_vector *inode = tn + --i;
 
 		/* Either update the children of a tnode that
 		 * already belongs to us or update the child
 		 * to point to ourselves.
 		 */
-		if (node_parent(inode) == tn)
-			update_children(inode);
-		else
-			node_set_parent(inode, tn);
+		if (IS_TNODE(inode)) {
+			struct key_vector *n = get_child(inode);
+
+			if (!n)
+				continue;
+
+			if (node_parent(n) == tn)
+				update_children(n);
+			else
+				node_set_parent(n, tn);
+		} else if (inode->leaf.first) {
+			/* update hash to point to us */
+			rcu_assign_pointer(inode->leaf.first->pprev,
+					   &inode->leaf.first);
+		}
 	}
 }
 
-static void leaf_init(struct key_vector *tn, t_key key, struct key_vector *l)
+static void leaf_init(struct key_vector *tn, t_key key, struct fib_alias *fa)
 {
-	/* link leaf to parent */
-	NODE_INIT_PARENT(l, tn);
-
 	/* update parent node stats */
 	if (!IS_TRIE(tn)) {
 		struct key_vector *kv = get_vector(tn);
 		unsigned long i = get_index(key, tn);
-		struct key_vector *n = get_child(tn + i);
 
 		BUG_ON(i >= child_length(tn));
 
-		if (!n)
-			empty_child_dec(tn);
-		else if (tnode_full(tn, n))
-			tn_info(tn)->full_children--;
+		empty_child_dec(tn);
 
 		/* update suffix length */
-		if (kv->slen < l->slen)
-			kv->slen = l->slen;
+		if (kv->slen < fa->fa_slen)
+			kv->slen = fa->fa_slen;
 
 		/* update offset to correct key_vector for update */
 		tn += i;
 	}
 
+	/* We should always be handed an empty slot */
+	BUG_ON(!hlist_empty(&tn->leaf));
+
 	/* populate key vector */
 	tn->key = key;
 	tn->pos = 0;
-	tn->slen = l->slen;
+	tn->bits = 0;
+	tn->slen = fa->fa_slen;
 
-	rcu_assign_pointer(tn->tnode, l);
+	/* clean the area and drop in the new leaf */
+	hlist_add_head(&fa->fa_list, &tn->leaf);
 }
 
 static struct key_vector *tnode_new(struct key_vector *tn, t_key key,
@@ -496,11 +457,11 @@ static struct key_vector *tnode_new(struct key_vector *tn, t_key key,
 	/* update parent node stats */
 	if (!IS_TRIE(tn)) {
 		unsigned long idx = get_index(key, tn);
-		struct key_vector *n = get_child(tn + idx);
+		struct key_vector *n = tn + idx;
 
 		BUG_ON(idx >= child_length(tn));
 
-		if (!n)
+		if (!get_child(n))
 			empty_child_dec(tn);
 		else if (tnode_full(tn, n))
 			tn_info(tn)->full_children--;
@@ -508,7 +469,7 @@ static struct key_vector *tnode_new(struct key_vector *tn, t_key key,
 			tn_info(tn)->full_children++;
 
 		/* update offset to correct key_vector for update */
-		tn += idx;
+		tn = n;
 	}
 
 	/* populate tn_info section */
@@ -518,7 +479,7 @@ static struct key_vector *tnode_new(struct key_vector *tn, t_key key,
 	else
 		tnode->empty_children = 1ul << bits;
 	tnode->kv[0].tn_pos = pos;
-	tnode->kv[0].bits = bits;
+	tnode->tn_bits = bits;
 
 	/* populate keys as though we are full of leaves */
 	for (i = (1ul << bits); i--;)
@@ -527,6 +488,7 @@ static struct key_vector *tnode_new(struct key_vector *tn, t_key key,
 	/* populate key vector */
 	tn->key = key;
 	tn->pos = pos;
+	tn->bits = bits;
 	tn->slen = pos;
 
 	rcu_assign_pointer(tn->tnode, tnode->kv);
@@ -618,7 +580,7 @@ static void tnode_free(struct key_vector *tn)
 
 	while (head) {
 		head = head->next;
-		tnode_free_size += TNODE_SIZE(1ul << tn->bits);
+		tnode_free_size += TNODE_SIZE(child_length(tn));
 		node_free(tn);
 
 		tn = container_of(head, struct tnode, rcu)->kv;
@@ -664,11 +626,12 @@ static struct key_vector *resize_children(struct net *net, struct trie *t,
 
 	/* resize children now that oldtnode is freed */
 	for (i = child_length(tn); i;) {
-		struct key_vector *inode = get_child(tn + --i);
+		struct key_vector *inode = tn + --i;
+		struct key_vector *tnode = get_child(inode);
 
 		/* resize child node */
-		if (tnode_full(tn, inode))
-			tn = resize(net, t, inode);
+		if (tnode && tnode_full(tn, inode))
+			tn = resize(net, t, tnode);
 	}
 
 	return node_parent(tn);
@@ -689,7 +652,7 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 		return NULL;
 
 	tn = tnode_new(pn, oldtnode->key,
-		       oldtnode->tn_pos - 1, oldtnode->bits + 1);
+		       oldtnode->tn_pos - 1, tn_info(oldtnode)->tn_bits + 1);
 	if (!tn)
 		goto notnode;
 
@@ -712,7 +675,7 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 			continue;
 
 		/* A leaf or an internal node with skipped bits */
-		if (!tnode_full(oldtnode, tnode)) {
+		if (!tnode_full(oldtnode, inode)) {
 			put_child(tn, get_index(inode->key, tn), inode);
 			continue;
 		}
@@ -721,7 +684,7 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 		tnode_free_append(oldtnode, tnode);
 
 		/* An internal node with two children */
-		if (tnode->bits == 1) {
+		if (inode->bits == 1) {
 			put_child(tn, 2 * i + 1, tnode + 1);
 			put_child(tn, 2 * i, tnode);
 			continue;
@@ -742,11 +705,11 @@ static struct key_vector *inflate(struct net *net, struct trie *t,
 		 * two new keys.
 		 */
 		node1 = tnode_new(tn, inode->key | m,
-				  inode->pos, tnode->bits - 1);
+				  inode->pos, inode->bits - 1);
 		if (!node1)
 			goto nomem;
 		node0 = tnode_new(tn, inode->key,
-				  inode->pos, tnode->bits - 1);
+				  inode->pos, inode->bits - 1);
 
 		tnode_free_append(tn, node1);
 		if (!node0)
@@ -790,7 +753,7 @@ static struct key_vector *halve(struct net *net, struct trie *t,
 		return NULL;
 
 	tn = tnode_new(pn, oldtnode->key,
-		       oldtnode->tn_pos + 1, oldtnode->bits - 1);
+		       oldtnode->tn_pos + 1, tn_info(oldtnode)->tn_bits - 1);
 	if (!tn)
 		goto notnode;
 
@@ -842,13 +805,12 @@ notnode:
 static struct key_vector *collapse(struct net *net, struct trie *t,
 				   struct key_vector *oldtnode)
 {
-	struct key_vector *pn = node_parent(oldtnode);
+	struct key_vector *n, *pn = node_parent(oldtnode);
 	unsigned long i;
 
 	/* scan the tnode looking for that one child that might still exist */
 	for (i = child_length(oldtnode); i--;) {
-		struct key_vector *n = get_child(oldtnode + i);
-
+		n = get_child(oldtnode + i);
 		if (!n)
 			continue;
 
@@ -865,11 +827,24 @@ static struct key_vector *collapse(struct net *net, struct trie *t,
 		node_free(oldtnode);
 
 		/* resize child since it could be promoted to root */
-		return IS_TNODE(n) ? resize(net, t, n) : pn;
+		return IS_TNODE(oldtnode + i) ? resize(net, t, n) : pn;
 	}
 
 	/* no children, just update pointer to NULL */
-	drop_child(pn, oldtnode->key);
+	n = pn;
+
+	/* update parent tnode statistics */
+	if (!IS_TRIE(pn)) {
+		/* update offset to correct key_vector for update */
+		n = get_vector(oldtnode);
+
+		empty_child_inc(pn);
+		if (tnode_full(pn, n))
+			tn_info(pn)->full_children--;
+	}
+
+	/* clear tnode pointers */
+	RCU_INIT_POINTER(n->tnode, NULL);
 	node_free(oldtnode);
 
 	return pn;
@@ -909,7 +884,7 @@ static unsigned char update_suffix(struct key_vector *tn)
 		 * 0 and 1 << (bits - 1) could have that as their suffix
 		 * length.
 		 */
-		if ((slen + 1) >= (tn->pos + tp->bits))
+		if ((slen + 1) >= (tn->pos + tn->bits))
 			break;
 	}
 
@@ -1004,7 +979,7 @@ static inline bool should_halve(struct key_vector *tp, struct key_vector *tn)
 
 	/* if bits == KEYLENGTH then used = 100% on wrap, and will fail below */
 
-	return (used > 1) && (tn->bits > 1) && ((100 * used) < threshold);
+	return (used > 1) && (tn_info(tn)->tn_bits > 1) && ((100 * used) < threshold);
 }
 
 static inline bool should_collapse(struct key_vector *tn)
@@ -1014,7 +989,7 @@ static inline bool should_collapse(struct key_vector *tn)
 	used -= tn_info(tn)->empty_children;
 
 	/* account for bits == KEYLENGTH case */
-	if ((tn->bits == KEYLENGTH) && tn_info(tn)->full_children)
+	if ((tn_info(tn)->tn_bits == KEYLENGTH) && tn_info(tn)->full_children)
 		used -= KEY_MAX;
 
 	/* One child or none, time to drop us from the trie */
@@ -1096,11 +1071,6 @@ static struct key_vector *resize(struct net *net, struct trie *t,
 
 static void leaf_pull_suffix(struct key_vector *tp, struct key_vector *l)
 {
-	struct key_vector *n = tp + get_index(l->key, tp);
-
-	/* update our local vector first */
-	n->slen = l->slen;
-
 	/* work our way back up the trie sorting out slen in the key vectors */
 	while (!IS_TRIE(tp)) {
 		/* if the suffix doesn't change then we are done */
@@ -1113,20 +1083,13 @@ static void leaf_pull_suffix(struct key_vector *tp, struct key_vector *l)
 
 static void leaf_push_suffix(struct key_vector *tn, struct key_vector *l)
 {
-	struct key_vector *n = tn + get_index(l->key, tn);
-
-	/* update our local vector first */
-	n->slen = l->slen;
-
 	/* work our way back up the trie sorting out slen in the key vectors */
-	while (!IS_TRIE(tn)) {
-		n = get_vector(tn);
+	if (!IS_TRIE(tn)) {
+		struct key_vector *n = get_vector(tn);
 
 		/* if the suffix doesn't change then we are done */
 		if (n->slen < l->slen)
 			n->slen = l->slen;
-
-		tn = node_parent(tn);
 	}
 }
 
@@ -1142,9 +1105,6 @@ static struct key_vector *fib_find_node(struct trie *t,
 		n += index;
 
 		index = get_cindex(key, n);
-		n = get_child_rcu(n);
-		if (!n)
-			break;
 
 		/* This bit of code is a bit tricky but it combines multiple
 		 * checks into a single check.  The prefix consists of the
@@ -1160,7 +1120,11 @@ static struct key_vector *fib_find_node(struct trie *t,
 			return NULL;
 
 		/* keep searching until we find a perfect match leaf or NULL */
-	} while (IS_TNODE(n));
+		if (IS_LEAF(n))
+			break;
+
+		n = get_child_rcu(n);
+	} while (n);
 
 	return n;
 }
@@ -1203,18 +1167,13 @@ static struct fib_table *fib_insert_node(struct net *net, struct trie *t,
 					 struct key_vector *tp,
 					 struct fib_alias *new, t_key key)
 {
-	struct key_vector *tn, *l, *n;
+	struct key_vector *tn, *n;
 
 	/* allocate the new parent that must be replaced */
 	tn = vector_clone(tp);
 	if (!tn)
 		return NULL;
 
-	/* allocate the new leaf we will insert */
-	l = leaf_new(key, new);
-	if (!l)
-		goto noleaf;
-
 	/* retrieve child from parent node */
 	n = tp + get_index(key, tp);
 
@@ -1241,14 +1200,12 @@ static struct fib_table *fib_insert_node(struct net *net, struct trie *t,
 	}
 
 	/* Case 3: n is NULL, and will just insert a new leaf */
-	leaf_init(n, key, l);
+	leaf_init(n, key, new);
 
 	vector_replace(net, tp, tn);
 
 	return trie_rebalance(net, t, n);
 notnode:
-	node_free(l);
-noleaf:
 	vector_free(tn);
 
 	return NULL;
@@ -1266,6 +1223,9 @@ static struct fib_table *fib_insert_alias(struct net *net, struct trie *t,
 	if (!fa) {
 		struct fib_alias *last;
 
+		if (hlist_empty(&l->leaf) && !IS_TRIE(tp))
+			empty_child_dec(tp);
+
 		hlist_for_each_entry(last, &l->leaf, fa_list) {
 			if (new->fa_slen < last->fa_slen)
 				break;
@@ -1284,7 +1244,7 @@ static struct fib_table *fib_insert_alias(struct net *net, struct trie *t,
 		leaf_push_suffix(tp, l);
 	}
 
-	return table_info(t->kv);
+	return trie_rebalance(net, t, tp);
 }
 
 /* Caller must hold RTNL. */
@@ -1456,25 +1416,20 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 	struct trie_use_stats __percpu *stats = t->stats;
 #endif
 	const t_key key = ntohl(flp->daddr);
-	struct key_vector *n, *pn, *tn;
-	unsigned long cindex;
+	struct key_vector *pn, *n, *l = t->kv;
+	unsigned long cindex, index = 0;
 	struct fib_alias *fa;
 
-	pn = t->kv;
-	cindex = 0;
-
-	tn = pn;
-	n = get_child_rcu(tn);
-	if (!n)
-		return -EAGAIN;
-
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	this_cpu_inc(stats->gets);
 #endif
+	pn = l;
+	cindex = index;
 
 	/* Step 1: Travel to the longest prefix match in the trie */
 	for (;;) {
-		unsigned long index = get_cindex(key, tn);
+		n = l + index;
+		index = get_cindex(key, n);
 
 		/* This bit of code is a bit tricky but it combines multiple
 		 * checks into a single check.  The prefix consists of the
@@ -1489,6 +1444,11 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		if (index & (~0ul << n->bits))
 			break;
 
+		/* grab the pointers for the next object */
+		l = get_child_rcu(n);
+		if (!l)
+			goto backtrace;
+
 		/* we have found a leaf. Prefixes have already been compared */
 		if (IS_LEAF(n))
 			goto found;
@@ -1496,43 +1456,19 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		/* only record pn and cindex if we are going to be chopping
 		 * bits later.  Otherwise we are just wasting cycles.
 		 */
-		if (tn->slen > tn->pos) {
-			pn = n;
+		if (n->slen > n->pos) {
+			pn = l;
 			cindex = index;
 		}
-
-		tn = n + index;
-
-		/* verify there is a tnode to go with the key vector */
-		n = get_child_rcu(tn);
-		if (unlikely(!n))
-			goto backtrace;
 	}
 
 	/* Step 2: Sort out leaves and begin backtracing for longest prefix */
 	for (;;) {
-		/* This test verifies that none of the bits that differ
-		 * between the key and the prefix exist in the region of
-		 * the lsb and higher in the prefix.
-		 */
-		if (unlikely(prefix_mismatch(key, tn)) || (tn->slen <= tn->pos))
-			goto backtrace;
-
-		/* exit out and process leaf */
-		if (unlikely(IS_LEAF(n)))
-			break;
-
-		/* Don't bother recording parent info.  Since we are in
-		 * prefix match mode we will have to come back to wherever
-		 * we started this traversal anyway
-		 */
-
-		tn = n;
-
-		while ((n = get_child_rcu(tn)) == NULL) {
+		/* grab the pointers for the next object */
+		while ((l = get_child_rcu(n)) == NULL) {
 backtrace:
 #ifdef CONFIG_IP_FIB_TRIE_STATS
-			if (!n)
+			if (!l)
 				this_cpu_inc(stats->null_node_hit);
 #endif
 			/* If we are at cindex 0 there are no more bits for
@@ -1561,24 +1497,46 @@ backtrace:
 			cindex &= cindex - 1;
 
 			/* grab pointer for next child node */
-			tn = pn + cindex;
+			n = pn + cindex;
 		}
-	}
 
+		/* This test verifies that none of the bits that differ
+		 * between the key and the prefix exist in the region of
+		 * the lsb and higher in the prefix.
+		 */
+		if (unlikely(prefix_mismatch(key, n)) || (n->slen <= n->pos))
+			goto backtrace;
+
+		/* exit out and process leaf */
+		if (unlikely(IS_LEAF(n)))
+			break;
+
+		/* Don't bother recording parent info.  Since we are in
+		 * prefix match mode we will have to come back to wherever
+		 * we started this traversal anyway
+		 */
+
+		n = l;
+	}
 found:
 	/* Step 3: Process the leaf, if that fails fall back to backtracing */
-	hlist_for_each_entry_rcu(fa, &n->leaf, fa_list) {
-		struct fib_info *fi = fa->fa_info;
+	fa = hlist_entry(&l->leaf.first, typeof(*fa), fa_list.next);
+	hlist_for_each_entry_from_rcu(fa, fa_list) {
+		struct fib_info *fi;
 		int nhsel, err;
 
-		if (((key ^ n->key) >> fa->fa_slen) &&
-		    (fa->fa_slen != KEYLENGTH))
+		if (((unsigned long)(key ^ n->key) >> fa->fa_slen) &&
+		    ((BITS_PER_LONG > KEYLENGTH) ||
+		     (fa->fa_slen != KEYLENGTH)))
 			continue;
 		if (fa->fa_tos && fa->fa_tos != flp->flowi4_tos)
 			continue;
+
+		fi = fa->fa_info;
+
 		if (fi->fib_dead)
 			continue;
-		if (fa->fa_info->fib_scope < flp->flowi4_scope)
+		if (fi->fib_scope < flp->flowi4_scope)
 			continue;
 		fib_alias_accessed(fa);
 		err = fib_props[fa->fa_type].error;
@@ -1636,9 +1594,13 @@ static void fib_remove_alias(struct net *net, struct trie *t,
 	 * out parent suffix lengths as a part of trie_rebalance
 	 */
 	if (hlist_empty(&l->leaf)) {
-		drop_child(tp, l->key);
-		node_free(l);
-		trie_rebalance(net, t, tp);
+		l->slen = 0;
+
+		if (!IS_TRIE(tp)) {
+			empty_child_inc(tp);
+			trie_rebalance(net, t, tp);
+		}
+
 		return;
 	}
 
@@ -1723,51 +1685,55 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
 /* Scan for the next leaf starting at the provided key value */
 static struct key_vector *leaf_walk_rcu(struct key_vector **pn, t_key key)
 {
-	struct key_vector *tn, *n = *pn;
-	unsigned long cindex;
+	struct key_vector *l, *n, *tn = *pn;
+	unsigned long cindex = get_index(key, tn);
 
 	/* this loop is meant to try and find the key in the trie */
-	do {
-		/* record parent and next child index */
-		tn = n;
-		cindex = get_index(key, tn);
+	while (cindex < child_length(tn)) {
+		n = tn + cindex++;
 
-		if (cindex >> tn->bits)
-			break;
-
-		/* descend into the next child */
-		n = get_child_rcu(tn + cindex++);
-		if (!n)
+		l = get_child_rcu(n);
+		if (!l)
 			break;
 
 		/* guarantee forward progress on the keys */
-		if (IS_LEAF(n) && (n->key >= key))
+		if (IS_LEAF(n)) {
+			if (n->key < key)
+				break;
 			goto found;
-	} while (IS_TNODE(n));
+		}
+
+		/* record parent and next child index */
+		tn = l;
+		cindex = get_index(key, tn);
+	}
 
 	/* this loop will search for the next leaf with a greater key */
 	while (!IS_TRIE(tn)) {
+		t_key pkey;
+
 		/* if we exhausted the parent node we will need to climb */
-		if (cindex >> tn->bits) {
-			t_key pkey = tn->key;
+		while (cindex < child_length(tn)) {
+			/* grab the next available node */
+			n = tn + cindex++;
 
-			tn = node_parent_rcu(tn);
-			cindex = get_index(pkey, tn) + 1;
-			continue;
-		}
+			l = get_child_rcu(n);
+			if (!l)
+				continue;
 
-		/* grab the next available node */
-		n = get_child_rcu(tn + cindex++);
-		if (!n)
-			continue;
+			/* no need to compare keys since we bumped the index */
+			if (IS_LEAF(n))
+				goto found;
 
-		/* no need to compare keys since we bumped the index */
-		if (IS_LEAF(n))
-			goto found;
+			/* Rescan start scanning in new node */
+			tn = l;
+			cindex = 0;
+		}
 
-		/* Rescan start scanning in new node */
-		tn = n;
-		cindex = 0;
+		pkey = tn->key;
+
+		tn = node_parent_rcu(tn);
+		cindex = get_index(pkey, tn) + 1;
 	}
 
 	*pn = tn;
@@ -1790,7 +1756,6 @@ int fib_table_flush(struct net *net, struct fib_table *tb)
 
 	/* walk trie in reverse order */
 	for (;;) {
-		unsigned char slen = 0;
 		struct key_vector *n;
 
 		if (!(cindex--)) {
@@ -1807,42 +1772,47 @@ int fib_table_flush(struct net *net, struct fib_table *tb)
 			continue;
 		}
 
-		/* grab the next available node */
-		n = get_child(pn + cindex);
-		if (!n)
-			continue;
+		/* locate key vector within the array */
+		n = pn + cindex;
 
 		if (IS_TNODE(n)) {
+			/* grab the next available node */
+			n = get_child(n);
+
 			/* record pn and cindex for leaf walking */
-			pn = n;
-			cindex = 1ul << n->bits;
+			if (n) {
+				pn = n;
+				cindex = child_length(n);
+			}
 
 			continue;
 		}
 
-		hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) {
-			struct fib_info *fi = fa->fa_info;
+		if (!hlist_empty(&n->leaf)) {
+			unsigned char slen = 0;
 
-			if (fi && (fi->fib_flags & RTNH_F_DEAD)) {
-				hlist_del_rcu(&fa->fa_list);
-				fib_release_info(fa->fa_info);
-				alias_free_mem_rcu(fa);
-				found++;
+			hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) {
+				struct fib_info *fi = fa->fa_info;
 
-				continue;
-			}
+				if (fi && (fi->fib_flags & RTNH_F_DEAD)) {
+					hlist_del_rcu(&fa->fa_list);
+					fib_release_info(fa->fa_info);
+					alias_free_mem_rcu(fa);
+					found++;
 
-			slen = fa->fa_slen;
-		}
+					continue;
+				}
+
+				/* track suffix length of non-flushed leaves */
+				slen = fa->fa_slen;
+			}
 
-		/* update leaf slen */
-		n->slen = slen;
+			/* reset slen and update tnode */
+			n->slen = slen;
 
-		if (hlist_empty(&n->leaf)) {
-			drop_child(pn, n->key);
-			node_free(n);
-		} else {
-			leaf_pull_suffix(pn, n);
+			/* update parent status */
+			if (hlist_empty(&n->leaf) && !IS_TRIE(pn))
+				empty_child_inc(pn);
 		}
 	}
 
@@ -1945,11 +1915,6 @@ void __init fib_trie_init(void)
 	fn_alias_kmem = kmem_cache_create("ip_fib_alias",
 					  sizeof(struct fib_alias),
 					  0, SLAB_PANIC, NULL);
-
-	trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
-					   sizeof(struct tnode) +
-					   sizeof(struct key_vector),
-					   0, SLAB_PANIC, NULL);
 }
 
 struct fib_table *fib_trie_table(u32 id)
@@ -1999,17 +1964,22 @@ static struct key_vector *fib_trie_get_next(struct fib_trie_iter *iter)
 
 	while (!IS_TRIE(pn)) {
 		while (cindex < child_length(pn)) {
-			struct key_vector *n = get_child_rcu(pn + cindex++);
-
-			if (!n)
-				continue;
+			struct key_vector *n = pn + cindex++;
 
 			if (IS_LEAF(n)) {
+				if (hlist_empty(&n->leaf))
+					continue;
+
 				iter->tnode = pn;
 				iter->index = cindex;
 			} else {
+				struct key_vector *tnode = get_child_rcu(n);
+
+				if (!tnode)
+					continue;
+
 				/* push down one level */
-				iter->tnode = n;
+				iter->tnode = tnode;
 				iter->index = 0;
 				++iter->depth;
 			}
@@ -2034,26 +2004,30 @@ static struct key_vector *fib_trie_get_next(struct fib_trie_iter *iter)
 static struct key_vector *fib_trie_get_first(struct fib_trie_iter *iter,
 					     struct trie *t)
 {
-	struct key_vector *n, *pn = t->kv;
+	struct key_vector *pn = t->kv;
 
 	if (!t)
 		return NULL;
 
-	n = rcu_dereference(pn->tnode);
-	if (!n)
-		return NULL;
+	if (IS_TNODE(pn)) {
+		struct key_vector *n = get_child_rcu(pn);
+
+		if (!n)
+			return NULL;
 
-	if (IS_TNODE(n)) {
 		iter->tnode = n;
 		iter->index = 0;
 		iter->depth = 1;
 	} else {
+		if (hlist_empty(&pn->leaf))
+			return NULL;
+
 		iter->tnode = pn;
 		iter->index = 0;
 		iter->depth = 0;
 	}
 
-	return n;
+	return pn;
 }
 
 static void trie_collect_stats(struct trie *t, struct trie_stat *s)
@@ -2079,7 +2053,7 @@ static void trie_collect_stats(struct trie *t, struct trie_stat *s)
 			s->tnodes++;
 			if (n->bits < MAX_STAT_DEPTH)
 				s->nodesizes[n->bits]++;
-			s->nullpointers += tn_info(n)->empty_children;
+			s->nullpointers += tn_info(iter.tnode)->empty_children;
 		}
 	}
 	rcu_read_unlock();
@@ -2124,7 +2098,7 @@ static void trie_show_stats(struct seq_file *seq, struct trie_stat *stat)
 	seq_printf(seq, "\tPointers: %u\n", pointers);
 
 	bytes += sizeof(struct key_vector) * pointers;
-	seq_printf(seq, "Empty vectors: %u\n", stat->nullpointers);
+	seq_printf(seq, "Empty leaves: %u\n", stat->nullpointers);
 	seq_printf(seq, "Total size: %u  kB\n", (bytes + 1023) / 1024);
 }
 
@@ -2177,7 +2151,7 @@ static int fib_triestat_seq_show(struct seq_file *seq, void *v)
 	seq_printf(seq,
 		   "Basic info: size of leaf:"
 		   " %Zd bytes, size of tnode: %Zd bytes.\n",
-		   TNODE_SIZE(1), TNODE_SIZE(0));
+		   sizeof(struct key_vector), TNODE_SIZE(0));
 
 	for (h = 0; h < FIB_TABLE_HASHSZ; h++) {
 		struct hlist_head *head = &net->ipv4.fib_table_hash[h];
@@ -2345,17 +2319,17 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
 	const struct fib_trie_iter *iter = seq->private;
 	struct key_vector *n = v;
 
-	if (IS_TRIE(node_parent_rcu(n)))
+	if (IS_TRIE(n))
 		fib_table_print(seq, iter->tb);
 
 	if (IS_TNODE(n)) {
-		__be32 prf = htonl((n->key >> n->tn_pos) << n->tn_pos);
+		__be32 prf = htonl(n->key);
 
 		seq_indent(seq, iter->depth-1);
 		seq_printf(seq, "  +-- %pI4/%zu %u %u %u\n",
-			   &prf, KEYLENGTH - n->tn_pos - n->bits, n->bits,
-			   tn_info(n)->full_children,
-			   tn_info(n)->empty_children);
+			   &prf, KEYLENGTH - n->pos - n->bits, n->bits,
+			   tn_info(iter->tnode)->full_children,
+			   tn_info(iter->tnode)->empty_children);
 	} else {
 		__be32 val = htonl(n->key);
 		struct fib_alias *fa;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list
  2015-02-24 20:48 ` [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list Alexander Duyck
@ 2015-02-24 21:51   ` Or Gerlitz
  2015-02-24 21:52     ` Or Gerlitz
  2015-02-24 22:08     ` David Miller
  2015-02-24 22:47   ` Julian Anastasov
  2015-02-24 23:09   ` Julian Anastasov
  2 siblings, 2 replies; 39+ messages in thread
From: Or Gerlitz @ 2015-02-24 21:51 UTC (permalink / raw
  To: Alexander Duyck; +Cc: Linux Netdev List

On Tue, Feb 24, 2015 at 10:48 PM, Alexander Duyck
<alexander.h.duyck@redhat.com> wrote:
> There isn't any advantage to having it as a list and by making it an hlist
> we make the fib_alias more compatible with the list_info in terms of the
> type of list used.

Hi Alex, twenty nine patches without (unless it's on his way) cover
letter are really hard to grasp, can you write something general on
this series, please...

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list
  2015-02-24 21:51   ` Or Gerlitz
@ 2015-02-24 21:52     ` Or Gerlitz
  2015-02-24 22:08     ` David Miller
  1 sibling, 0 replies; 39+ messages in thread
From: Or Gerlitz @ 2015-02-24 21:52 UTC (permalink / raw
  To: Alexander Duyck; +Cc: Linux Netdev List

On Tue, Feb 24, 2015 at 11:51 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, Feb 24, 2015 at 10:48 PM, Alexander Duyck
> <alexander.h.duyck@redhat.com> wrote:
>> There isn't any advantage to having it as a list and by making it an hlist
>> we make the fib_alias more compatible with the list_info in terms of the
>> type of list used.
>
> Hi Alex, twenty nine patches without (unless it's on his way) cover
> letter are really hard to grasp, can you write something general on
> this series, please...

Sorry for the spam, it's there.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list
  2015-02-24 21:51   ` Or Gerlitz
  2015-02-24 21:52     ` Or Gerlitz
@ 2015-02-24 22:08     ` David Miller
  2015-02-24 22:14       ` Alexander Duyck
  1 sibling, 1 reply; 39+ messages in thread
From: David Miller @ 2015-02-24 22:08 UTC (permalink / raw
  To: gerlitz.or; +Cc: alexander.h.duyck, netdev

From: Or Gerlitz <gerlitz.or@gmail.com>
Date: Tue, 24 Feb 2015 23:51:40 +0200

> On Tue, Feb 24, 2015 at 10:48 PM, Alexander Duyck
> <alexander.h.duyck@redhat.com> wrote:
>> There isn't any advantage to having it as a list and by making it an hlist
>> we make the fib_alias more compatible with the list_info in terms of the
>> type of list used.
> 
> Hi Alex, twenty nine patches without (unless it's on his way) cover
> letter are really hard to grasp, can you write something general on
> this series, please...

There was a 00/29 posting.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list
  2015-02-24 22:08     ` David Miller
@ 2015-02-24 22:14       ` Alexander Duyck
  0 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2015-02-24 22:14 UTC (permalink / raw
  To: David Miller, gerlitz.or; +Cc: netdev


On 02/24/2015 02:08 PM, David Miller wrote:
> From: Or Gerlitz <gerlitz.or@gmail.com>
> Date: Tue, 24 Feb 2015 23:51:40 +0200
>
>> On Tue, Feb 24, 2015 at 10:48 PM, Alexander Duyck
>> <alexander.h.duyck@redhat.com> wrote:
>>> There isn't any advantage to having it as a list and by making it an hlist
>>> we make the fib_alias more compatible with the list_info in terms of the
>>> type of list used.
>> Hi Alex, twenty nine patches without (unless it's on his way) cover
>> letter are really hard to grasp, can you write something general on
>> this series, please...
> There was a 00/29 posting.

Yeah, Or saw it.  It just showed up late I am guessing.  Even in my own 
inbox it takes a few minutes for all of the patches to show up when I 
post a set.

- Alex

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list
  2015-02-24 20:48 ` [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list Alexander Duyck
  2015-02-24 21:51   ` Or Gerlitz
@ 2015-02-24 22:47   ` Julian Anastasov
  2015-02-24 23:09   ` Julian Anastasov
  2 siblings, 0 replies; 39+ messages in thread
From: Julian Anastasov @ 2015-02-24 22:47 UTC (permalink / raw
  To: Alexander Duyck; +Cc: netdev


	Hello,

On Tue, 24 Feb 2015, Alexander Duyck wrote:

> There isn't any advantage to having it as a list and by making it an hlist
> we make the fib_alias more compatible with the list_info in terms of the
> type of list used.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
> ---
>  include/net/ip_fib.h     |    2 +
>  net/ipv4/fib_lookup.h    |    2 +
>  net/ipv4/fib_semantics.c |    4 +--
>  net/ipv4/fib_trie.c      |   72 ++++++++++++++++++++++++++--------------------
>  4 files changed, 44 insertions(+), 36 deletions(-)

> -static struct list_head *fib_insert_node(struct trie *t, u32 key, int plen)
> +static struct hlist_head *fib_insert_node(struct trie *t, u32 key, int plen)

> @@ -1276,8 +1276,17 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>  	if (!plen)
>  		tb->tb_num_default++;
>  
> -	list_add_tail_rcu(&new_fa->fa_list,
> -			  (fa ? &fa->fa_list : fa_head));
> +	if (!fa) {
> +		struct fib_alias *last;
> +
> +		hlist_for_each_entry(last, fa_head, fa_list)
> +			fa = last;

	This should be changed to properly replace add_tail
because when fa is NULL we are adding new_fa with lowest
TOS value and it should be really added at tail. When
fa != NULL, new_fa should go before fa. Looks like this
comment is wrong:

         * If fa is NULL, we will need to allocate a new one and
         * insert to the head of f.

	It should be "to the tail of fa_head".

> +	}
> +
> +	if (fa)
> +		hlist_add_behind_rcu(&new_fa->fa_list, &fa->fa_list);

		hlist_add_before_rcu because list_add_tail_rcu
means add_before.

> +	else
> +		hlist_add_head_rcu(&new_fa->fa_list, fa_head);

		hlist_add_behind_rcu after last or
hlist_add_head_rcu if list is empty.

	What about:

	/* fa_last can come from fib_find_alias */
	struct fib_alias *fa_last = NULL;

	fa = fib_find_alias(fa_head, tos, fi->fib_priority, &fa_last);
	...

	if (fa)
		hlist_add_before_rcu(&new_fa->fa_list, &fa->fa_list);
	else if (!fa_last)
		hlist_add_head_rcu(&new_fa->fa_list, fa_head);
	else
		hlist_add_behind_rcu(&new_fa->fa_list, &fa_last->fa_list);

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list
  2015-02-24 20:48 ` [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list Alexander Duyck
  2015-02-24 21:51   ` Or Gerlitz
  2015-02-24 22:47   ` Julian Anastasov
@ 2015-02-24 23:09   ` Julian Anastasov
  2 siblings, 0 replies; 39+ messages in thread
From: Julian Anastasov @ 2015-02-24 23:09 UTC (permalink / raw
  To: Alexander Duyck; +Cc: netdev


	Hello,

On Tue, 24 Feb 2015, Alexander Duyck wrote:

> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index 3daf022..e0d44b7 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c

> @@ -1192,8 +1193,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>  		 */
>  		fa_match = NULL;
>  		fa_first = fa;
> -		fa = list_entry(fa->fa_list.prev, struct fib_alias, fa_list);
> -		list_for_each_entry_continue(fa, fa_head, fa_list) {
> +		hlist_for_each_entry_from(fa, fa_list) {
>  			if (fa->fa_tos != tos)
>  				break;
>  			if (fa->fa_info->fib_priority != fi->fib_priority)

	Also, as this loop when implemented as hlist
can exit with fa = NULL we have to use fa_last = fa at
end of loop:

		hlist_for_each_entry_from(fa, fa_list) {
			if (fa->fa_tos != tos)
				break;
			if (fa->fa_info->fib_priority != fi->fib_priority)
				break;
			if (fa->fa_type == cfg->fc_type &&
			    fa->fa_info == fi) {
				fa_match = fa;
				break;
			}
+			fa_last = fa;
		}

	It is needed so that NLM_F_APPEND can work.
Basicly, if list_for_* end with pos pointing to head,
hlist_for_* end with NULL.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 00/29] Phase 2 of fib_trie updates
  2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
                   ` (28 preceding siblings ...)
  2015-02-24 20:51 ` [RFC PATCH 29/29] fib_trie: Push bits up one level, and move leaves up into parent key_vector array Alexander Duyck
@ 2015-02-25  3:53 ` David Miller
  2015-02-25  5:12   ` Alexander Duyck
  29 siblings, 1 reply; 39+ messages in thread
From: David Miller @ 2015-02-25  3:53 UTC (permalink / raw
  To: alexander.h.duyck; +Cc: netdev

From: Alexander Duyck <alexander.h.duyck@redhat.com>
Date: Tue, 24 Feb 2015 12:47:55 -0800

> This patch series implements the second phase of the fib_trie changes.  I
> presented on these and the previous changes at Netdev01 and netconf.  The
> slides for the Netdev01 presentation can be found at
> https://www.netdev01.org/docs/duyck-fib-trie.pdf.
> 
> I'm currently debating if I should just submit the entire patch-set as-is
> or if I should hold off on submitting the last 10 patches as they currently
> have a potential performance impact in the case of a large number of
> entries placed in the local table.  Specifically I have seen that removing
> an interface in the case of 8K local subnets being configured on it
> resulted in the time for a dummy interface being removed increasing from
> about .6 seconds to 2.4 seconds.  I am not sure how common of a use-case
> something like this would be.  I have not seen the same issue if I assign
> 8K routes to the interface as I believe the fib_table_flush aggregates them
> all in to one resize action.
> 
> The entire series reduces the total look-up time by another 20-35% versus
> what is currently in the 4.0-rc1 kernel.  So for example a set of routing
> look-ups which took 140ns in the 4.0-rc1 kernel will now only take about
> 105ns after these patches.

I did a quick once-over for these changes and conceptually they look
fine.

Why are sequences of removals so much more costly now?  Is it because
of the maintainence of the information in the parent when rebalancing?

In any event, I'll say two things:

1) You should submit these changes in smaller batches anyways.
   It's easier to review and get small sets of transformations
   tested as a unit.

2) For the device removal case, we can batch the inet addr removal
   based route delete operations, and thus mitigate the rebalancing
   costs.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 00/29] Phase 2 of fib_trie updates
  2015-02-25  3:53 ` [RFC PATCH 00/29] Phase 2 of fib_trie updates David Miller
@ 2015-02-25  5:12   ` Alexander Duyck
  2015-02-27 21:01     ` David Miller
  0 siblings, 1 reply; 39+ messages in thread
From: Alexander Duyck @ 2015-02-25  5:12 UTC (permalink / raw
  To: David Miller, alexander.h.duyck; +Cc: netdev

On 02/24/2015 07:53 PM, David Miller wrote:
> From: Alexander Duyck <alexander.h.duyck@redhat.com>
> Date: Tue, 24 Feb 2015 12:47:55 -0800
>
>> This patch series implements the second phase of the fib_trie changes.  I
>> presented on these and the previous changes at Netdev01 and netconf.  The
>> slides for the Netdev01 presentation can be found at
>> https://www.netdev01.org/docs/duyck-fib-trie.pdf.
>>
>> I'm currently debating if I should just submit the entire patch-set as-is
>> or if I should hold off on submitting the last 10 patches as they currently
>> have a potential performance impact in the case of a large number of
>> entries placed in the local table.  Specifically I have seen that removing
>> an interface in the case of 8K local subnets being configured on it
>> resulted in the time for a dummy interface being removed increasing from
>> about .6 seconds to 2.4 seconds.  I am not sure how common of a use-case
>> something like this would be.  I have not seen the same issue if I assign
>> 8K routes to the interface as I believe the fib_table_flush aggregates them
>> all in to one resize action.
>>
>> The entire series reduces the total look-up time by another 20-35% versus
>> what is currently in the 4.0-rc1 kernel.  So for example a set of routing
>> look-ups which took 140ns in the 4.0-rc1 kernel will now only take about
>> 105ns after these patches.
> I did a quick once-over for these changes and conceptually they look
> fine.
>
> Why are sequences of removals so much more costly now?  Is it because
> of the maintainence of the information in the parent when rebalancing?
>
> In any event, I'll say two things:
>
> 1) You should submit these changes in smaller batches anyways.
>    It's easier to review and get small sets of transformations
>    tested as a unit.

Yeah, these will probably be submitted as 3 sets.  The first being the
leaf_info removal, then the key_vector stuff, and finally reworking the
RCU and pushing everything up one level so the pointer and key info
occupy the same cache line.

> 2) For the device removal case, we can batch the inet addr removal
>    based route delete operations, and thus mitigate the rebalancing
>    costs.

The problem is that the tnodes are now split over 2 cache lines.  As a
result in order to resize a node, or replace it with the leaf contained
in the node you end up having to replace the parent of the node as well. 

As it turns out dropping a subnet from the local trie occurs in two
steps.  The first appears to drop the broadcast addresses and flush
them, this is causing some significant overhead since it means the
kernel to reallocate the 8K child tnode as each subnet/child is
collapsing from a 4 child tnode to just a leaf.  Then it looks like the
kernel is going though and deleting the local addresses that were there
for each subnet one at a time.  This was much cheaper in the old setup
since it was just a matter of swapping a pointer instead of having to
update a pointer and key information.

- Alex

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 00/29] Phase 2 of fib_trie updates
  2015-02-25  5:12   ` Alexander Duyck
@ 2015-02-27 21:01     ` David Miller
  0 siblings, 0 replies; 39+ messages in thread
From: David Miller @ 2015-02-27 21:01 UTC (permalink / raw
  To: alexander.duyck; +Cc: alexander.h.duyck, netdev

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Tue, 24 Feb 2015 21:12:48 -0800

> The problem is that the tnodes are now split over 2 cache lines.  As a
> result in order to resize a node, or replace it with the leaf contained
> in the node you end up having to replace the parent of the node as well. 

Therefore I still think that batching the rebalancing across all of the
per-device address removals will help a lot here.

This is true with or without your fib_trie modifications.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2015-02-27 21:01 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-24 20:47 [RFC PATCH 00/29] Phase 2 of fib_trie updates Alexander Duyck
2015-02-24 20:48 ` [RFC PATCH 01/29] fib_trie: Convert fib_alias to hlist from list Alexander Duyck
2015-02-24 21:51   ` Or Gerlitz
2015-02-24 21:52     ` Or Gerlitz
2015-02-24 22:08     ` David Miller
2015-02-24 22:14       ` Alexander Duyck
2015-02-24 22:47   ` Julian Anastasov
2015-02-24 23:09   ` Julian Anastasov
2015-02-24 20:48 ` [RFC PATCH 02/29] fib_trie: Replace plen with slen in leaf_info Alexander Duyck
2015-02-24 20:48 ` [RFC PATCH 03/29] fib_trie: Add slen to fib alias Alexander Duyck
2015-02-24 20:48 ` [RFC PATCH 04/29] fib_trie: Remove leaf_info Alexander Duyck
2015-02-24 20:48 ` [RFC PATCH 05/29] fib_trie: Only resize N/2 times instead N * log(N) times in fib_table_flush Alexander Duyck
2015-02-24 20:48 ` [RFC PATCH 06/29] fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf Alexander Duyck
2015-02-24 20:48 ` [RFC PATCH 07/29] fib_trie: Fib find node should return parent Alexander Duyck
2015-02-24 20:48 ` [RFC PATCH 08/29] fib_trie: Update insert and delete to make use of tp from find_node Alexander Duyck
2015-02-24 20:48 ` [RFC PATCH 09/29] fib_trie: Make fib_table rcu safe Alexander Duyck
2015-02-24 20:49 ` [RFC PATCH 10/29] fib_trie: Return pointer to tnode pointer in resize/inflate/halve Alexander Duyck
2015-02-24 20:49 ` [RFC PATCH 11/29] fib_trie: Rename tnode to key_vector Alexander Duyck
2015-02-24 20:49 ` [RFC PATCH 12/29] fib_trie: move leaf and tnode to occupy the same spot in the key vector Alexander Duyck
2015-02-24 20:49 ` [RFC PATCH 13/29] fib_trie: replace tnode_get_child functions with get_child macros Alexander Duyck
2015-02-24 20:49 ` [RFC PATCH 14/29] fib_trie: Rename tnode_child_length to child_length Alexander Duyck
2015-02-24 20:49 ` [RFC PATCH 15/29] fib_trie: Add tnode struct as a container for fields not needed in key_vector Alexander Duyck
2015-02-24 20:49 ` [RFC PATCH 16/29] fib_trie: Move rcu from key_vector to tnode, add accessors Alexander Duyck
2015-02-24 20:49 ` [RFC PATCH 17/29] fib_trie: Pull empty_children and full_children into tnode Alexander Duyck
2015-02-24 20:49 ` [RFC PATCH 18/29] fib_trie: Move parent from key_vector to tnode Alexander Duyck
2015-02-24 20:50 ` [RFC PATCH 19/29] fib_trie: Add key vector to root, return parent key_vector in resize Alexander Duyck
2015-02-24 20:50 ` [RFC PATCH 20/29] fib_trie: Push net pointer down into fib_trie insert/delete/flush calls Alexander Duyck
2015-02-24 20:50 ` [RFC PATCH 21/29] fib_trie: Rewrite handling of RCU to include parent in replacement Alexander Duyck
2015-02-24 20:50 ` [RFC PATCH 22/29] fib_trie: Allocate tnode as array of key_vectors instead of key_vector as array of tnode pointers Alexander Duyck
2015-02-24 20:50 ` [RFC PATCH 23/29] fib_trie: Add leaf_init Alexander Duyck
2015-02-24 20:50 ` [RFC PATCH 24/29] fib_trie: Update tnode_new to drop use of put_child_root Alexander Duyck
2015-02-24 20:50 ` [RFC PATCH 25/29] fib_trie: Add function for dropping children from trie Alexander Duyck
2015-02-24 20:50 ` [RFC PATCH 26/29] fib_trie: Use put_child to only copy key_vectors instead of pointers Alexander Duyck
2015-02-24 20:50 ` [RFC PATCH 27/29] fib_trie: Move key and pos into key_vector from tnode Alexander Duyck
2015-02-24 20:51 ` [RFC PATCH 28/29] fib_trie: Move slen from tnode to key vector Alexander Duyck
2015-02-24 20:51 ` [RFC PATCH 29/29] fib_trie: Push bits up one level, and move leaves up into parent key_vector array Alexander Duyck
2015-02-25  3:53 ` [RFC PATCH 00/29] Phase 2 of fib_trie updates David Miller
2015-02-25  5:12   ` Alexander Duyck
2015-02-27 21:01     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.