From mboxrd@z Thu Jan 1 00:00:00 1970 From: "De Lara Guarch, Pablo" Subject: Re: [PATCH] hash: update jhash function with the latest available Date: Fri, 17 Apr 2015 16:03:44 +0000 Message-ID: References: <1429190819-27402-1-git-send-email-pablo.de.lara.guarch@intel.com> <20150416140115.GA10552@bricha3-MOBL3> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "dev-VfR2kkLFssw@public.gmane.org" To: "Richardson, Bruce" Return-path: In-Reply-To: <20150416140115.GA10552@bricha3-MOBL3> Content-Language: en-US List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org Sender: "dev" > -----Original Message----- > From: Richardson, Bruce > Sent: Thursday, April 16, 2015 3:01 PM > To: De Lara Guarch, Pablo > Cc: dev-VfR2kkLFssw@public.gmane.org > Subject: Re: [dpdk-dev] [PATCH] hash: update jhash function with the late= st > available >=20 > On Thu, Apr 16, 2015 at 02:26:59PM +0100, Pablo de Lara wrote: > > Jenkins hash function was developed originally in 1996, > > and was integrated in first versions of DPDK. > > The function has been improved in 2006, > > achieving up to 60% better performance, compared to the original one. > > > > Check out: http://burtleburtle.net/bob/c/lookup3.c > > > > This patch integrates that code in the rte_jhash library, > > adding also a new function rte_jhash_word2, > > that returns two different hash values, for a single key. > > >=20 > Should the addition of the new functionality not be a separate patch from > the > update to the existing code? True, actually, I miss one extra function (2 in total). > Also, do the new functions return the exact same values as the previous > versions, > just faster? The new functions return different values from the previous version AND faster (some cases, MUCH faster) [...] > > +#if RTE_BYTE_ORDER =3D=3D RTE_LITTLE_ENDIAN > > + case 12: > > + c +=3D k[2]; b +=3D k[1]; a +=3D k[0]; break; > > + case 11: > > + c +=3D k[2]&0xffffff; b +=3D k[1]; a +=3D k[0]; break; > > + case 10: > > + c +=3D k[2]&0xffff; b +=3D k[1]; a +=3D k[0]; break; > > + case 9: > > + c +=3D k[2]&0xff; b +=3D k[1]; a +=3D k[0]; break; > > + case 8: > > + b +=3D k[1]; a +=3D k[0]; break; > > + case 7: > > + b +=3D k[1]&0xffffff; a +=3D k[0]; break; > > + case 6: > > + b +=3D k[1]&0xffff; a +=3D k[0]; break; > > + case 5: > > + b +=3D k[1]&0xff; a +=3D k[0]; break; > > + case 4: > > + a +=3D k[0]; break; > > + case 3: > > + a +=3D k[0]&0xffffff; break; > > + case 2: > > + a +=3D k[0]&0xffff; break; > > + case 1: > > + a +=3D k[0]&0xff; break; > > +#else > > + case 12: > > + c +=3D k[2]; b +=3D k[1]; a +=3D k[0]; break; > > + case 11: > > + c +=3D k[2]&0xffffff00; b +=3D k[1]; a +=3D k[0]; break; > > + case 10: > > + c +=3D k[2]&0xffff0000; b +=3D k[1]; a +=3D k[0]; break; > > + case 9: > > + c +=3D k[2]&0xff000000; b +=3D k[1]; a +=3D k[0]; break; > > + case 8: > > + b +=3D k[1]; a +=3D k[0]; break; > > + case 7: > > + b +=3D k[1]&0xffffff00; a +=3D k[0]; break; > > + case 6: > > + b +=3D k[1]&0xffff0000; a +=3D k[0]; break; > > + case 5: > > + b +=3D k[1]&0xff000000; a +=3D k[0]; break; > > + case 4: > > + a +=3D k[0]; break; > > + case 3: > > + a +=3D k[0]&0xffffff00; break; > > + case 2: > > + a +=3D k[0]&0xffff0000; break; > > + case 1: > > + a +=3D k[0]&0xff000000; break; > > +#endif >=20 > Only the constants seem different in this block. Can we get rid of the > #ifdefs using rte_XX_to_cpu() calls instead? Will add that in next version. >=20 > > + /* zero length strings require no mixing */ > > + case 0: > > + return c; > > + }; > > +#if RTE_BYTE_ORDER =3D=3D RTE_LITTLE_ENDIAN > > + } else if ((u.i & 0x1) =3D=3D 0) { > > + /* read 16-bit chunks */ > > + const uint16_t *k =3D (const uint16_t *)key; > > + const uint8_t *k8; > > + > > + /* all but last block: aligned reads and different mixing */ > > + while (length > 12) { > > + a +=3D k[0] + (((uint32_t)k[1])<<16); > > + b +=3D k[2] + (((uint32_t)k[3])<<16); > > + c +=3D k[4] + (((uint32_t)k[5])<<16); > > + > > + __rte_jhash_mix(a, b, c); > > + > > + k +=3D 6; > > + length -=3D 12; > > + } > > + > > + /* handle the last (probably partial) block */ > > + k8 =3D (const uint8_t *)k; > > + switch (length) { > > + case 12: > > + c +=3D k[4]+(((uint32_t)k[5])<<16); > > + b +=3D k[2]+(((uint32_t)k[3])<<16); > > + a +=3D k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 11: > > + /* fall through */ > > + c +=3D ((uint32_t)k8[10])<<16; > > + case 10: > > + c +=3D k[4]; > > + b +=3D k[2]+(((uint32_t)k[3])<<16); > > + a +=3D k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 9: > > + /* fall through */ > > + c +=3D k8[8]; > > + case 8: > > + b +=3D k[2]+(((uint32_t)k[3])<<16); > > + a +=3D k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 7: > > + /* fall through */ > > + b +=3D ((uint32_t)k8[6])<<16; > > + case 6: > > + b +=3D k[2]; > > + a +=3D k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 5: > > + /* fall through */ > > + b +=3D k8[4]; > > + case 4: > > + a +=3D k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 3: > > + /* fall through */ > > + a +=3D ((uint32_t)k8[2])<<16; > > + case 2: > > + a +=3D k[0]; > > + break; > > + case 1: > > + a +=3D k8[0]; > > + break; > > + case 0: > > + /* zero length requires no mixing */ > > + return c; > > + } > > +#endif >=20 > No else block for this ifdef? According to the code, for big endian, it only covers 4-byte alignment and = rest of cases. >=20 > > + } else { > > + const uint8_t *k =3D (const uint8_t *)key; > > + > > + /* all but the last block: affect some 32 bits of (a, b, c) */ > > + while (length > 12) { > > +#if RTE_BYTE_ORDER =3D=3D RTE_LITTLE_ENDIAN > > + a +=3D k[0]; > > + a +=3D ((uint32_t)k[1])<<8; > > + a +=3D ((uint32_t)k[2])<<16; > > + a +=3D ((uint32_t)k[3])<<24; > > + b +=3D k[4]; > > + b +=3D ((uint32_t)k[5])<<8; > > + b +=3D ((uint32_t)k[6])<<16; > > + b +=3D ((uint32_t)k[7])<<24; > > + c +=3D k[8]; > > + c +=3D ((uint32_t)k[9])<<8; > > + c +=3D ((uint32_t)k[10])<<16; > > + c +=3D ((uint32_t)k[11])<<24; > > +#else > > + a +=3D ((uint32_t)k[0])<<24; > > + a +=3D ((uint32_t)k[1])<<16; > > + a +=3D ((uint32_t)k[2])<<8; > > + a +=3D ((uint32_t)k[3]); > > + b +=3D ((uint32_t)k[4])<<24; > > + b +=3D ((uint32_t)k[5])<<16; > > + b +=3D ((uint32_t)k[6])<<8; > > + b +=3D ((uint32_t)k[7]); > > + c +=3D ((uint32_t)k[8])<<32; > > + c +=3D ((uint32_t)k[9])<<16; > > + c +=3D ((uint32_t)k[10])<<8; > > + c +=3D ((uint32_t)k[11]); > > +#endif >=20 > Maybe find a better way to shorten/remove this #ifdef also. E.g. shorter > ifdef defining macros for the different shift amounts, 0, 8, 16, 24. Agree. Will change that in v2. [...] Thanks for the comments. I will send a v2 soon.