From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Monjalon Subject: Re: [PATCH v7 0/7] Cuckoo hash - part 3 of Cuckoo hash Date: Mon, 13 Jul 2015 00:46:56 +0200 Message-ID: <2632377.THyBr7JfX0@xps13> References: <1436571020-16252-1-git-send-email-pablo.de.lara.guarch@intel.com> <1436573936-15956-1-git-send-email-pablo.de.lara.guarch@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7Bit Cc: dev@dpdk.org To: Pablo de Lara Return-path: Received: from mail-wi0-f173.google.com (mail-wi0-f173.google.com [209.85.212.173]) by dpdk.org (Postfix) with ESMTP id ABFD45A86 for ; Mon, 13 Jul 2015 00:48:08 +0200 (CEST) Received: by wicmz13 with SMTP id mz13so47759535wic.0 for ; Sun, 12 Jul 2015 15:48:08 -0700 (PDT) In-Reply-To: <1436573936-15956-1-git-send-email-pablo.de.lara.guarch@intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 2015-07-11 01:18, Pablo de Lara: > This patchset is to replace the existing hash library with > a more efficient and functional approach, using the Cuckoo hash > method to deal with collisions. This method is based on using > two different hash functions to have two possible locations > in the hash table where an entry can be. > So, if a bucket is full, a new entry can push one of the items > in that bucket to its alternative location, making space for itself. > > Advantages > ~~ > - Offers the option to store more entries when the target bucket is full > (unlike the previous implementation) > - Memory efficient: for storing those entries, it is not necessary to > request new memory, as the entries will be stored in the same table > - Constant worst lookup time: in worst case scenario, it always takes > the same time to look up an entry, as there are only two possible locations > where an entry can be. > - Storing data: user can store data in the hash table, unlike the > previous implementation, but he can still use the old API > > This implementation typically offers over 90% utilization. > Notice that API has been extended, but old API remains. > Check documentation included to know more about this new implementation > (including how entries are distributed as table utilization increases). > > Changes in v7: > - Fix inaccurate documentation > > Changes in v6: > - Replace datatype for functions from uintptr_t to void * > > Changes in v5: > - Fix 32-bit compilation issues > > Changes in v4: > - Unit tests enhancements are not part of this patchset anymore. > - rte_hash structure has been made internal in another patch, > so it is not part of this patchset anymore. > - Add function to iterate through the hash table, as rte_hash > structure has been made private. > - Added extra_flag parameter in rte_hash_parameter to be able > to add new parameters in the future without breaking the ABI > - Remove proposed lookup_bulk_with_hash function, as it is > not of much use with the existing hash functions > (there are no vector hash functions). > - User can store 8-byte integer or pointer as data, instead > of variable size data, as discussed in the mailing list. > > Changes in v3: > > - Now user can store variable size data, instead of 32 or 64-bit size data, > using the new parameter "data_len" in rte_hash_parameters > - Add lookup_bulk_with_hash function in performance unit tests > - Add new functions that handle data in performance unit tests > - Remove duplicates in performance unit tests > - Fix rte_hash_reset, which was not resetting the last entry > > Changes in v2: > > - Fixed issue where table could not store maximum number of entries > - Fixed issue where lookup burst could not be more than 32 (instead of 64) > - Remove unnecessary macros and add other useful ones > - Added missing library dependencies > - Used directly rte_hash_secondary instead of rte_hash_alt > - Renamed rte_hash.c to rte_cuckoo_hash.c to ease the view of the new library > - Renamed test_hash_perf.c temporarily to ease the view of the improved unit test > - Moved rte_hash, rte_bucket and rte_hash_key structures to rte_cuckoo_hash.c to > make them private > - Corrected copyright dates > - Added an optimized function to compare keys that are multiple of 16 bytes > - Improved the way to use primary/secondary signatures. Now both are stored in > the bucket, so there is no need to calculate them if required. > Also, there is no need to use the MSB of a signature to differenciate between > an empty entry and signature 0, since we are storing both signatures, > which cannot be both 0. > - Removed rte_hash_rehash, as it was a very expensive operation. > Therefore, the add function returns now -ENOSPC if key cannot be added > because of a loop. > - Prefetched new slot for new key in add function to improve performance. > - Made doxygen comments more clear. > - Removed unnecessary rte_hash_del_key_data and rte_hash_del_key_with_data, > as we can use the lookup functions if we want to get the data before deleting. > - Removed some unnecessary includes in rte_hash.h > - Removed some unnecessary variables in rte_cuckoo_hash.c > - Removed some unnecessary checks before creating a new hash table > - Added documentation (in release notes and programmers guide) > - Added new unit tests and replaced the performance one for hash tables > > Series Acked-by: Bruce Richardson Applied, thanks Some bugs/formatting were fixed in fly. Some remaining comments may be addressed in further patches.