From mboxrd@z Thu Jan  1 00:00:00 1970
From: Thomas Monjalon <thomas.monjalon@6wind.com>
Subject: Re: [PATCH v7 0/7] Cuckoo hash - part 3 of Cuckoo hash
Date: Mon, 13 Jul 2015 00:46:56 +0200
Message-ID: <2632377.THyBr7JfX0@xps13>
References: <1436571020-16252-1-git-send-email-pablo.de.lara.guarch@intel.com>
 <1436573936-15956-1-git-send-email-pablo.de.lara.guarch@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7Bit
Cc: dev@dpdk.org
To: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-wi0-f173.google.com (mail-wi0-f173.google.com
 [209.85.212.173]) by dpdk.org (Postfix) with ESMTP id ABFD45A86
 for <dev@dpdk.org>; Mon, 13 Jul 2015 00:48:08 +0200 (CEST)
Received: by wicmz13 with SMTP id mz13so47759535wic.0
 for <dev@dpdk.org>; Sun, 12 Jul 2015 15:48:08 -0700 (PDT)
In-Reply-To: <1436573936-15956-1-git-send-email-pablo.de.lara.guarch@intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

2015-07-11 01:18, Pablo de Lara:
> This patchset is to replace the existing hash library with
> a more efficient and functional approach, using the Cuckoo hash
> method to deal with collisions. This method is based on using
> two different hash functions to have two possible locations
> in the hash table where an entry can be.
> So, if a bucket is full, a new entry can push one of the items
> in that bucket to its alternative location, making space for itself.
> 
> Advantages
> ~~
> - Offers the option to store more entries when the target bucket is full
>   (unlike the previous implementation)
> - Memory efficient: for storing those entries, it is not necessary to
>   request new memory, as the entries will be stored in the same table
> - Constant worst lookup time: in worst case scenario, it always takes
>   the same time to look up an entry, as there are only two possible locations
>   where an entry can be.
> - Storing data: user can store data in the hash table, unlike the
>   previous implementation, but he can still use the old API
> 
> This implementation typically offers over 90% utilization.
> Notice that API has been extended, but old API remains.
> Check documentation included to know more about this new implementation
> (including how entries are distributed as table utilization increases).
> 
> Changes in v7:
> - Fix inaccurate documentation
> 
> Changes in v6:
> - Replace datatype for functions from uintptr_t to void *
> 
> Changes in v5:
> - Fix 32-bit compilation issues
> 
> Changes in v4:
> - Unit tests enhancements are not part of this patchset anymore.
> - rte_hash structure has been made internal in another patch,
>   so it is not part of this patchset anymore.
> - Add function to iterate through the hash table, as rte_hash
>   structure has been made private.
> - Added extra_flag parameter in rte_hash_parameter to be able
>   to add new parameters in the future without breaking the ABI
> - Remove proposed lookup_bulk_with_hash function, as it is
>   not of much use with the existing hash functions
>   (there are no vector hash functions).
> - User can store 8-byte integer or pointer as data, instead
>   of variable size data, as discussed in the mailing list.
> 
> Changes in v3:
> 
> - Now user can store variable size data, instead of 32 or 64-bit size data,
>   using the new parameter "data_len" in rte_hash_parameters
> - Add lookup_bulk_with_hash function in performance  unit tests
> - Add new functions that handle data in performance unit tests
> - Remove duplicates in performance unit tests
> - Fix rte_hash_reset, which was not resetting the last entry
> 
> Changes in v2:
> 
> - Fixed issue where table could not store maximum number of entries
> - Fixed issue where lookup burst could not be more than 32 (instead of 64)
> - Remove unnecessary macros and add other useful ones
> - Added missing library dependencies
> - Used directly rte_hash_secondary instead of rte_hash_alt
> - Renamed rte_hash.c to rte_cuckoo_hash.c to ease the view of the new library
> - Renamed test_hash_perf.c temporarily to ease the view of the improved unit test
> - Moved rte_hash, rte_bucket and rte_hash_key structures to rte_cuckoo_hash.c to
>   make them private
> - Corrected copyright dates
> - Added an optimized function to compare keys that are multiple of 16 bytes
> - Improved the way to use primary/secondary signatures. Now both are stored in
>   the bucket, so there is no need to calculate them if required.
>   Also, there is no need to use the MSB of a signature to differenciate between
>   an empty entry and signature 0, since we are storing both signatures,
>   which cannot be both 0.
> - Removed rte_hash_rehash, as it was a very expensive operation.
>   Therefore, the add function returns now -ENOSPC if key cannot be added
>   because of a loop.
> - Prefetched new slot for new key in add function to improve performance.
> - Made doxygen comments more clear.
> - Removed unnecessary rte_hash_del_key_data and rte_hash_del_key_with_data,
>   as we can use the lookup functions if we want to get the data before deleting.
> - Removed some unnecessary includes in rte_hash.h
> - Removed some unnecessary variables in rte_cuckoo_hash.c
> - Removed some unnecessary checks before creating a new hash table
> - Added documentation (in release notes and programmers guide)
> - Added new unit tests and replaced the performance one for hash tables
> 
> Series Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Applied, thanks

Some bugs/formatting were fixed in fly.
Some remaining comments may be addressed in further patches.