From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1543210A3D; Thu, 25 Apr 2024 02:43:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714012984; cv=none; b=s44ZgP8au/zsmv16OKnzJBKYPDYp9flwuo4C/mAI+/mGIDHgiRfNhJ8N8CJN/LoPTivJc7RxuV3PK85YA9YWZoTMI0sM5Gt8YcWxF49U1UF9nVeL8UJGsFCkArvj5M5QPjiqK0IhQ4J9v5Kqms5p80e12vCYfXdGbrROeKzlX8o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714012984; c=relaxed/simple; bh=yyzFqzoobxjUHohdb0ArBTlNL0goc/vEwXBKTWCqHmE=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=hktn/zztd3+XfsMPcnnHJPc0jVz1vPrJ2Dj7Y7f1c863/AWO6hTVObLRtRep9asI88o6JypGu+0KyecLlzW4ELkSMazw7SPVzFpPZJSF024x9wkUtu09A10SRwVPljbkzIMZGj8/AcVQ/ahRKXtWlLCw4xrYbEfzFwzZkswfUZ4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4VQ0Ww0xlgz4f3kkF; Thu, 25 Apr 2024 10:42:52 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id F30421A0179; Thu, 25 Apr 2024 10:42:56 +0800 (CST) Received: from [10.67.111.192] (unknown [10.67.111.192]) by APP3 (Coremail) with SMTP id _Ch0CgAXFY0twylmzAjnKg--.19671S2; Thu, 25 Apr 2024 10:42:54 +0800 (CST) Message-ID: <2ef84dfa-44a9-4d4c-b3b2-9d0b2a2e0d8e@huaweicloud.com> Date: Thu, 25 Apr 2024 10:42:53 +0800 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH bpf-next v3 07/11] bpf: Fix a false rejection caused by AND operation Content-Language: en-US To: Yonghong Song , Eduard Zingerman , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Song Liu , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Matt Bobrowski , Brendan Jackman , Paul Moore , James Morris , "Serge E . Hallyn" , Khadija Kamran , Casey Schaufler , Ondrej Mosnacek , Kees Cook , John Johansen , Lukas Bulwahn , Roberto Sassu , Shung-Hsi Yu References: <20240411122752.2873562-1-xukuohai@huaweicloud.com> <20240411122752.2873562-8-xukuohai@huaweicloud.com> <576c7c44-d1b4-42c8-8b6e-2e6b93d7547a@huaweicloud.com> <3ed8b579-8342-4d74-9050-b0bf6afe5ab3@linux.dev> From: Xu Kuohai In-Reply-To: <3ed8b579-8342-4d74-9050-b0bf6afe5ab3@linux.dev> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:_Ch0CgAXFY0twylmzAjnKg--.19671S2 X-Coremail-Antispam: 1UD129KBjvAXoWfWr4UArW3JFWDAry5Zr4fZrb_yoW8ZFyxZo Wjgr13Jr1rXF4UGF1UJw1UJr15Jw17JrnrtryUJr13Gr10yw4UX348JryUJ3yUtr18Gr1U Ar1Utry8AFyUJr1rn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUYj7kC6x804xWl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK 8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4 AK67xGY2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF 7I0E14v26r4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7 CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8C rVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4 IE7xkEbVWUJVW8JwACjcxG0xvEwIxGrwACI402YVCY1x02628vn2kIc2xKxwCF04k20xvY 0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I 0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAI cVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcV CF04k26cxKx2IYs7xG6rW3Jr0E3s1lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280 aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU1c4S7UUUUU== X-CM-SenderInfo: 50xn30hkdlqx5xdzvxpfor3voofrz/ On 4/25/2024 6:06 AM, Yonghong Song wrote: > > On 4/23/24 7:25 PM, Xu Kuohai wrote: >> On 4/24/2024 5:55 AM, Yonghong Song wrote: >>> >>> On 4/20/24 1:33 AM, Xu Kuohai wrote: >>>> On 4/20/2024 7:00 AM, Eduard Zingerman wrote: >>>>> On Thu, 2024-04-11 at 20:27 +0800, Xu Kuohai wrote: >>>>>> From: Xu Kuohai >>>>>> >>>>>> With lsm return value check, the no-alu32 version test_libbpf_get_fd_by_id_opts >>>>>> is rejected by the verifier, and the log says: >>>>>> >>>>>>    0: R1=ctx() R10=fp0 >>>>>>    ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27 >>>>>>    0: (b7) r0 = 0                        ; R0_w=0 >>>>>>    1: (79) r2 = *(u64 *)(r1 +0) >>>>>>    func 'bpf_lsm_bpf_map' arg0 has btf_id 916 type STRUCT 'bpf_map' >>>>>>    2: R1=ctx() R2_w=trusted_ptr_bpf_map() >>>>>>    ; if (map != (struct bpf_map *)&data_input) @ test_libbpf_get_fd_by_id_opts.c:29 >>>>>>    2: (18) r3 = 0xffff9742c0951a00       ; R3_w=map_ptr(map=data_input,ks=4,vs=4) >>>>>>    4: (5d) if r2 != r3 goto pc+4         ; R2_w=trusted_ptr_bpf_map() R3_w=map_ptr(map=data_input,ks=4,vs=4) >>>>>>    ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27 >>>>>>    5: (79) r0 = *(u64 *)(r1 +8)          ; R0_w=scalar() R1=ctx() >>>>>>    ; if (fmode & FMODE_WRITE) @ test_libbpf_get_fd_by_id_opts.c:32 >>>>>>    6: (67) r0 <<= 62                     ; R0_w=scalar(smax=0x4000000000000000,umax=0xc000000000000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xc000000000000000)) >>>>>>    7: (c7) r0 s>>= 63                    ; R0_w=scalar(smin=smin32=-1,smax=smax32=0) >>>>>>    ;  @ test_libbpf_get_fd_by_id_opts.c:0 >>>>>>    8: (57) r0 &= -13                     ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3)) >>>>>>    ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27 >>>>>>    9: (95) exit >>>>>> >>>>>> And here is the C code of the prog. >>>>>> >>>>>> SEC("lsm/bpf_map") >>>>>> int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) >>>>>> { >>>>>>     if (map != (struct bpf_map *)&data_input) >>>>>>         return 0; >>>>>> >>>>>>     if (fmode & FMODE_WRITE) >>>>>>         return -EACCES; >>>>>> >>>>>>     return 0; >>>>>> } >>>>>> >>>>>> It is clear that the prog can only return either 0 or -EACCESS, and both >>>>>> values are legal. >>>>>> >>>>>> So why is it rejected by the verifier? >>>>>> >>>>>> The verifier log shows that the second if and return value setting >>>>>> statements in the prog is optimized to bitwise operations "r0 s>>= 63" >>>>>> and "r0 &= -13". The verifier correctly deduces that the the value of >>>>>> r0 is in the range [-1, 0] after verifing instruction "r0 s>>= 63". >>>>>> But when the verifier proceeds to verify instruction "r0 &= -13", it >>>>>> fails to deduce the correct value range of r0. >>>>>> >>>>>> 7: (c7) r0 s>>= 63                    ; R0_w=scalar(smin=smin32=-1,smax=smax32=0) >>>>>> 8: (57) r0 &= -13                     ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3)) >>>>>> >>>>>> So why the verifier fails to deduce the result of 'r0 &= -13'? >>>>>> >>>>>> The verifier uses tnum to track values, and the two ranges "[-1, 0]" and >>>>>> "[0, -1ULL]" are encoded to the same tnum. When verifing instruction >>>>>> "r0 &= -13", the verifier erroneously deduces the result from >>>>>> "[0, -1ULL] AND -13", which is out of the expected return range >>>>>> [-4095, 0]. >>>>>> >>>>>> To fix it, this patch simply adds a special SCALAR32 case for the >>>>>> verifier. That is, when the source operand of the AND instruction is >>>>>> a constant and the destination operand changes from negative to >>>>>> non-negative and falls in range [-256, 256], deduce the result range >>>>>> by enumerating all possible AND results. >>>>>> >>>>>> Signed-off-by: Xu Kuohai >>>>>> --- >>>>> >>>>> Hello, >>>>> >>>>> Sorry for the delay, I had to think about this issue a bit. >>>>> I found the clang transformation that generates the pattern this patch >>>>> tries to handle. >>>>> It is located in DAGCombiner::SimplifySelectCC() method (see [1]). >>>>> The transformation happens as a part of DAG to DAG rewrites >>>>> (LLVM uses several internal representations: >>>>>   - generic optimizer uses LLVM IR, most of the work is done >>>>>     using this representation; >>>>>   - before instruction selection IR is converted to Selection DAG, >>>>>     some optimizations are applied at this stage, >>>>>     all such optimizations are a set of pattern replacements; >>>>>   - Selection DAG is converted to machine code, some optimizations >>>>>     are applied at the machine code level). >>>>> >>>>> Full pattern is described as follows: >>>>> >>>>>    // fold (select_cc seteq (and x, y), 0, 0, A) -> (and (sra (shl x)) A) >>>>>    // where y is has a single bit set. >>>>>    // A plaintext description would be, we can turn the SELECT_CC into an AND >>>>>    // when the condition can be materialized as an all-ones register.  Any >>>>>    // single bit-test can be materialized as an all-ones register with >>>>>    // shift-left and shift-right-arith. >>>>> >>>>> For this particular test case the DAG is converted as follows: >>>>> >>>>>                      .---------------- lhs         The meaning of this select_cc is: >>>>>                      |        .------- rhs         `lhs == rhs ? true value : false value` >>>>>                      |        | .----- true value >>>>>                      |        | |  .-- false value >>>>>                      v        v v  v >>>>>    (select_cc seteq (and X 2) 0 0 -13) >>>>>                            ^ >>>>> ->                        '---------------. >>>>>    (and (sra (sll X 62) 63)                | >>>>>         -13)                               | >>>>>                                            | >>>>> Before pattern is applied, it checks that second 'and' operand has >>>>> only one bit set, (which is true for '2'). >>>>> >>>>> The pattern itself generates logical shift left / arithmetic shift >>>>> right pair, that ensures that result is either all ones (-1) or all >>>>> zeros (0). Hence, applying 'and' to shifts result and false value >>>>> generates a correct result. >>>>> >>>> >>>> Thanks for your detailed and invaluable explanation! >>> >>> Thanks Eduard for detailed explanation. It looks like we could >>> resolve this issue without adding too much complexity to verifier. >>> Also, this code pattern above seems generic enough to be worthwhile >>> with verifier change. >>> >>> Kuohai, please added detailed explanation (as described by Eduard) >>> in the commit message. >>> >> >> Sure, already added, the commit message and the change now is like this: >> >> --- >> >>     bpf: Fix a false rejection caused by AND operation >> >>     With lsm return value check, the no-alu32 version test_libbpf_get_fd_by_id_opts >>     is rejected by the verifier, and the log says: >> >>     0: R1=ctx() R10=fp0 >>     ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27 >>     0: (b7) r0 = 0                        ; R0_w=0 >>     1: (79) r2 = *(u64 *)(r1 +0) >>     func 'bpf_lsm_bpf_map' arg0 has btf_id 916 type STRUCT 'bpf_map' >>     2: R1=ctx() R2_w=trusted_ptr_bpf_map() >>     ; if (map != (struct bpf_map *)&data_input) @ test_libbpf_get_fd_by_id_opts.c:29 >>     2: (18) r3 = 0xffff9742c0951a00       ; R3_w=map_ptr(map=data_input,ks=4,vs=4) >>     4: (5d) if r2 != r3 goto pc+4         ; R2_w=trusted_ptr_bpf_map() R3_w=map_ptr(map=data_input,ks=4,vs=4) >>     ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27 >>     5: (79) r0 = *(u64 *)(r1 +8)          ; R0_w=scalar() R1=ctx() >>     ; if (fmode & FMODE_WRITE) @ test_libbpf_get_fd_by_id_opts.c:32 >>     6: (67) r0 <<= 62                     ; R0_w=scalar(smax=0x4000000000000000,umax=0xc000000000000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xc000000000000000)) >>     7: (c7) r0 s>>= 63                    ; R0_w=scalar(smin=smin32=-1,smax=smax32=0) >>     ;  @ test_libbpf_get_fd_by_id_opts.c:0 >>     8: (57) r0 &= -13                     ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3)) >>     ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27 >>     9: (95) exit >> >>     And here is the C code of the prog. >> >>     SEC("lsm/bpf_map") >>     int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) >>     { >>         if (map != (struct bpf_map *)&data_input) >>                 return 0; >> >>         if (fmode & FMODE_WRITE) >>                 return -EACCES; >> >>         return 0; >>     } >> >>     It is clear that the prog can only return either 0 or -EACCESS, and both >>     values are legal. >> >>     So why is it rejected by the verifier? >> >>     The verifier log shows that the second if and return value setting >>     statements in the prog is optimized to bitwise operations "r0 s>>= 63" >>     and "r0 &= -13". The verifier correctly deduces that the the value of >>     r0 is in the range [-1, 0] after verifing instruction "r0 s>>= 63". >>     But when the verifier proceeds to verify instruction "r0 &= -13", it >>     fails to deduce the correct value range of r0. >> >>     7: (c7) r0 s>>= 63                    ; R0_w=scalar(smin=smin32=-1,smax=smax32=0) >>     8: (57) r0 &= -13                     ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3)) >> >>     So why the verifier fails to deduce the result of 'r0 &= -13'? >> >>     The verifier uses tnum to track values, and the two ranges "[-1, 0]" and >>     "[0, -1ULL]" are encoded to the same tnum. When verifing instruction >>     "r0 &= -13", the verifier erroneously deduces the result from >>     "[0, -1ULL] AND -13", which is out of the expected return range >>     [-4095, 0]. >> >>     As explained by Eduard in [0], the clang transformation that generates this >>     pattern is located in DAGCombiner::SimplifySelectCC() method (see [1]). >> >>     The transformation happens as a part of DAG to DAG rewrites >>     (LLVM uses several internal representations: >>      - generic optimizer uses LLVM IR, most of the work is done >>        using this representation; >>      - before instruction selection IR is converted to Selection DAG, >>        some optimizations are applied at this stage, >>        all such optimizations are a set of pattern replacements; >>      - Selection DAG is converted to machine code, some optimizations >>        are applied at the machine code level). >> >>     Full pattern is described as follows: >> >>       // fold (select_cc seteq (and x, y), 0, 0, A) -> (and (sra (shl x)) A) >>       // where y is has a single bit set. >>       // A plaintext description would be, we can turn the SELECT_CC into an AND >>       // when the condition can be materialized as an all-ones register.  Any >>       // single bit-test can be materialized as an all-ones register with >>       // shift-left and shift-right-arith. >> >>     For this particular test case the DAG is converted as follows: >> >>                         .---------------- lhs         The meaning of this select_cc is: >>                         |        .------- rhs         `lhs == rhs ? true value : false value` >>                         |        | .----- true value >>                         |        | |  .-- false value >>                         v        v v  v >>       (select_cc seteq (and X 2) 0 0 -13) >>                               ^ >>     ->                        '---------------. >>       (and (sra (sll X 62) 63)                | >>            -13)                               | >>                                               | >>     Before pattern is applied, it checks that second 'and' operand has >>     only one bit set, (which is true for '2'). >> >>     The pattern itself generates logical shift left / arithmetic shift >>     right pair, that ensures that result is either all ones (-1) or all >>     zeros (0). Hence, applying 'and' to shifts result and false value >>     generates a correct result. >> >>     As suggested by Eduard, this patch makes a special case for source >>     or destination register of '&=' operation being in range [-1, 0]. >> >>     Meaning that one of the '&=' operands is either: >>     - all ones, in which case the counterpart is the result of the operation; >>     - all zeros, in which case zero is the result of the operation. >> >>     And MIN and MAX values could be derived based on above two observations. >> >>     [0] https://lore.kernel.org/bpf/e62e2971301ca7f2e9eb74fc500c520285cad8f5.camel@gmail.com/ >>     [1] https://github.com/llvm/llvm-project/blob/4523a267829c807f3fc8fab8e5e9613985a51565/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp >> >>     Suggested-by: Eduard Zingerman >>     Signed-off-by: Xu Kuohai >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 640747b53745..30c551d39329 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -13374,6 +13374,24 @@ static void scalar32_min_max_and(struct bpf_reg_state *dst_reg, >>         dst_reg->u32_min_value = var32_off.value; >>         dst_reg->u32_max_value = min(dst_reg->u32_max_value, umax_val); >> >> +       /* Special case: src_reg is known and dst_reg is in range [-1, 0] */ >> +       if (src_known && >> +               dst_reg->s32_min_value == -1 && dst_reg->s32_max_value == 0 && >> +               dst_reg->smin_value == -1 && dst_reg->smax_value == 0) { > > do we need to check dst_reg->smin_value/smax_value here? They should not impact > final dst_reg->s32_{min,max}_value computation, right? right, the check was simply copied from the old code, which only handled the case where 64-bit range is the same as the 32-bit range > Similarly, for later 64bit min/max and, 32bit value does not really matter. > hmm, the 32-bit check is completely unnecessary. >> + dst_reg->s32_min_value = min_t(s32, src_reg->s32_min_value, 0); >> +               dst_reg->s32_max_value = max_t(s32, src_reg->s32_min_value, 0); >> +               return; >> +       } >> + >> +       /* Special case: dst_reg is known and src_reg is in range [-1, 0] */ >> +       if (dst_known && >> +               src_reg->s32_min_value == -1 && src_reg->s32_max_value == 0 && >> +               src_reg->smin_value == -1 && src_reg->smax_value == 0) { >> +               dst_reg->s32_min_value = min_t(s32, dst_reg->s32_min_value, 0); >> +               dst_reg->s32_max_value = max_t(s32, dst_reg->s32_min_value, 0); >> +               return; >> +       } >> + >>         /* Safe to set s32 bounds by casting u32 result into s32 when u32 >>          * doesn't cross sign boundary. Otherwise set s32 bounds to unbounded. >>          */ >> @@ -13404,6 +13422,24 @@ static void scalar_min_max_and(struct bpf_reg_state *dst_reg, >>         dst_reg->umin_value = dst_reg->var_off.value; >>         dst_reg->umax_value = min(dst_reg->umax_value, umax_val); >> >> +       /* Special case: src_reg is known and dst_reg is in range [-1, 0] */ >> +       if (src_known && >> +               dst_reg->smin_value == -1 && dst_reg->smax_value == 0 && >> +               dst_reg->s32_min_value == -1 && dst_reg->s32_max_value == 0) { >> +               dst_reg->smin_value = min_t(s64, src_reg->smin_value, 0); >> +               dst_reg->smax_value = max_t(s64, src_reg->smin_value, 0); >> +               return; >> +       } >> + >> +       /* Special case: dst_reg is known and src_reg is in range [-1, 0] */ >> +       if (dst_known && >> +               src_reg->smin_value == -1 && src_reg->smax_value == 0 && >> +               src_reg->s32_min_value == -1 && src_reg->s32_max_value == 0) { >> +               dst_reg->smin_value = min_t(s64, dst_reg->smin_value, 0); >> +               dst_reg->smax_value = max_t(s64, dst_reg->smin_value, 0); >> +               return; >> +       } >> + >>         /* Safe to set s64 bounds by casting u64 result into s64 when u64 >>          * doesn't cross sign boundary. Otherwise set s64 bounds to unbounded. >>          */ >> >>>> >>>>> In my opinion the approach taken by this patch is sub-optimal: >>>>> - 512 iterations is too much; >>>>> - this does not cover all code that could be generated by the above >>>>>    mentioned LLVM transformation >>>>>    (e.g. second 'and' operand could be 1 << 16). >>>>> >>>>> Instead, I suggest to make a special case for source or dst register >>>>> of '&=' operation being in range [-1,0]. >>>>> Meaning that one of the '&=' operands is either: >>>>> - all ones, in which case the counterpart is the result of the operation; >>>>> - all zeros, in which case zero is the result of the operation; >>>>> - derive MIN and MAX values based on above two observations. >>>>> >>>> >>>> Totally agree, I'll cook a new patch as you suggested. >>>> >>>>> [1] https://github.com/llvm/llvm-project/blob/4523a267829c807f3fc8fab8e5e9613985a51565/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L5391 >>>>> >>>>> Best regards, >>>>> Eduard >>>> >>>> >>> >>