* Size growth? @ 2020-10-16 19:36 Tom Rini 2020-10-16 21:46 ` Simon Glass 0 siblings, 1 reply; 28+ messages in thread From: Tom Rini @ 2020-10-16 19:36 UTC (permalink / raw) To: devicetree-compiler-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 6071 bytes --] Hey all, With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc to cbca977ea121. I'm seeing some rather worrying size increases however. For example, on an aarch64 board such as leez-rk3399 we have: leez-rk3399 : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696 u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696) function old new delta do_fdt 3992 4308 +316 fdt_check_header 276 492 +216 fdt_add_property_ 388 556 +168 fdt_ro_probe_ 128 252 +124 fdt_packblocks_ 176 296 +120 fdt_open_into 392 512 +120 fdt_blocks_misordered_ 96 216 +120 fdt_get_mem_rsv 108 220 +112 fdt_offset_ptr 104 208 +104 fdt_get_string 288 388 +100 fdt_splice_ 148 228 +80 fdt_valid 204 276 +72 fdt_getprop_by_offset 184 256 +72 fdt_splice_mem_rsv_ 96 152 +56 fdt_num_mem_rsv 60 116 +56 fdt_splice_struct_ 96 144 +48 fdt_shrink_to_minimum 220 268 +48 fdt_rw_probe_ 108 156 +48 fdt_mem_rsv 60 108 +48 fdt_getprop_namelen 100 148 +48 fdt_get_property_by_offset_ 100 148 +48 fdt_get_name 164 212 +48 efi_install_fdt 964 1012 +48 boot_get_fdt 888 936 +48 fdt_move 80 116 +36 reserve_fdt 72 96 +24 reloc_fdt 76 100 +24 image_setup_libfdt 284 308 +24 fit_image_load 1608 1632 +24 fit_image_get_data_and_size 172 196 +24 fit_get_end 16 40 +24 fdt_next_tag 256 280 +24 fdt_header_size 12 36 +24 fdt_get_property_namelen_ 212 236 +24 fdt_get_property_namelen 44 68 +24 fdt_get_phandle 120 144 +24 fdt_del_node 96 120 +24 fdt_del_mem_rsv 112 136 +24 fdt_add_subnode_namelen 284 308 +24 fdt_add_mem_rsv 124 148 +24 common_diskboot 680 704 +24 fdt_next_subnode 80 88 +8 spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116) function old new delta fdt_get_mem_rsv 52 236 +184 fdt_add_property_ 340 484 +144 fdt_get_name 68 152 +84 fdt_splice_ 148 228 +80 fdt_num_mem_rsv 64 128 +64 fdt_splice_mem_rsv_ 96 152 +56 fdt_offset_ptr 52 104 +52 fdt_splice_struct_ 96 144 +48 fdt_shrink_to_minimum 220 268 +48 fdt_get_property_by_offset_ 36 84 +48 fdt_ro_probe_ - 36 +36 fdt_subnode_offset_namelen 200 232 +32 spl_load_simple_fit 848 872 +24 spl_fit_append_fdt 196 220 +24 fdt_getprop_by_offset 80 104 +24 fdt_get_string 64 88 +24 fdt_get_property_namelen_ 200 224 +24 fdt_get_phandle 120 144 +24 fdt_del_mem_rsv 104 128 +24 fdt_check_header 28 52 +24 fdt_add_subnode_namelen 260 284 +24 fdt_add_mem_rsv 112 136 +24 fdt_next_tag 192 212 +20 fdt_path_offset_namelen 240 256 +16 fdt_supernode_atdepth_offset 160 172 +12 fdt_node_offset_by_phandle 120 132 +12 fdt_mem_rsv 20 - -20 fdt_check_prop_offset_ 64 44 -20 fdt_check_node_offset_ 64 44 -20 And note that for the SPL case we're already setting ASSUME_MASK to 0xff so there's maximum savings already being done there. Does anyone have ideas on where / how to further tweak code size? Thanks! -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Size growth? 2020-10-16 19:36 Size growth? Tom Rini @ 2020-10-16 21:46 ` Simon Glass [not found] ` <CAPnjgZ3jPciWmoVpuoYb9KC2h3eCevZsq+1BzefCOCAFCDoseQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: Simon Glass @ 2020-10-16 21:46 UTC (permalink / raw) To: Tom Rini, David Gibson; +Cc: Devicetree Compiler Hi Tom, On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > Hey all, > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc > to cbca977ea121. I'm seeing some rather worrying size increases > however. For example, on an aarch64 board such as leez-rk3399 we have: > leez-rk3399 : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696 > u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696) > function old new delta > do_fdt 3992 4308 +316 > fdt_check_header 276 492 +216 > fdt_add_property_ 388 556 +168 > fdt_ro_probe_ 128 252 +124 > fdt_packblocks_ 176 296 +120 > fdt_open_into 392 512 +120 > fdt_blocks_misordered_ 96 216 +120 > fdt_get_mem_rsv 108 220 +112 > fdt_offset_ptr 104 208 +104 > fdt_get_string 288 388 +100 > fdt_splice_ 148 228 +80 > fdt_valid 204 276 +72 > fdt_getprop_by_offset 184 256 +72 > fdt_splice_mem_rsv_ 96 152 +56 > fdt_num_mem_rsv 60 116 +56 > fdt_splice_struct_ 96 144 +48 > fdt_shrink_to_minimum 220 268 +48 > fdt_rw_probe_ 108 156 +48 > fdt_mem_rsv 60 108 +48 > fdt_getprop_namelen 100 148 +48 > fdt_get_property_by_offset_ 100 148 +48 > fdt_get_name 164 212 +48 > efi_install_fdt 964 1012 +48 > boot_get_fdt 888 936 +48 > fdt_move 80 116 +36 > reserve_fdt 72 96 +24 > reloc_fdt 76 100 +24 > image_setup_libfdt 284 308 +24 > fit_image_load 1608 1632 +24 > fit_image_get_data_and_size 172 196 +24 > fit_get_end 16 40 +24 > fdt_next_tag 256 280 +24 > fdt_header_size 12 36 +24 > fdt_get_property_namelen_ 212 236 +24 > fdt_get_property_namelen 44 68 +24 > fdt_get_phandle 120 144 +24 > fdt_del_node 96 120 +24 > fdt_del_mem_rsv 112 136 +24 > fdt_add_subnode_namelen 284 308 +24 > fdt_add_mem_rsv 124 148 +24 > common_diskboot 680 704 +24 > fdt_next_subnode 80 88 +8 > spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116) > function old new delta > fdt_get_mem_rsv 52 236 +184 > fdt_add_property_ 340 484 +144 > fdt_get_name 68 152 +84 > fdt_splice_ 148 228 +80 > fdt_num_mem_rsv 64 128 +64 > fdt_splice_mem_rsv_ 96 152 +56 > fdt_offset_ptr 52 104 +52 > fdt_splice_struct_ 96 144 +48 > fdt_shrink_to_minimum 220 268 +48 > fdt_get_property_by_offset_ 36 84 +48 > fdt_ro_probe_ - 36 +36 > fdt_subnode_offset_namelen 200 232 +32 > spl_load_simple_fit 848 872 +24 > spl_fit_append_fdt 196 220 +24 > fdt_getprop_by_offset 80 104 +24 > fdt_get_string 64 88 +24 > fdt_get_property_namelen_ 200 224 +24 > fdt_get_phandle 120 144 +24 > fdt_del_mem_rsv 104 128 +24 > fdt_check_header 28 52 +24 > fdt_add_subnode_namelen 260 284 +24 > fdt_add_mem_rsv 112 136 +24 > fdt_next_tag 192 212 +20 > fdt_path_offset_namelen 240 256 +16 > fdt_supernode_atdepth_offset 160 172 +12 > fdt_node_offset_by_phandle 120 132 +12 > fdt_mem_rsv 20 - -20 > fdt_check_prop_offset_ 64 44 -20 > fdt_check_node_offset_ 64 44 -20 > > And note that for the SPL case we're already setting ASSUME_MASK to > 0xff so there's maximum savings already being done there. Does anyone > have ideas on where / how to further tweak code size? Thanks! +David Gibson I suspect there are more checks that need to be made conditional. Regards, Simon > -- > Tom ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <CAPnjgZ3jPciWmoVpuoYb9KC2h3eCevZsq+1BzefCOCAFCDoseQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Size growth? [not found] ` <CAPnjgZ3jPciWmoVpuoYb9KC2h3eCevZsq+1BzefCOCAFCDoseQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2020-10-19 1:42 ` David Gibson [not found] ` <20201019014213.GA11625-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: David Gibson @ 2020-10-19 1:42 UTC (permalink / raw) To: Simon Glass; +Cc: Tom Rini, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 7052 bytes --] On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote: > Hi Tom, > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > Hey all, > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc > > to cbca977ea121. I'm seeing some rather worrying size increases > > however. For example, on an aarch64 board such as leez-rk3399 we have: > > leez-rk3399 : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696 > > u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696) > > function old new delta > > do_fdt 3992 4308 +316 > > fdt_check_header 276 492 +216 > > fdt_add_property_ 388 556 +168 > > fdt_ro_probe_ 128 252 +124 > > fdt_packblocks_ 176 296 +120 > > fdt_open_into 392 512 +120 > > fdt_blocks_misordered_ 96 216 +120 > > fdt_get_mem_rsv 108 220 +112 > > fdt_offset_ptr 104 208 +104 > > fdt_get_string 288 388 +100 > > fdt_splice_ 148 228 +80 > > fdt_valid 204 276 +72 > > fdt_getprop_by_offset 184 256 +72 > > fdt_splice_mem_rsv_ 96 152 +56 > > fdt_num_mem_rsv 60 116 +56 > > fdt_splice_struct_ 96 144 +48 > > fdt_shrink_to_minimum 220 268 +48 > > fdt_rw_probe_ 108 156 +48 > > fdt_mem_rsv 60 108 +48 > > fdt_getprop_namelen 100 148 +48 > > fdt_get_property_by_offset_ 100 148 +48 > > fdt_get_name 164 212 +48 > > efi_install_fdt 964 1012 +48 > > boot_get_fdt 888 936 +48 > > fdt_move 80 116 +36 > > reserve_fdt 72 96 +24 > > reloc_fdt 76 100 +24 > > image_setup_libfdt 284 308 +24 > > fit_image_load 1608 1632 +24 > > fit_image_get_data_and_size 172 196 +24 > > fit_get_end 16 40 +24 > > fdt_next_tag 256 280 +24 > > fdt_header_size 12 36 +24 > > fdt_get_property_namelen_ 212 236 +24 > > fdt_get_property_namelen 44 68 +24 > > fdt_get_phandle 120 144 +24 > > fdt_del_node 96 120 +24 > > fdt_del_mem_rsv 112 136 +24 > > fdt_add_subnode_namelen 284 308 +24 > > fdt_add_mem_rsv 124 148 +24 > > common_diskboot 680 704 +24 > > fdt_next_subnode 80 88 +8 > > spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116) > > function old new delta > > fdt_get_mem_rsv 52 236 +184 > > fdt_add_property_ 340 484 +144 > > fdt_get_name 68 152 +84 > > fdt_splice_ 148 228 +80 > > fdt_num_mem_rsv 64 128 +64 > > fdt_splice_mem_rsv_ 96 152 +56 > > fdt_offset_ptr 52 104 +52 > > fdt_splice_struct_ 96 144 +48 > > fdt_shrink_to_minimum 220 268 +48 > > fdt_get_property_by_offset_ 36 84 +48 > > fdt_ro_probe_ - 36 +36 > > fdt_subnode_offset_namelen 200 232 +32 > > spl_load_simple_fit 848 872 +24 > > spl_fit_append_fdt 196 220 +24 > > fdt_getprop_by_offset 80 104 +24 > > fdt_get_string 64 88 +24 > > fdt_get_property_namelen_ 200 224 +24 > > fdt_get_phandle 120 144 +24 > > fdt_del_mem_rsv 104 128 +24 > > fdt_check_header 28 52 +24 > > fdt_add_subnode_namelen 260 284 +24 > > fdt_add_mem_rsv 112 136 +24 > > fdt_next_tag 192 212 +20 > > fdt_path_offset_namelen 240 256 +16 > > fdt_supernode_atdepth_offset 160 172 +12 > > fdt_node_offset_by_phandle 120 132 +12 > > fdt_mem_rsv 20 - -20 > > fdt_check_prop_offset_ 64 44 -20 > > fdt_check_node_offset_ 64 44 -20 > > > > And note that for the SPL case we're already setting ASSUME_MASK to > > 0xff so there's maximum savings already being done there. Does anyone > > have ideas on where / how to further tweak code size? Thanks! > > +David Gibson > > I suspect there are more checks that need to be made conditional. Seems likely. Though, as I've opined before, from what I understand the SPL is *so* restricted an environment, I'm not really convinced DTs are the right tool for the job there. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20201019014213.GA11625-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>]
* Re: Size growth? [not found] ` <20201019014213.GA11625-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> @ 2020-10-19 12:37 ` Tom Rini 2020-10-20 2:09 ` David Gibson 0 siblings, 1 reply; 28+ messages in thread From: Tom Rini @ 2020-10-19 12:37 UTC (permalink / raw) To: David Gibson; +Cc: Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 7613 bytes --] On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote: > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote: > > Hi Tom, > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > > Hey all, > > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc > > > to cbca977ea121. I'm seeing some rather worrying size increases > > > however. For example, on an aarch64 board such as leez-rk3399 we have: > > > leez-rk3399 : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696 > > > u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696) > > > function old new delta > > > do_fdt 3992 4308 +316 > > > fdt_check_header 276 492 +216 > > > fdt_add_property_ 388 556 +168 > > > fdt_ro_probe_ 128 252 +124 > > > fdt_packblocks_ 176 296 +120 > > > fdt_open_into 392 512 +120 > > > fdt_blocks_misordered_ 96 216 +120 > > > fdt_get_mem_rsv 108 220 +112 > > > fdt_offset_ptr 104 208 +104 > > > fdt_get_string 288 388 +100 > > > fdt_splice_ 148 228 +80 > > > fdt_valid 204 276 +72 > > > fdt_getprop_by_offset 184 256 +72 > > > fdt_splice_mem_rsv_ 96 152 +56 > > > fdt_num_mem_rsv 60 116 +56 > > > fdt_splice_struct_ 96 144 +48 > > > fdt_shrink_to_minimum 220 268 +48 > > > fdt_rw_probe_ 108 156 +48 > > > fdt_mem_rsv 60 108 +48 > > > fdt_getprop_namelen 100 148 +48 > > > fdt_get_property_by_offset_ 100 148 +48 > > > fdt_get_name 164 212 +48 > > > efi_install_fdt 964 1012 +48 > > > boot_get_fdt 888 936 +48 > > > fdt_move 80 116 +36 > > > reserve_fdt 72 96 +24 > > > reloc_fdt 76 100 +24 > > > image_setup_libfdt 284 308 +24 > > > fit_image_load 1608 1632 +24 > > > fit_image_get_data_and_size 172 196 +24 > > > fit_get_end 16 40 +24 > > > fdt_next_tag 256 280 +24 > > > fdt_header_size 12 36 +24 > > > fdt_get_property_namelen_ 212 236 +24 > > > fdt_get_property_namelen 44 68 +24 > > > fdt_get_phandle 120 144 +24 > > > fdt_del_node 96 120 +24 > > > fdt_del_mem_rsv 112 136 +24 > > > fdt_add_subnode_namelen 284 308 +24 > > > fdt_add_mem_rsv 124 148 +24 > > > common_diskboot 680 704 +24 > > > fdt_next_subnode 80 88 +8 > > > spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116) > > > function old new delta > > > fdt_get_mem_rsv 52 236 +184 > > > fdt_add_property_ 340 484 +144 > > > fdt_get_name 68 152 +84 > > > fdt_splice_ 148 228 +80 > > > fdt_num_mem_rsv 64 128 +64 > > > fdt_splice_mem_rsv_ 96 152 +56 > > > fdt_offset_ptr 52 104 +52 > > > fdt_splice_struct_ 96 144 +48 > > > fdt_shrink_to_minimum 220 268 +48 > > > fdt_get_property_by_offset_ 36 84 +48 > > > fdt_ro_probe_ - 36 +36 > > > fdt_subnode_offset_namelen 200 232 +32 > > > spl_load_simple_fit 848 872 +24 > > > spl_fit_append_fdt 196 220 +24 > > > fdt_getprop_by_offset 80 104 +24 > > > fdt_get_string 64 88 +24 > > > fdt_get_property_namelen_ 200 224 +24 > > > fdt_get_phandle 120 144 +24 > > > fdt_del_mem_rsv 104 128 +24 > > > fdt_check_header 28 52 +24 > > > fdt_add_subnode_namelen 260 284 +24 > > > fdt_add_mem_rsv 112 136 +24 > > > fdt_next_tag 192 212 +20 > > > fdt_path_offset_namelen 240 256 +16 > > > fdt_supernode_atdepth_offset 160 172 +12 > > > fdt_node_offset_by_phandle 120 132 +12 > > > fdt_mem_rsv 20 - -20 > > > fdt_check_prop_offset_ 64 44 -20 > > > fdt_check_node_offset_ 64 44 -20 > > > > > > And note that for the SPL case we're already setting ASSUME_MASK to > > > 0xff so there's maximum savings already being done there. Does anyone > > > have ideas on where / how to further tweak code size? Thanks! > > > > +David Gibson > > > > I suspect there are more checks that need to be made conditional. > > Seems likely. OK. Does that mean you're going to take a look? > Though, as I've opined before, from what I understand the SPL is *so* > restricted an environment, I'm not really convinced DTs are the right > tool for the job there. SPL is space constrained, which is why we mask out all of the safety checks. But we still generally have enough memory that this is fine. This specific board has 256KiB for SPL, for example. But I'm not just concerned about 1KiB of growth when everything is masked off, I'm also concerned about 2KiB of growth over a changelog that doesn't read like it added a bunch of stuff that should cause everything to grow, either. -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Size growth? 2020-10-19 12:37 ` Tom Rini @ 2020-10-20 2:09 ` David Gibson [not found] ` <20201020020907.GA64103-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: David Gibson @ 2020-10-20 2:09 UTC (permalink / raw) To: Tom Rini; +Cc: Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 8300 bytes --] On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote: > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote: > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote: > > > Hi Tom, > > > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > > > > Hey all, > > > > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc > > > > to cbca977ea121. I'm seeing some rather worrying size increases > > > > however. For example, on an aarch64 board such as leez-rk3399 we have: > > > > leez-rk3399 : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696 > > > > u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696) > > > > function old new delta > > > > do_fdt 3992 4308 +316 > > > > fdt_check_header 276 492 +216 > > > > fdt_add_property_ 388 556 +168 > > > > fdt_ro_probe_ 128 252 +124 > > > > fdt_packblocks_ 176 296 +120 > > > > fdt_open_into 392 512 +120 > > > > fdt_blocks_misordered_ 96 216 +120 > > > > fdt_get_mem_rsv 108 220 +112 > > > > fdt_offset_ptr 104 208 +104 > > > > fdt_get_string 288 388 +100 > > > > fdt_splice_ 148 228 +80 > > > > fdt_valid 204 276 +72 > > > > fdt_getprop_by_offset 184 256 +72 > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > fdt_num_mem_rsv 60 116 +56 > > > > fdt_splice_struct_ 96 144 +48 > > > > fdt_shrink_to_minimum 220 268 +48 > > > > fdt_rw_probe_ 108 156 +48 > > > > fdt_mem_rsv 60 108 +48 > > > > fdt_getprop_namelen 100 148 +48 > > > > fdt_get_property_by_offset_ 100 148 +48 > > > > fdt_get_name 164 212 +48 > > > > efi_install_fdt 964 1012 +48 > > > > boot_get_fdt 888 936 +48 > > > > fdt_move 80 116 +36 > > > > reserve_fdt 72 96 +24 > > > > reloc_fdt 76 100 +24 > > > > image_setup_libfdt 284 308 +24 > > > > fit_image_load 1608 1632 +24 > > > > fit_image_get_data_and_size 172 196 +24 > > > > fit_get_end 16 40 +24 > > > > fdt_next_tag 256 280 +24 > > > > fdt_header_size 12 36 +24 > > > > fdt_get_property_namelen_ 212 236 +24 > > > > fdt_get_property_namelen 44 68 +24 > > > > fdt_get_phandle 120 144 +24 > > > > fdt_del_node 96 120 +24 > > > > fdt_del_mem_rsv 112 136 +24 > > > > fdt_add_subnode_namelen 284 308 +24 > > > > fdt_add_mem_rsv 124 148 +24 > > > > common_diskboot 680 704 +24 > > > > fdt_next_subnode 80 88 +8 > > > > spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116) > > > > function old new delta > > > > fdt_get_mem_rsv 52 236 +184 > > > > fdt_add_property_ 340 484 +144 > > > > fdt_get_name 68 152 +84 > > > > fdt_splice_ 148 228 +80 > > > > fdt_num_mem_rsv 64 128 +64 > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > fdt_offset_ptr 52 104 +52 > > > > fdt_splice_struct_ 96 144 +48 > > > > fdt_shrink_to_minimum 220 268 +48 > > > > fdt_get_property_by_offset_ 36 84 +48 > > > > fdt_ro_probe_ - 36 +36 > > > > fdt_subnode_offset_namelen 200 232 +32 > > > > spl_load_simple_fit 848 872 +24 > > > > spl_fit_append_fdt 196 220 +24 > > > > fdt_getprop_by_offset 80 104 +24 > > > > fdt_get_string 64 88 +24 > > > > fdt_get_property_namelen_ 200 224 +24 > > > > fdt_get_phandle 120 144 +24 > > > > fdt_del_mem_rsv 104 128 +24 > > > > fdt_check_header 28 52 +24 > > > > fdt_add_subnode_namelen 260 284 +24 > > > > fdt_add_mem_rsv 112 136 +24 > > > > fdt_next_tag 192 212 +20 > > > > fdt_path_offset_namelen 240 256 +16 > > > > fdt_supernode_atdepth_offset 160 172 +12 > > > > fdt_node_offset_by_phandle 120 132 +12 > > > > fdt_mem_rsv 20 - -20 > > > > fdt_check_prop_offset_ 64 44 -20 > > > > fdt_check_node_offset_ 64 44 -20 > > > > > > > > And note that for the SPL case we're already setting ASSUME_MASK to > > > > 0xff so there's maximum savings already being done there. Does anyone > > > > have ideas on where / how to further tweak code size? Thanks! > > > > > > +David Gibson > > > > > > I suspect there are more checks that need to be made conditional. > > > > Seems likely. > > OK. Does that mean you're going to take a look? No, sorry, my time for dtc is very limited. > > Though, as I've opined before, from what I understand the SPL is *so* > > restricted an environment, I'm not really convinced DTs are the right > > tool for the job there. > > SPL is space constrained, which is why we mask out all of the safety > checks. But we still generally have enough memory that this is fine. > This specific board has 256KiB for SPL, for example. But I'm not just > concerned about 1KiB of growth when everything is masked off, I'm also > concerned about 2KiB of growth over a changelog that doesn't read like > it added a bunch of stuff that should cause everything to grow, > either. Yeah, that's a good point. It's certainly not obvious to me what would have caused the growth. I think we'll need a revision by revision test to see where it was added. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20201020020907.GA64103-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>]
* Re: Size growth? [not found] ` <20201020020907.GA64103-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> @ 2020-10-21 22:49 ` Tom Rini 2020-10-22 4:00 ` David Gibson 0 siblings, 1 reply; 28+ messages in thread From: Tom Rini @ 2020-10-21 22:49 UTC (permalink / raw) To: David Gibson; +Cc: Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 10806 bytes --] On Tue, Oct 20, 2020 at 01:09:07PM +1100, David Gibson wrote: > On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote: > > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote: > > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote: > > > > Hi Tom, > > > > > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > > > > > > Hey all, > > > > > > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc > > > > > to cbca977ea121. I'm seeing some rather worrying size increases > > > > > however. For example, on an aarch64 board such as leez-rk3399 we have: > > > > > leez-rk3399 : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696 > > > > > u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696) > > > > > function old new delta > > > > > do_fdt 3992 4308 +316 > > > > > fdt_check_header 276 492 +216 > > > > > fdt_add_property_ 388 556 +168 > > > > > fdt_ro_probe_ 128 252 +124 > > > > > fdt_packblocks_ 176 296 +120 > > > > > fdt_open_into 392 512 +120 > > > > > fdt_blocks_misordered_ 96 216 +120 > > > > > fdt_get_mem_rsv 108 220 +112 > > > > > fdt_offset_ptr 104 208 +104 > > > > > fdt_get_string 288 388 +100 > > > > > fdt_splice_ 148 228 +80 > > > > > fdt_valid 204 276 +72 > > > > > fdt_getprop_by_offset 184 256 +72 > > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > > fdt_num_mem_rsv 60 116 +56 > > > > > fdt_splice_struct_ 96 144 +48 > > > > > fdt_shrink_to_minimum 220 268 +48 > > > > > fdt_rw_probe_ 108 156 +48 > > > > > fdt_mem_rsv 60 108 +48 > > > > > fdt_getprop_namelen 100 148 +48 > > > > > fdt_get_property_by_offset_ 100 148 +48 > > > > > fdt_get_name 164 212 +48 > > > > > efi_install_fdt 964 1012 +48 > > > > > boot_get_fdt 888 936 +48 > > > > > fdt_move 80 116 +36 > > > > > reserve_fdt 72 96 +24 > > > > > reloc_fdt 76 100 +24 > > > > > image_setup_libfdt 284 308 +24 > > > > > fit_image_load 1608 1632 +24 > > > > > fit_image_get_data_and_size 172 196 +24 > > > > > fit_get_end 16 40 +24 > > > > > fdt_next_tag 256 280 +24 > > > > > fdt_header_size 12 36 +24 > > > > > fdt_get_property_namelen_ 212 236 +24 > > > > > fdt_get_property_namelen 44 68 +24 > > > > > fdt_get_phandle 120 144 +24 > > > > > fdt_del_node 96 120 +24 > > > > > fdt_del_mem_rsv 112 136 +24 > > > > > fdt_add_subnode_namelen 284 308 +24 > > > > > fdt_add_mem_rsv 124 148 +24 > > > > > common_diskboot 680 704 +24 > > > > > fdt_next_subnode 80 88 +8 > > > > > spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116) > > > > > function old new delta > > > > > fdt_get_mem_rsv 52 236 +184 > > > > > fdt_add_property_ 340 484 +144 > > > > > fdt_get_name 68 152 +84 > > > > > fdt_splice_ 148 228 +80 > > > > > fdt_num_mem_rsv 64 128 +64 > > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > > fdt_offset_ptr 52 104 +52 > > > > > fdt_splice_struct_ 96 144 +48 > > > > > fdt_shrink_to_minimum 220 268 +48 > > > > > fdt_get_property_by_offset_ 36 84 +48 > > > > > fdt_ro_probe_ - 36 +36 > > > > > fdt_subnode_offset_namelen 200 232 +32 > > > > > spl_load_simple_fit 848 872 +24 > > > > > spl_fit_append_fdt 196 220 +24 > > > > > fdt_getprop_by_offset 80 104 +24 > > > > > fdt_get_string 64 88 +24 > > > > > fdt_get_property_namelen_ 200 224 +24 > > > > > fdt_get_phandle 120 144 +24 > > > > > fdt_del_mem_rsv 104 128 +24 > > > > > fdt_check_header 28 52 +24 > > > > > fdt_add_subnode_namelen 260 284 +24 > > > > > fdt_add_mem_rsv 112 136 +24 > > > > > fdt_next_tag 192 212 +20 > > > > > fdt_path_offset_namelen 240 256 +16 > > > > > fdt_supernode_atdepth_offset 160 172 +12 > > > > > fdt_node_offset_by_phandle 120 132 +12 > > > > > fdt_mem_rsv 20 - -20 > > > > > fdt_check_prop_offset_ 64 44 -20 > > > > > fdt_check_node_offset_ 64 44 -20 > > > > > > > > > > And note that for the SPL case we're already setting ASSUME_MASK to > > > > > 0xff so there's maximum savings already being done there. Does anyone > > > > > have ideas on where / how to further tweak code size? Thanks! > > > > > > > > +David Gibson > > > > > > > > I suspect there are more checks that need to be made conditional. > > > > > > Seems likely. > > > > OK. Does that mean you're going to take a look? > > No, sorry, my time for dtc is very limited. OK, fair point. > > > Though, as I've opined before, from what I understand the SPL is *so* > > > restricted an environment, I'm not really convinced DTs are the right > > > tool for the job there. > > > > SPL is space constrained, which is why we mask out all of the safety > > checks. But we still generally have enough memory that this is fine. > > This specific board has 256KiB for SPL, for example. But I'm not just > > concerned about 1KiB of growth when everything is masked off, I'm also > > concerned about 2KiB of growth over a changelog that doesn't read like > > it added a bunch of stuff that should cause everything to grow, > > either. > > Yeah, that's a good point. It's certainly not obvious to me what > would have caused the growth. I think we'll need a revision by > revision test to see where it was added. So, with a bit of playing around, I've used buildman to give us that, for the leez-rk3399 platform. The full results are at https://gist.github.com/trini/e42cc848d38bfc77e0124d3ace86d95d and note that build #6, "scripts/dtc: Update to upstream version v1.4.6-22-g24b1f3f064d4" is the first interesting one (#5 was a resync entirely to dtc's 84e414b0b5bc rather than strictly what U-Boot has). But what does all of this _mean_ ? I kinda think I have an answer now. One of the things that sticks out is 6dcb8ba408ec adds a lot and 11738cf01f15 reduces it just a little. With a re-revert again locally, there's just a few size growths now to look in to on the SPL side: leez-rk3399 : all +8 spl/u-boot-spl:all +180 spl/u-boot-spl:text +180 text +8 u-boot: add: 0/0, grow: 3/-1 bytes: 28/-20 (8) function old new delta fdt_move 80 92 +12 fdt_splice_ 148 156 +8 fdt_offset_ptr 104 112 +8 fdt_get_string 288 268 -20 spl-u-boot-spl: add: 1/0, grow: 9/-2 bytes: 220/-40 (180) function old new delta fdt_get_name 68 128 +60 fdt_get_mem_rsv 52 96 +44 fdt_subnode_offset_namelen 200 232 +32 fdt_next_tag 192 212 +20 fdt_path_offset_namelen 240 256 +16 fdt_supernode_atdepth_offset 160 172 +12 fdt_ro_probe_ - 12 +12 fdt_node_offset_by_phandle 120 132 +12 fdt_splice_ 148 156 +8 fdt_offset_ptr 52 56 +4 fdt_check_prop_offset_ 64 44 -20 fdt_check_node_offset_ 64 44 -20 Of those, I'm not 100% sure. That might well end up being Simon's suggestion of a few places needing a check they don't have, I'm not sure. -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Size growth? 2020-10-21 22:49 ` Tom Rini @ 2020-10-22 4:00 ` David Gibson [not found] ` <20201022040013.GB1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: David Gibson @ 2020-10-22 4:00 UTC (permalink / raw) To: Tom Rini; +Cc: Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 12020 bytes --] On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > On Tue, Oct 20, 2020 at 01:09:07PM +1100, David Gibson wrote: > > On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote: > > > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote: > > > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote: > > > > > Hi Tom, > > > > > > > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > > > > > > > > Hey all, > > > > > > > > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc > > > > > > to cbca977ea121. I'm seeing some rather worrying size increases > > > > > > however. For example, on an aarch64 board such as leez-rk3399 we have: > > > > > > leez-rk3399 : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696 > > > > > > u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696) > > > > > > function old new delta > > > > > > do_fdt 3992 4308 +316 > > > > > > fdt_check_header 276 492 +216 > > > > > > fdt_add_property_ 388 556 +168 > > > > > > fdt_ro_probe_ 128 252 +124 > > > > > > fdt_packblocks_ 176 296 +120 > > > > > > fdt_open_into 392 512 +120 > > > > > > fdt_blocks_misordered_ 96 216 +120 > > > > > > fdt_get_mem_rsv 108 220 +112 > > > > > > fdt_offset_ptr 104 208 +104 > > > > > > fdt_get_string 288 388 +100 > > > > > > fdt_splice_ 148 228 +80 > > > > > > fdt_valid 204 276 +72 > > > > > > fdt_getprop_by_offset 184 256 +72 > > > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > > > fdt_num_mem_rsv 60 116 +56 > > > > > > fdt_splice_struct_ 96 144 +48 > > > > > > fdt_shrink_to_minimum 220 268 +48 > > > > > > fdt_rw_probe_ 108 156 +48 > > > > > > fdt_mem_rsv 60 108 +48 > > > > > > fdt_getprop_namelen 100 148 +48 > > > > > > fdt_get_property_by_offset_ 100 148 +48 > > > > > > fdt_get_name 164 212 +48 > > > > > > efi_install_fdt 964 1012 +48 > > > > > > boot_get_fdt 888 936 +48 > > > > > > fdt_move 80 116 +36 > > > > > > reserve_fdt 72 96 +24 > > > > > > reloc_fdt 76 100 +24 > > > > > > image_setup_libfdt 284 308 +24 > > > > > > fit_image_load 1608 1632 +24 > > > > > > fit_image_get_data_and_size 172 196 +24 > > > > > > fit_get_end 16 40 +24 > > > > > > fdt_next_tag 256 280 +24 > > > > > > fdt_header_size 12 36 +24 > > > > > > fdt_get_property_namelen_ 212 236 +24 > > > > > > fdt_get_property_namelen 44 68 +24 > > > > > > fdt_get_phandle 120 144 +24 > > > > > > fdt_del_node 96 120 +24 > > > > > > fdt_del_mem_rsv 112 136 +24 > > > > > > fdt_add_subnode_namelen 284 308 +24 > > > > > > fdt_add_mem_rsv 124 148 +24 > > > > > > common_diskboot 680 704 +24 > > > > > > fdt_next_subnode 80 88 +8 > > > > > > spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116) > > > > > > function old new delta > > > > > > fdt_get_mem_rsv 52 236 +184 > > > > > > fdt_add_property_ 340 484 +144 > > > > > > fdt_get_name 68 152 +84 > > > > > > fdt_splice_ 148 228 +80 > > > > > > fdt_num_mem_rsv 64 128 +64 > > > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > > > fdt_offset_ptr 52 104 +52 > > > > > > fdt_splice_struct_ 96 144 +48 > > > > > > fdt_shrink_to_minimum 220 268 +48 > > > > > > fdt_get_property_by_offset_ 36 84 +48 > > > > > > fdt_ro_probe_ - 36 +36 > > > > > > fdt_subnode_offset_namelen 200 232 +32 > > > > > > spl_load_simple_fit 848 872 +24 > > > > > > spl_fit_append_fdt 196 220 +24 > > > > > > fdt_getprop_by_offset 80 104 +24 > > > > > > fdt_get_string 64 88 +24 > > > > > > fdt_get_property_namelen_ 200 224 +24 > > > > > > fdt_get_phandle 120 144 +24 > > > > > > fdt_del_mem_rsv 104 128 +24 > > > > > > fdt_check_header 28 52 +24 > > > > > > fdt_add_subnode_namelen 260 284 +24 > > > > > > fdt_add_mem_rsv 112 136 +24 > > > > > > fdt_next_tag 192 212 +20 > > > > > > fdt_path_offset_namelen 240 256 +16 > > > > > > fdt_supernode_atdepth_offset 160 172 +12 > > > > > > fdt_node_offset_by_phandle 120 132 +12 > > > > > > fdt_mem_rsv 20 - -20 > > > > > > fdt_check_prop_offset_ 64 44 -20 > > > > > > fdt_check_node_offset_ 64 44 -20 > > > > > > > > > > > > And note that for the SPL case we're already setting ASSUME_MASK to > > > > > > 0xff so there's maximum savings already being done there. Does anyone > > > > > > have ideas on where / how to further tweak code size? Thanks! > > > > > > > > > > +David Gibson > > > > > > > > > > I suspect there are more checks that need to be made conditional. > > > > > > > > Seems likely. > > > > > > OK. Does that mean you're going to take a look? > > > > No, sorry, my time for dtc is very limited. > > OK, fair point. > > > > > Though, as I've opined before, from what I understand the SPL is *so* > > > > restricted an environment, I'm not really convinced DTs are the right > > > > tool for the job there. > > > > > > SPL is space constrained, which is why we mask out all of the safety > > > checks. But we still generally have enough memory that this is fine. > > > This specific board has 256KiB for SPL, for example. But I'm not just > > > concerned about 1KiB of growth when everything is masked off, I'm also > > > concerned about 2KiB of growth over a changelog that doesn't read like > > > it added a bunch of stuff that should cause everything to grow, > > > either. > > > > Yeah, that's a good point. It's certainly not obvious to me what > > would have caused the growth. I think we'll need a revision by > > revision test to see where it was added. > > So, with a bit of playing around, I've used buildman to give us that, > for the leez-rk3399 platform. The full results are at > https://gist.github.com/trini/e42cc848d38bfc77e0124d3ace86d95d and note > that build #6, "scripts/dtc: Update to upstream version > v1.4.6-22-g24b1f3f064d4" is the first interesting one (#5 was a resync > entirely to dtc's 84e414b0b5bc rather than strictly what U-Boot has). Right.. I was meaning stepping through the dtc upstream revisions, not just the uboot revisions, so you get a fine grained idea of where the increase is coming from. > But what does all of this _mean_ ? I kinda think I have an answer now. > One of the things that sticks out is 6dcb8ba408ec adds a lot and > 11738cf01f15 reduces it just a little. Ah, that's a tricky one. If we don't handle unaligned accesses we instead get intermittent bug reports where it just crashes. I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just break for either an unaligned dtb (unlikely) or if you attempt to load an unaligned value from a property (more likely, but don't add the flag if you're not sure you don't need it). > With a re-revert again locally, there's just a few size > growths now to look in to on the SPL side: > leez-rk3399 : all +8 spl/u-boot-spl:all +180 spl/u-boot-spl:text +180 text +8 > u-boot: add: 0/0, grow: 3/-1 bytes: 28/-20 (8) > function old new delta > fdt_move 80 92 +12 > fdt_splice_ 148 156 +8 > fdt_offset_ptr 104 112 +8 > fdt_get_string 288 268 -20 > spl-u-boot-spl: add: 1/0, grow: 9/-2 bytes: 220/-40 (180) > function old new delta > fdt_get_name 68 128 +60 > fdt_get_mem_rsv 52 96 +44 > fdt_subnode_offset_namelen 200 232 +32 > fdt_next_tag 192 212 +20 > fdt_path_offset_namelen 240 256 +16 > fdt_supernode_atdepth_offset 160 172 +12 > fdt_ro_probe_ - 12 +12 > fdt_node_offset_by_phandle 120 132 +12 > fdt_splice_ 148 156 +8 > fdt_offset_ptr 52 56 +4 > fdt_check_prop_offset_ 64 44 -20 > fdt_check_node_offset_ 64 44 -20 > > Of those, I'm not 100% sure. That might well end up being Simon's > suggestion of a few places needing a check they don't have, I'm not > sure. Maybe. Really need to know which dtc/libfdt revision makes the changes to decipher that. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20201022040013.GB1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>]
* Re: Size growth? [not found] ` <20201022040013.GB1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> @ 2020-10-22 12:32 ` Tom Rini 2020-10-22 14:58 ` David Gibson 0 siblings, 1 reply; 28+ messages in thread From: Tom Rini @ 2020-10-22 12:32 UTC (permalink / raw) To: David Gibson; +Cc: Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 13196 bytes --] On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > On Tue, Oct 20, 2020 at 01:09:07PM +1100, David Gibson wrote: > > > On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote: > > > > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote: > > > > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote: > > > > > > Hi Tom, > > > > > > > > > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > > > > > > > > > > Hey all, > > > > > > > > > > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc > > > > > > > to cbca977ea121. I'm seeing some rather worrying size increases > > > > > > > however. For example, on an aarch64 board such as leez-rk3399 we have: > > > > > > > leez-rk3399 : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696 > > > > > > > u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696) > > > > > > > function old new delta > > > > > > > do_fdt 3992 4308 +316 > > > > > > > fdt_check_header 276 492 +216 > > > > > > > fdt_add_property_ 388 556 +168 > > > > > > > fdt_ro_probe_ 128 252 +124 > > > > > > > fdt_packblocks_ 176 296 +120 > > > > > > > fdt_open_into 392 512 +120 > > > > > > > fdt_blocks_misordered_ 96 216 +120 > > > > > > > fdt_get_mem_rsv 108 220 +112 > > > > > > > fdt_offset_ptr 104 208 +104 > > > > > > > fdt_get_string 288 388 +100 > > > > > > > fdt_splice_ 148 228 +80 > > > > > > > fdt_valid 204 276 +72 > > > > > > > fdt_getprop_by_offset 184 256 +72 > > > > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > > > > fdt_num_mem_rsv 60 116 +56 > > > > > > > fdt_splice_struct_ 96 144 +48 > > > > > > > fdt_shrink_to_minimum 220 268 +48 > > > > > > > fdt_rw_probe_ 108 156 +48 > > > > > > > fdt_mem_rsv 60 108 +48 > > > > > > > fdt_getprop_namelen 100 148 +48 > > > > > > > fdt_get_property_by_offset_ 100 148 +48 > > > > > > > fdt_get_name 164 212 +48 > > > > > > > efi_install_fdt 964 1012 +48 > > > > > > > boot_get_fdt 888 936 +48 > > > > > > > fdt_move 80 116 +36 > > > > > > > reserve_fdt 72 96 +24 > > > > > > > reloc_fdt 76 100 +24 > > > > > > > image_setup_libfdt 284 308 +24 > > > > > > > fit_image_load 1608 1632 +24 > > > > > > > fit_image_get_data_and_size 172 196 +24 > > > > > > > fit_get_end 16 40 +24 > > > > > > > fdt_next_tag 256 280 +24 > > > > > > > fdt_header_size 12 36 +24 > > > > > > > fdt_get_property_namelen_ 212 236 +24 > > > > > > > fdt_get_property_namelen 44 68 +24 > > > > > > > fdt_get_phandle 120 144 +24 > > > > > > > fdt_del_node 96 120 +24 > > > > > > > fdt_del_mem_rsv 112 136 +24 > > > > > > > fdt_add_subnode_namelen 284 308 +24 > > > > > > > fdt_add_mem_rsv 124 148 +24 > > > > > > > common_diskboot 680 704 +24 > > > > > > > fdt_next_subnode 80 88 +8 > > > > > > > spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116) > > > > > > > function old new delta > > > > > > > fdt_get_mem_rsv 52 236 +184 > > > > > > > fdt_add_property_ 340 484 +144 > > > > > > > fdt_get_name 68 152 +84 > > > > > > > fdt_splice_ 148 228 +80 > > > > > > > fdt_num_mem_rsv 64 128 +64 > > > > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > > > > fdt_offset_ptr 52 104 +52 > > > > > > > fdt_splice_struct_ 96 144 +48 > > > > > > > fdt_shrink_to_minimum 220 268 +48 > > > > > > > fdt_get_property_by_offset_ 36 84 +48 > > > > > > > fdt_ro_probe_ - 36 +36 > > > > > > > fdt_subnode_offset_namelen 200 232 +32 > > > > > > > spl_load_simple_fit 848 872 +24 > > > > > > > spl_fit_append_fdt 196 220 +24 > > > > > > > fdt_getprop_by_offset 80 104 +24 > > > > > > > fdt_get_string 64 88 +24 > > > > > > > fdt_get_property_namelen_ 200 224 +24 > > > > > > > fdt_get_phandle 120 144 +24 > > > > > > > fdt_del_mem_rsv 104 128 +24 > > > > > > > fdt_check_header 28 52 +24 > > > > > > > fdt_add_subnode_namelen 260 284 +24 > > > > > > > fdt_add_mem_rsv 112 136 +24 > > > > > > > fdt_next_tag 192 212 +20 > > > > > > > fdt_path_offset_namelen 240 256 +16 > > > > > > > fdt_supernode_atdepth_offset 160 172 +12 > > > > > > > fdt_node_offset_by_phandle 120 132 +12 > > > > > > > fdt_mem_rsv 20 - -20 > > > > > > > fdt_check_prop_offset_ 64 44 -20 > > > > > > > fdt_check_node_offset_ 64 44 -20 > > > > > > > > > > > > > > And note that for the SPL case we're already setting ASSUME_MASK to > > > > > > > 0xff so there's maximum savings already being done there. Does anyone > > > > > > > have ideas on where / how to further tweak code size? Thanks! > > > > > > > > > > > > +David Gibson > > > > > > > > > > > > I suspect there are more checks that need to be made conditional. > > > > > > > > > > Seems likely. > > > > > > > > OK. Does that mean you're going to take a look? > > > > > > No, sorry, my time for dtc is very limited. > > > > OK, fair point. > > > > > > > Though, as I've opined before, from what I understand the SPL is *so* > > > > > restricted an environment, I'm not really convinced DTs are the right > > > > > tool for the job there. > > > > > > > > SPL is space constrained, which is why we mask out all of the safety > > > > checks. But we still generally have enough memory that this is fine. > > > > This specific board has 256KiB for SPL, for example. But I'm not just > > > > concerned about 1KiB of growth when everything is masked off, I'm also > > > > concerned about 2KiB of growth over a changelog that doesn't read like > > > > it added a bunch of stuff that should cause everything to grow, > > > > either. > > > > > > Yeah, that's a good point. It's certainly not obvious to me what > > > would have caused the growth. I think we'll need a revision by > > > revision test to see where it was added. > > > > So, with a bit of playing around, I've used buildman to give us that, > > for the leez-rk3399 platform. The full results are at > > https://gist.github.com/trini/e42cc848d38bfc77e0124d3ace86d95d and note > > that build #6, "scripts/dtc: Update to upstream version > > v1.4.6-22-g24b1f3f064d4" is the first interesting one (#5 was a resync > > entirely to dtc's 84e414b0b5bc rather than strictly what U-Boot has). > > Right.. I was meaning stepping through the dtc upstream revisions, not > just the uboot revisions, so you get a fine grained idea of where the > increase is coming from. Yes, that's what that link is. I imported every revision from point A to point B, and built U-Boot for it, and checked the sizes, since we have tooling for that. It's possible the bloat-o-meter script in the kernel could be used, but I'm not sure what the right target(s) in dtc would be, to get any useful information out. > > But what does all of this _mean_ ? I kinda think I have an answer now. > > One of the things that sticks out is 6dcb8ba408ec adds a lot and > > 11738cf01f15 reduces it just a little. > > Ah, that's a tricky one. If we don't handle unaligned accesses we > instead get intermittent bug reports where it just crashes. We really need to talk about that then. There was a problem of people turning off the sanity check for making sure the entire device tree was aligned and then having everything crash. > I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > break for either an unaligned dtb (unlikely) or if you attempt to load > an unaligned value from a property (more likely, but don't add the > flag if you're not sure you don't need it). So long as it's abstracted in such a way that we don't grow the size of everything again, yes, that is the right way forward I think. These tests still introduce a big boot-time slowdown as well. > > With a re-revert again locally, there's just a few size > > growths now to look in to on the SPL side: > > leez-rk3399 : all +8 spl/u-boot-spl:all +180 spl/u-boot-spl:text +180 text +8 > > u-boot: add: 0/0, grow: 3/-1 bytes: 28/-20 (8) > > function old new delta > > fdt_move 80 92 +12 > > fdt_splice_ 148 156 +8 > > fdt_offset_ptr 104 112 +8 > > fdt_get_string 288 268 -20 > > spl-u-boot-spl: add: 1/0, grow: 9/-2 bytes: 220/-40 (180) > > function old new delta > > fdt_get_name 68 128 +60 > > fdt_get_mem_rsv 52 96 +44 > > fdt_subnode_offset_namelen 200 232 +32 > > fdt_next_tag 192 212 +20 > > fdt_path_offset_namelen 240 256 +16 > > fdt_supernode_atdepth_offset 160 172 +12 > > fdt_ro_probe_ - 12 +12 > > fdt_node_offset_by_phandle 120 132 +12 > > fdt_splice_ 148 156 +8 > > fdt_offset_ptr 52 56 +4 > > fdt_check_prop_offset_ 64 44 -20 > > fdt_check_node_offset_ 64 44 -20 > > > > Of those, I'm not 100% sure. That might well end up being Simon's > > suggestion of a few places needing a check they don't have, I'm not > > sure. > > Maybe. Really need to know which dtc/libfdt revision makes the > changes to decipher that. Yes, there's a few suggestive dtc revisions that change the size of those functions in the full gist. I think some of the satisfy Coverity changes had small impact, but I didn't look for fdt_get_name / fdt_get_mem_rsv. -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Size growth? 2020-10-22 12:32 ` Tom Rini @ 2020-10-22 14:58 ` David Gibson [not found] ` <20201022145804.GI1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: David Gibson @ 2020-10-22 14:58 UTC (permalink / raw) To: Tom Rini; +Cc: Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 14252 bytes --] On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > On Tue, Oct 20, 2020 at 01:09:07PM +1100, David Gibson wrote: > > > > On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote: > > > > > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote: > > > > > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote: > > > > > > > Hi Tom, > > > > > > > > > > > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > > > > > > > > > > > > Hey all, > > > > > > > > > > > > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc > > > > > > > > to cbca977ea121. I'm seeing some rather worrying size increases > > > > > > > > however. For example, on an aarch64 board such as leez-rk3399 we have: > > > > > > > > leez-rk3399 : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696 > > > > > > > > u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696) > > > > > > > > function old new delta > > > > > > > > do_fdt 3992 4308 +316 > > > > > > > > fdt_check_header 276 492 +216 > > > > > > > > fdt_add_property_ 388 556 +168 > > > > > > > > fdt_ro_probe_ 128 252 +124 > > > > > > > > fdt_packblocks_ 176 296 +120 > > > > > > > > fdt_open_into 392 512 +120 > > > > > > > > fdt_blocks_misordered_ 96 216 +120 > > > > > > > > fdt_get_mem_rsv 108 220 +112 > > > > > > > > fdt_offset_ptr 104 208 +104 > > > > > > > > fdt_get_string 288 388 +100 > > > > > > > > fdt_splice_ 148 228 +80 > > > > > > > > fdt_valid 204 276 +72 > > > > > > > > fdt_getprop_by_offset 184 256 +72 > > > > > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > > > > > fdt_num_mem_rsv 60 116 +56 > > > > > > > > fdt_splice_struct_ 96 144 +48 > > > > > > > > fdt_shrink_to_minimum 220 268 +48 > > > > > > > > fdt_rw_probe_ 108 156 +48 > > > > > > > > fdt_mem_rsv 60 108 +48 > > > > > > > > fdt_getprop_namelen 100 148 +48 > > > > > > > > fdt_get_property_by_offset_ 100 148 +48 > > > > > > > > fdt_get_name 164 212 +48 > > > > > > > > efi_install_fdt 964 1012 +48 > > > > > > > > boot_get_fdt 888 936 +48 > > > > > > > > fdt_move 80 116 +36 > > > > > > > > reserve_fdt 72 96 +24 > > > > > > > > reloc_fdt 76 100 +24 > > > > > > > > image_setup_libfdt 284 308 +24 > > > > > > > > fit_image_load 1608 1632 +24 > > > > > > > > fit_image_get_data_and_size 172 196 +24 > > > > > > > > fit_get_end 16 40 +24 > > > > > > > > fdt_next_tag 256 280 +24 > > > > > > > > fdt_header_size 12 36 +24 > > > > > > > > fdt_get_property_namelen_ 212 236 +24 > > > > > > > > fdt_get_property_namelen 44 68 +24 > > > > > > > > fdt_get_phandle 120 144 +24 > > > > > > > > fdt_del_node 96 120 +24 > > > > > > > > fdt_del_mem_rsv 112 136 +24 > > > > > > > > fdt_add_subnode_namelen 284 308 +24 > > > > > > > > fdt_add_mem_rsv 124 148 +24 > > > > > > > > common_diskboot 680 704 +24 > > > > > > > > fdt_next_subnode 80 88 +8 > > > > > > > > spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116) > > > > > > > > function old new delta > > > > > > > > fdt_get_mem_rsv 52 236 +184 > > > > > > > > fdt_add_property_ 340 484 +144 > > > > > > > > fdt_get_name 68 152 +84 > > > > > > > > fdt_splice_ 148 228 +80 > > > > > > > > fdt_num_mem_rsv 64 128 +64 > > > > > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > > > > > fdt_offset_ptr 52 104 +52 > > > > > > > > fdt_splice_struct_ 96 144 +48 > > > > > > > > fdt_shrink_to_minimum 220 268 +48 > > > > > > > > fdt_get_property_by_offset_ 36 84 +48 > > > > > > > > fdt_ro_probe_ - 36 +36 > > > > > > > > fdt_subnode_offset_namelen 200 232 +32 > > > > > > > > spl_load_simple_fit 848 872 +24 > > > > > > > > spl_fit_append_fdt 196 220 +24 > > > > > > > > fdt_getprop_by_offset 80 104 +24 > > > > > > > > fdt_get_string 64 88 +24 > > > > > > > > fdt_get_property_namelen_ 200 224 +24 > > > > > > > > fdt_get_phandle 120 144 +24 > > > > > > > > fdt_del_mem_rsv 104 128 +24 > > > > > > > > fdt_check_header 28 52 +24 > > > > > > > > fdt_add_subnode_namelen 260 284 +24 > > > > > > > > fdt_add_mem_rsv 112 136 +24 > > > > > > > > fdt_next_tag 192 212 +20 > > > > > > > > fdt_path_offset_namelen 240 256 +16 > > > > > > > > fdt_supernode_atdepth_offset 160 172 +12 > > > > > > > > fdt_node_offset_by_phandle 120 132 +12 > > > > > > > > fdt_mem_rsv 20 - -20 > > > > > > > > fdt_check_prop_offset_ 64 44 -20 > > > > > > > > fdt_check_node_offset_ 64 44 -20 > > > > > > > > > > > > > > > > And note that for the SPL case we're already setting ASSUME_MASK to > > > > > > > > 0xff so there's maximum savings already being done there. Does anyone > > > > > > > > have ideas on where / how to further tweak code size? Thanks! > > > > > > > > > > > > > > +David Gibson > > > > > > > > > > > > > > I suspect there are more checks that need to be made conditional. > > > > > > > > > > > > Seems likely. > > > > > > > > > > OK. Does that mean you're going to take a look? > > > > > > > > No, sorry, my time for dtc is very limited. > > > > > > OK, fair point. > > > > > > > > > Though, as I've opined before, from what I understand the SPL is *so* > > > > > > restricted an environment, I'm not really convinced DTs are the right > > > > > > tool for the job there. > > > > > > > > > > SPL is space constrained, which is why we mask out all of the safety > > > > > checks. But we still generally have enough memory that this is fine. > > > > > This specific board has 256KiB for SPL, for example. But I'm not just > > > > > concerned about 1KiB of growth when everything is masked off, I'm also > > > > > concerned about 2KiB of growth over a changelog that doesn't read like > > > > > it added a bunch of stuff that should cause everything to grow, > > > > > either. > > > > > > > > Yeah, that's a good point. It's certainly not obvious to me what > > > > would have caused the growth. I think we'll need a revision by > > > > revision test to see where it was added. > > > > > > So, with a bit of playing around, I've used buildman to give us that, > > > for the leez-rk3399 platform. The full results are at > > > https://gist.github.com/trini/e42cc848d38bfc77e0124d3ace86d95d and note > > > that build #6, "scripts/dtc: Update to upstream version > > > v1.4.6-22-g24b1f3f064d4" is the first interesting one (#5 was a resync > > > entirely to dtc's 84e414b0b5bc rather than strictly what U-Boot has). > > > > Right.. I was meaning stepping through the dtc upstream revisions, not > > just the uboot revisions, so you get a fine grained idea of where the > > increase is coming from. > > Yes, that's what that link is. I imported every revision from point A > to point B, and built U-Boot for it, and checked the sizes, since we > have tooling for that. It's possible the bloat-o-meter script in the > kernel could be used, but I'm not sure what the right target(s) in dtc > would be, to get any useful information out. Ok.. ok. I might need some guidance to interpret the information there on that link. > > > But what does all of this _mean_ ? I kinda think I have an answer now. > > > One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > 11738cf01f15 reduces it just a little. > > > > Ah, that's a tricky one. If we don't handle unaligned accesses we > > instead get intermittent bug reports where it just crashes. > > We really need to talk about that then. There was a problem of people > turning off the sanity check for making sure the entire device tree was > aligned and then having everything crash. Ok... I'm not really sure where you're going with that thought. > > I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > break for either an unaligned dtb (unlikely) or if you attempt to load > > an unaligned value from a property (more likely, but don't add the > > flag if you're not sure you don't need it). > > So long as it's abstracted in such a way that we don't grow the size of > everything again, yes, that is the right way forward I think. All the ASSUME flags should be resolved at compile time (at least with normal optimization levels enabled in the compiler), so testing for those shouldn't increase size at all. If they do, something is wrong. > These > tests still introduce a big boot-time slowdown as well. I'm not sure exactly what tests you mean. > > > > With a re-revert again locally, there's just a few size > > > growths now to look in to on the SPL side: > > > leez-rk3399 : all +8 spl/u-boot-spl:all +180 spl/u-boot-spl:text +180 text +8 > > > u-boot: add: 0/0, grow: 3/-1 bytes: 28/-20 (8) > > > function old new delta > > > fdt_move 80 92 +12 > > > fdt_splice_ 148 156 +8 > > > fdt_offset_ptr 104 112 +8 > > > fdt_get_string 288 268 -20 > > > spl-u-boot-spl: add: 1/0, grow: 9/-2 bytes: 220/-40 (180) > > > function old new delta > > > fdt_get_name 68 128 +60 > > > fdt_get_mem_rsv 52 96 +44 > > > fdt_subnode_offset_namelen 200 232 +32 > > > fdt_next_tag 192 212 +20 > > > fdt_path_offset_namelen 240 256 +16 > > > fdt_supernode_atdepth_offset 160 172 +12 > > > fdt_ro_probe_ - 12 +12 > > > fdt_node_offset_by_phandle 120 132 +12 > > > fdt_splice_ 148 156 +8 > > > fdt_offset_ptr 52 56 +4 > > > fdt_check_prop_offset_ 64 44 -20 > > > fdt_check_node_offset_ 64 44 -20 > > > > > > Of those, I'm not 100% sure. That might well end up being Simon's > > > suggestion of a few places needing a check they don't have, I'm not > > > sure. > > > > Maybe. Really need to know which dtc/libfdt revision makes the > > changes to decipher that. > > Yes, there's a few suggestive dtc revisions that change the size of > those functions in the full gist. I think some of the satisfy Coverity > changes had small impact, but I didn't look for fdt_get_name / > fdt_get_mem_rsv. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20201022145804.GI1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>]
* Re: Size growth? [not found] ` <20201022145804.GI1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> @ 2020-10-22 15:22 ` Tom Rini 2020-10-26 21:51 ` Rob Herring 0 siblings, 1 reply; 28+ messages in thread From: Tom Rini @ 2020-10-22 15:22 UTC (permalink / raw) To: David Gibson; +Cc: Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 15666 bytes --] On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > On Tue, Oct 20, 2020 at 01:09:07PM +1100, David Gibson wrote: > > > > > On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote: > > > > > > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote: > > > > > > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote: > > > > > > > > Hi Tom, > > > > > > > > > > > > > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > > > > > > > > > > > > > > Hey all, > > > > > > > > > > > > > > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc > > > > > > > > > to cbca977ea121. I'm seeing some rather worrying size increases > > > > > > > > > however. For example, on an aarch64 board such as leez-rk3399 we have: > > > > > > > > > leez-rk3399 : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696 > > > > > > > > > u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696) > > > > > > > > > function old new delta > > > > > > > > > do_fdt 3992 4308 +316 > > > > > > > > > fdt_check_header 276 492 +216 > > > > > > > > > fdt_add_property_ 388 556 +168 > > > > > > > > > fdt_ro_probe_ 128 252 +124 > > > > > > > > > fdt_packblocks_ 176 296 +120 > > > > > > > > > fdt_open_into 392 512 +120 > > > > > > > > > fdt_blocks_misordered_ 96 216 +120 > > > > > > > > > fdt_get_mem_rsv 108 220 +112 > > > > > > > > > fdt_offset_ptr 104 208 +104 > > > > > > > > > fdt_get_string 288 388 +100 > > > > > > > > > fdt_splice_ 148 228 +80 > > > > > > > > > fdt_valid 204 276 +72 > > > > > > > > > fdt_getprop_by_offset 184 256 +72 > > > > > > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > > > > > > fdt_num_mem_rsv 60 116 +56 > > > > > > > > > fdt_splice_struct_ 96 144 +48 > > > > > > > > > fdt_shrink_to_minimum 220 268 +48 > > > > > > > > > fdt_rw_probe_ 108 156 +48 > > > > > > > > > fdt_mem_rsv 60 108 +48 > > > > > > > > > fdt_getprop_namelen 100 148 +48 > > > > > > > > > fdt_get_property_by_offset_ 100 148 +48 > > > > > > > > > fdt_get_name 164 212 +48 > > > > > > > > > efi_install_fdt 964 1012 +48 > > > > > > > > > boot_get_fdt 888 936 +48 > > > > > > > > > fdt_move 80 116 +36 > > > > > > > > > reserve_fdt 72 96 +24 > > > > > > > > > reloc_fdt 76 100 +24 > > > > > > > > > image_setup_libfdt 284 308 +24 > > > > > > > > > fit_image_load 1608 1632 +24 > > > > > > > > > fit_image_get_data_and_size 172 196 +24 > > > > > > > > > fit_get_end 16 40 +24 > > > > > > > > > fdt_next_tag 256 280 +24 > > > > > > > > > fdt_header_size 12 36 +24 > > > > > > > > > fdt_get_property_namelen_ 212 236 +24 > > > > > > > > > fdt_get_property_namelen 44 68 +24 > > > > > > > > > fdt_get_phandle 120 144 +24 > > > > > > > > > fdt_del_node 96 120 +24 > > > > > > > > > fdt_del_mem_rsv 112 136 +24 > > > > > > > > > fdt_add_subnode_namelen 284 308 +24 > > > > > > > > > fdt_add_mem_rsv 124 148 +24 > > > > > > > > > common_diskboot 680 704 +24 > > > > > > > > > fdt_next_subnode 80 88 +8 > > > > > > > > > spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116) > > > > > > > > > function old new delta > > > > > > > > > fdt_get_mem_rsv 52 236 +184 > > > > > > > > > fdt_add_property_ 340 484 +144 > > > > > > > > > fdt_get_name 68 152 +84 > > > > > > > > > fdt_splice_ 148 228 +80 > > > > > > > > > fdt_num_mem_rsv 64 128 +64 > > > > > > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > > > > > > fdt_offset_ptr 52 104 +52 > > > > > > > > > fdt_splice_struct_ 96 144 +48 > > > > > > > > > fdt_shrink_to_minimum 220 268 +48 > > > > > > > > > fdt_get_property_by_offset_ 36 84 +48 > > > > > > > > > fdt_ro_probe_ - 36 +36 > > > > > > > > > fdt_subnode_offset_namelen 200 232 +32 > > > > > > > > > spl_load_simple_fit 848 872 +24 > > > > > > > > > spl_fit_append_fdt 196 220 +24 > > > > > > > > > fdt_getprop_by_offset 80 104 +24 > > > > > > > > > fdt_get_string 64 88 +24 > > > > > > > > > fdt_get_property_namelen_ 200 224 +24 > > > > > > > > > fdt_get_phandle 120 144 +24 > > > > > > > > > fdt_del_mem_rsv 104 128 +24 > > > > > > > > > fdt_check_header 28 52 +24 > > > > > > > > > fdt_add_subnode_namelen 260 284 +24 > > > > > > > > > fdt_add_mem_rsv 112 136 +24 > > > > > > > > > fdt_next_tag 192 212 +20 > > > > > > > > > fdt_path_offset_namelen 240 256 +16 > > > > > > > > > fdt_supernode_atdepth_offset 160 172 +12 > > > > > > > > > fdt_node_offset_by_phandle 120 132 +12 > > > > > > > > > fdt_mem_rsv 20 - -20 > > > > > > > > > fdt_check_prop_offset_ 64 44 -20 > > > > > > > > > fdt_check_node_offset_ 64 44 -20 > > > > > > > > > > > > > > > > > > And note that for the SPL case we're already setting ASSUME_MASK to > > > > > > > > > 0xff so there's maximum savings already being done there. Does anyone > > > > > > > > > have ideas on where / how to further tweak code size? Thanks! > > > > > > > > > > > > > > > > +David Gibson > > > > > > > > > > > > > > > > I suspect there are more checks that need to be made conditional. > > > > > > > > > > > > > > Seems likely. > > > > > > > > > > > > OK. Does that mean you're going to take a look? > > > > > > > > > > No, sorry, my time for dtc is very limited. > > > > > > > > OK, fair point. > > > > > > > > > > > Though, as I've opined before, from what I understand the SPL is *so* > > > > > > > restricted an environment, I'm not really convinced DTs are the right > > > > > > > tool for the job there. > > > > > > > > > > > > SPL is space constrained, which is why we mask out all of the safety > > > > > > checks. But we still generally have enough memory that this is fine. > > > > > > This specific board has 256KiB for SPL, for example. But I'm not just > > > > > > concerned about 1KiB of growth when everything is masked off, I'm also > > > > > > concerned about 2KiB of growth over a changelog that doesn't read like > > > > > > it added a bunch of stuff that should cause everything to grow, > > > > > > either. > > > > > > > > > > Yeah, that's a good point. It's certainly not obvious to me what > > > > > would have caused the growth. I think we'll need a revision by > > > > > revision test to see where it was added. > > > > > > > > So, with a bit of playing around, I've used buildman to give us that, > > > > for the leez-rk3399 platform. The full results are at > > > > https://gist.github.com/trini/e42cc848d38bfc77e0124d3ace86d95d and note > > > > that build #6, "scripts/dtc: Update to upstream version > > > > v1.4.6-22-g24b1f3f064d4" is the first interesting one (#5 was a resync > > > > entirely to dtc's 84e414b0b5bc rather than strictly what U-Boot has). > > > > > > Right.. I was meaning stepping through the dtc upstream revisions, not > > > just the uboot revisions, so you get a fine grained idea of where the > > > increase is coming from. > > > > Yes, that's what that link is. I imported every revision from point A > > to point B, and built U-Boot for it, and checked the sizes, since we > > have tooling for that. It's possible the bloat-o-meter script in the > > kernel could be used, but I'm not sure what the right target(s) in dtc > > would be, to get any useful information out. > > Ok.. ok. I might need some guidance to interpret the information > there on that link. You can ignore everything before line 78. If we look at lines 86-100 there is: 09: scripts/dtc: Update to upstream version v1.4.6-25-g04b5b4062ccd aarch64: (for 1/1 boards) all +4.0 spl/u-boot-spl:all +4.0 spl/u-boot-spl:text +4.0 text +4.0 leez-rk3399 : all +4 spl/u-boot-spl:all +4 spl/u-boot-spl:text +4 text +4 u-boot: add: 2/-1, grow: 0/-1 bytes: 224/-220 (4) function old new delta fdt_ro_probe_ - 116 +116 fdt_rw_probe_ - 108 +108 fdt_rw_check_header_ 108 - -108 fdt_check_header 116 4 -112 spl-u-boot-spl: add: 2/-1, grow: 0/-1 bytes: 224/-220 (4) function old new delta fdt_ro_probe_ - 116 +116 fdt_rw_probe_ - 108 +108 fdt_rw_check_header_ 108 - -108 fdt_check_header 116 4 -112 Which means that with dtc commit 04b5b4062ccd, we got the above function size (text) changes. Overall it's a 4 byte growth for "libfdt: Clean up header checking functions" which at the end of the day, yes, that's a fine and reasonable change and nothing with my U-Boot hat on I'm concerned about. Where it gets complicated is that it's not until line 722 where we get to the end of where Simon's ASSUME_... changes were fully merged. So it's a bit more detective work to see where for example fdt_get_mem_rsv was grown, but then not shrunk back with the ASSUME flags. A quick search shows there's 5 dtc commits that change fdt_get_mem_rsv. My hope is that it would be quicker for you or someone else that works on dtc a lot to mentally swap those commits in and see "Oh, I guess we could ASSUME_... over in this hunk" or "Oh, maybe we can rewrite this change to be a bit more compact" easier than I :) > > > > But what does all of this _mean_ ? I kinda think I have an answer now. > > > > One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > 11738cf01f15 reduces it just a little. > > > > > > Ah, that's a tricky one. If we don't handle unaligned accesses we > > > instead get intermittent bug reports where it just crashes. > > > > We really need to talk about that then. There was a problem of people > > turning off the sanity check for making sure the entire device tree was > > aligned and then having everything crash. > > Ok... I'm not really sure where you're going with that thought. In my reading of the mailing list history of how this issue came up, it was someone was booting a dragonboard or something, and they (or rather, the board maintainer set by default) the flag to use the device tree wherever it is in memory and NOT to relocate it to a properly aligned address. This in turn lead to the kernel getting an unaligned device tree and everything crashing. The "I know what I'm doing" flag was set, violated the documented requirements for device trees need to reside in memory and everything blew up. After that it was noticed that there could be some internal mis-alignment and if you tried those accesses on a CPU that doesn't support doing those reads easily there could be problems, but that's not a common at all case (as noted by it not having been seen in practice). > > > I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > break for either an unaligned dtb (unlikely) or if you attempt to load > > > an unaligned value from a property (more likely, but don't add the > > > flag if you're not sure you don't need it). > > > > So long as it's abstracted in such a way that we don't grow the size of > > everything again, yes, that is the right way forward I think. > > All the ASSUME flags should be resolved at compile time (at least with > normal optimization levels enabled in the compiler), so testing for > those shouldn't increase size at all. If they do, something is wrong. I'm saying that how ever this new ASSUME flag is done, it needs to be done in such a way the compiler really will be smart about it. So something like making a new function that does fdt64_ld() if we aren't ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are ASSUME_ALIGNED_ACCESS. > > These > > tests still introduce a big boot-time slowdown as well. > > I'm not sure exactly what tests you mean. Sorry, s/tests/changes/. I asked Patrice to re-run the boot time tests on a resync to exactly what's in upstream dtc and he still sees the very noticeable slowdown. -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Size growth? 2020-10-22 15:22 ` Tom Rini @ 2020-10-26 21:51 ` Rob Herring [not found] ` <CAL_JsqJiKETTMJX3MsEmECE+jtbwYydVSgt1a6poz_L+pPRFTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: Rob Herring @ 2020-10-26 21:51 UTC (permalink / raw) To: Tom Rini; +Cc: David Gibson, Simon Glass, Devicetree Compiler On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: [...] > > > > > But what does all of this _mean_ ? I kinda think I have an answer now. > > > > > One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > > 11738cf01f15 reduces it just a little. > > > > > > > > Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > instead get intermittent bug reports where it just crashes. > > > > > > We really need to talk about that then. There was a problem of people > > > turning off the sanity check for making sure the entire device tree was > > > aligned and then having everything crash. > > > > Ok... I'm not really sure where you're going with that thought. > > In my reading of the mailing list history of how this issue came up, > it was someone was booting a dragonboard or something, and they (or > rather, the board maintainer set by default) the flag to use the device > tree wherever it is in memory and NOT to relocate it to a properly > aligned address. This in turn lead to the kernel getting an unaligned > device tree and everything crashing. The "I know what I'm doing" flag > was set, violated the documented requirements for device trees need to > reside in memory and everything blew up. > > After that it was noticed that there could be some internal > mis-alignment and if you tried those accesses on a CPU that doesn't > support doing those reads easily there could be problems, but that's not > a common at all case (as noted by it not having been seen in practice). Nor a problem on many environments to begin with. More below... > > > > I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > break for either an unaligned dtb (unlikely) or if you attempt to load > > > > an unaligned value from a property (more likely, but don't add the > > > > flag if you're not sure you don't need it). > > > > > > So long as it's abstracted in such a way that we don't grow the size of > > > everything again, yes, that is the right way forward I think. > > > > All the ASSUME flags should be resolved at compile time (at least with > > normal optimization levels enabled in the compiler), so testing for > > those shouldn't increase size at all. If they do, something is wrong. > > I'm saying that how ever this new ASSUME flag is done, it needs to be > done in such a way the compiler really will be smart about it. So > something like making a new function that does fdt64_ld() if we aren't > ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > ASSUME_ALIGNED_ACCESS. Ah, unaligned accesses again... To summarize, both performance and size suffer with not doing unaligned accesses. Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will do unaligned accesses? That would be more aligned with what the system can support rather than sanity checking associated with ASSUME_*. To repeat from last time, everything ARMv6 and up can do unaligned accesses if enabled. That's the vast majority of Arm systems that people care about. Whether that's enabled or not is up to how SCTLR.A is configured. Last I checked, u-boot clears this. Don't know about SPL case though. Rob ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <CAL_JsqJiKETTMJX3MsEmECE+jtbwYydVSgt1a6poz_L+pPRFTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Size growth? [not found] ` <CAL_JsqJiKETTMJX3MsEmECE+jtbwYydVSgt1a6poz_L+pPRFTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2020-10-26 22:17 ` Tom Rini 2020-10-27 15:57 ` André Przywara 1 sibling, 0 replies; 28+ messages in thread From: Tom Rini @ 2020-10-26 22:17 UTC (permalink / raw) To: Rob Herring; +Cc: David Gibson, Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 4266 bytes --] On Mon, Oct 26, 2020 at 04:51:20PM -0500, Rob Herring wrote: > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > > On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > > On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > [...] > > > > > > > But what does all of this _mean_ ? I kinda think I have an answer now. > > > > > > One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > > > 11738cf01f15 reduces it just a little. > > > > > > > > > > Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > > instead get intermittent bug reports where it just crashes. > > > > > > > > We really need to talk about that then. There was a problem of people > > > > turning off the sanity check for making sure the entire device tree was > > > > aligned and then having everything crash. > > > > > > Ok... I'm not really sure where you're going with that thought. > > > > In my reading of the mailing list history of how this issue came up, > > it was someone was booting a dragonboard or something, and they (or > > rather, the board maintainer set by default) the flag to use the device > > tree wherever it is in memory and NOT to relocate it to a properly > > aligned address. This in turn lead to the kernel getting an unaligned > > device tree and everything crashing. The "I know what I'm doing" flag > > was set, violated the documented requirements for device trees need to > > reside in memory and everything blew up. > > > > After that it was noticed that there could be some internal > > mis-alignment and if you tried those accesses on a CPU that doesn't > > support doing those reads easily there could be problems, but that's not > > a common at all case (as noted by it not having been seen in practice). > > Nor a problem on many environments to begin with. More below... > > > > > > I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > > break for either an unaligned dtb (unlikely) or if you attempt to load > > > > > an unaligned value from a property (more likely, but don't add the > > > > > flag if you're not sure you don't need it). > > > > > > > > So long as it's abstracted in such a way that we don't grow the size of > > > > everything again, yes, that is the right way forward I think. > > > > > > All the ASSUME flags should be resolved at compile time (at least with > > > normal optimization levels enabled in the compiler), so testing for > > > those shouldn't increase size at all. If they do, something is wrong. > > > > I'm saying that how ever this new ASSUME flag is done, it needs to be > > done in such a way the compiler really will be smart about it. So > > something like making a new function that does fdt64_ld() if we aren't > > ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > ASSUME_ALIGNED_ACCESS. > > Ah, unaligned accesses again... To summarize, both performance and > size suffer with not doing unaligned accesses. > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > do unaligned accesses? That would be more aligned with what the system > can support rather than sanity checking associated with ASSUME_*. I leave this up to the dtc folks. With my U-Boot hat on, we'll be defaulting to "unaligned is fine" for everyone. > To repeat from last time, everything ARMv6 and up can do unaligned > accesses if enabled. That's the vast majority of Arm systems that > people care about. Whether that's enabled or not is up to how SCTLR.A > is configured. Last I checked, u-boot clears this. Don't know about > SPL case though. I think it's something close to never do we ever have things configured such that truly unaligned access is a problem. The historical problem wasn't even that something wasn't at all aligned, it was 4-byte aligned when the requirement was 8-byte aligned and everything failed. Only later were totally unaligned parts of the tree found and that's still not a problem for how U-Boot uses things. -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Size growth? [not found] ` <CAL_JsqJiKETTMJX3MsEmECE+jtbwYydVSgt1a6poz_L+pPRFTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2020-10-26 22:17 ` Tom Rini @ 2020-10-27 15:57 ` André Przywara [not found] ` <a14daf09-4d97-052f-5071-09e67ccb925e-5wv7dgnIgG8@public.gmane.org> 1 sibling, 1 reply; 28+ messages in thread From: André Przywara @ 2020-10-27 15:57 UTC (permalink / raw) To: Rob Herring, Tom Rini; +Cc: David Gibson, Simon Glass, Devicetree Compiler On 26/10/2020 21:51, Rob Herring wrote: > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > [...] > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and >>>>>> 11738cf01f15 reduces it just a little. >>>>> >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we >>>>> instead get intermittent bug reports where it just crashes. >>>> >>>> We really need to talk about that then. There was a problem of people >>>> turning off the sanity check for making sure the entire device tree was >>>> aligned and then having everything crash. >>> >>> Ok... I'm not really sure where you're going with that thought. >> >> In my reading of the mailing list history of how this issue came up, >> it was someone was booting a dragonboard or something, and they (or >> rather, the board maintainer set by default) the flag to use the device >> tree wherever it is in memory and NOT to relocate it to a properly >> aligned address. This in turn lead to the kernel getting an unaligned >> device tree and everything crashing. The "I know what I'm doing" flag >> was set, violated the documented requirements for device trees need to >> reside in memory and everything blew up. >> >> After that it was noticed that there could be some internal >> mis-alignment and if you tried those accesses on a CPU that doesn't >> support doing those reads easily there could be problems, but that's not >> a common at all case (as noted by it not having been seen in practice). > > Nor a problem on many environments to begin with. More below... > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load >>>>> an unaligned value from a property (more likely, but don't add the >>>>> flag if you're not sure you don't need it). >>>> >>>> So long as it's abstracted in such a way that we don't grow the size of >>>> everything again, yes, that is the right way forward I think. >>> >>> All the ASSUME flags should be resolved at compile time (at least with >>> normal optimization levels enabled in the compiler), so testing for >>> those shouldn't increase size at all. If they do, something is wrong. >> >> I'm saying that how ever this new ASSUME flag is done, it needs to be >> done in such a way the compiler really will be smart about it. So >> something like making a new function that does fdt64_ld() if we aren't >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are >> ASSUME_ALIGNED_ACCESS. > > Ah, unaligned accesses again... To summarize, both performance and > size suffer with not doing unaligned accesses. > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > do unaligned accesses? That would be more aligned with what the system > can support rather than sanity checking associated with ASSUME_*. > > To repeat from last time, everything ARMv6 and up can do unaligned > accesses if enabled. But that requires the MMU to be enabled, doesn't it? If I read the ARM ARM correctly, unaligned accesses always trap on device memory, regardless of SCTLR.A. And without the MMU enabled everything is device memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to cope with that, and that most likely affects libfdt as well? Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled all the time, and I know of at least the sunxi-aarch64 SPL running with the MMU off as well. Cheers, Andre > people care about. Whether that's enabled or not is up to how SCTLR.A > is configured. Last I checked, u-boot clears this. Don't know about > SPL case though. > > Rob > ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <a14daf09-4d97-052f-5071-09e67ccb925e-5wv7dgnIgG8@public.gmane.org>]
* Re: Size growth? [not found] ` <a14daf09-4d97-052f-5071-09e67ccb925e-5wv7dgnIgG8@public.gmane.org> @ 2020-10-27 19:55 ` Rob Herring [not found] ` <CAL_JsqK_fC346UnCmXMJxKHCM6=eFBF_kmGt_fBdvwPXbPRkvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: Rob Herring @ 2020-10-27 19:55 UTC (permalink / raw) To: André Przywara Cc: Tom Rini, David Gibson, Simon Glass, Devicetree Compiler On Tue, Oct 27, 2020 at 10:58 AM Andr√© Przywara <andre.przywara-AbSShOkvfpQ@public.gmane.orgm> wrote: > > On 26/10/2020 21:51, Rob Herring wrote: > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > [...] > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > >>>>>> 11738cf01f15 reduces it just a little. > >>>>> > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > >>>>> instead get intermittent bug reports where it just crashes. > >>>> > >>>> We really need to talk about that then. There was a problem of people > >>>> turning off the sanity check for making sure the entire device tree was > >>>> aligned and then having everything crash. > >>> > >>> Ok... I'm not really sure where you're going with that thought. > >> > >> In my reading of the mailing list history of how this issue came up, > >> it was someone was booting a dragonboard or something, and they (or > >> rather, the board maintainer set by default) the flag to use the device > >> tree wherever it is in memory and NOT to relocate it to a properly > >> aligned address. This in turn lead to the kernel getting an unaligned > >> device tree and everything crashing. The "I know what I'm doing" flag > >> was set, violated the documented requirements for device trees need to > >> reside in memory and everything blew up. > >> > >> After that it was noticed that there could be some internal > >> mis-alignment and if you tried those accesses on a CPU that doesn't > >> support doing those reads easily there could be problems, but that's not > >> a common at all case (as noted by it not having been seen in practice). > > > > Nor a problem on many environments to begin with. More below... > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > >>>>> an unaligned value from a property (more likely, but don't add the > >>>>> flag if you're not sure you don't need it). > >>>> > >>>> So long as it's abstracted in such a way that we don't grow the size of > >>>> everything again, yes, that is the right way forward I think. > >>> > >>> All the ASSUME flags should be resolved at compile time (at least with > >>> normal optimization levels enabled in the compiler), so testing for > >>> those shouldn't increase size at all. If they do, something is wrong. > >> > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > >> done in such a way the compiler really will be smart about it. So > >> something like making a new function that does fdt64_ld() if we aren't > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > >> ASSUME_ALIGNED_ACCESS. > > > > Ah, unaligned accesses again... To summarize, both performance and > > size suffer with not doing unaligned accesses. > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > do unaligned accesses? That would be more aligned with what the system > > can support rather than sanity checking associated with ASSUME_*. > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > accesses if enabled. > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > ARM correctly, unaligned accesses always trap on device memory, > regardless of SCTLR.A. And without the MMU enabled everything is device > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > cope with that, and that most likely affects libfdt as well? Ah yes, I think you are right. In that case, seems like we should figure out whether (internal) unaligned accesses are possible with dtc generated dtbs at least rather than just "not a common at all case (as noted by it not having been seen in practice)." I'm sure David will point out that not all dtbs come from dtc, but all the ones u-boot deals with do in reality. > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > all the time, and I know of at least the sunxi-aarch64 SPL running with > the MMU off as well. I'm making a mental note of this for the next time performance issues come up. Rob ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <CAL_JsqK_fC346UnCmXMJxKHCM6=eFBF_kmGt_fBdvwPXbPRkvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Size growth? [not found] ` <CAL_JsqK_fC346UnCmXMJxKHCM6=eFBF_kmGt_fBdvwPXbPRkvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2020-10-28 4:26 ` David Gibson [not found] ` <20201028042601.GA5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: David Gibson @ 2020-10-28 4:26 UTC (permalink / raw) To: Rob Herring Cc: André Przywara, Tom Rini, Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 8311 bytes --] On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara-5wv7dgnIgG8@public.gmane.org> wrote: > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > [...] > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > >>>>>> 11738cf01f15 reduces it just a little. > > >>>>> > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > >>>>> instead get intermittent bug reports where it just crashes. > > >>>> > > >>>> We really need to talk about that then. There was a problem of people > > >>>> turning off the sanity check for making sure the entire device tree was > > >>>> aligned and then having everything crash. > > >>> > > >>> Ok... I'm not really sure where you're going with that thought. > > >> > > >> In my reading of the mailing list history of how this issue came up, > > >> it was someone was booting a dragonboard or something, and they (or > > >> rather, the board maintainer set by default) the flag to use the device > > >> tree wherever it is in memory and NOT to relocate it to a properly > > >> aligned address. This in turn lead to the kernel getting an unaligned > > >> device tree and everything crashing. The "I know what I'm doing" flag > > >> was set, violated the documented requirements for device trees need to > > >> reside in memory and everything blew up. > > >> > > >> After that it was noticed that there could be some internal > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > >> support doing those reads easily there could be problems, but that's not > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > Nor a problem on many environments to begin with. More below... > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > >>>>> an unaligned value from a property (more likely, but don't add the > > >>>>> flag if you're not sure you don't need it). > > >>>> > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > >>>> everything again, yes, that is the right way forward I think. > > >>> > > >>> All the ASSUME flags should be resolved at compile time (at least with > > >>> normal optimization levels enabled in the compiler), so testing for > > >>> those shouldn't increase size at all. If they do, something is wrong. > > >> > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > >> done in such a way the compiler really will be smart about it. So > > >> something like making a new function that does fdt64_ld() if we aren't > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > >> ASSUME_ALIGNED_ACCESS. > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > size suffer with not doing unaligned accesses. > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > do unaligned accesses? That would be more aligned with what the system > > > can support rather than sanity checking associated with ASSUME_*. So, there are kind of two things here, (1) is "my platform can handle unaligned accesses" and (2) is "assume I don't need unaligned accesses". We can use the fast & small versions of fdt32_ld() etc. if either is true. However we need to consider those separately, because they can be independently true (or not) for different reasons. (1) depends on the hardware, whereas (2) depends on how you're using dtc, and, see below, you may need at least unaligned-handling fdt64_ld() in more cases than you think. > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > accesses if enabled. > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > ARM correctly, unaligned accesses always trap on device memory, > > regardless of SCTLR.A. And without the MMU enabled everything is device > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > cope with that, and that most likely affects libfdt as well? > > Ah yes, I think you are right. > > In that case, seems like we should figure out whether (internal) > unaligned accesses are possible with dtc generated dtbs at least > rather than just "not a common at all case (as noted by it not having > been seen in practice)." I'm sure David will point out that not all > dtbs come from dtc, but all the ones u-boot deals with do in > reality. Assuming the blob itself is 8-byte aligned in memory, then all structural elements (i.e. the tree metadata) of a compliant dtb will be naturally aligned. The spec requires 8-byte alignment of the mem reserve block w.r.t. the base of the blob and 4 byte aligned structure block w.r.t. the base of the blob. Likewise the layout of the mem reserve block will preserve 8-byte alignment of all the 64-bit values it contains, assuming the block itself starts 8-byte aligned. Similarly the structure blob will preserve 4-byte alignment of all its tags and other structural data (this amounts to requiring an alignment gap after node names and property values). However, "all structural elements" does not include values within property values themselves. Assuming propery alignment of the blocks and the blob itself, then all property values will *begin* 4 byte aligned. However that leaves two relevant cases: a) 64-bit property values may be 4-byte aligned but not 8-byte aligned b) complex property values including both strings and integers typically use a packed representation with no alignment gaps. Such property structures are usually avoided in modern bindings, but they definitely exist in a bunch of older bindings. Obviously that means that integer values sitting after arbitrary length strings may not have any natural alignment So acccesses made by libfdt internally should be safe(*) assuming the blob itself is loaded 8-byte aligned, and the dtb is compliant. However the libfdt user may hit both problems (a) and (b) getting things they actually want from the tree. fdt{32,64}_{ld,st}() are intended to handle those cases, so that they're useful for the caller to pull things from properties as well as for libfdt internal accesses. (*) There are a number of other functions that looked like they might be dangerous for case (a) because they are based on 64-bit property values: fdt_setprop_inplace_u64(), fdt_property_u64(), fdt_setprop_u64(), fdt_appendprop_u64() and fdt_appendprop_addrrange(). However I think they're actually ok, because the way they're built in terms of other functions means there's implicitly a memcpy() from a byte buffer. > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > the MMU off as well. > > I'm making a mental note of this for the next time performance issues come up. Right, running early with MMU off is definitely a real use case for libfdt. For similar reasons we can't assume we have an OS which will trap and handle unaligned accesses, which we might for a more conventional userspace library. This kind of underscores why I'm a bit hesitant to introduce "my platform handles unaligned acccesses" flag. Not only does it require detailed knowledge of the target CPU, but it can also depend on exactly what mode that hardware is in. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20201028042601.GA5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>]
* Re: Size growth? [not found] ` <20201028042601.GA5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> @ 2020-10-28 12:05 ` Tom Rini 2020-10-29 2:55 ` David Gibson 2020-10-28 17:49 ` Rob Herring 1 sibling, 1 reply; 28+ messages in thread From: Tom Rini @ 2020-10-28 12:05 UTC (permalink / raw) To: David Gibson Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 8589 bytes --] On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote: > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote: > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > [...] > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > >>>>>> 11738cf01f15 reduces it just a little. > > > >>>>> > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > >>>>> instead get intermittent bug reports where it just crashes. > > > >>>> > > > >>>> We really need to talk about that then. There was a problem of people > > > >>>> turning off the sanity check for making sure the entire device tree was > > > >>>> aligned and then having everything crash. > > > >>> > > > >>> Ok... I'm not really sure where you're going with that thought. > > > >> > > > >> In my reading of the mailing list history of how this issue came up, > > > >> it was someone was booting a dragonboard or something, and they (or > > > >> rather, the board maintainer set by default) the flag to use the device > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > >> was set, violated the documented requirements for device trees need to > > > >> reside in memory and everything blew up. > > > >> > > > >> After that it was noticed that there could be some internal > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > >> support doing those reads easily there could be problems, but that's not > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > >>>>> flag if you're not sure you don't need it). > > > >>>> > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > >>>> everything again, yes, that is the right way forward I think. > > > >>> > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > >>> normal optimization levels enabled in the compiler), so testing for > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > >> > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > >> done in such a way the compiler really will be smart about it. So > > > >> something like making a new function that does fdt64_ld() if we aren't > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > size suffer with not doing unaligned accesses. > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > do unaligned accesses? That would be more aligned with what the system > > > > can support rather than sanity checking associated with ASSUME_*. > > So, there are kind of two things here, (1) is "my platform can handle > unaligned accesses" and (2) is "assume I don't need unaligned > accesses". We can use the fast & small versions of fdt32_ld() etc. if > either is true. However we need to consider those separately, because > they can be independently true (or not) for different reasons. (1) > depends on the hardware, whereas (2) depends on how you're using dtc, > and, see below, you may need at least unaligned-handling fdt64_ld() in > more cases than you think. > > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > accesses if enabled. > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > ARM correctly, unaligned accesses always trap on device memory, > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > cope with that, and that most likely affects libfdt as well? > > > > Ah yes, I think you are right. > > > > In that case, seems like we should figure out whether (internal) > > unaligned accesses are possible with dtc generated dtbs at least > > rather than just "not a common at all case (as noted by it not having > > been seen in practice)." I'm sure David will point out that not all > > dtbs come from dtc, but all the ones u-boot deals with do in > > reality. > > Assuming the blob itself is 8-byte aligned in memory, then all > structural elements (i.e. the tree metadata) of a compliant dtb will > be naturally aligned. The spec requires 8-byte alignment of the mem > reserve block w.r.t. the base of the blob and 4 byte aligned structure > block w.r.t. the base of the blob. Likewise the layout of the mem > reserve block will preserve 8-byte alignment of all the 64-bit values > it contains, assuming the block itself starts 8-byte aligned. > Similarly the structure blob will preserve 4-byte alignment of all its > tags and other structural data (this amounts to requiring an alignment > gap after node names and property values). > > However, "all structural elements" does not include values within > property values themselves. Assuming propery alignment of the blocks > and the blob itself, then all property values will *begin* 4 byte > aligned. However that leaves two relevant cases: > > a) 64-bit property values may be 4-byte aligned but not 8-byte > aligned > b) complex property values including both strings and integers > typically use a packed representation with no alignment gaps. > Such property structures are usually avoided in modern bindings, > but they definitely exist in a bunch of older bindings. Obviously > that means that integer values sitting after arbitrary length > strings may not have any natural alignment > > So acccesses made by libfdt internally should be safe(*) assuming the > blob itself is loaded 8-byte aligned, and the dtb is compliant. > However the libfdt user may hit both problems (a) and (b) getting > things they actually want from the tree. fdt{32,64}_{ld,st}() are > intended to handle those cases, so that they're useful for the caller > to pull things from properties as well as for libfdt internal > accesses. > > (*) There are a number of other functions that looked like they might > be dangerous for case (a) because they are based on 64-bit > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > fdt_setprop_u64(), fdt_appendprop_u64() and > fdt_appendprop_addrrange(). However I think they're actually > ok, because the way they're built in terms of other functions > means there's implicitly a memcpy() from a byte buffer. > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > the MMU off as well. > > > > I'm making a mental note of this for the next time performance issues come up. > > Right, running early with MMU off is definitely a real use case for > libfdt. For similar reasons we can't assume we have an OS which will > trap and handle unaligned accesses, which we might for a more > conventional userspace library. > > This kind of underscores why I'm a bit hesitant to introduce "my > platform handles unaligned acccesses" flag. Not only does it require > detailed knowledge of the target CPU, but it can also depend on > exactly what mode that hardware is in. Can you please note the existing user(s) where we have just the right combination of factors and so everything fails? -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Size growth? 2020-10-28 12:05 ` Tom Rini @ 2020-10-29 2:55 ` David Gibson [not found] ` <20201029025503.GI5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: David Gibson @ 2020-10-29 2:55 UTC (permalink / raw) To: Tom Rini; +Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 9181 bytes --] On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote: > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote: > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote: > > > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > > > [...] > > > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > >>>>>> 11738cf01f15 reduces it just a little. > > > > >>>>> > > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > >>>>> instead get intermittent bug reports where it just crashes. > > > > >>>> > > > > >>>> We really need to talk about that then. There was a problem of people > > > > >>>> turning off the sanity check for making sure the entire device tree was > > > > >>>> aligned and then having everything crash. > > > > >>> > > > > >>> Ok... I'm not really sure where you're going with that thought. > > > > >> > > > > >> In my reading of the mailing list history of how this issue came up, > > > > >> it was someone was booting a dragonboard or something, and they (or > > > > >> rather, the board maintainer set by default) the flag to use the device > > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > > >> was set, violated the documented requirements for device trees need to > > > > >> reside in memory and everything blew up. > > > > >> > > > > >> After that it was noticed that there could be some internal > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > > >> support doing those reads easily there could be problems, but that's not > > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > > >>>>> flag if you're not sure you don't need it). > > > > >>>> > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > > >>>> everything again, yes, that is the right way forward I think. > > > > >>> > > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > > >>> normal optimization levels enabled in the compiler), so testing for > > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > > >> > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > > >> done in such a way the compiler really will be smart about it. So > > > > >> something like making a new function that does fdt64_ld() if we aren't > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > > size suffer with not doing unaligned accesses. > > > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > > do unaligned accesses? That would be more aligned with what the system > > > > > can support rather than sanity checking associated with ASSUME_*. > > > > So, there are kind of two things here, (1) is "my platform can handle > > unaligned accesses" and (2) is "assume I don't need unaligned > > accesses". We can use the fast & small versions of fdt32_ld() etc. if > > either is true. However we need to consider those separately, because > > they can be independently true (or not) for different reasons. (1) > > depends on the hardware, whereas (2) depends on how you're using dtc, > > and, see below, you may need at least unaligned-handling fdt64_ld() in > > more cases than you think. > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > > accesses if enabled. > > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > > ARM correctly, unaligned accesses always trap on device memory, > > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > > cope with that, and that most likely affects libfdt as well? > > > > > > Ah yes, I think you are right. > > > > > > In that case, seems like we should figure out whether (internal) > > > unaligned accesses are possible with dtc generated dtbs at least > > > rather than just "not a common at all case (as noted by it not having > > > been seen in practice)." I'm sure David will point out that not all > > > dtbs come from dtc, but all the ones u-boot deals with do in > > > reality. > > > > Assuming the blob itself is 8-byte aligned in memory, then all > > structural elements (i.e. the tree metadata) of a compliant dtb will > > be naturally aligned. The spec requires 8-byte alignment of the mem > > reserve block w.r.t. the base of the blob and 4 byte aligned structure > > block w.r.t. the base of the blob. Likewise the layout of the mem > > reserve block will preserve 8-byte alignment of all the 64-bit values > > it contains, assuming the block itself starts 8-byte aligned. > > Similarly the structure blob will preserve 4-byte alignment of all its > > tags and other structural data (this amounts to requiring an alignment > > gap after node names and property values). > > > > However, "all structural elements" does not include values within > > property values themselves. Assuming propery alignment of the blocks > > and the blob itself, then all property values will *begin* 4 byte > > aligned. However that leaves two relevant cases: > > > > a) 64-bit property values may be 4-byte aligned but not 8-byte > > aligned > > b) complex property values including both strings and integers > > typically use a packed representation with no alignment gaps. > > Such property structures are usually avoided in modern bindings, > > but they definitely exist in a bunch of older bindings. Obviously > > that means that integer values sitting after arbitrary length > > strings may not have any natural alignment > > > > So acccesses made by libfdt internally should be safe(*) assuming the > > blob itself is loaded 8-byte aligned, and the dtb is compliant. > > However the libfdt user may hit both problems (a) and (b) getting > > things they actually want from the tree. fdt{32,64}_{ld,st}() are > > intended to handle those cases, so that they're useful for the caller > > to pull things from properties as well as for libfdt internal > > accesses. > > > > (*) There are a number of other functions that looked like they might > > be dangerous for case (a) because they are based on 64-bit > > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > > fdt_setprop_u64(), fdt_appendprop_u64() and > > fdt_appendprop_addrrange(). However I think they're actually > > ok, because the way they're built in terms of other functions > > means there's implicitly a memcpy() from a byte buffer. > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > > the MMU off as well. > > > > > > I'm making a mental note of this for the next time performance issues come up. > > > > Right, running early with MMU off is definitely a real use case for > > libfdt. For similar reasons we can't assume we have an OS which will > > trap and handle unaligned accesses, which we might for a more > > conventional userspace library. > > > > This kind of underscores why I'm a bit hesitant to introduce "my > > platform handles unaligned acccesses" flag. Not only does it require > > detailed knowledge of the target CPU, but it can also depend on > > exactly what mode that hardware is in. > > Can you please note the existing user(s) where we have just the right > combination of factors and so everything fails? Sorry, I don't understand the question. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20201029025503.GI5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>]
* Re: Size growth? [not found] ` <20201029025503.GI5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> @ 2020-10-29 15:06 ` Tom Rini 2020-10-29 15:48 ` Rob Herring 2020-11-02 2:06 ` David Gibson 0 siblings, 2 replies; 28+ messages in thread From: Tom Rini @ 2020-10-29 15:06 UTC (permalink / raw) To: David Gibson Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 9662 bytes --] On Thu, Oct 29, 2020 at 01:55:03PM +1100, David Gibson wrote: > On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote: > > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote: > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote: > > > > > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > > > > > [...] > > > > > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > > >>>>>> 11738cf01f15 reduces it just a little. > > > > > >>>>> > > > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > > >>>>> instead get intermittent bug reports where it just crashes. > > > > > >>>> > > > > > >>>> We really need to talk about that then. There was a problem of people > > > > > >>>> turning off the sanity check for making sure the entire device tree was > > > > > >>>> aligned and then having everything crash. > > > > > >>> > > > > > >>> Ok... I'm not really sure where you're going with that thought. > > > > > >> > > > > > >> In my reading of the mailing list history of how this issue came up, > > > > > >> it was someone was booting a dragonboard or something, and they (or > > > > > >> rather, the board maintainer set by default) the flag to use the device > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > > > >> was set, violated the documented requirements for device trees need to > > > > > >> reside in memory and everything blew up. > > > > > >> > > > > > >> After that it was noticed that there could be some internal > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > > > >> support doing those reads easily there could be problems, but that's not > > > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > > > >>>>> flag if you're not sure you don't need it). > > > > > >>>> > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > > > >>>> everything again, yes, that is the right way forward I think. > > > > > >>> > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > > > >>> normal optimization levels enabled in the compiler), so testing for > > > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > > > >> > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > > > >> done in such a way the compiler really will be smart about it. So > > > > > >> something like making a new function that does fdt64_ld() if we aren't > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > > > size suffer with not doing unaligned accesses. > > > > > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > > > do unaligned accesses? That would be more aligned with what the system > > > > > > can support rather than sanity checking associated with ASSUME_*. > > > > > > So, there are kind of two things here, (1) is "my platform can handle > > > unaligned accesses" and (2) is "assume I don't need unaligned > > > accesses". We can use the fast & small versions of fdt32_ld() etc. if > > > either is true. However we need to consider those separately, because > > > they can be independently true (or not) for different reasons. (1) > > > depends on the hardware, whereas (2) depends on how you're using dtc, > > > and, see below, you may need at least unaligned-handling fdt64_ld() in > > > more cases than you think. > > > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > > > accesses if enabled. > > > > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > > > ARM correctly, unaligned accesses always trap on device memory, > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > > > cope with that, and that most likely affects libfdt as well? > > > > > > > > Ah yes, I think you are right. > > > > > > > > In that case, seems like we should figure out whether (internal) > > > > unaligned accesses are possible with dtc generated dtbs at least > > > > rather than just "not a common at all case (as noted by it not having > > > > been seen in practice)." I'm sure David will point out that not all > > > > dtbs come from dtc, but all the ones u-boot deals with do in > > > > reality. > > > > > > Assuming the blob itself is 8-byte aligned in memory, then all > > > structural elements (i.e. the tree metadata) of a compliant dtb will > > > be naturally aligned. The spec requires 8-byte alignment of the mem > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure > > > block w.r.t. the base of the blob. Likewise the layout of the mem > > > reserve block will preserve 8-byte alignment of all the 64-bit values > > > it contains, assuming the block itself starts 8-byte aligned. > > > Similarly the structure blob will preserve 4-byte alignment of all its > > > tags and other structural data (this amounts to requiring an alignment > > > gap after node names and property values). > > > > > > However, "all structural elements" does not include values within > > > property values themselves. Assuming propery alignment of the blocks > > > and the blob itself, then all property values will *begin* 4 byte > > > aligned. However that leaves two relevant cases: > > > > > > a) 64-bit property values may be 4-byte aligned but not 8-byte > > > aligned > > > b) complex property values including both strings and integers > > > typically use a packed representation with no alignment gaps. > > > Such property structures are usually avoided in modern bindings, > > > but they definitely exist in a bunch of older bindings. Obviously > > > that means that integer values sitting after arbitrary length > > > strings may not have any natural alignment > > > > > > So acccesses made by libfdt internally should be safe(*) assuming the > > > blob itself is loaded 8-byte aligned, and the dtb is compliant. > > > However the libfdt user may hit both problems (a) and (b) getting > > > things they actually want from the tree. fdt{32,64}_{ld,st}() are > > > intended to handle those cases, so that they're useful for the caller > > > to pull things from properties as well as for libfdt internal > > > accesses. > > > > > > (*) There are a number of other functions that looked like they might > > > be dangerous for case (a) because they are based on 64-bit > > > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > > > fdt_setprop_u64(), fdt_appendprop_u64() and > > > fdt_appendprop_addrrange(). However I think they're actually > > > ok, because the way they're built in terms of other functions > > > means there's implicitly a memcpy() from a byte buffer. > > > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > > > the MMU off as well. > > > > > > > > I'm making a mental note of this for the next time performance issues come up. > > > > > > Right, running early with MMU off is definitely a real use case for > > > libfdt. For similar reasons we can't assume we have an OS which will > > > trap and handle unaligned accesses, which we might for a more > > > conventional userspace library. > > > > > > This kind of underscores why I'm a bit hesitant to introduce "my > > > platform handles unaligned acccesses" flag. Not only does it require > > > detailed knowledge of the target CPU, but it can also depend on > > > exactly what mode that hardware is in. > > > > Can you please note the existing user(s) where we have just the right > > combination of factors and so everything fails? > > Sorry, I don't understand the question. I'm asking what the platform(s) are that have the very specific "and here be failure" problem you're concerned with. I'm concerned that right now we're going to end up with larger pile of reverts to dtc in U-Boot rather than being able to just sync with the project properly again. -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Size growth? 2020-10-29 15:06 ` Tom Rini @ 2020-10-29 15:48 ` Rob Herring [not found] ` <CAL_JsqJUixnyZx-tu9EV8YZ-gSDE7i1jvMddnNZZWFzezaHftw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2020-11-02 2:06 ` David Gibson 1 sibling, 1 reply; 28+ messages in thread From: Rob Herring @ 2020-10-29 15:48 UTC (permalink / raw) To: Tom Rini Cc: David Gibson, André Przywara, Simon Glass, Devicetree Compiler On Thu, Oct 29, 2020 at 10:06 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > On Thu, Oct 29, 2020 at 01:55:03PM +1100, David Gibson wrote: > > On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote: > > > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote: > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > > > > On Tue, Oct 27, 2020 at 10:58 AM Andr√© Przywara <andre.przywara-5wv7dgnIgG8@public.gmane.org> wrote: > > > > > > > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > > > >>>>>> 11738cf01f15 reduces it just a little. > > > > > > >>>>> > > > > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > > > >>>>> instead get intermittent bug reports where it just crashes. > > > > > > >>>> > > > > > > >>>> We really need to talk about that then. There was a problem of people > > > > > > >>>> turning off the sanity check for making sure the entire device tree was > > > > > > >>>> aligned and then having everything crash. > > > > > > >>> > > > > > > >>> Ok... I'm not really sure where you're going with that thought. > > > > > > >> > > > > > > >> In my reading of the mailing list history of how this issue came up, > > > > > > >> it was someone was booting a dragonboard or something, and they (or > > > > > > >> rather, the board maintainer set by default) the flag to use the device > > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > > > > >> was set, violated the documented requirements for device trees need to > > > > > > >> reside in memory and everything blew up. > > > > > > >> > > > > > > >> After that it was noticed that there could be some internal > > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > > > > >> support doing those reads easily there could be problems, but that's not > > > > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > > > > >>>>> flag if you're not sure you don't need it). > > > > > > >>>> > > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > > > > >>>> everything again, yes, that is the right way forward I think. > > > > > > >>> > > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > > > > >>> normal optimization levels enabled in the compiler), so testing for > > > > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > > > > >> > > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > > > > >> done in such a way the compiler really will be smart about it. So > > > > > > >> something like making a new function that does fdt64_ld() if we aren't > > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > > > > size suffer with not doing unaligned accesses. > > > > > > > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > > > > do unaligned accesses? That would be more aligned with what the system > > > > > > > can support rather than sanity checking associated with ASSUME_*. > > > > > > > > So, there are kind of two things here, (1) is "my platform can handle > > > > unaligned accesses" and (2) is "assume I don't need unaligned > > > > accesses". We can use the fast & small versions of fdt32_ld() etc. if > > > > either is true. However we need to consider those separately, because > > > > they can be independently true (or not) for different reasons. (1) > > > > depends on the hardware, whereas (2) depends on how you're using dtc, > > > > and, see below, you may need at least unaligned-handling fdt64_ld() in > > > > more cases than you think. > > > > > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > > > > accesses if enabled. > > > > > > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > > > > ARM correctly, unaligned accesses always trap on device memory, > > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > > > > cope with that, and that most likely affects libfdt as well? > > > > > > > > > > Ah yes, I think you are right. > > > > > > > > > > In that case, seems like we should figure out whether (internal) > > > > > unaligned accesses are possible with dtc generated dtbs at least > > > > > rather than just "not a common at all case (as noted by it not having > > > > > been seen in practice)." I'm sure David will point out that not all > > > > > dtbs come from dtc, but all the ones u-boot deals with do in > > > > > reality. > > > > > > > > Assuming the blob itself is 8-byte aligned in memory, then all > > > > structural elements (i.e. the tree metadata) of a compliant dtb will > > > > be naturally aligned. The spec requires 8-byte alignment of the mem > > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure > > > > block w.r.t. the base of the blob. Likewise the layout of the mem > > > > reserve block will preserve 8-byte alignment of all the 64-bit values > > > > it contains, assuming the block itself starts 8-byte aligned. > > > > Similarly the structure blob will preserve 4-byte alignment of all its > > > > tags and other structural data (this amounts to requiring an alignment > > > > gap after node names and property values). > > > > > > > > However, "all structural elements" does not include values within > > > > property values themselves. Assuming propery alignment of the blocks > > > > and the blob itself, then all property values will *begin* 4 byte > > > > aligned. However that leaves two relevant cases: > > > > > > > > a) 64-bit property values may be 4-byte aligned but not 8-byte > > > > aligned > > > > b) complex property values including both strings and integers > > > > typically use a packed representation with no alignment gaps. > > > > Such property structures are usually avoided in modern bindings, > > > > but they definitely exist in a bunch of older bindings. Obviously > > > > that means that integer values sitting after arbitrary length > > > > strings may not have any natural alignment > > > > > > > > So acccesses made by libfdt internally should be safe(*) assuming the > > > > blob itself is loaded 8-byte aligned, and the dtb is compliant. > > > > However the libfdt user may hit both problems (a) and (b) getting > > > > things they actually want from the tree. fdt{32,64}_{ld,st}() are > > > > intended to handle those cases, so that they're useful for the caller > > > > to pull things from properties as well as for libfdt internal > > > > accesses. > > > > > > > > (*) There are a number of other functions that looked like they might > > > > be dangerous for case (a) because they are based on 64-bit > > > > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > > > > fdt_setprop_u64(), fdt_appendprop_u64() and > > > > fdt_appendprop_addrrange(). However I think they're actually > > > > ok, because the way they're built in terms of other functions > > > > means there's implicitly a memcpy() from a byte buffer. > > > > > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > > > > the MMU off as well. > > > > > > > > > > I'm making a mental note of this for the next time performance issues come up. > > > > > > > > Right, running early with MMU off is definitely a real use case for > > > > libfdt. For similar reasons we can't assume we have an OS which will > > > > trap and handle unaligned accesses, which we might for a more > > > > conventional userspace library. > > > > > > > > This kind of underscores why I'm a bit hesitant to introduce "my > > > > platform handles unaligned acccesses" flag. Not only does it require > > > > detailed knowledge of the target CPU, but it can also depend on > > > > exactly what mode that hardware is in. > > > > > > Can you please note the existing user(s) where we have just the right > > > combination of factors and so everything fails? > > > > Sorry, I don't understand the question. > > I'm asking what the platform(s) are that have the very specific "and > here be failure" problem you're concerned with. I'm also and still confused with your question. > I'm concerned that > right now we're going to end up with larger pile of reverts to dtc in > U-Boot rather than being able to just sync with the project properly > again. I think we have some agreement which I believe would end being the revert you originally submitted, but just keep the fdt*_ld() accessors which always do safe accesses. Rob ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <CAL_JsqJUixnyZx-tu9EV8YZ-gSDE7i1jvMddnNZZWFzezaHftw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Size growth? [not found] ` <CAL_JsqJUixnyZx-tu9EV8YZ-gSDE7i1jvMddnNZZWFzezaHftw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2020-10-29 16:04 ` Tom Rini 2020-10-29 18:08 ` Rob Herring 0 siblings, 1 reply; 28+ messages in thread From: Tom Rini @ 2020-10-29 16:04 UTC (permalink / raw) To: Rob Herring Cc: David Gibson, André Przywara, Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 11198 bytes --] On Thu, Oct 29, 2020 at 10:48:28AM -0500, Rob Herring wrote: > On Thu, Oct 29, 2020 at 10:06 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > On Thu, Oct 29, 2020 at 01:55:03PM +1100, David Gibson wrote: > > > On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote: > > > > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote: > > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara-5wv7dgnIgG8@public.gmane.org> wrote: > > > > > > > > > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini@konsulko.com> wrote: > > > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > > > > >>>>>> 11738cf01f15 reduces it just a little. > > > > > > > >>>>> > > > > > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > > > > >>>>> instead get intermittent bug reports where it just crashes. > > > > > > > >>>> > > > > > > > >>>> We really need to talk about that then. There was a problem of people > > > > > > > >>>> turning off the sanity check for making sure the entire device tree was > > > > > > > >>>> aligned and then having everything crash. > > > > > > > >>> > > > > > > > >>> Ok... I'm not really sure where you're going with that thought. > > > > > > > >> > > > > > > > >> In my reading of the mailing list history of how this issue came up, > > > > > > > >> it was someone was booting a dragonboard or something, and they (or > > > > > > > >> rather, the board maintainer set by default) the flag to use the device > > > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > > > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > > > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > > > > > >> was set, violated the documented requirements for device trees need to > > > > > > > >> reside in memory and everything blew up. > > > > > > > >> > > > > > > > >> After that it was noticed that there could be some internal > > > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > > > > > >> support doing those reads easily there could be problems, but that's not > > > > > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > > > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > > > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > > > > > >>>>> flag if you're not sure you don't need it). > > > > > > > >>>> > > > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > > > > > >>>> everything again, yes, that is the right way forward I think. > > > > > > > >>> > > > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > > > > > >>> normal optimization levels enabled in the compiler), so testing for > > > > > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > > > > > >> > > > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > > > > > >> done in such a way the compiler really will be smart about it. So > > > > > > > >> something like making a new function that does fdt64_ld() if we aren't > > > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > > > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > > > > > size suffer with not doing unaligned accesses. > > > > > > > > > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > > > > > do unaligned accesses? That would be more aligned with what the system > > > > > > > > can support rather than sanity checking associated with ASSUME_*. > > > > > > > > > > So, there are kind of two things here, (1) is "my platform can handle > > > > > unaligned accesses" and (2) is "assume I don't need unaligned > > > > > accesses". We can use the fast & small versions of fdt32_ld() etc. if > > > > > either is true. However we need to consider those separately, because > > > > > they can be independently true (or not) for different reasons. (1) > > > > > depends on the hardware, whereas (2) depends on how you're using dtc, > > > > > and, see below, you may need at least unaligned-handling fdt64_ld() in > > > > > more cases than you think. > > > > > > > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > > > > > accesses if enabled. > > > > > > > > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > > > > > ARM correctly, unaligned accesses always trap on device memory, > > > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > > > > > cope with that, and that most likely affects libfdt as well? > > > > > > > > > > > > Ah yes, I think you are right. > > > > > > > > > > > > In that case, seems like we should figure out whether (internal) > > > > > > unaligned accesses are possible with dtc generated dtbs at least > > > > > > rather than just "not a common at all case (as noted by it not having > > > > > > been seen in practice)." I'm sure David will point out that not all > > > > > > dtbs come from dtc, but all the ones u-boot deals with do in > > > > > > reality. > > > > > > > > > > Assuming the blob itself is 8-byte aligned in memory, then all > > > > > structural elements (i.e. the tree metadata) of a compliant dtb will > > > > > be naturally aligned. The spec requires 8-byte alignment of the mem > > > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure > > > > > block w.r.t. the base of the blob. Likewise the layout of the mem > > > > > reserve block will preserve 8-byte alignment of all the 64-bit values > > > > > it contains, assuming the block itself starts 8-byte aligned. > > > > > Similarly the structure blob will preserve 4-byte alignment of all its > > > > > tags and other structural data (this amounts to requiring an alignment > > > > > gap after node names and property values). > > > > > > > > > > However, "all structural elements" does not include values within > > > > > property values themselves. Assuming propery alignment of the blocks > > > > > and the blob itself, then all property values will *begin* 4 byte > > > > > aligned. However that leaves two relevant cases: > > > > > > > > > > a) 64-bit property values may be 4-byte aligned but not 8-byte > > > > > aligned > > > > > b) complex property values including both strings and integers > > > > > typically use a packed representation with no alignment gaps. > > > > > Such property structures are usually avoided in modern bindings, > > > > > but they definitely exist in a bunch of older bindings. Obviously > > > > > that means that integer values sitting after arbitrary length > > > > > strings may not have any natural alignment > > > > > > > > > > So acccesses made by libfdt internally should be safe(*) assuming the > > > > > blob itself is loaded 8-byte aligned, and the dtb is compliant. > > > > > However the libfdt user may hit both problems (a) and (b) getting > > > > > things they actually want from the tree. fdt{32,64}_{ld,st}() are > > > > > intended to handle those cases, so that they're useful for the caller > > > > > to pull things from properties as well as for libfdt internal > > > > > accesses. > > > > > > > > > > (*) There are a number of other functions that looked like they might > > > > > be dangerous for case (a) because they are based on 64-bit > > > > > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > > > > > fdt_setprop_u64(), fdt_appendprop_u64() and > > > > > fdt_appendprop_addrrange(). However I think they're actually > > > > > ok, because the way they're built in terms of other functions > > > > > means there's implicitly a memcpy() from a byte buffer. > > > > > > > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > > > > > the MMU off as well. > > > > > > > > > > > > I'm making a mental note of this for the next time performance issues come up. > > > > > > > > > > Right, running early with MMU off is definitely a real use case for > > > > > libfdt. For similar reasons we can't assume we have an OS which will > > > > > trap and handle unaligned accesses, which we might for a more > > > > > conventional userspace library. > > > > > > > > > > This kind of underscores why I'm a bit hesitant to introduce "my > > > > > platform handles unaligned acccesses" flag. Not only does it require > > > > > detailed knowledge of the target CPU, but it can also depend on > > > > > exactly what mode that hardware is in. > > > > > > > > Can you please note the existing user(s) where we have just the right > > > > combination of factors and so everything fails? > > > > > > Sorry, I don't understand the question. > > > > I'm asking what the platform(s) are that have the very specific "and > > here be failure" problem you're concerned with. > > I'm also and still confused with your question. Maybe I'm just confused then. I'm asking what hardware ends up causing a fault when we read a property that starts at 0x....3 and we would not have already enabled the MMU and done whatever else is required to have the fault fixed up and handled. That's not the case in U-Boot. But that being a possible case seems to be where the underlying concern is. > > I'm concerned that > > right now we're going to end up with larger pile of reverts to dtc in > > U-Boot rather than being able to just sync with the project properly > > again. > > I think we have some agreement which I believe would end being the > revert you originally submitted, but just keep the fdt*_ld() accessors > which always do safe accesses. So keep fdt{32,64}_ld() and switch all of the callers back to fdt{32,64}_to_cpu()? OK, should I spin up a patch or ? Thanks! -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Size growth? 2020-10-29 16:04 ` Tom Rini @ 2020-10-29 18:08 ` Rob Herring [not found] ` <CAL_JsqJTTJAoTYwxDn3i0KETMXLyGg3WXzxN3-OdRLx=R96a-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: Rob Herring @ 2020-10-29 18:08 UTC (permalink / raw) To: Tom Rini Cc: David Gibson, André Przywara, Simon Glass, Devicetree Compiler On Thu, Oct 29, 2020 at 11:04 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > On Thu, Oct 29, 2020 at 10:48:28AM -0500, Rob Herring wrote: > > On Thu, Oct 29, 2020 at 10:06 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > > On Thu, Oct 29, 2020 at 01:55:03PM +1100, David Gibson wrote: > > > > On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote: > > > > > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote: > > > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > > > > > > On Tue, Oct 27, 2020 at 10:58 AM Andr√© Przywara <andre.przywara-5wv7dgnIgG8@public.gmane.org> wrote: > > > > > > > > > > > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini@konsulko.com> wrote: > > > > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > > > > > >>>>>> 11738cf01f15 reduces it just a little. > > > > > > > > >>>>> > > > > > > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > > > > > >>>>> instead get intermittent bug reports where it just crashes. > > > > > > > > >>>> > > > > > > > > >>>> We really need to talk about that then. There was a problem of people > > > > > > > > >>>> turning off the sanity check for making sure the entire device tree was > > > > > > > > >>>> aligned and then having everything crash. > > > > > > > > >>> > > > > > > > > >>> Ok... I'm not really sure where you're going with that thought. > > > > > > > > >> > > > > > > > > >> In my reading of the mailing list history of how this issue came up, > > > > > > > > >> it was someone was booting a dragonboard or something, and they (or > > > > > > > > >> rather, the board maintainer set by default) the flag to use the device > > > > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > > > > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > > > > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > > > > > > >> was set, violated the documented requirements for device trees need to > > > > > > > > >> reside in memory and everything blew up. > > > > > > > > >> > > > > > > > > >> After that it was noticed that there could be some internal > > > > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > > > > > > >> support doing those reads easily there could be problems, but that's not > > > > > > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > > > > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > > > > > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > > > > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > > > > > > >>>>> flag if you're not sure you don't need it). > > > > > > > > >>>> > > > > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > > > > > > >>>> everything again, yes, that is the right way forward I think. > > > > > > > > >>> > > > > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > > > > > > >>> normal optimization levels enabled in the compiler), so testing for > > > > > > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > > > > > > >> > > > > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > > > > > > >> done in such a way the compiler really will be smart about it. So > > > > > > > > >> something like making a new function that does fdt64_ld() if we aren't > > > > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > > > > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > > > > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > > > > > > size suffer with not doing unaligned accesses. > > > > > > > > > > > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > > > > > > do unaligned accesses? That would be more aligned with what the system > > > > > > > > > can support rather than sanity checking associated with ASSUME_*. > > > > > > > > > > > > So, there are kind of two things here, (1) is "my platform can handle > > > > > > unaligned accesses" and (2) is "assume I don't need unaligned > > > > > > accesses". We can use the fast & small versions of fdt32_ld() etc. if > > > > > > either is true. However we need to consider those separately, because > > > > > > they can be independently true (or not) for different reasons. (1) > > > > > > depends on the hardware, whereas (2) depends on how you're using dtc, > > > > > > and, see below, you may need at least unaligned-handling fdt64_ld() in > > > > > > more cases than you think. > > > > > > > > > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > > > > > > accesses if enabled. > > > > > > > > > > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > > > > > > ARM correctly, unaligned accesses always trap on device memory, > > > > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > > > > > > cope with that, and that most likely affects libfdt as well? > > > > > > > > > > > > > > Ah yes, I think you are right. > > > > > > > > > > > > > > In that case, seems like we should figure out whether (internal) > > > > > > > unaligned accesses are possible with dtc generated dtbs at least > > > > > > > rather than just "not a common at all case (as noted by it not having > > > > > > > been seen in practice)." I'm sure David will point out that not all > > > > > > > dtbs come from dtc, but all the ones u-boot deals with do in > > > > > > > reality. > > > > > > > > > > > > Assuming the blob itself is 8-byte aligned in memory, then all > > > > > > structural elements (i.e. the tree metadata) of a compliant dtb will > > > > > > be naturally aligned. The spec requires 8-byte alignment of the mem > > > > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure > > > > > > block w.r.t. the base of the blob. Likewise the layout of the mem > > > > > > reserve block will preserve 8-byte alignment of all the 64-bit values > > > > > > it contains, assuming the block itself starts 8-byte aligned. > > > > > > Similarly the structure blob will preserve 4-byte alignment of all its > > > > > > tags and other structural data (this amounts to requiring an alignment > > > > > > gap after node names and property values). > > > > > > > > > > > > However, "all structural elements" does not include values within > > > > > > property values themselves. Assuming propery alignment of the blocks > > > > > > and the blob itself, then all property values will *begin* 4 byte > > > > > > aligned. However that leaves two relevant cases: > > > > > > > > > > > > a) 64-bit property values may be 4-byte aligned but not 8-byte > > > > > > aligned > > > > > > b) complex property values including both strings and integers > > > > > > typically use a packed representation with no alignment gaps. > > > > > > Such property structures are usually avoided in modern bindings, > > > > > > but they definitely exist in a bunch of older bindings. Obviously > > > > > > that means that integer values sitting after arbitrary length > > > > > > strings may not have any natural alignment > > > > > > > > > > > > So acccesses made by libfdt internally should be safe(*) assuming the > > > > > > blob itself is loaded 8-byte aligned, and the dtb is compliant. > > > > > > However the libfdt user may hit both problems (a) and (b) getting > > > > > > things they actually want from the tree. fdt{32,64}_{ld,st}() are > > > > > > intended to handle those cases, so that they're useful for the caller > > > > > > to pull things from properties as well as for libfdt internal > > > > > > accesses. > > > > > > > > > > > > (*) There are a number of other functions that looked like they might > > > > > > be dangerous for case (a) because they are based on 64-bit > > > > > > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > > > > > > fdt_setprop_u64(), fdt_appendprop_u64() and > > > > > > fdt_appendprop_addrrange(). However I think they're actually > > > > > > ok, because the way they're built in terms of other functions > > > > > > means there's implicitly a memcpy() from a byte buffer. > > > > > > > > > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > > > > > > the MMU off as well. > > > > > > > > > > > > > > I'm making a mental note of this for the next time performance issues come up. > > > > > > > > > > > > Right, running early with MMU off is definitely a real use case for > > > > > > libfdt. For similar reasons we can't assume we have an OS which will > > > > > > trap and handle unaligned accesses, which we might for a more > > > > > > conventional userspace library. > > > > > > > > > > > > This kind of underscores why I'm a bit hesitant to introduce "my > > > > > > platform handles unaligned acccesses" flag. Not only does it require > > > > > > detailed knowledge of the target CPU, but it can also depend on > > > > > > exactly what mode that hardware is in. > > > > > > > > > > Can you please note the existing user(s) where we have just the right > > > > > combination of factors and so everything fails? > > > > > > > > Sorry, I don't understand the question. > > > > > > I'm asking what the platform(s) are that have the very specific "and > > > here be failure" problem you're concerned with. > > > > I'm also and still confused with your question. > > Maybe I'm just confused then. I'm asking what hardware ends up causing > a fault when we read a property that starts at 0x....3 and we would not > have already enabled the MMU and done whatever else is required to have > the fault fixed up and handled. Ah, well ARMv4 and v5 systems. Outside of that, I don't know. But as David said, properties will start with 4 byte alignment. It's only accesses within a property value that could be a problem, but those would be done by the user of libfdt and probably already made them alignment safe if needed (or they use fdt{32,64}_ld() if relatively recently written). > That's not the case in U-Boot. But it is for u-boot as Andre pointed out if the MMU is off. > But > that being a possible case seems to be where the underlying concern is. > > > > I'm concerned that > > > right now we're going to end up with larger pile of reverts to dtc in > > > U-Boot rather than being able to just sync with the project properly > > > again. > > > > I think we have some agreement which I believe would end being the > > revert you originally submitted, but just keep the fdt*_ld() accessors > > which always do safe accesses. > > So keep fdt{32,64}_ld() and switch all of the callers back to > fdt{32,64}_to_cpu()? Right. OK, should I spin up a patch or ? Yes. Rob ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <CAL_JsqJTTJAoTYwxDn3i0KETMXLyGg3WXzxN3-OdRLx=R96a-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Size growth? [not found] ` <CAL_JsqJTTJAoTYwxDn3i0KETMXLyGg3WXzxN3-OdRLx=R96a-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2020-10-29 20:21 ` Tom Rini 0 siblings, 0 replies; 28+ messages in thread From: Tom Rini @ 2020-10-29 20:21 UTC (permalink / raw) To: Rob Herring Cc: David Gibson, André Przywara, Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 13048 bytes --] On Thu, Oct 29, 2020 at 01:08:37PM -0500, Rob Herring wrote: > On Thu, Oct 29, 2020 at 11:04 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > On Thu, Oct 29, 2020 at 10:48:28AM -0500, Rob Herring wrote: > > > On Thu, Oct 29, 2020 at 10:06 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > > > > On Thu, Oct 29, 2020 at 01:55:03PM +1100, David Gibson wrote: > > > > > On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote: > > > > > > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote: > > > > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > > > > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara-5wv7dgnIgG8@public.gmane.org> wrote: > > > > > > > > > > > > > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini@konsulko.com> wrote: > > > > > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > > > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > > > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > > > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > > > > > > >>>>>> 11738cf01f15 reduces it just a little. > > > > > > > > > >>>>> > > > > > > > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > > > > > > >>>>> instead get intermittent bug reports where it just crashes. > > > > > > > > > >>>> > > > > > > > > > >>>> We really need to talk about that then. There was a problem of people > > > > > > > > > >>>> turning off the sanity check for making sure the entire device tree was > > > > > > > > > >>>> aligned and then having everything crash. > > > > > > > > > >>> > > > > > > > > > >>> Ok... I'm not really sure where you're going with that thought. > > > > > > > > > >> > > > > > > > > > >> In my reading of the mailing list history of how this issue came up, > > > > > > > > > >> it was someone was booting a dragonboard or something, and they (or > > > > > > > > > >> rather, the board maintainer set by default) the flag to use the device > > > > > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > > > > > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > > > > > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > > > > > > > >> was set, violated the documented requirements for device trees need to > > > > > > > > > >> reside in memory and everything blew up. > > > > > > > > > >> > > > > > > > > > >> After that it was noticed that there could be some internal > > > > > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > > > > > > > >> support doing those reads easily there could be problems, but that's not > > > > > > > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > > > > > > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > > > > > > > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > > > > > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > > > > > > > >>>>> flag if you're not sure you don't need it). > > > > > > > > > >>>> > > > > > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > > > > > > > >>>> everything again, yes, that is the right way forward I think. > > > > > > > > > >>> > > > > > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > > > > > > > >>> normal optimization levels enabled in the compiler), so testing for > > > > > > > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > > > > > > > >> > > > > > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > > > > > > > >> done in such a way the compiler really will be smart about it. So > > > > > > > > > >> something like making a new function that does fdt64_ld() if we aren't > > > > > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > > > > > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > > > > > > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > > > > > > > size suffer with not doing unaligned accesses. > > > > > > > > > > > > > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > > > > > > > do unaligned accesses? That would be more aligned with what the system > > > > > > > > > > can support rather than sanity checking associated with ASSUME_*. > > > > > > > > > > > > > > So, there are kind of two things here, (1) is "my platform can handle > > > > > > > unaligned accesses" and (2) is "assume I don't need unaligned > > > > > > > accesses". We can use the fast & small versions of fdt32_ld() etc. if > > > > > > > either is true. However we need to consider those separately, because > > > > > > > they can be independently true (or not) for different reasons. (1) > > > > > > > depends on the hardware, whereas (2) depends on how you're using dtc, > > > > > > > and, see below, you may need at least unaligned-handling fdt64_ld() in > > > > > > > more cases than you think. > > > > > > > > > > > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > > > > > > > accesses if enabled. > > > > > > > > > > > > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > > > > > > > ARM correctly, unaligned accesses always trap on device memory, > > > > > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > > > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > > > > > > > cope with that, and that most likely affects libfdt as well? > > > > > > > > > > > > > > > > Ah yes, I think you are right. > > > > > > > > > > > > > > > > In that case, seems like we should figure out whether (internal) > > > > > > > > unaligned accesses are possible with dtc generated dtbs at least > > > > > > > > rather than just "not a common at all case (as noted by it not having > > > > > > > > been seen in practice)." I'm sure David will point out that not all > > > > > > > > dtbs come from dtc, but all the ones u-boot deals with do in > > > > > > > > reality. > > > > > > > > > > > > > > Assuming the blob itself is 8-byte aligned in memory, then all > > > > > > > structural elements (i.e. the tree metadata) of a compliant dtb will > > > > > > > be naturally aligned. The spec requires 8-byte alignment of the mem > > > > > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure > > > > > > > block w.r.t. the base of the blob. Likewise the layout of the mem > > > > > > > reserve block will preserve 8-byte alignment of all the 64-bit values > > > > > > > it contains, assuming the block itself starts 8-byte aligned. > > > > > > > Similarly the structure blob will preserve 4-byte alignment of all its > > > > > > > tags and other structural data (this amounts to requiring an alignment > > > > > > > gap after node names and property values). > > > > > > > > > > > > > > However, "all structural elements" does not include values within > > > > > > > property values themselves. Assuming propery alignment of the blocks > > > > > > > and the blob itself, then all property values will *begin* 4 byte > > > > > > > aligned. However that leaves two relevant cases: > > > > > > > > > > > > > > a) 64-bit property values may be 4-byte aligned but not 8-byte > > > > > > > aligned > > > > > > > b) complex property values including both strings and integers > > > > > > > typically use a packed representation with no alignment gaps. > > > > > > > Such property structures are usually avoided in modern bindings, > > > > > > > but they definitely exist in a bunch of older bindings. Obviously > > > > > > > that means that integer values sitting after arbitrary length > > > > > > > strings may not have any natural alignment > > > > > > > > > > > > > > So acccesses made by libfdt internally should be safe(*) assuming the > > > > > > > blob itself is loaded 8-byte aligned, and the dtb is compliant. > > > > > > > However the libfdt user may hit both problems (a) and (b) getting > > > > > > > things they actually want from the tree. fdt{32,64}_{ld,st}() are > > > > > > > intended to handle those cases, so that they're useful for the caller > > > > > > > to pull things from properties as well as for libfdt internal > > > > > > > accesses. > > > > > > > > > > > > > > (*) There are a number of other functions that looked like they might > > > > > > > be dangerous for case (a) because they are based on 64-bit > > > > > > > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > > > > > > > fdt_setprop_u64(), fdt_appendprop_u64() and > > > > > > > fdt_appendprop_addrrange(). However I think they're actually > > > > > > > ok, because the way they're built in terms of other functions > > > > > > > means there's implicitly a memcpy() from a byte buffer. > > > > > > > > > > > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > > > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > > > > > > > the MMU off as well. > > > > > > > > > > > > > > > > I'm making a mental note of this for the next time performance issues come up. > > > > > > > > > > > > > > Right, running early with MMU off is definitely a real use case for > > > > > > > libfdt. For similar reasons we can't assume we have an OS which will > > > > > > > trap and handle unaligned accesses, which we might for a more > > > > > > > conventional userspace library. > > > > > > > > > > > > > > This kind of underscores why I'm a bit hesitant to introduce "my > > > > > > > platform handles unaligned acccesses" flag. Not only does it require > > > > > > > detailed knowledge of the target CPU, but it can also depend on > > > > > > > exactly what mode that hardware is in. > > > > > > > > > > > > Can you please note the existing user(s) where we have just the right > > > > > > combination of factors and so everything fails? > > > > > > > > > > Sorry, I don't understand the question. > > > > > > > > I'm asking what the platform(s) are that have the very specific "and > > > > here be failure" problem you're concerned with. > > > > > > I'm also and still confused with your question. > > > > Maybe I'm just confused then. I'm asking what hardware ends up causing > > a fault when we read a property that starts at 0x....3 and we would not > > have already enabled the MMU and done whatever else is required to have > > the fault fixed up and handled. > > Ah, well ARMv4 and v5 systems. Outside of that, I don't know. Right, I don't think anything other than those cases has this specific potential problem. Which isn't a common case, but a potential case. > But as David said, properties will start with 4 byte alignment. It's > only accesses within a property value that could be a problem, but > those would be done by the user of libfdt and probably already made > them alignment safe if needed (or they use fdt{32,64}_ld() if > relatively recently written). > > > That's not the case in U-Boot. > > But it is for u-boot as Andre pointed out if the MMU is off. But that's just it, the MMU isn't, I would swear, off, at this point for anything semi modern. "Everyone" wants dcache, so we have to do some simple MMU setup. We care about that before we care about the device tree. Unless I'm way off in the weeds in my thinking right now. > > But > > that being a possible case seems to be where the underlying concern is. > > > > > > I'm concerned that > > > > right now we're going to end up with larger pile of reverts to dtc in > > > > U-Boot rather than being able to just sync with the project properly > > > > again. > > > > > > I think we have some agreement which I believe would end being the > > > revert you originally submitted, but just keep the fdt*_ld() accessors > > > which always do safe accesses. > > > > So keep fdt{32,64}_ld() and switch all of the callers back to > > fdt{32,64}_to_cpu()? > > Right. > > OK, should I spin up a patch or ? > > Yes. I'll put it on the TODO list, thanks. -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Size growth? 2020-10-29 15:06 ` Tom Rini 2020-10-29 15:48 ` Rob Herring @ 2020-11-02 2:06 ` David Gibson 1 sibling, 0 replies; 28+ messages in thread From: David Gibson @ 2020-11-02 2:06 UTC (permalink / raw) To: Tom Rini; +Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 10330 bytes --] On Thu, Oct 29, 2020 at 11:06:03AM -0400, Tom Rini wrote: > On Thu, Oct 29, 2020 at 01:55:03PM +1100, David Gibson wrote: > > On Wed, Oct 28, 2020 at 08:05:54AM -0400, Tom Rini wrote: > > > On Wed, Oct 28, 2020 at 03:26:01PM +1100, David Gibson wrote: > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote: > > > > > > > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > > > >>>>>> 11738cf01f15 reduces it just a little. > > > > > > >>>>> > > > > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > > > >>>>> instead get intermittent bug reports where it just crashes. > > > > > > >>>> > > > > > > >>>> We really need to talk about that then. There was a problem of people > > > > > > >>>> turning off the sanity check for making sure the entire device tree was > > > > > > >>>> aligned and then having everything crash. > > > > > > >>> > > > > > > >>> Ok... I'm not really sure where you're going with that thought. > > > > > > >> > > > > > > >> In my reading of the mailing list history of how this issue came up, > > > > > > >> it was someone was booting a dragonboard or something, and they (or > > > > > > >> rather, the board maintainer set by default) the flag to use the device > > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > > > > >> was set, violated the documented requirements for device trees need to > > > > > > >> reside in memory and everything blew up. > > > > > > >> > > > > > > >> After that it was noticed that there could be some internal > > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > > > > >> support doing those reads easily there could be problems, but that's not > > > > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > > > > >>>>> flag if you're not sure you don't need it). > > > > > > >>>> > > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > > > > >>>> everything again, yes, that is the right way forward I think. > > > > > > >>> > > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > > > > >>> normal optimization levels enabled in the compiler), so testing for > > > > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > > > > >> > > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > > > > >> done in such a way the compiler really will be smart about it. So > > > > > > >> something like making a new function that does fdt64_ld() if we aren't > > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > > > > size suffer with not doing unaligned accesses. > > > > > > > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > > > > do unaligned accesses? That would be more aligned with what the system > > > > > > > can support rather than sanity checking associated with ASSUME_*. > > > > > > > > So, there are kind of two things here, (1) is "my platform can handle > > > > unaligned accesses" and (2) is "assume I don't need unaligned > > > > accesses". We can use the fast & small versions of fdt32_ld() etc. if > > > > either is true. However we need to consider those separately, because > > > > they can be independently true (or not) for different reasons. (1) > > > > depends on the hardware, whereas (2) depends on how you're using dtc, > > > > and, see below, you may need at least unaligned-handling fdt64_ld() in > > > > more cases than you think. > > > > > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > > > > accesses if enabled. > > > > > > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > > > > ARM correctly, unaligned accesses always trap on device memory, > > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > > > > cope with that, and that most likely affects libfdt as well? > > > > > > > > > > Ah yes, I think you are right. > > > > > > > > > > In that case, seems like we should figure out whether (internal) > > > > > unaligned accesses are possible with dtc generated dtbs at least > > > > > rather than just "not a common at all case (as noted by it not having > > > > > been seen in practice)." I'm sure David will point out that not all > > > > > dtbs come from dtc, but all the ones u-boot deals with do in > > > > > reality. > > > > > > > > Assuming the blob itself is 8-byte aligned in memory, then all > > > > structural elements (i.e. the tree metadata) of a compliant dtb will > > > > be naturally aligned. The spec requires 8-byte alignment of the mem > > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure > > > > block w.r.t. the base of the blob. Likewise the layout of the mem > > > > reserve block will preserve 8-byte alignment of all the 64-bit values > > > > it contains, assuming the block itself starts 8-byte aligned. > > > > Similarly the structure blob will preserve 4-byte alignment of all its > > > > tags and other structural data (this amounts to requiring an alignment > > > > gap after node names and property values). > > > > > > > > However, "all structural elements" does not include values within > > > > property values themselves. Assuming propery alignment of the blocks > > > > and the blob itself, then all property values will *begin* 4 byte > > > > aligned. However that leaves two relevant cases: > > > > > > > > a) 64-bit property values may be 4-byte aligned but not 8-byte > > > > aligned > > > > b) complex property values including both strings and integers > > > > typically use a packed representation with no alignment gaps. > > > > Such property structures are usually avoided in modern bindings, > > > > but they definitely exist in a bunch of older bindings. Obviously > > > > that means that integer values sitting after arbitrary length > > > > strings may not have any natural alignment > > > > > > > > So acccesses made by libfdt internally should be safe(*) assuming the > > > > blob itself is loaded 8-byte aligned, and the dtb is compliant. > > > > However the libfdt user may hit both problems (a) and (b) getting > > > > things they actually want from the tree. fdt{32,64}_{ld,st}() are > > > > intended to handle those cases, so that they're useful for the caller > > > > to pull things from properties as well as for libfdt internal > > > > accesses. > > > > > > > > (*) There are a number of other functions that looked like they might > > > > be dangerous for case (a) because they are based on 64-bit > > > > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > > > > fdt_setprop_u64(), fdt_appendprop_u64() and > > > > fdt_appendprop_addrrange(). However I think they're actually > > > > ok, because the way they're built in terms of other functions > > > > means there's implicitly a memcpy() from a byte buffer. > > > > > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > > > > the MMU off as well. > > > > > > > > > > I'm making a mental note of this for the next time performance issues come up. > > > > > > > > Right, running early with MMU off is definitely a real use case for > > > > libfdt. For similar reasons we can't assume we have an OS which will > > > > trap and handle unaligned accesses, which we might for a more > > > > conventional userspace library. > > > > > > > > This kind of underscores why I'm a bit hesitant to introduce "my > > > > platform handles unaligned acccesses" flag. Not only does it require > > > > detailed knowledge of the target CPU, but it can also depend on > > > > exactly what mode that hardware is in. > > > > > > Can you please note the existing user(s) where we have just the right > > > combination of factors and so everything fails? > > > > Sorry, I don't understand the question. > > I'm asking what the platform(s) are that have the very specific "and > here be failure" problem you're concerned with. I'm concerned that > right now we're going to end up with larger pile of reverts to dtc in > U-Boot rather than being able to just sync with the project properly > again. I'm afraid I don't know. I just know that the problem has been reported on several occasions. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Size growth? [not found] ` <20201028042601.GA5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 2020-10-28 12:05 ` Tom Rini @ 2020-10-28 17:49 ` Rob Herring [not found] ` <CAL_JsqJPrnjKjdmvyY2NOay0YrYc20Tr3OSr0yjq+9HjCN+anA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 28+ messages in thread From: Rob Herring @ 2020-10-28 17:49 UTC (permalink / raw) To: David Gibson Cc: André Przywara, Tom Rini, Simon Glass, Devicetree Compiler On Tue, Oct 27, 2020 at 11:26 PM David Gibson <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> wrote: > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > On Tue, Oct 27, 2020 at 10:58 AM Andr√© Przywara <andre.przywara@arm.com> wrote: > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > [...] > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > >>>>>> 11738cf01f15 reduces it just a little. > > > >>>>> > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > >>>>> instead get intermittent bug reports where it just crashes. > > > >>>> > > > >>>> We really need to talk about that then. There was a problem of people > > > >>>> turning off the sanity check for making sure the entire device tree was > > > >>>> aligned and then having everything crash. > > > >>> > > > >>> Ok... I'm not really sure where you're going with that thought. > > > >> > > > >> In my reading of the mailing list history of how this issue came up, > > > >> it was someone was booting a dragonboard or something, and they (or > > > >> rather, the board maintainer set by default) the flag to use the device > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > >> was set, violated the documented requirements for device trees need to > > > >> reside in memory and everything blew up. > > > >> > > > >> After that it was noticed that there could be some internal > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > >> support doing those reads easily there could be problems, but that's not > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > >>>>> flag if you're not sure you don't need it). > > > >>>> > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > >>>> everything again, yes, that is the right way forward I think. > > > >>> > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > >>> normal optimization levels enabled in the compiler), so testing for > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > >> > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > >> done in such a way the compiler really will be smart about it. So > > > >> something like making a new function that does fdt64_ld() if we aren't > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > size suffer with not doing unaligned accesses. > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > do unaligned accesses? That would be more aligned with what the system > > > > can support rather than sanity checking associated with ASSUME_*. > > So, there are kind of two things here, (1) is "my platform can handle > unaligned accesses" and (2) is "assume I don't need unaligned > accesses". We can use the fast & small versions of fdt32_ld() etc. if > either is true. However we need to consider those separately, because > they can be independently true (or not) for different reasons. (1) > depends on the hardware, whereas (2) depends on how you're using dtc, > and, see below, you may need at least unaligned-handling fdt64_ld() in > more cases than you think. Okay, I guess you were thinking of (2) for ASSUME_ALIGNED_ACCESS, but I read it as (1). > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > accesses if enabled. > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > ARM correctly, unaligned accesses always trap on device memory, > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > cope with that, and that most likely affects libfdt as well? > > > > Ah yes, I think you are right. > > > > In that case, seems like we should figure out whether (internal) > > unaligned accesses are possible with dtc generated dtbs at least > > rather than just "not a common at all case (as noted by it not having > > been seen in practice)." I'm sure David will point out that not all > > dtbs come from dtc, but all the ones u-boot deals with do in > > reality. > > Assuming the blob itself is 8-byte aligned in memory, then all > structural elements (i.e. the tree metadata) of a compliant dtb will > be naturally aligned. The spec requires 8-byte alignment of the mem > reserve block w.r.t. the base of the blob and 4 byte aligned structure > block w.r.t. the base of the blob. Likewise the layout of the mem > reserve block will preserve 8-byte alignment of all the 64-bit values > it contains, assuming the block itself starts 8-byte aligned. > Similarly the structure blob will preserve 4-byte alignment of all its > tags and other structural data (this amounts to requiring an alignment > gap after node names and property values). > > However, "all structural elements" does not include values within > property values themselves. Assuming propery alignment of the blocks > and the blob itself, then all property values will *begin* 4 byte > aligned. However that leaves two relevant cases: > > a) 64-bit property values may be 4-byte aligned but not 8-byte > aligned I'd assume that while an arch may support only the above in terms of misalignment, an arch that supports any alignment would always support this as part of that. It would just be odd to support byte alignment only up to 32-bit. I don't think we need to optimize the former case. > b) complex property values including both strings and integers > typically use a packed representation with no alignment gaps. > Such property structures are usually avoided in modern bindings, > but they definitely exist in a bunch of older bindings. Obviously > that means that integer values sitting after arbitrary length > strings may not have any natural alignment That's the user's problem IMO. Users of older bindings having this aren't likely using a newish function like fdt32_ld either. > So acccesses made by libfdt internally should be safe(*) assuming the > blob itself is loaded 8-byte aligned, and the dtb is compliant. > However the libfdt user may hit both problems (a) and (b) getting > things they actually want from the tree. fdt{32,64}_{ld,st}() are > intended to handle those cases, so that they're useful for the caller > to pull things from properties as well as for libfdt internal > accesses. > > (*) There are a number of other functions that looked like they might > be dangerous for case (a) because they are based on 64-bit > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > fdt_setprop_u64(), fdt_appendprop_u64() and > fdt_appendprop_addrrange(). However I think they're actually > ok, because the way they're built in terms of other functions > means there's implicitly a memcpy() from a byte buffer. > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > the MMU off as well. > > > > I'm making a mental note of this for the next time performance issues come up. > > Right, running early with MMU off is definitely a real use case for > libfdt. For similar reasons we can't assume we have an OS which will > trap and handle unaligned accesses, which we might for a more > conventional userspace library. > > This kind of underscores why I'm a bit hesitant to introduce "my > platform handles unaligned acccesses" flag. Not only does it require > detailed knowledge of the target CPU, but it can also depend on > exactly what mode that hardware is in. I think there's a more simple solution with no flags. Given all internal accesses are at least 4-byte aligned, libfdt should just do 32-bit accesses internally (as it used to). Maybe we need a check up front that the dtb is 8-byte aligned though. fdt{32,64}_ld() can just keep the same implementation. If users want it optimized, then they can do their own accessors. Rob ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <CAL_JsqJPrnjKjdmvyY2NOay0YrYc20Tr3OSr0yjq+9HjCN+anA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Size growth? [not found] ` <CAL_JsqJPrnjKjdmvyY2NOay0YrYc20Tr3OSr0yjq+9HjCN+anA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2020-10-29 3:02 ` David Gibson [not found] ` <20201029030247.GJ5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: David Gibson @ 2020-10-29 3:02 UTC (permalink / raw) To: Rob Herring Cc: André Przywara, Tom Rini, Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 10768 bytes --] On Wed, Oct 28, 2020 at 12:49:08PM -0500, Rob Herring wrote: > On Tue, Oct 27, 2020 at 11:26 PM David Gibson > <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> wrote: > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote: > > > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > > > [...] > > > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > >>>>>> 11738cf01f15 reduces it just a little. > > > > >>>>> > > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > >>>>> instead get intermittent bug reports where it just crashes. > > > > >>>> > > > > >>>> We really need to talk about that then. There was a problem of people > > > > >>>> turning off the sanity check for making sure the entire device tree was > > > > >>>> aligned and then having everything crash. > > > > >>> > > > > >>> Ok... I'm not really sure where you're going with that thought. > > > > >> > > > > >> In my reading of the mailing list history of how this issue came up, > > > > >> it was someone was booting a dragonboard or something, and they (or > > > > >> rather, the board maintainer set by default) the flag to use the device > > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > > >> was set, violated the documented requirements for device trees need to > > > > >> reside in memory and everything blew up. > > > > >> > > > > >> After that it was noticed that there could be some internal > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > > >> support doing those reads easily there could be problems, but that's not > > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > > >>>>> flag if you're not sure you don't need it). > > > > >>>> > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > > >>>> everything again, yes, that is the right way forward I think. > > > > >>> > > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > > >>> normal optimization levels enabled in the compiler), so testing for > > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > > >> > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > > >> done in such a way the compiler really will be smart about it. So > > > > >> something like making a new function that does fdt64_ld() if we aren't > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > > size suffer with not doing unaligned accesses. > > > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > > do unaligned accesses? That would be more aligned with what the system > > > > > can support rather than sanity checking associated with ASSUME_*. > > > > So, there are kind of two things here, (1) is "my platform can handle > > unaligned accesses" and (2) is "assume I don't need unaligned > > accesses". We can use the fast & small versions of fdt32_ld() etc. if > > either is true. However we need to consider those separately, because > > they can be independently true (or not) for different reasons. (1) > > depends on the hardware, whereas (2) depends on how you're using dtc, > > and, see below, you may need at least unaligned-handling fdt64_ld() in > > more cases than you think. > > Okay, I guess you were thinking of (2) for ASSUME_ALIGNED_ACCESS, but > I read it as (1). Yes. > > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > > accesses if enabled. > > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > > ARM correctly, unaligned accesses always trap on device memory, > > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > > cope with that, and that most likely affects libfdt as well? > > > > > > Ah yes, I think you are right. > > > > > > In that case, seems like we should figure out whether (internal) > > > unaligned accesses are possible with dtc generated dtbs at least > > > rather than just "not a common at all case (as noted by it not having > > > been seen in practice)." I'm sure David will point out that not all > > > dtbs come from dtc, but all the ones u-boot deals with do in > > > reality. > > > > Assuming the blob itself is 8-byte aligned in memory, then all > > structural elements (i.e. the tree metadata) of a compliant dtb will > > be naturally aligned. The spec requires 8-byte alignment of the mem > > reserve block w.r.t. the base of the blob and 4 byte aligned structure > > block w.r.t. the base of the blob. Likewise the layout of the mem > > reserve block will preserve 8-byte alignment of all the 64-bit values > > it contains, assuming the block itself starts 8-byte aligned. > > Similarly the structure blob will preserve 4-byte alignment of all its > > tags and other structural data (this amounts to requiring an alignment > > gap after node names and property values). > > > > However, "all structural elements" does not include values within > > property values themselves. Assuming propery alignment of the blocks > > and the blob itself, then all property values will *begin* 4 byte > > aligned. However that leaves two relevant cases: > > > > a) 64-bit property values may be 4-byte aligned but not 8-byte > > aligned > > I'd assume that while an arch may support only the above in terms of > misalignment, an arch that supports any alignment would always support > this as part of that. It would just be odd to support byte alignment > only up to 32-bit. Yes, I'd expect so. > I don't think we need to optimize the former case. I don't see how we would, in any case. > > b) complex property values including both strings and integers > > typically use a packed representation with no alignment gaps. > > Such property structures are usually avoided in modern bindings, > > but they definitely exist in a bunch of older bindings. Obviously > > that means that integer values sitting after arbitrary length > > strings may not have any natural alignment > > That's the user's problem IMO. Users of older bindings having this > aren't likely using a newish function like fdt32_ld either. That doesn't follow. The bindings still exist and are in use, e.g. on IBM PAPR systems, that's not correlated to how recent teh libfdt is. > > So acccesses made by libfdt internally should be safe(*) assuming the > > blob itself is loaded 8-byte aligned, and the dtb is compliant. > > However the libfdt user may hit both problems (a) and (b) getting > > things they actually want from the tree. fdt{32,64}_{ld,st}() are > > intended to handle those cases, so that they're useful for the caller > > to pull things from properties as well as for libfdt internal > > accesses. > > > > (*) There are a number of other functions that looked like they might > > be dangerous for case (a) because they are based on 64-bit > > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > > fdt_setprop_u64(), fdt_appendprop_u64() and > > fdt_appendprop_addrrange(). However I think they're actually > > ok, because the way they're built in terms of other functions > > means there's implicitly a memcpy() from a byte buffer. > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > > the MMU off as well. > > > > > > I'm making a mental note of this for the next time performance issues come up. > > > > Right, running early with MMU off is definitely a real use case for > > libfdt. For similar reasons we can't assume we have an OS which will > > trap and handle unaligned accesses, which we might for a more > > conventional userspace library. > > > > This kind of underscores why I'm a bit hesitant to introduce "my > > platform handles unaligned acccesses" flag. Not only does it require > > detailed knowledge of the target CPU, but it can also depend on > > exactly what mode that hardware is in. > > I think there's a more simple solution with no flags. Given all > internal accesses are at least 4-byte aligned, libfdt should just do > 32-bit accesses internally (as it used to). Maybe we need a check up > front that the dtb is 8-byte aligned though. That's not a bad idea. We could do it in fdt_ro_probe_(). Although, one extra case occurs to me. Someone (is it uboot?) has a wacky format where dtbs for several platforms, along with kernels and other information are bundled together in a big dtb (that is, using the dtb encoding, even though it's not actually a device tree). The "sub-dtbs" in that will be 4-byte aligned, but maybe not 8-byte aligned. > fdt{32,64}_ld() can just > keep the same implementation. Meaning the misalignment-safe ones? Means we need different internal accessors to the exported ones, but I guess that's ok. > If users want it optimized, then they > can do their own accessors. > > Rob > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20201029030247.GJ5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>]
* Re: Size growth? [not found] ` <20201029030247.GJ5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> @ 2020-10-29 15:04 ` Tom Rini 2020-10-29 19:56 ` David Gibson 0 siblings, 1 reply; 28+ messages in thread From: Tom Rini @ 2020-10-29 15:04 UTC (permalink / raw) To: David Gibson Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 10923 bytes --] On Thu, Oct 29, 2020 at 02:02:47PM +1100, David Gibson wrote: > On Wed, Oct 28, 2020 at 12:49:08PM -0500, Rob Herring wrote: > > On Tue, Oct 27, 2020 at 11:26 PM David Gibson > > <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> wrote: > > > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote: > > > > > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > > > > > [...] > > > > > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > > >>>>>> 11738cf01f15 reduces it just a little. > > > > > >>>>> > > > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > > >>>>> instead get intermittent bug reports where it just crashes. > > > > > >>>> > > > > > >>>> We really need to talk about that then. There was a problem of people > > > > > >>>> turning off the sanity check for making sure the entire device tree was > > > > > >>>> aligned and then having everything crash. > > > > > >>> > > > > > >>> Ok... I'm not really sure where you're going with that thought. > > > > > >> > > > > > >> In my reading of the mailing list history of how this issue came up, > > > > > >> it was someone was booting a dragonboard or something, and they (or > > > > > >> rather, the board maintainer set by default) the flag to use the device > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > > > >> was set, violated the documented requirements for device trees need to > > > > > >> reside in memory and everything blew up. > > > > > >> > > > > > >> After that it was noticed that there could be some internal > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > > > >> support doing those reads easily there could be problems, but that's not > > > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > > > >>>>> flag if you're not sure you don't need it). > > > > > >>>> > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > > > >>>> everything again, yes, that is the right way forward I think. > > > > > >>> > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > > > >>> normal optimization levels enabled in the compiler), so testing for > > > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > > > >> > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > > > >> done in such a way the compiler really will be smart about it. So > > > > > >> something like making a new function that does fdt64_ld() if we aren't > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > > > size suffer with not doing unaligned accesses. > > > > > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > > > do unaligned accesses? That would be more aligned with what the system > > > > > > can support rather than sanity checking associated with ASSUME_*. > > > > > > So, there are kind of two things here, (1) is "my platform can handle > > > unaligned accesses" and (2) is "assume I don't need unaligned > > > accesses". We can use the fast & small versions of fdt32_ld() etc. if > > > either is true. However we need to consider those separately, because > > > they can be independently true (or not) for different reasons. (1) > > > depends on the hardware, whereas (2) depends on how you're using dtc, > > > and, see below, you may need at least unaligned-handling fdt64_ld() in > > > more cases than you think. > > > > Okay, I guess you were thinking of (2) for ASSUME_ALIGNED_ACCESS, but > > I read it as (1). > > Yes. > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > > > accesses if enabled. > > > > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > > > ARM correctly, unaligned accesses always trap on device memory, > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > > > cope with that, and that most likely affects libfdt as well? > > > > > > > > Ah yes, I think you are right. > > > > > > > > In that case, seems like we should figure out whether (internal) > > > > unaligned accesses are possible with dtc generated dtbs at least > > > > rather than just "not a common at all case (as noted by it not having > > > > been seen in practice)." I'm sure David will point out that not all > > > > dtbs come from dtc, but all the ones u-boot deals with do in > > > > reality. > > > > > > Assuming the blob itself is 8-byte aligned in memory, then all > > > structural elements (i.e. the tree metadata) of a compliant dtb will > > > be naturally aligned. The spec requires 8-byte alignment of the mem > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure > > > block w.r.t. the base of the blob. Likewise the layout of the mem > > > reserve block will preserve 8-byte alignment of all the 64-bit values > > > it contains, assuming the block itself starts 8-byte aligned. > > > Similarly the structure blob will preserve 4-byte alignment of all its > > > tags and other structural data (this amounts to requiring an alignment > > > gap after node names and property values). > > > > > > However, "all structural elements" does not include values within > > > property values themselves. Assuming propery alignment of the blocks > > > and the blob itself, then all property values will *begin* 4 byte > > > aligned. However that leaves two relevant cases: > > > > > > a) 64-bit property values may be 4-byte aligned but not 8-byte > > > aligned > > > > I'd assume that while an arch may support only the above in terms of > > misalignment, an arch that supports any alignment would always support > > this as part of that. It would just be odd to support byte alignment > > only up to 32-bit. > > Yes, I'd expect so. > > > I don't think we need to optimize the former case. > > I don't see how we would, in any case. > > > > b) complex property values including both strings and integers > > > typically use a packed representation with no alignment gaps. > > > Such property structures are usually avoided in modern bindings, > > > but they definitely exist in a bunch of older bindings. Obviously > > > that means that integer values sitting after arbitrary length > > > strings may not have any natural alignment > > > > That's the user's problem IMO. Users of older bindings having this > > aren't likely using a newish function like fdt32_ld either. > > That doesn't follow. The bindings still exist and are in use, e.g. on > IBM PAPR systems, that's not correlated to how recent teh libfdt is. > > > > So acccesses made by libfdt internally should be safe(*) assuming the > > > blob itself is loaded 8-byte aligned, and the dtb is compliant. > > > However the libfdt user may hit both problems (a) and (b) getting > > > things they actually want from the tree. fdt{32,64}_{ld,st}() are > > > intended to handle those cases, so that they're useful for the caller > > > to pull things from properties as well as for libfdt internal > > > accesses. > > > > > > (*) There are a number of other functions that looked like they might > > > be dangerous for case (a) because they are based on 64-bit > > > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > > > fdt_setprop_u64(), fdt_appendprop_u64() and > > > fdt_appendprop_addrrange(). However I think they're actually > > > ok, because the way they're built in terms of other functions > > > means there's implicitly a memcpy() from a byte buffer. > > > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > > > the MMU off as well. > > > > > > > > I'm making a mental note of this for the next time performance issues come up. > > > > > > Right, running early with MMU off is definitely a real use case for > > > libfdt. For similar reasons we can't assume we have an OS which will > > > trap and handle unaligned accesses, which we might for a more > > > conventional userspace library. > > > > > > This kind of underscores why I'm a bit hesitant to introduce "my > > > platform handles unaligned acccesses" flag. Not only does it require > > > detailed knowledge of the target CPU, but it can also depend on > > > exactly what mode that hardware is in. > > > > I think there's a more simple solution with no flags. Given all > > internal accesses are at least 4-byte aligned, libfdt should just do > > 32-bit accesses internally (as it used to). Maybe we need a check up > > front that the dtb is 8-byte aligned though. > > That's not a bad idea. We could do it in fdt_ro_probe_(). > > Although, one extra case occurs to me. Someone (is it uboot?) has a > wacky format where dtbs for several platforms, along with kernels and > other information are bundled together in a big dtb (that is, using > the dtb encoding, even though it's not actually a device tree). The > "sub-dtbs" in that will be 4-byte aligned, but maybe not 8-byte > aligned. Yes, about 12 years ago now U-Boot introduced (but it's useful anywhere, really...) FIT images which are what you're thinking of. That's unrelated to all of this however. -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Size growth? 2020-10-29 15:04 ` Tom Rini @ 2020-10-29 19:56 ` David Gibson [not found] ` <20201029195658.GK5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 0 siblings, 1 reply; 28+ messages in thread From: David Gibson @ 2020-10-29 19:56 UTC (permalink / raw) To: Tom Rini; +Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 11714 bytes --] On Thu, Oct 29, 2020 at 11:04:01AM -0400, Tom Rini wrote: > On Thu, Oct 29, 2020 at 02:02:47PM +1100, David Gibson wrote: > > On Wed, Oct 28, 2020 at 12:49:08PM -0500, Rob Herring wrote: > > > On Tue, Oct 27, 2020 at 11:26 PM David Gibson > > > <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> wrote: > > > > > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara@arm.com> wrote: > > > > > > > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini-OWPKS81ov/FWk0Htik3J/w@public.gmane.org> wrote: > > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > > > >>>>>> 11738cf01f15 reduces it just a little. > > > > > > >>>>> > > > > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > > > >>>>> instead get intermittent bug reports where it just crashes. > > > > > > >>>> > > > > > > >>>> We really need to talk about that then. There was a problem of people > > > > > > >>>> turning off the sanity check for making sure the entire device tree was > > > > > > >>>> aligned and then having everything crash. > > > > > > >>> > > > > > > >>> Ok... I'm not really sure where you're going with that thought. > > > > > > >> > > > > > > >> In my reading of the mailing list history of how this issue came up, > > > > > > >> it was someone was booting a dragonboard or something, and they (or > > > > > > >> rather, the board maintainer set by default) the flag to use the device > > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > > > > >> was set, violated the documented requirements for device trees need to > > > > > > >> reside in memory and everything blew up. > > > > > > >> > > > > > > >> After that it was noticed that there could be some internal > > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > > > > >> support doing those reads easily there could be problems, but that's not > > > > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > > > > >>>>> flag if you're not sure you don't need it). > > > > > > >>>> > > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > > > > >>>> everything again, yes, that is the right way forward I think. > > > > > > >>> > > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > > > > >>> normal optimization levels enabled in the compiler), so testing for > > > > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > > > > >> > > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > > > > >> done in such a way the compiler really will be smart about it. So > > > > > > >> something like making a new function that does fdt64_ld() if we aren't > > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > > > > size suffer with not doing unaligned accesses. > > > > > > > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > > > > do unaligned accesses? That would be more aligned with what the system > > > > > > > can support rather than sanity checking associated with ASSUME_*. > > > > > > > > So, there are kind of two things here, (1) is "my platform can handle > > > > unaligned accesses" and (2) is "assume I don't need unaligned > > > > accesses". We can use the fast & small versions of fdt32_ld() etc. if > > > > either is true. However we need to consider those separately, because > > > > they can be independently true (or not) for different reasons. (1) > > > > depends on the hardware, whereas (2) depends on how you're using dtc, > > > > and, see below, you may need at least unaligned-handling fdt64_ld() in > > > > more cases than you think. > > > > > > Okay, I guess you were thinking of (2) for ASSUME_ALIGNED_ACCESS, but > > > I read it as (1). > > > > Yes. > > > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > > > > accesses if enabled. > > > > > > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > > > > ARM correctly, unaligned accesses always trap on device memory, > > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > > > > cope with that, and that most likely affects libfdt as well? > > > > > > > > > > Ah yes, I think you are right. > > > > > > > > > > In that case, seems like we should figure out whether (internal) > > > > > unaligned accesses are possible with dtc generated dtbs at least > > > > > rather than just "not a common at all case (as noted by it not having > > > > > been seen in practice)." I'm sure David will point out that not all > > > > > dtbs come from dtc, but all the ones u-boot deals with do in > > > > > reality. > > > > > > > > Assuming the blob itself is 8-byte aligned in memory, then all > > > > structural elements (i.e. the tree metadata) of a compliant dtb will > > > > be naturally aligned. The spec requires 8-byte alignment of the mem > > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure > > > > block w.r.t. the base of the blob. Likewise the layout of the mem > > > > reserve block will preserve 8-byte alignment of all the 64-bit values > > > > it contains, assuming the block itself starts 8-byte aligned. > > > > Similarly the structure blob will preserve 4-byte alignment of all its > > > > tags and other structural data (this amounts to requiring an alignment > > > > gap after node names and property values). > > > > > > > > However, "all structural elements" does not include values within > > > > property values themselves. Assuming propery alignment of the blocks > > > > and the blob itself, then all property values will *begin* 4 byte > > > > aligned. However that leaves two relevant cases: > > > > > > > > a) 64-bit property values may be 4-byte aligned but not 8-byte > > > > aligned > > > > > > I'd assume that while an arch may support only the above in terms of > > > misalignment, an arch that supports any alignment would always support > > > this as part of that. It would just be odd to support byte alignment > > > only up to 32-bit. > > > > Yes, I'd expect so. > > > > > I don't think we need to optimize the former case. > > > > I don't see how we would, in any case. > > > > > > b) complex property values including both strings and integers > > > > typically use a packed representation with no alignment gaps. > > > > Such property structures are usually avoided in modern bindings, > > > > but they definitely exist in a bunch of older bindings. Obviously > > > > that means that integer values sitting after arbitrary length > > > > strings may not have any natural alignment > > > > > > That's the user's problem IMO. Users of older bindings having this > > > aren't likely using a newish function like fdt32_ld either. > > > > That doesn't follow. The bindings still exist and are in use, e.g. on > > IBM PAPR systems, that's not correlated to how recent teh libfdt is. > > > > > > So acccesses made by libfdt internally should be safe(*) assuming the > > > > blob itself is loaded 8-byte aligned, and the dtb is compliant. > > > > However the libfdt user may hit both problems (a) and (b) getting > > > > things they actually want from the tree. fdt{32,64}_{ld,st}() are > > > > intended to handle those cases, so that they're useful for the caller > > > > to pull things from properties as well as for libfdt internal > > > > accesses. > > > > > > > > (*) There are a number of other functions that looked like they might > > > > be dangerous for case (a) because they are based on 64-bit > > > > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > > > > fdt_setprop_u64(), fdt_appendprop_u64() and > > > > fdt_appendprop_addrrange(). However I think they're actually > > > > ok, because the way they're built in terms of other functions > > > > means there's implicitly a memcpy() from a byte buffer. > > > > > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > > > > the MMU off as well. > > > > > > > > > > I'm making a mental note of this for the next time performance issues come up. > > > > > > > > Right, running early with MMU off is definitely a real use case for > > > > libfdt. For similar reasons we can't assume we have an OS which will > > > > trap and handle unaligned accesses, which we might for a more > > > > conventional userspace library. > > > > > > > > This kind of underscores why I'm a bit hesitant to introduce "my > > > > platform handles unaligned acccesses" flag. Not only does it require > > > > detailed knowledge of the target CPU, but it can also depend on > > > > exactly what mode that hardware is in. > > > > > > I think there's a more simple solution with no flags. Given all > > > internal accesses are at least 4-byte aligned, libfdt should just do > > > 32-bit accesses internally (as it used to). Maybe we need a check up > > > front that the dtb is 8-byte aligned though. > > > > That's not a bad idea. We could do it in fdt_ro_probe_(). > > > > Although, one extra case occurs to me. Someone (is it uboot?) has a > > wacky format where dtbs for several platforms, along with kernels and > > other information are bundled together in a big dtb (that is, using > > the dtb encoding, even though it's not actually a device tree). The > > "sub-dtbs" in that will be 4-byte aligned, but maybe not 8-byte > > aligned. > > Yes, about 12 years ago now U-Boot introduced (but it's useful anywhere, > really...) FIT images which are what you're thinking of. That's > unrelated to all of this however. Well, not entirely, because it's a plausible reason someone would have a dtb loaded at a non-8-byte aligned address (though it would be 4-byte aligned). -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20201029195658.GK5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org>]
* Re: Size growth? [not found] ` <20201029195658.GK5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> @ 2020-10-29 20:26 ` Tom Rini 0 siblings, 0 replies; 28+ messages in thread From: Tom Rini @ 2020-10-29 20:26 UTC (permalink / raw) To: David Gibson Cc: Rob Herring, André Przywara, Simon Glass, Devicetree Compiler [-- Attachment #1: Type: text/plain, Size: 12491 bytes --] On Fri, Oct 30, 2020 at 06:56:58AM +1100, David Gibson wrote: > On Thu, Oct 29, 2020 at 11:04:01AM -0400, Tom Rini wrote: > > On Thu, Oct 29, 2020 at 02:02:47PM +1100, David Gibson wrote: > > > On Wed, Oct 28, 2020 at 12:49:08PM -0500, Rob Herring wrote: > > > > On Tue, Oct 27, 2020 at 11:26 PM David Gibson > > > > <david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+@public.gmane.org> wrote: > > > > > > > > > > On Tue, Oct 27, 2020 at 02:55:17PM -0500, Rob Herring wrote: > > > > > > On Tue, Oct 27, 2020 at 10:58 AM André Przywara <andre.przywara-5wv7dgnIgG8@public.gmane.org> wrote: > > > > > > > > > > > > > > On 26/10/2020 21:51, Rob Herring wrote: > > > > > > > > On Thu, Oct 22, 2020 at 10:23 AM Tom Rini <trini@konsulko.com> wrote: > > > > > > > >> On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > > > > > > > >>> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > > > > > > >>>> On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > > > > > >>>>> On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > >>>>>> But what does all of this _mean_ ? I kinda think I have an answer now. > > > > > > > >>>>>> One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > > > > >>>>>> 11738cf01f15 reduces it just a little. > > > > > > > >>>>> > > > > > > > >>>>> Ah, that's a tricky one. If we don't handle unaligned accesses we > > > > > > > >>>>> instead get intermittent bug reports where it just crashes. > > > > > > > >>>> > > > > > > > >>>> We really need to talk about that then. There was a problem of people > > > > > > > >>>> turning off the sanity check for making sure the entire device tree was > > > > > > > >>>> aligned and then having everything crash. > > > > > > > >>> > > > > > > > >>> Ok... I'm not really sure where you're going with that thought. > > > > > > > >> > > > > > > > >> In my reading of the mailing list history of how this issue came up, > > > > > > > >> it was someone was booting a dragonboard or something, and they (or > > > > > > > >> rather, the board maintainer set by default) the flag to use the device > > > > > > > >> tree wherever it is in memory and NOT to relocate it to a properly > > > > > > > >> aligned address. This in turn lead to the kernel getting an unaligned > > > > > > > >> device tree and everything crashing. The "I know what I'm doing" flag > > > > > > > >> was set, violated the documented requirements for device trees need to > > > > > > > >> reside in memory and everything blew up. > > > > > > > >> > > > > > > > >> After that it was noticed that there could be some internal > > > > > > > >> mis-alignment and if you tried those accesses on a CPU that doesn't > > > > > > > >> support doing those reads easily there could be problems, but that's not > > > > > > > >> a common at all case (as noted by it not having been seen in practice). > > > > > > > > > > > > > > > > Nor a problem on many environments to begin with. More below... > > > > > > > > > > > > > > > >>>>> I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > > > > > >>>>> break for either an unaligned dtb (unlikely) or if you attempt to load > > > > > > > >>>>> an unaligned value from a property (more likely, but don't add the > > > > > > > >>>>> flag if you're not sure you don't need it). > > > > > > > >>>> > > > > > > > >>>> So long as it's abstracted in such a way that we don't grow the size of > > > > > > > >>>> everything again, yes, that is the right way forward I think. > > > > > > > >>> > > > > > > > >>> All the ASSUME flags should be resolved at compile time (at least with > > > > > > > >>> normal optimization levels enabled in the compiler), so testing for > > > > > > > >>> those shouldn't increase size at all. If they do, something is wrong. > > > > > > > >> > > > > > > > >> I'm saying that how ever this new ASSUME flag is done, it needs to be > > > > > > > >> done in such a way the compiler really will be smart about it. So > > > > > > > >> something like making a new function that does fdt64_ld() if we aren't > > > > > > > >> ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are > > > > > > > >> ASSUME_ALIGNED_ACCESS. > > > > > > > > > > > > > > > > Ah, unaligned accesses again... To summarize, both performance and > > > > > > > > size suffer with not doing unaligned accesses. > > > > > > > > > > > > > > > > Why not a HAS_UNALIGNED_ACCESS flag instead (or the inverse) that will > > > > > > > > do unaligned accesses? That would be more aligned with what the system > > > > > > > > can support rather than sanity checking associated with ASSUME_*. > > > > > > > > > > So, there are kind of two things here, (1) is "my platform can handle > > > > > unaligned accesses" and (2) is "assume I don't need unaligned > > > > > accesses". We can use the fast & small versions of fdt32_ld() etc. if > > > > > either is true. However we need to consider those separately, because > > > > > they can be independently true (or not) for different reasons. (1) > > > > > depends on the hardware, whereas (2) depends on how you're using dtc, > > > > > and, see below, you may need at least unaligned-handling fdt64_ld() in > > > > > more cases than you think. > > > > > > > > Okay, I guess you were thinking of (2) for ASSUME_ALIGNED_ACCESS, but > > > > I read it as (1). > > > > > > Yes. > > > > > > > > > > > To repeat from last time, everything ARMv6 and up can do unaligned > > > > > > > > accesses if enabled. > > > > > > > > > > > > > > But that requires the MMU to be enabled, doesn't it? If I read the ARM > > > > > > > ARM correctly, unaligned accesses always trap on device memory, > > > > > > > regardless of SCTLR.A. And without the MMU enabled everything is device > > > > > > > memory. We compile U-Boot with -mno-unaligned-access/-mstrict-align to > > > > > > > cope with that, and that most likely affects libfdt as well? > > > > > > > > > > > > Ah yes, I think you are right. > > > > > > > > > > > > In that case, seems like we should figure out whether (internal) > > > > > > unaligned accesses are possible with dtc generated dtbs at least > > > > > > rather than just "not a common at all case (as noted by it not having > > > > > > been seen in practice)." I'm sure David will point out that not all > > > > > > dtbs come from dtc, but all the ones u-boot deals with do in > > > > > > reality. > > > > > > > > > > Assuming the blob itself is 8-byte aligned in memory, then all > > > > > structural elements (i.e. the tree metadata) of a compliant dtb will > > > > > be naturally aligned. The spec requires 8-byte alignment of the mem > > > > > reserve block w.r.t. the base of the blob and 4 byte aligned structure > > > > > block w.r.t. the base of the blob. Likewise the layout of the mem > > > > > reserve block will preserve 8-byte alignment of all the 64-bit values > > > > > it contains, assuming the block itself starts 8-byte aligned. > > > > > Similarly the structure blob will preserve 4-byte alignment of all its > > > > > tags and other structural data (this amounts to requiring an alignment > > > > > gap after node names and property values). > > > > > > > > > > However, "all structural elements" does not include values within > > > > > property values themselves. Assuming propery alignment of the blocks > > > > > and the blob itself, then all property values will *begin* 4 byte > > > > > aligned. However that leaves two relevant cases: > > > > > > > > > > a) 64-bit property values may be 4-byte aligned but not 8-byte > > > > > aligned > > > > > > > > I'd assume that while an arch may support only the above in terms of > > > > misalignment, an arch that supports any alignment would always support > > > > this as part of that. It would just be odd to support byte alignment > > > > only up to 32-bit. > > > > > > Yes, I'd expect so. > > > > > > > I don't think we need to optimize the former case. > > > > > > I don't see how we would, in any case. > > > > > > > > b) complex property values including both strings and integers > > > > > typically use a packed representation with no alignment gaps. > > > > > Such property structures are usually avoided in modern bindings, > > > > > but they definitely exist in a bunch of older bindings. Obviously > > > > > that means that integer values sitting after arbitrary length > > > > > strings may not have any natural alignment > > > > > > > > That's the user's problem IMO. Users of older bindings having this > > > > aren't likely using a newish function like fdt32_ld either. > > > > > > That doesn't follow. The bindings still exist and are in use, e.g. on > > > IBM PAPR systems, that's not correlated to how recent teh libfdt is. > > > > > > > > So acccesses made by libfdt internally should be safe(*) assuming the > > > > > blob itself is loaded 8-byte aligned, and the dtb is compliant. > > > > > However the libfdt user may hit both problems (a) and (b) getting > > > > > things they actually want from the tree. fdt{32,64}_{ld,st}() are > > > > > intended to handle those cases, so that they're useful for the caller > > > > > to pull things from properties as well as for libfdt internal > > > > > accesses. > > > > > > > > > > (*) There are a number of other functions that looked like they might > > > > > be dangerous for case (a) because they are based on 64-bit > > > > > property values: fdt_setprop_inplace_u64(), fdt_property_u64(), > > > > > fdt_setprop_u64(), fdt_appendprop_u64() and > > > > > fdt_appendprop_addrrange(). However I think they're actually > > > > > ok, because the way they're built in terms of other functions > > > > > means there's implicitly a memcpy() from a byte buffer. > > > > > > > > > > > > Also some 32-bit ARM platforms run U-Boot proper with the MMU disabled > > > > > > > all the time, and I know of at least the sunxi-aarch64 SPL running with > > > > > > > the MMU off as well. > > > > > > > > > > > > I'm making a mental note of this for the next time performance issues come up. > > > > > > > > > > Right, running early with MMU off is definitely a real use case for > > > > > libfdt. For similar reasons we can't assume we have an OS which will > > > > > trap and handle unaligned accesses, which we might for a more > > > > > conventional userspace library. > > > > > > > > > > This kind of underscores why I'm a bit hesitant to introduce "my > > > > > platform handles unaligned acccesses" flag. Not only does it require > > > > > detailed knowledge of the target CPU, but it can also depend on > > > > > exactly what mode that hardware is in. > > > > > > > > I think there's a more simple solution with no flags. Given all > > > > internal accesses are at least 4-byte aligned, libfdt should just do > > > > 32-bit accesses internally (as it used to). Maybe we need a check up > > > > front that the dtb is 8-byte aligned though. > > > > > > That's not a bad idea. We could do it in fdt_ro_probe_(). > > > > > > Although, one extra case occurs to me. Someone (is it uboot?) has a > > > wacky format where dtbs for several platforms, along with kernels and > > > other information are bundled together in a big dtb (that is, using > > > the dtb encoding, even though it's not actually a device tree). The > > > "sub-dtbs" in that will be 4-byte aligned, but maybe not 8-byte > > > aligned. > > > > Yes, about 12 years ago now U-Boot introduced (but it's useful anywhere, > > really...) FIT images which are what you're thinking of. That's > > unrelated to all of this however. > > Well, not entirely, because it's a plausible reason someone would have > a dtb loaded at a non-8-byte aligned address (though it would be > 4-byte aligned). But that in turn gets us back to the original problem. A board by default was unfortunately setting the "do not relocate items to be aligned before use" flag, and got the mess that resulted from that. Another way to look at it I think is that since we (U-Boot) know what the alignment requirements are when loading something to memory, it's on us to make sure those requirements are met and not on a later access library to deal with "our requirements were broken, but lets use it anyways". -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2020-11-02 2:06 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-10-16 19:36 Size growth? Tom Rini 2020-10-16 21:46 ` Simon Glass [not found] ` <CAPnjgZ3jPciWmoVpuoYb9KC2h3eCevZsq+1BzefCOCAFCDoseQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2020-10-19 1:42 ` David Gibson [not found] ` <20201019014213.GA11625-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 2020-10-19 12:37 ` Tom Rini 2020-10-20 2:09 ` David Gibson [not found] ` <20201020020907.GA64103-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 2020-10-21 22:49 ` Tom Rini 2020-10-22 4:00 ` David Gibson [not found] ` <20201022040013.GB1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 2020-10-22 12:32 ` Tom Rini 2020-10-22 14:58 ` David Gibson [not found] ` <20201022145804.GI1821515-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 2020-10-22 15:22 ` Tom Rini 2020-10-26 21:51 ` Rob Herring [not found] ` <CAL_JsqJiKETTMJX3MsEmECE+jtbwYydVSgt1a6poz_L+pPRFTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2020-10-26 22:17 ` Tom Rini 2020-10-27 15:57 ` André Przywara [not found] ` <a14daf09-4d97-052f-5071-09e67ccb925e-5wv7dgnIgG8@public.gmane.org> 2020-10-27 19:55 ` Rob Herring [not found] ` <CAL_JsqK_fC346UnCmXMJxKHCM6=eFBF_kmGt_fBdvwPXbPRkvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2020-10-28 4:26 ` David Gibson [not found] ` <20201028042601.GA5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 2020-10-28 12:05 ` Tom Rini 2020-10-29 2:55 ` David Gibson [not found] ` <20201029025503.GI5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 2020-10-29 15:06 ` Tom Rini 2020-10-29 15:48 ` Rob Herring [not found] ` <CAL_JsqJUixnyZx-tu9EV8YZ-gSDE7i1jvMddnNZZWFzezaHftw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2020-10-29 16:04 ` Tom Rini 2020-10-29 18:08 ` Rob Herring [not found] ` <CAL_JsqJTTJAoTYwxDn3i0KETMXLyGg3WXzxN3-OdRLx=R96a-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2020-10-29 20:21 ` Tom Rini 2020-11-02 2:06 ` David Gibson 2020-10-28 17:49 ` Rob Herring [not found] ` <CAL_JsqJPrnjKjdmvyY2NOay0YrYc20Tr3OSr0yjq+9HjCN+anA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2020-10-29 3:02 ` David Gibson [not found] ` <20201029030247.GJ5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 2020-10-29 15:04 ` Tom Rini 2020-10-29 19:56 ` David Gibson [not found] ` <20201029195658.GK5604-l+x2Y8Cxqc4e6aEkudXLsA@public.gmane.org> 2020-10-29 20:26 ` Tom Rini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).