* x86 Endiannes and libc printf @ 2005-07-08 19:05 Nanakos Chrysostomos 2005-07-08 19:25 ` Vadim Lobanov ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Nanakos Chrysostomos @ 2005-07-08 19:05 UTC (permalink / raw To: linux-assembly Hi all, i am searching for a few hours now the endianness in the x86 environment,and i have the following snippets of code,which in some places i cant understand.Please help me!!! endian.c --------- #include <stdio.h> #include <fcntl.h> #include <sys/types.h> int main() { char *filename= "endian.txt"; unsigned long buf; char *k=(char *)&buf; int fd; fd = open("makis",O_RDONLY); read(fd,&buf,4); printf("%.4s\n",&buf); printf("%p\n",buf); printf("&buf: %p %#x %p\n",&buf,*k,k); return 0; } endian.txt ---------- DBCA #./read DCBA 0x41424344 &buf: 0xbffff8b0 0x44 0xbffff8b0 # In the first printf everything is fine.In the second printf we see that the 0x44,0x43,0x42,0x41 byte-data is printed in the revserse order,while we can see that in memory it is in the right order after the read system call.Why this happens?Is it being internal in printf??? I tried to explain that with similar approaches like unions, but the same happens.endian2.c --------- #include <stdio.h> #include <unistd.h> int main() { union { long s; char c[sizeof(long)]; } un; un.s = 0x41424344; if (sizeof(short) == 2) { if (un.c[0] == 0x41 && un.c[1] == 0x42) printf("big-endian\n"); else if (un.c[0] == 0x44 && un.c[1] == 0x43) printf("little-endian\n"); else printf("unknown\n"); } else printf("sizeof(short) = %d\n", sizeof(short)); printf("%.4s\n",&(un.s)); printf("%p\n",(un.s)); _exit(0); } The same as above.Should i assume that an internal operation in printf is doing this??? I also used the above assembly example,to see what happens.Memory-to-memory movements (with push & pop) dont inherit the little-endian way.Is this happens only from memory-to-register and the opposite???? read.asm -------- section .bss buf resd 1 section .data pathname db "makis",0 section .text global _start _start: ;open mov eax,5 mov ebx,pathname mov ecx,02 int 0x80 ;read mov ebx,eax mov eax,3 mov ecx,buf mov edx,4 int 0x80 ;write mov eax,4 mov ebx,1 mov ecx,buf mov edx,4 int 0x80 ;exit mov eax,1 mov ebx,0 int 0x80 Everything works just fine.Can anynone knows how can i revserse the order of the data,from 0x44434241 to 0x41424344 into the stack?? Without using AND and OR.Can this be done???? The last two examples is the output from gcc,one "fixed" from me to find out what is in the stack and the other is the default output from the first example.My example has been changed only in the printf call from the library,after the read call,which i suppose is the "black box" to the "problem" i cant understand...........read.s ------ .file "read.c" .version "01.01" gcc2_compiled.: .section .rodata .LC0: .string "makis" .LC1: .string "%#x\n" .text .align 4 .globl main .type main,@function main: pushl %ebp movl %esp, %ebp subl $24, %esp movl $.LC0, -4(%ebp) subl $8, %esp pushl $0 pushl $.LC0 call open addl $16, %esp movl %eax, %eax movl %eax, -12(%ebp) subl $4, %esp pushl $4 leal -8(%ebp), %eax pushl %eax pushl -12(%ebp) call read ---->Before it was the printf call which retrieves its arguments from the stack.Which as we can see its different for every conversion specifier.. movl $4,%eax movl $1,%ebx leal -8(%ebp),%ecx movl $4,%edx int $0x80 movl $1, %eax movl $0,%ebx int $0x80 .Lfe1: .size main,.Lfe1-main .ident "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.3 2.96-110)" Can someone please help me with that??? Thanks in advance,Chris. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: x86 Endiannes and libc printf 2005-07-08 19:05 x86 Endiannes and libc printf Nanakos Chrysostomos @ 2005-07-08 19:25 ` Vadim Lobanov 2005-07-08 20:17 ` Richard B. Johnson 2005-07-09 8:19 ` Paolo Ornati 2 siblings, 0 replies; 7+ messages in thread From: Vadim Lobanov @ 2005-07-08 19:25 UTC (permalink / raw To: Nanakos Chrysostomos; +Cc: linux-assembly The x86 architecture is little endian. By definition, "little endian" means that the little end of a multi-byte number is stored first. In other words, if you want to store the integer value '1', then the bytes will be arranged in the following order in memory: 0x01 0x00 0x00 0x00 In your example below, the bytes are arranged in the following order: 0x44 0x43 0x42 0x41 since that's the order in which they were read from the file. That's why when you print the first byte in the buffer, you get 0x44. However, because x86 is little endian, it will interpret 0x44 as the low-order digits in the "overall number", and 0x41 as the high-order digits in the "overall number". Printf doesn't need to do anything special when printing the number. I don't remember the code offhand, but it probably does something along the lines of: while (num != 0) { print(num & 0x0000000F); go to previous character; num >>= 4; } The processor is the one that takes that 4-byte quantity and treats the first bit as low-order (first to get split off by the mask and shift). That's what makes the processor little endian. Big endian flip the byte order, as you might expect. They store the big end of the number in the first byte, etc. Hope that was more clear than confusing. :-) - Vadim L On Fri, 8 Jul 2005, Nanakos Chrysostomos wrote: > Hi all, > i am searching for a few hours now the endianness in the x86 > environment,and i have the following > snippets of code,which in some places i cant understand.Please help me!!! > > endian.c > --------- > #include <stdio.h> > #include <fcntl.h> > #include <sys/types.h> > > > int main() > { > char *filename= "endian.txt"; > unsigned long buf; > char *k=(char *)&buf; > int fd; > > fd = open("makis",O_RDONLY); > > read(fd,&buf,4); > > printf("%.4s\n",&buf); > printf("%p\n",buf); > printf("&buf: %p %#x %p\n",&buf,*k,k); > return 0; > } > > endian.txt > ---------- > DBCA > > > #./read > DCBA > 0x41424344 > &buf: 0xbffff8b0 0x44 0xbffff8b0 > # > > > In the first printf everything is fine.In the second printf we see that > the 0x44,0x43,0x42,0x41 byte-data is printed in the revserse order,while > we can see > that in memory it is in the right order after the read system call.Why > this happens?Is it being internal in printf??? > > > I tried to explain that with similar approaches like unions, but the same > happens.endian2.c > --------- > #include <stdio.h> > #include <unistd.h> > > int main() > { > > union { > long s; > char c[sizeof(long)]; > } un; > > un.s = 0x41424344; > if (sizeof(short) == 2) { > if (un.c[0] == 0x41 && un.c[1] == 0x42) > printf("big-endian\n"); > else if (un.c[0] == 0x44 && un.c[1] == 0x43) > printf("little-endian\n"); > else > printf("unknown\n"); > } else > printf("sizeof(short) = %d\n", sizeof(short)); > > > printf("%.4s\n",&(un.s)); > printf("%p\n",(un.s)); > _exit(0); > > } > > > > The same as above.Should i assume that an internal operation in printf is > doing this??? > > > I also used the above assembly example,to see what > happens.Memory-to-memory movements (with push & pop) dont inherit the > little-endian way.Is this > happens only from memory-to-register and the opposite???? > > read.asm > -------- > section .bss > buf resd 1 > > section .data > pathname db "makis",0 > section .text > > global _start > > _start: > > ;open > mov eax,5 > mov ebx,pathname > mov ecx,02 > int 0x80 > > > ;read > mov ebx,eax > mov eax,3 > mov ecx,buf > mov edx,4 > int 0x80 > > ;write > mov eax,4 > mov ebx,1 > mov ecx,buf > mov edx,4 > int 0x80 > > ;exit > mov eax,1 > mov ebx,0 > int 0x80 > > > Everything works just fine.Can anynone knows how can i revserse the order > of the data,from 0x44434241 to 0x41424344 into the stack?? Without using > AND and OR.Can > this be done???? > > > The last two examples is the output from gcc,one "fixed" from me to find > out what is in the stack and the other is the default output from the > first example.My example > has been changed only in the printf call from the library,after the read > call,which i suppose is the "black box" to the "problem" i cant > understand...........read.s > ------ > .file "read.c" > .version "01.01" > gcc2_compiled.: > .section .rodata > .LC0: > .string "makis" > .LC1: > .string "%#x\n" > .text > .align 4 > .globl main > .type main,@function > main: > pushl %ebp > movl %esp, %ebp > subl $24, %esp > movl $.LC0, -4(%ebp) > subl $8, %esp > pushl $0 > pushl $.LC0 > call open > > addl $16, %esp > movl %eax, %eax > movl %eax, -12(%ebp) > subl $4, %esp > pushl $4 > leal -8(%ebp), %eax > pushl %eax > pushl -12(%ebp) > call read > > ---->Before it was the printf call which retrieves its arguments from the > stack.Which as we can see its different for every conversion specifier.. > > movl $4,%eax > movl $1,%ebx > leal -8(%ebp),%ecx > movl $4,%edx > int $0x80 > > > movl $1, %eax > movl $0,%ebx > int $0x80 > .Lfe1: > .size main,.Lfe1-main > .ident "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.3 2.96-110)" > > > > Can someone please help me with that??? > > Thanks in advance,Chris. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: x86 Endiannes and libc printf 2005-07-08 19:05 x86 Endiannes and libc printf Nanakos Chrysostomos 2005-07-08 19:25 ` Vadim Lobanov @ 2005-07-08 20:17 ` Richard B. Johnson 2005-07-09 8:19 ` Paolo Ornati 2 siblings, 0 replies; 7+ messages in thread From: Richard B. Johnson @ 2005-07-08 20:17 UTC (permalink / raw To: Nanakos Chrysostomos; +Cc: linux-assembly On Fri, 8 Jul 2005, Nanakos Chrysostomos wrote: > Hi all, > i am searching for a few hours now the endianness in the x86 > environment,and i have the following > snippets of code,which in some places i cant understand.Please help me!!! Intel ix86 processors have the lowest byte in the lowest memory location. Therefore if you had an unsigned SHORT int (16 bits) of 0xcdef, it would be represented in memory as: .byte 0xef, 0xcd | |____ Highest byte |__________ Lowest byte Now, it turns out that the two WORDS of a long int also have the lowest word stored in the lowest memory location. Therefore, if we had a long int of 0x89ABCDEF it would be stored as: .byte 0xef, 0xcd # Lowest word .byte 0xab, 0x89 # Highest word | |____ Highest byte |__________ Lowest byte Long longs follow the same rule. It makes no difference if data are on the stack or in memory. But.... You may need to understand how 'C' views things. The default for the gcc compiler is to think of chars as signed. This may be a problem if you attempt to view a memory object that is large for its type, and the observation window is signed. The compiler tools and the runtime library will sign-extend your observation. For instance, a char of value 0xff will be 0xffff when viewed as a short and 0xffffffff when viewed as an int. This, even though you know from observing the byte in memory that it was 0xff. This is absolutely correct because all 3 types are -1 in value. Therefore, when experimenting with bytes and bits, it's certainly recommended that you use unsigned types so you don't get this extension. Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by Dictator Bush. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: x86 Endiannes and libc printf 2005-07-08 19:05 x86 Endiannes and libc printf Nanakos Chrysostomos 2005-07-08 19:25 ` Vadim Lobanov 2005-07-08 20:17 ` Richard B. Johnson @ 2005-07-09 8:19 ` Paolo Ornati 2005-07-09 14:41 ` Nanakos Chrysostomos 2 siblings, 1 reply; 7+ messages in thread From: Paolo Ornati @ 2005-07-09 8:19 UTC (permalink / raw To: nanakos; +Cc: linux-assembly On Fri, 8 Jul 2005 22:05:17 +0300 (EEST) "Nanakos Chrysostomos" <nanakos@wired-net.gr> wrote: > int main() > { > char *filename= "endian.txt"; > unsigned long buf; > char *k=(char *)&buf; > int fd; > > fd = open("makis",O_RDONLY); > > read(fd,&buf,4); > > printf("%.4s\n",&buf); > printf("%p\n",buf); > printf("&buf: %p %#x %p\n",&buf,*k,k); > return 0; > } > > endian.txt > ---------- > DBCA > > > #./read > DCBA > 0x41424344 > &buf: 0xbffff8b0 0x44 0xbffff8b0 > # > > > In the first printf everything is fine.In the second printf we see > that the 0x44,0x43,0x42,0x41 byte-data is printed in the revserse > order,while we can see > that in memory it is in the right order after the read system call.Why > this happens?Is it being internal in printf??? No, it isn't printf... it's just that x86 CPU are little endian. When you do: printf("%.4s\n",&buf); you are printing these four bytes in memory-order. When you do: printf("%p\n",buf); you are treating "buf" as a single number. Since your CPU is little endian it expects the low order bytes to come first. Another example: #include <stdio.h> int main(void) { unsigned int buf = 0x11223344; unsigned char *x = (unsigned char*)&buf; printf("0x%hhx\n", x[0]); return 0; } printf will print 0x44... simply because the 0x11223344 numer is stored in memory in this way: address value &buf 0x44 &buf+1 0x33 &buf+2 0x22 &buf+3 0x11 For more info about x86 CPU look here: http://developer.intel.com/design/pentium4/manuals/index_new.htm -- Paolo Ornati Linux 2.6.12.2 on x86_64 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: x86 Endiannes and libc printf 2005-07-09 8:19 ` Paolo Ornati @ 2005-07-09 14:41 ` Nanakos Chrysostomos 2005-07-09 14:51 ` Robert Plantz 2005-07-09 16:31 ` Paolo Ornati 0 siblings, 2 replies; 7+ messages in thread From: Nanakos Chrysostomos @ 2005-07-09 14:41 UTC (permalink / raw To: Paolo Ornati; +Cc: linux-assembly Thanks very much for your response. I know that x86 CPU is little endian,but how the printf prints out the that number which starts from the first lowest byte in mem /stack which is 0x44?? Maybe internally knows that this is a cast??? Lest assume the following example: #include <stdio.h> int main(void) { unsigned int buf = {0x44,0x43,0x42,0x41}; unsigned char *x = (unsigned char*)&buf; printf("%#x\n", *(int *)x); return 0; } prints out the buf array in reverse order ,little-ednian, and treats it as a number, 0x41424344 But check out the following example: endian.txt ----------- DBCA .section .data .filename: .string "endian.txt" .text .globl _start _start: pushl %ebp movl %esp,%ebp movl $5,%eax movl $.filename,%ebx movl $0x00,%ecx int $0x80 movl %eax,%ebx movl $3,%eax leal -8(%ebp),%ecx movl $4,%edx int $0x80 movl $4,%eax movl $1,%ebx movl $0x0a,-4(%ebp) leal -8(%ebp),%ecx movl $5,%edx int $0x80 movl $1,%eax movl $0,%ebx int $0x80 #as -o example.o example.s #ld -o example example.o #./example DBCA We print out the memory from the lowest byte-order. How can we print out by using the system call 'write' this byte-order and treat it like a number,as printf does.???????????????????????????????? Thanks in advance. example.s ---------- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: x86 Endiannes and libc printf 2005-07-09 14:41 ` Nanakos Chrysostomos @ 2005-07-09 14:51 ` Robert Plantz 2005-07-09 16:31 ` Paolo Ornati 1 sibling, 0 replies; 7+ messages in thread From: Robert Plantz @ 2005-07-09 14:51 UTC (permalink / raw To: nanakos; +Cc: Paolo Ornati, linux-assembly The write system call function is at a lower level than printf. The job of printf is to convert C data types to character strings for display. That is, the decimal integer 123 is converted to the characters '1', '2', '3', and '\0' (the null character). This character string is sent to write, which writes one byte (character) at a time to standard out (usually the screen). printf must take endianness into account when doing its type conversions. write is at a much lower level. It knows nothing about data types or endianess. I simply writes one byte at a time to the specified file descriptor. Bob On Sat, 2005-07-09 at 17:41 +0300, Nanakos Chrysostomos wrote: > Thanks very much for your response. > I know that x86 CPU is little endian,but how the printf prints out the > that number which starts from the first lowest byte in mem /stack which is > 0x44?? > Maybe internally knows that this is a cast??? > Lest assume the following example: > #include <stdio.h> > > int main(void) { > unsigned int buf = {0x44,0x43,0x42,0x41}; > unsigned char *x = (unsigned char*)&buf; > printf("%#x\n", *(int *)x); > return 0; > } > > prints out the buf array in reverse order ,little-ednian, and treats it as > a number, > > 0x41424344 > > But check out the following example: > > endian.txt > ----------- > DBCA > .section .data > .filename: .string "endian.txt" > > > .text > .globl _start > > _start: > pushl %ebp > movl %esp,%ebp > > movl $5,%eax > movl $.filename,%ebx > movl $0x00,%ecx > int $0x80 > > movl %eax,%ebx > movl $3,%eax > leal -8(%ebp),%ecx > movl $4,%edx > int $0x80 > > movl $4,%eax > movl $1,%ebx > movl $0x0a,-4(%ebp) > leal -8(%ebp),%ecx > movl $5,%edx > int $0x80 > > movl $1,%eax > movl $0,%ebx > int $0x80 > > > #as -o example.o example.s > #ld -o example example.o > #./example > DBCA > > We print out the memory from the lowest byte-order. > How can we print out by using the system call 'write' this byte-order and > treat it like a number,as printf does.???????????????????????????????? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: x86 Endiannes and libc printf 2005-07-09 14:41 ` Nanakos Chrysostomos 2005-07-09 14:51 ` Robert Plantz @ 2005-07-09 16:31 ` Paolo Ornati 1 sibling, 0 replies; 7+ messages in thread From: Paolo Ornati @ 2005-07-09 16:31 UTC (permalink / raw To: nanakos; +Cc: linux-assembly On Sat, 9 Jul 2005 17:41:54 +0300 (EEST) "Nanakos Chrysostomos" <nanakos@wired-net.gr> wrote: > #as -o example.o example.s > #ld -o example example.o > #./example > DBCA > > We print out the memory from the lowest byte-order. > How can we print out by using the system call 'write' this byte-order > and treat it like a number,as printf > does.???????????????????????????????? 1) from a quick read of your assebly code it seems that you are reading some bytes from a file and writing them to standard output. These bytes are the ASCII codes of D, B, C, A NOTE that this is very different than hexadecimal values D, B, C, A. I hope you agree with me that 'A' != 0xA... sice 'A' = 65, and 0xA = 10. But maybe you are doing it on purpose... 2) if you want "treat it as a number" in assembly, just put these bytes in a register. Assuming that they are in little endian order at -8(%ebp): movl -8(%ebp), %eax Now you have the WHOLE number in %eax register. 3) To print the value in a HUMAN-READABLE way you should do something like this (written in C for semplicity): const char digits[] = "0123456789abcdef"; unsigned int x = 64335252; // this is the NUMBER to print unsigned int base = 10; // you can change it to anything from 2 to 16 char tmp; while (x) { tmp = x % base; // extract low order digit x /= base; // discard low order digit PRINT_WITH_SOMETHINTG( digit[tmp] ); } This is basically what printf does... PS: in this example I've done the RAW NUMBER DIGIT to ASCII conversion with an array... but you can do it in other ways as well. -- Paolo Ornati Linux 2.6.12.2 on x86_64 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-07-09 16:31 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-07-08 19:05 x86 Endiannes and libc printf Nanakos Chrysostomos 2005-07-08 19:25 ` Vadim Lobanov 2005-07-08 20:17 ` Richard B. Johnson 2005-07-09 8:19 ` Paolo Ornati 2005-07-09 14:41 ` Nanakos Chrysostomos 2005-07-09 14:51 ` Robert Plantz 2005-07-09 16:31 ` Paolo Ornati
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).