linux-assembly.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* x86 Endiannes and libc printf
@ 2005-07-08 19:05 Nanakos Chrysostomos
  2005-07-08 19:25 ` Vadim Lobanov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Nanakos Chrysostomos @ 2005-07-08 19:05 UTC (permalink / raw
  To: linux-assembly

Hi all,
i am searching for a few hours now the endianness in the x86
environment,and i have the following
snippets of code,which in some places i cant understand.Please help me!!!

endian.c
---------
#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>


int main()
{
        char *filename= "endian.txt";
        unsigned long buf;
        char *k=(char *)&buf;
        int fd;

        fd = open("makis",O_RDONLY);

        read(fd,&buf,4);

        printf("%.4s\n",&buf);
        printf("%p\n",buf);
        printf("&buf: %p %#x %p\n",&buf,*k,k);
        return 0;
}

endian.txt
----------
DBCA


#./read
DCBA
0x41424344
&buf: 0xbffff8b0 0x44 0xbffff8b0
#


In the first printf everything is fine.In the second printf we see that
the 0x44,0x43,0x42,0x41 byte-data is printed in the revserse order,while
we can see
that in memory it is in the right order after the read system call.Why
this happens?Is it being internal in printf???


I tried to explain that with similar approaches like unions, but the same
happens.endian2.c
---------
#include <stdio.h>
#include <unistd.h>

int   main()
{

        union {
                long   s;
                char    c[sizeof(long)];
        } un;

        un.s = 0x41424344;
        if (sizeof(short) == 2) {
                if (un.c[0] == 0x41 && un.c[1] == 0x42)
                        printf("big-endian\n");
                else if (un.c[0] == 0x44 && un.c[1] == 0x43)
                        printf("little-endian\n");
                else
                        printf("unknown\n");
        } else
                printf("sizeof(short) = %d\n", sizeof(short));


        printf("%.4s\n",&(un.s));
        printf("%p\n",(un.s));
        _exit(0);

}



The same as above.Should i assume that an internal operation in printf is
doing this???


I also used the above assembly example,to see what
happens.Memory-to-memory movements (with push & pop) dont inherit the
little-endian way.Is this
happens only from memory-to-register and the opposite????

read.asm
--------
section .bss
buf     resd    1

section .data
        pathname db "makis",0
section .text

global _start

_start:

;open
mov eax,5
mov ebx,pathname
mov ecx,02
int 0x80


;read
mov ebx,eax
mov eax,3
mov ecx,buf
mov edx,4
int 0x80

;write
mov eax,4
mov ebx,1
mov ecx,buf
mov edx,4
int 0x80

;exit
mov eax,1
mov ebx,0
int 0x80


Everything works just fine.Can anynone knows how can i revserse the order
of the data,from 0x44434241 to 0x41424344 into the stack?? Without using
AND and OR.Can
this be done????


The last two examples is the output from gcc,one "fixed" from me to find
out what is in the stack and the other is the default output from the
first example.My example
has been changed only in the printf call from the library,after the read
call,which i suppose is the "black box" to the "problem" i cant
understand...........read.s
------
        .file   "read.c"
        .version        "01.01"
gcc2_compiled.:
                .section        .rodata
.LC0:
        .string "makis"
.LC1:
        .string "%#x\n"
.text
        .align 4
.globl main
        .type    main,@function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        movl    $.LC0, -4(%ebp)
        subl    $8, %esp
        pushl   $0
        pushl   $.LC0
        call    open

        addl    $16, %esp
        movl    %eax, %eax
        movl    %eax, -12(%ebp)
        subl    $4, %esp
        pushl   $4
        leal    -8(%ebp), %eax
        pushl   %eax
        pushl   -12(%ebp)
        call    read

---->Before it was the printf call which retrieves its arguments from the
stack.Which as we can see its different for every conversion specifier..

        movl    $4,%eax
        movl    $1,%ebx
        leal    -8(%ebp),%ecx
        movl    $4,%edx
        int     $0x80


        movl    $1, %eax
        movl    $0,%ebx
        int $0x80
.Lfe1:
        .size    main,.Lfe1-main
        .ident  "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.3 2.96-110)"



Can someone please help me with that???

Thanks in advance,Chris.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: x86 Endiannes and libc printf
  2005-07-08 19:05 x86 Endiannes and libc printf Nanakos Chrysostomos
@ 2005-07-08 19:25 ` Vadim Lobanov
  2005-07-08 20:17 ` Richard B. Johnson
  2005-07-09  8:19 ` Paolo Ornati
  2 siblings, 0 replies; 7+ messages in thread
From: Vadim Lobanov @ 2005-07-08 19:25 UTC (permalink / raw
  To: Nanakos Chrysostomos; +Cc: linux-assembly

The x86 architecture is little endian. By definition, "little endian"
means that the little end of a multi-byte number is stored first. In
other words, if you want to store the integer value '1', then the bytes
will be arranged in the following order in memory:
  0x01 0x00 0x00 0x00

In your example below, the bytes are arranged in the following order:
  0x44 0x43 0x42 0x41
since that's the order in which they were read from the file.
That's why when you print the first byte in the buffer, you get
0x44. However, because x86 is little endian, it will interpret 0x44 as
the low-order digits in the "overall number", and 0x41 as the high-order
digits in the "overall number".

Printf doesn't need to do anything special when printing the number. I
don't remember the code offhand, but it probably does something along
the lines of:
    while (num != 0) {
	print(num & 0x0000000F);
        go to previous character;
        num >>= 4;
    }
The processor is the one that takes that 4-byte quantity and treats the
first bit as low-order (first to get split off by the mask and shift).
That's what makes the processor little endian.

Big endian flip the byte order, as you might expect. They store the big
end of the number in the first byte, etc.

Hope that was more clear than confusing. :-)

- Vadim L

On Fri, 8 Jul 2005, Nanakos Chrysostomos wrote:

> Hi all,
> i am searching for a few hours now the endianness in the x86
> environment,and i have the following
> snippets of code,which in some places i cant understand.Please help me!!!
>
> endian.c
> ---------
> #include <stdio.h>
> #include <fcntl.h>
> #include <sys/types.h>
>
>
> int main()
> {
>         char *filename= "endian.txt";
>         unsigned long buf;
>         char *k=(char *)&buf;
>         int fd;
>
>         fd = open("makis",O_RDONLY);
>
>         read(fd,&buf,4);
>
>         printf("%.4s\n",&buf);
>         printf("%p\n",buf);
>         printf("&buf: %p %#x %p\n",&buf,*k,k);
>         return 0;
> }
>
> endian.txt
> ----------
> DBCA
>
>
> #./read
> DCBA
> 0x41424344
> &buf: 0xbffff8b0 0x44 0xbffff8b0
> #
>
>
> In the first printf everything is fine.In the second printf we see that
> the 0x44,0x43,0x42,0x41 byte-data is printed in the revserse order,while
> we can see
> that in memory it is in the right order after the read system call.Why
> this happens?Is it being internal in printf???
>
>
> I tried to explain that with similar approaches like unions, but the same
> happens.endian2.c
> ---------
> #include <stdio.h>
> #include <unistd.h>
>
> int   main()
> {
>
>         union {
>                 long   s;
>                 char    c[sizeof(long)];
>         } un;
>
>         un.s = 0x41424344;
>         if (sizeof(short) == 2) {
>                 if (un.c[0] == 0x41 && un.c[1] == 0x42)
>                         printf("big-endian\n");
>                 else if (un.c[0] == 0x44 && un.c[1] == 0x43)
>                         printf("little-endian\n");
>                 else
>                         printf("unknown\n");
>         } else
>                 printf("sizeof(short) = %d\n", sizeof(short));
>
>
>         printf("%.4s\n",&(un.s));
>         printf("%p\n",(un.s));
>         _exit(0);
>
> }
>
>
>
> The same as above.Should i assume that an internal operation in printf is
> doing this???
>
>
> I also used the above assembly example,to see what
> happens.Memory-to-memory movements (with push & pop) dont inherit the
> little-endian way.Is this
> happens only from memory-to-register and the opposite????
>
> read.asm
> --------
> section .bss
> buf     resd    1
>
> section .data
>         pathname db "makis",0
> section .text
>
> global _start
>
> _start:
>
> ;open
> mov eax,5
> mov ebx,pathname
> mov ecx,02
> int 0x80
>
>
> ;read
> mov ebx,eax
> mov eax,3
> mov ecx,buf
> mov edx,4
> int 0x80
>
> ;write
> mov eax,4
> mov ebx,1
> mov ecx,buf
> mov edx,4
> int 0x80
>
> ;exit
> mov eax,1
> mov ebx,0
> int 0x80
>
>
> Everything works just fine.Can anynone knows how can i revserse the order
> of the data,from 0x44434241 to 0x41424344 into the stack?? Without using
> AND and OR.Can
> this be done????
>
>
> The last two examples is the output from gcc,one "fixed" from me to find
> out what is in the stack and the other is the default output from the
> first example.My example
> has been changed only in the printf call from the library,after the read
> call,which i suppose is the "black box" to the "problem" i cant
> understand...........read.s
> ------
>         .file   "read.c"
>         .version        "01.01"
> gcc2_compiled.:
>                 .section        .rodata
> .LC0:
>         .string "makis"
> .LC1:
>         .string "%#x\n"
> .text
>         .align 4
> .globl main
>         .type    main,@function
> main:
>         pushl   %ebp
>         movl    %esp, %ebp
>         subl    $24, %esp
>         movl    $.LC0, -4(%ebp)
>         subl    $8, %esp
>         pushl   $0
>         pushl   $.LC0
>         call    open
>
>         addl    $16, %esp
>         movl    %eax, %eax
>         movl    %eax, -12(%ebp)
>         subl    $4, %esp
>         pushl   $4
>         leal    -8(%ebp), %eax
>         pushl   %eax
>         pushl   -12(%ebp)
>         call    read
>
> ---->Before it was the printf call which retrieves its arguments from the
> stack.Which as we can see its different for every conversion specifier..
>
>         movl    $4,%eax
>         movl    $1,%ebx
>         leal    -8(%ebp),%ecx
>         movl    $4,%edx
>         int     $0x80
>
>
>         movl    $1, %eax
>         movl    $0,%ebx
>         int $0x80
> .Lfe1:
>         .size    main,.Lfe1-main
>         .ident  "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.3 2.96-110)"
>
>
>
> Can someone please help me with that???
>
> Thanks in advance,Chris.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: x86 Endiannes and libc printf
  2005-07-08 19:05 x86 Endiannes and libc printf Nanakos Chrysostomos
  2005-07-08 19:25 ` Vadim Lobanov
@ 2005-07-08 20:17 ` Richard B. Johnson
  2005-07-09  8:19 ` Paolo Ornati
  2 siblings, 0 replies; 7+ messages in thread
From: Richard B. Johnson @ 2005-07-08 20:17 UTC (permalink / raw
  To: Nanakos Chrysostomos; +Cc: linux-assembly

On Fri, 8 Jul 2005, Nanakos Chrysostomos wrote:

> Hi all,
> i am searching for a few hours now the endianness in the x86
> environment,and i have the following
> snippets of code,which in some places i cant understand.Please help me!!!

Intel ix86 processors have the lowest byte in the lowest
memory location. Therefore if you had an unsigned SHORT int
(16 bits) of 0xcdef, it would be represented in memory as:

.byte	0xef, 0xcd
            |     |____ Highest byte
            |__________ Lowest byte

Now, it turns out that the two WORDS of a long int also
have the lowest word stored in the lowest memory location.

Therefore, if we had a long int of 0x89ABCDEF it would be
stored as:

.byte	0xef, 0xcd	# Lowest word
.byte	0xab, 0x89	# Highest word
            |     |____ Highest byte
            |__________ Lowest byte


Long longs follow the same rule.

It makes no difference if data are on the stack or in memory.
But.... You may need to understand how 'C' views things.
The default for the gcc compiler is to think of chars as
signed. This may be a problem if you attempt to view
a memory object that is large for its type, and the
observation window is signed. The compiler tools and
the runtime library will sign-extend your observation.
For instance, a char of value 0xff will be 0xffff when
viewed as a short and 0xffffffff when viewed as an int.
This, even though you know from observing the byte in
memory that it was 0xff. This is absolutely correct
because all 3 types are -1 in value.

Therefore, when experimenting with bytes and bits,
it's certainly recommended that you use unsigned types
so you don't get this extension.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by Dictator Bush.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: x86 Endiannes and libc printf
  2005-07-08 19:05 x86 Endiannes and libc printf Nanakos Chrysostomos
  2005-07-08 19:25 ` Vadim Lobanov
  2005-07-08 20:17 ` Richard B. Johnson
@ 2005-07-09  8:19 ` Paolo Ornati
  2005-07-09 14:41   ` Nanakos Chrysostomos
  2 siblings, 1 reply; 7+ messages in thread
From: Paolo Ornati @ 2005-07-09  8:19 UTC (permalink / raw
  To: nanakos; +Cc: linux-assembly

On Fri, 8 Jul 2005 22:05:17 +0300 (EEST)
"Nanakos Chrysostomos" <nanakos@wired-net.gr> wrote:

> int main()
> {
>         char *filename= "endian.txt";
>         unsigned long buf;
>         char *k=(char *)&buf;
>         int fd;
> 
>         fd = open("makis",O_RDONLY);
> 
>         read(fd,&buf,4);
> 
>         printf("%.4s\n",&buf);
>         printf("%p\n",buf);
>         printf("&buf: %p %#x %p\n",&buf,*k,k);
>         return 0;
> }
> 
> endian.txt
> ----------
> DBCA
> 
> 
> #./read
> DCBA
> 0x41424344
> &buf: 0xbffff8b0 0x44 0xbffff8b0
> #
> 
> 
> In the first printf everything is fine.In the second printf we see
> that the 0x44,0x43,0x42,0x41 byte-data is printed in the revserse
> order,while we can see
> that in memory it is in the right order after the read system call.Why
> this happens?Is it being internal in printf???


No, it isn't printf... it's just that x86 CPU are little endian.


When you do:
	printf("%.4s\n",&buf);
you are printing these four bytes in memory-order.


When you do:
	printf("%p\n",buf);
you are treating "buf" as a single number. Since your CPU is little
endian it expects the low order bytes to come first.


Another example:

#include <stdio.h>

int main(void) {
	unsigned int buf = 0x11223344;
	unsigned char *x = (unsigned char*)&buf;
	printf("0x%hhx\n", x[0]);
	return 0;
}


printf will print 0x44... simply because the 0x11223344 numer is stored
in memory in this way:

address		value
&buf		0x44
&buf+1		0x33
&buf+2		0x22
&buf+3		0x11


For more info about x86 CPU look here:

http://developer.intel.com/design/pentium4/manuals/index_new.htm

-- 
	Paolo Ornati
	Linux 2.6.12.2 on x86_64

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: x86 Endiannes and libc printf
  2005-07-09  8:19 ` Paolo Ornati
@ 2005-07-09 14:41   ` Nanakos Chrysostomos
  2005-07-09 14:51     ` Robert Plantz
  2005-07-09 16:31     ` Paolo Ornati
  0 siblings, 2 replies; 7+ messages in thread
From: Nanakos Chrysostomos @ 2005-07-09 14:41 UTC (permalink / raw
  To: Paolo Ornati; +Cc: linux-assembly

Thanks very much for your response.
I know that x86 CPU is little endian,but how the printf prints out the
that number which starts from the first lowest byte in mem /stack which is
0x44??
Maybe internally knows that this is a cast???
Lest assume the following example:
#include <stdio.h>

int main(void) {
 	unsigned int buf = {0x44,0x43,0x42,0x41};
 	unsigned char *x = (unsigned char*)&buf;
 	printf("%#x\n", *(int *)x);
 	return 0;
 }

prints out the buf array in reverse order ,little-ednian, and treats it as
a number,

0x41424344

But check out the following example:

endian.txt
-----------
DBCA
.section .data
        .filename: .string  "endian.txt"


.text
.globl _start

_start:
        pushl %ebp
        movl %esp,%ebp

        movl $5,%eax
        movl $.filename,%ebx
        movl $0x00,%ecx
        int $0x80

        movl %eax,%ebx
        movl $3,%eax
        leal -8(%ebp),%ecx
        movl $4,%edx
        int $0x80

        movl $4,%eax
        movl $1,%ebx
        movl $0x0a,-4(%ebp)
        leal -8(%ebp),%ecx
        movl $5,%edx
        int $0x80

        movl $1,%eax
        movl $0,%ebx
        int $0x80


#as -o example.o example.s
#ld -o example example.o
#./example
DBCA

We print out the memory from the lowest byte-order.
How can we print out by using the system call 'write' this byte-order and
treat it like a number,as printf does.????????????????????????????????

Thanks in advance.


example.s
----------



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: x86 Endiannes and libc printf
  2005-07-09 14:41   ` Nanakos Chrysostomos
@ 2005-07-09 14:51     ` Robert Plantz
  2005-07-09 16:31     ` Paolo Ornati
  1 sibling, 0 replies; 7+ messages in thread
From: Robert Plantz @ 2005-07-09 14:51 UTC (permalink / raw
  To: nanakos; +Cc: Paolo Ornati, linux-assembly

The write system call function is at a lower level than printf. The job
of printf is to convert C data types to character strings for display.
That is, the decimal integer 123 is converted to the characters '1',
'2', '3', and '\0' (the null character).

This character string is sent to write, which writes one byte
(character) at a time to standard out (usually the screen).

printf must take endianness into account when doing its type
conversions.

write is at a much lower level. It knows nothing about data types or
endianess. I simply writes one byte at a time to the specified file
descriptor.

Bob

On Sat, 2005-07-09 at 17:41 +0300, Nanakos Chrysostomos wrote:
> Thanks very much for your response.
> I know that x86 CPU is little endian,but how the printf prints out the
> that number which starts from the first lowest byte in mem /stack which is
> 0x44??
> Maybe internally knows that this is a cast???
> Lest assume the following example:
> #include <stdio.h>
> 
> int main(void) {
>  	unsigned int buf = {0x44,0x43,0x42,0x41};
>  	unsigned char *x = (unsigned char*)&buf;
>  	printf("%#x\n", *(int *)x);
>  	return 0;
>  }
> 
> prints out the buf array in reverse order ,little-ednian, and treats it as
> a number,
> 
> 0x41424344
> 
> But check out the following example:
> 
> endian.txt
> -----------
> DBCA
> .section .data
>         .filename: .string  "endian.txt"
> 
> 
> .text
> .globl _start
> 
> _start:
>         pushl %ebp
>         movl %esp,%ebp
> 
>         movl $5,%eax
>         movl $.filename,%ebx
>         movl $0x00,%ecx
>         int $0x80
> 
>         movl %eax,%ebx
>         movl $3,%eax
>         leal -8(%ebp),%ecx
>         movl $4,%edx
>         int $0x80
> 
>         movl $4,%eax
>         movl $1,%ebx
>         movl $0x0a,-4(%ebp)
>         leal -8(%ebp),%ecx
>         movl $5,%edx
>         int $0x80
> 
>         movl $1,%eax
>         movl $0,%ebx
>         int $0x80
> 
> 
> #as -o example.o example.s
> #ld -o example example.o
> #./example
> DBCA
> 
> We print out the memory from the lowest byte-order.
> How can we print out by using the system call 'write' this byte-order and
> treat it like a number,as printf does.????????????????????????????????



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: x86 Endiannes and libc printf
  2005-07-09 14:41   ` Nanakos Chrysostomos
  2005-07-09 14:51     ` Robert Plantz
@ 2005-07-09 16:31     ` Paolo Ornati
  1 sibling, 0 replies; 7+ messages in thread
From: Paolo Ornati @ 2005-07-09 16:31 UTC (permalink / raw
  To: nanakos; +Cc: linux-assembly

On Sat, 9 Jul 2005 17:41:54 +0300 (EEST)
"Nanakos Chrysostomos" <nanakos@wired-net.gr> wrote:

> #as -o example.o example.s
> #ld -o example example.o
> #./example
> DBCA
> 
> We print out the memory from the lowest byte-order.
> How can we print out by using the system call 'write' this byte-order
> and treat it like a number,as printf
> does.????????????????????????????????


1) from a quick read of your assebly code it seems that you are reading
some bytes from a file and writing them to standard output.

These bytes are the ASCII codes of D, B, C, A
NOTE that this is very different than hexadecimal values D, B, C, A.

I hope you agree with me that 'A' != 0xA... sice 'A' = 65, and 0xA = 10.

But maybe you are doing it on purpose...


2) if you want "treat it as a number" in assembly, just put these bytes
in a register.

Assuming that they are in little endian order at -8(%ebp):

	movl	-8(%ebp), %eax

Now you have the WHOLE number in %eax register.


3) To print the value in a HUMAN-READABLE way you should do something
like this (written in C for semplicity):

	const char digits[] = "0123456789abcdef";
	unsigned int x = 64335252;	// this is the NUMBER to print
	unsigned int base = 10;		// you can change it to anything from 2 to 16
	char tmp;

	while (x) {
		tmp = x % base;		// extract low order digit
		x /= base;		// discard low order digit
		PRINT_WITH_SOMETHINTG( digit[tmp] );
	}

This is basically what printf does...


PS: in this example I've done the RAW NUMBER DIGIT to ASCII conversion
with an array... but you can do it in other ways as well.


-- 
	Paolo Ornati
	Linux 2.6.12.2 on x86_64

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-07-09 16:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-08 19:05 x86 Endiannes and libc printf Nanakos Chrysostomos
2005-07-08 19:25 ` Vadim Lobanov
2005-07-08 20:17 ` Richard B. Johnson
2005-07-09  8:19 ` Paolo Ornati
2005-07-09 14:41   ` Nanakos Chrysostomos
2005-07-09 14:51     ` Robert Plantz
2005-07-09 16:31     ` Paolo Ornati

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).