Advertisement
BaSs_HaXoR

Assembly Language Tutorial (x86)

Mar 11th, 2015
721
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. Assembly Language Tutorial (x86)
  2. ##################################################################################################
  3. For more detailed information about the architecture and about processor instructions, you will need access to a 486 (or 386+) microprocessor manual. The one I like is entitled The 80386 book, by Ross P. Nelson. (This book is copyright 1988 by Microsoft Press, ISBN 1-55615-138-1.) Intel processor manuals may also be found at http://www.x86.org/intel.doc/586manuals.htm.
  4.  
  5. The GNU Assembler, gas, uses a different syntax from what you will likely find in any x86 reference manual, and the two-operand instructions have the source and destinations in the opposite order. Here are the types of the gas instructions:
  6.  
  7.     opcode                    (e.g., pushal)
  8.     opcode operand            (e.g., pushl %edx)
  9.     opcode source,dest        (e.g., movl %edx,%eax) (e.g., addl %edx,%eax)
  10. Where there are two operands, the rightmost one is the destination. The leftmost one is the source.
  11. For example, movl %edx, %eax means Move the contents of the edx register into the eax register. For another example, addl %edx,%eax means Add the contents of the edx and eax registers, and place the sum in the eax register.
  12.  
  13. Included in the syntactic differences between gas and Intel assemblers is that all register names used as operands must be preceeded by a percent (%) sign, and instruction names usually end in either "l", "w", or "b", indicating the size of the operands: long (32 bits), word (16 bits), or byte (8 bits), respectively. For our purposes, we will usually be using the "l" (long) suffix.
  14.  
  15. 80386+ Register Set
  16.  
  17. There are different names for the same register depending on what part of the register you want to use. To use the first set of 8 bits of eax (bits 0-7), you would use %al. For the second set of 8 bits (bits 8-15) of eax you would use %ah. To refer to the lowest 16 bits of eax (bits 0-15) together you would use %ax. For the entire 32 bits you would use %eax (90% of the time this is what you will be using). The form of the register name must agree with the size suffix of the instruction.
  18. Here are the important processor registers:
  19.  
  20.     EAX,EBX,ECX,EDX - "general purpose", more or less interchangeable
  21.  
  22.     EBP             - used to access data on stack
  23.                     - when this register is used to specify an address, SS is
  24.                       used implicitly
  25.  
  26.     ESI,EDI         - index registers, relative to DS,ES respectively
  27.  
  28.     SS,DS,CS,ES,FS,GS - segment registers
  29.                       - (when Intel went from the 286 to the 386, they figured
  30.                          that providing more segment registers would be more
  31.                          useful to programmers than providing more general-
  32.                          purpose registers... now, they have an essentially
  33.                          RISC processor with only _FOUR_ GPRs!)
  34.                       - these are all only 16 bits in size
  35.  
  36.     EIP            - program counter (instruction pointer), relative to CS
  37.  
  38.     ESP            - stack pointer, relative to SS
  39.  
  40.     EFLAGS         - condition codes, a.k.a. flags
  41. Segmentation
  42.  
  43. We are using the 32-bit segment addressing feature of the 486. Using 32-bit addressing as opposed to 16-bit addressing gives us many advantages:
  44. No need to worry about 64K segments. Segments can be 4 gigabytes in length under the 32-bit architecture.
  45. 32-bit segments have a protection mechanism for segments, which you have the option of using.
  46. You don't have to deal with any of that ugly 16-bit crud that is used in other operating systems for the PC, like DOS or OS/2; 32-bit segmentation is really a thing of beauty in comparison to that.
  47. i486 addresses are formed from a segment base address plus an offset. To compute an absolute memory address, the i486 figures out which segment register is being used, and uses the value in that segment register as an index into the global descriptor table (GDT). The entry in the GDT tells (among other things) what the absolute address of the start of the segment is. The processor takes this base address and adds on the offset to come up with the final absolute address for an operation. You'll be able to look in a 486 manual for more information about this or about the GDT's organization.
  48.  
  49. i486 has 6 16-bit segment registers, listed here in order of importance:
  50.  
  51. CS: Code Segment Register
  52. Added to address during instruction fetch.
  53. SS: Stack Segment Register
  54. Added to address during stack access.
  55. DS: Data Segment Register
  56. Added to address when accessing a memory operand that is not on the stack.
  57. ES, FS, GS: Extra Segment Registers
  58. Can be used as extra segment registers; also used in special instructions that span segments (like string copies).
  59. The x86 architecture supports different addressing modes for the operands. A discussion of all modes is out of the scope of this tutorial, and you may refer to your favorite x86 reference manual for a painfully-detailed discussion of them. Segment registers are special, you can't do a
  60.  
  61.     movw seg-reg, seg-reg
  62. You can, however, do
  63.     movw seg-reg,memory
  64.     movw memory,seg-reg
  65.     movw seg-reg,reg
  66.     movw reg,seg-reg
  67. Note: If you movw %ss,%ax, then you should xorl %eax,%eax first to clear the high-order 16 bits of %eax, so you can work with long values.
  68.  
  69.  
  70. Common/Useful Instructions
  71.  
  72. mov (especially with segment registers)
  73.     - e.g.,:
  74.         movw %es,%ax
  75.         movl %cs:4,%esp
  76.         movw _processControlBlock,%cs
  77.  
  78.     - note:     mov's do NOT set flags
  79.  
  80. pushl, popl       - push/pop long
  81. pushal, popal     - push/pop EAX,EBX,ECX,EDX,ESP,EBP,ESI,EDI
  82.  
  83. call  (jumps to piece of code, saves return address on stack)
  84.         e.g., call _cFunction
  85.  
  86. int   - call a software interrupt
  87.  
  88. ret   (returns from piece of code entered due to call instruction)
  89. iretl (returns from piece of code entered due to hardware or software interrupt)
  90.  
  91. sti, cli - set/clear the interrupt bit to enable/disable interrupts respectively
  92. lea  - is Load Effective Address, it's basically a direct pipeline to the address you want to do calculations on without affecting any flags, or the need of pushing and popping flags.
  93. A simple example:
  94.  
  95. CODE
  96. void funtction1() {
  97.     int A = 10;
  98.     A += 66;
  99. }
  100.  
  101. compiles to...
  102. funtction1:
  103. 1   pushl %ebp #
  104. 2   movl %esp, %ebp #,
  105. 3   subl $4, %esp #,
  106. 4   movl $10, -4(%ebp) #, A
  107. 5   leal -4(%ebp), %eax #,
  108. 6   addl $66, (%eax) #, A
  109. 7   leave
  110. 8   ret
  111. Explanation:
  112. 1. push ebp
  113. 2. copy stack pointer to ebp
  114. 3. make space on stack for local data
  115. 4. put value 10 in A (this would be the address A has now)
  116. 5. load address of A into EAX (similar to a pointer)
  117. 6. add 66 to A
  118. ... don't think you need to know the rest
  119. Mixing C and Assembly Language
  120.  
  121. The way to mix C and assembly language is to use the "asm" directive. To access C-language variables from inside of assembly language, you simply use the C identifier name as a memory operand. These variables cannot be local to a procedure, and also cannot be static inside a procedure. They must be global (but can be static global). The newline characters are necessary.
  122.  
  123. unsigned long a1, r;
  124. void junk( void )
  125. {
  126.   asm(
  127.        "pushl %eax \n"
  128.        "pushl %ebx \n"
  129.        "movl $100,%eax \n"
  130.        "movl a1,%ebx \n"
  131.        "int $69 \n"
  132.        "movl %eax,r \n"
  133.        "popl %ebx \n"
  134.        "popl %eax \n"
  135.   );
  136. }
  137.  
  138. This example does the following:
  139.  
  140. Pushes the value stored in %eax and %ebx onto the stack.
  141. Puts a value of 100 into %eax.
  142. Copies the value in global variable a1 into %ebx.
  143. Executes a software interrupt number 69.
  144. Copies the value in %eax into the global variable r.
  145. Restores (pops) the contents of the temporary registers %eax and %ebx.
  146. ##################################################################################################
  147. // http://www.hep.wisc.edu/~pinghc/x86AssmTutorial.htm
  148. // BaSs_HaXoR
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement