Enjoyyyyyy code redundancy (0x00000)

linking +compile time optimization-generasiya eyni mnemonic instrukturasiyalarin ferqli opcodelarini genersiya ede biler.
Hemcinin bildirdiyiniz state-de COMPILER misprediction-larin yaxud code redundancy ve s. kimi heller ucun fergli opcode-lar genersiya ede biler. Elece NASM, FASM, MASM kimi assemblerlarda.

e.g code redundancy (only memory mapping: code models, PIC, GOT blah blah)

instruction (format) encoding ---->

opcode ModR/M --- SIB - Disp - 		Imm
				 1 byte  1-4 byte   1-4 byte


0x3
ADD - Gv, Ev
	 reg,reg/m32
	  eax,ebx
		03C3

		ardiyca

add eax,ebx
0x1 - executable fayldaki opcode.
ADD - Ev, Gv
	reg/m32, reg
	eax,	 ebx
		01D8

Compiler/Assembler terefinden opcode her iki halda generasiya ede biler ve edirde bezi assembler-lar 03C3 opkodunu bezileride 01D8 opkodunu istifade edir.

Hemcinin bu yaxinlarda meqalelerin birinde qarsiladigim alternativ encoding.
Immediate byte operandi ile instruksiyalar ucun alternativ opcode:
add byte[eax],0
8000 00
add byte[eax],0 (meqale deyir 2ci opcode 64 bit mod ucun invalid sayilir.)
8200 00

2ci terefden adice Nt kernel-in Wow64 subsystemi uzerinde segment switching ucun 32 --- > 64
FAR calling 2 special segment vasitesi ile edilirdi. Bu halda da Segment selektorlarinin manipulasiyasi mumkundur.

Esas meselemiz olan Branch Prediction optimizasiyasi ucun compiler terefinden istifade edilen metodlara baxaq.
Intel terefinden bildirilir performans penalty qarhisinin alinmasi ucun (mispred) ) 'spin-wait loops' texnikasi istifade edilmelidir. Ancaq AMD optimization guide da qeyd etdiyi kimi branch penalty-lerin qarshisinin alinmasi ucun "decoding bandwidth" azaldilmasi ucun 'nop' evezine 'rep' prefixi istifade edilmelidir.
spin-wait loop triggering ucunde 'rep nop' instruksiyasi istifade edilirdi daha sonra Intel eyni opcode ferqli mnemonic elave etdi 'PAUSE'
Ancaq sonradan da `execution time` yaxud `mispred` kimi problemlere gore programmer-ler terefinden secimler ferqlendirildi.
Intel reference: (rep nop)
NOP instruction can be between 0.4-0.5 clocks and PAUSE instruction can consume 38-40 clocks.

yaxud ikinci metod Static prediction yeni performans penalty-leri ve yaxud mispred-lari (misal olara Branch Target Buffer uzerinde branch state olmadigi halda) minuimuma endirmek ucun branch states statik hesablanmalidir.

Compiler/Assembler optimizasiyasi cox uzun ve derin movzudur men sadece bir nece misal getirdim.

e.g unconditional branches:

       83 7d 8c 01             cmp    DWORD PTR [rbp-0x74],0x1
			   encoding operand
       75 0f                   jne    4005e4 <main+0x47>
							conditional state

       83 7d 8c 00             cmp    DWORD PTR [rbp-0x74],0x0
       		  encoding operand
	   74 0f                   je     4005f9 <main+0x5c>
						  conditional state