Advertisement
Guest User

Untitled

a guest
Aug 3rd, 2013
68
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. The original patch was written by jeroen@linuxforge.net
  2. http://www.linuxforge.net/linux/kernel/kernel-33-gcc47-0.patch
  3.  
  4. Benchmarks by graysky
  5.  
  6. Three different machines running a generic x86-64 kernel and an otherwise identical kernel running with the optimized gcc options were tested using a make based endpoint.
  7.  
  8. Conclusion:
  9. There are small but real speed increases using a make endpoint to running with this patch.
  10.  
  11. Details:
  12. 1) Three test machines: Intel Xeon X3360, Intel i7-2620M, Intel Core i7-3660K.
  13. 2) All ran the make benchmark (linked below) 35 times while booted into a 'generic' kernel. Then all ran the same make benchmark 35 times after booting into an optimized kernel. Below are the optimizations chosen for each machine.
  14. 2a) X3360 = core2
  15. 2b) i7-2620M = corei7-avx
  16. 2c) i7-3660K = core-avx-i
  17. 3) Analyzed resulting distributions for statistical significance via ANOVA plots that clearly show statistically significant albeit small differences.
  18.  
  19. Links to ANOVA plots:
  20. http://s19.postimage.org/68urcofzn/corei7_avx.png
  21. http://s19.postimage.org/ozwomuak3/core_avx_i.png
  22. http://s19.postimage.org/d0l6fj4z7/core2.png
  23.  
  24. Discussion:
  25. 1) All the assumptions for ANOVA are met:
  26. *Data are normally distributed as show in the normal quantile plots.
  27. *The population variances are fairly equal (Levene and Barlett tests).
  28.  
  29. 2) The ANOVA plots clearly show significance.
  30. *Pair-wise analysis by Tukey-Kramer shows significance at the 0.05 level for all CPUs compared.
  31. Below are the differences in median values:
  32.  
  33. core2 +87.5 ms
  34. corei7-avx +79.7 ms
  35. core-avx-i +257.2 ms
  36.  
  37. References:
  38. Bash script that controls the benchmark: https://github.com/graysky2/bin/blob/master/bench
  39. Log file generated by script: http://repo-ck.com/bench/compile_time_optimization.txt.gz
  40.  
  41. ---
  42. --- linux-3.10/arch/x86/include/asm/module.h 2013-02-18 18:58:34.000000000 -0500
  43. +++ linux-3.10.mod/arch/x86/include/asm/module.h 2013-04-11 17:40:04.064910866 -0400
  44. @@ -15,6 +15,16 @@
  45. #define MODULE_PROC_FAMILY "586MMX "
  46. #elif defined CONFIG_MCORE2
  47. #define MODULE_PROC_FAMILY "CORE2 "
  48. +#elif defined CONFIG_MNATIVE
  49. +#define MODULE_PROC_FAMILY "NATIVE "
  50. +#elif defined CONFIG_MCOREI7
  51. +#define MODULE_PROC_FAMILY "COREI7 "
  52. +#elif defined CONFIG_MCOREI7AVX
  53. +#define MODULE_PROC_FAMILY "COREI7AVX "
  54. +#elif defined CONFIG_MCOREAVXI
  55. +#define MODULE_PROC_FAMILY "COREAVXI "
  56. +#elif defined CONFIG_MCOREAVX2
  57. +#define MODULE_PROC_FAMILY "COREAVX2 "
  58. #elif defined CONFIG_MATOM
  59. #define MODULE_PROC_FAMILY "ATOM "
  60. #elif defined CONFIG_M686
  61. @@ -33,6 +43,16 @@
  62. #define MODULE_PROC_FAMILY "K7 "
  63. #elif defined CONFIG_MK8
  64. #define MODULE_PROC_FAMILY "K8 "
  65. +#elif defined CONFIG_MK10
  66. +#define MODULE_PROC_FAMILY "K10 "
  67. +#elif defined CONFIG_MBARCELONA
  68. +#define MODULE_PROC_FAMILY "BARCELONA "
  69. +#elif defined CONFIG_MBOBCAT
  70. +#define MODULE_PROC_FAMILY "BOBCAT "
  71. +#elif defined CONFIG_MBULLDOZER
  72. +#define MODULE_PROC_FAMILY "BULLDOZER "
  73. +#elif defined CONFIG_MPILEDRIVER
  74. +#define MODULE_PROC_FAMILY "PILEDRIVER "
  75. #elif defined CONFIG_MELAN
  76. #define MODULE_PROC_FAMILY "ELAN "
  77. #elif defined CONFIG_MCRUSOE
  78. --- linux-3.10/arch/x86/Kconfig.cpu 2013-02-18 18:58:34.000000000 -0500
  79. +++ linux-3.10.mod/arch/x86/Kconfig.cpu 2013-04-06 08:25:58.095745643 -0400
  80. @@ -139,7 +139,7 @@
  81.  
  82.  
  83. config MK6
  84. - bool "K6/K6-II/K6-III"
  85. + bool "AMD K6/K6-II/K6-III"
  86. depends on X86_32
  87. ---help---
  88. Select this for an AMD K6-family processor. Enables use of
  89. @@ -147,7 +147,7 @@
  90. flags to GCC.
  91.  
  92. config MK7
  93. - bool "Athlon/Duron/K7"
  94. + bool "AMD Athlon/Duron/K7"
  95. depends on X86_32
  96. ---help---
  97. Select this for an AMD Athlon K7-family processor. Enables use of
  98. @@ -155,12 +155,48 @@
  99. flags to GCC.
  100.  
  101. config MK8
  102. - bool "Opteron/Athlon64/Hammer/K8"
  103. + bool "AMD Opteron/Athlon64/Hammer/K8"
  104. ---help---
  105. Select this for an AMD Opteron or Athlon64 Hammer-family processor.
  106. Enables use of some extended instructions, and passes appropriate
  107. optimization flags to GCC.
  108.  
  109. +config MK10
  110. + bool "AMD 61xx/7x50/PhenomX3/X4/II/K10"
  111. + ---help---
  112. + Select this for an AMD 61xx Eight-Core Magny-Cours, Athlon X2 7x50,
  113. + Phenom X3/X4/II, Athlon II X2/X3/X4, or Turion II-family processor.
  114. + Enables use of some extended instructions, and passes appropriate
  115. + optimization flags to GCC.
  116. +
  117. +config MBARCELONA
  118. + bool "AMD Barcelona"
  119. + ---help---
  120. + Select this for AMD Barcelona and newer processors.
  121. +
  122. + Enables -march=barcelona
  123. +
  124. +config MBOBCAT
  125. + bool "AMD Bobcat"
  126. + ---help---
  127. + Select this for AMD Bobcat processors.
  128. +
  129. + Enables -march=btver1
  130. +
  131. +config MBULLDOZER
  132. + bool "AMD Bulldozer"
  133. + ---help---
  134. + Select this for AMD Bulldozer processors.
  135. +
  136. + Enables -march=bdver1
  137. +
  138. +config MPILEDRIVER
  139. + bool "AMD Piledriver"
  140. + ---help---
  141. + Select this for AMD Piledriver processors.
  142. +
  143. + Enables -march=bdver2
  144. +
  145. config MCRUSOE
  146. bool "Crusoe"
  147. depends on X86_32
  148. @@ -251,8 +287,17 @@
  149. using the cpu family field
  150. in /proc/cpuinfo. Family 15 is an older Xeon, Family 6 a newer one.
  151.  
  152. +config MATOM
  153. + bool "Intel Atom"
  154. + ---help---
  155. +
  156. + Select this for the Intel Atom platform. Intel Atom CPUs have an
  157. + in-order pipelining architecture and thus can benefit from
  158. + accordingly optimized code. Use a recent GCC with specific Atom
  159. + support in order to fully benefit from selecting this option.
  160. +
  161. config MCORE2
  162. - bool "Core 2/newer Xeon"
  163. + bool "Intel Core 2"
  164. ---help---
  165.  
  166. Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and
  167. @@ -260,14 +305,40 @@
  168. family in /proc/cpuinfo. Newer ones have 6 and older ones 15
  169. (not a typo)
  170.  
  171. -config MATOM
  172. - bool "Intel Atom"
  173. + Enables -march=core2
  174. +
  175. +config MCOREI7
  176. + bool "Intel Core i7"
  177. ---help---
  178.  
  179. - Select this for the Intel Atom platform. Intel Atom CPUs have an
  180. - in-order pipelining architecture and thus can benefit from
  181. - accordingly optimized code. Use a recent GCC with specific Atom
  182. - support in order to fully benefit from selecting this option.
  183. + Select this for the Intel Nehalem platform. Intel Nehalem proecessors
  184. + include Core i3, i5, i7, Xeon: 34xx, 35xx, 55xx, 56xx, 75xx processors.
  185. +
  186. + Enables -march=corei7
  187. +
  188. +config MCOREI7AVX
  189. + bool "Intel Core 2nd Gen AVX"
  190. + ---help---
  191. +
  192. + Select this for 2nd Gen Core processors including Sandy Bridge.
  193. +
  194. + Enables -march=corei7-avx
  195. +
  196. +config MCOREAVXI
  197. + bool "Intel Core 3rd Gen AVX"
  198. + ---help---
  199. +
  200. + Select this for 3rd Gen Core processors including Ivy Bridge.
  201. +
  202. + Enables -march=core-avx-i
  203. +
  204. +config MCOREAVX2
  205. + bool "Intel Core AVX2"
  206. + ---help---
  207. +
  208. + Select this for AVX2 enabled processors including Haswell.
  209. +
  210. + Enables -march=corei7-avx2
  211.  
  212. config GENERIC_CPU
  213. bool "Generic-x86-64"
  214. @@ -276,6 +347,19 @@
  215. Generic x86-64 CPU.
  216. Run equally well on all x86-64 CPUs.
  217.  
  218. +config MNATIVE
  219. + bool "Native optimizations autodetected by GCC"
  220. + ---help---
  221. +
  222. + GCC 4.2 and above support -march=native, which automatically detects
  223. + the optimum settings to use based on your processor. -march=native
  224. + also detects and applies additional settings beyond -march specific
  225. + to your CPU, (eg. -msse4). Unless you have a specific reason not to
  226. + (e.g. distcc cross-compiling), you should probably be using
  227. + -march=native rather than anything listed below.
  228. +
  229. + Enables -march=native
  230. +
  231. endchoice
  232.  
  233. config X86_GENERIC
  234. @@ -300,7 +384,7 @@
  235. config X86_L1_CACHE_SHIFT
  236. int
  237. default "7" if MPENTIUM4 || MPSC
  238. - default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
  239. + default "6" if MK7 || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MPENTIUMM || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MVIAC7 || X86_GENERIC || MNATIVE || GENERIC_CPU
  240. default "4" if MELAN || M486 || MGEODEGX1
  241. default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
  242.  
  243. @@ -331,11 +415,11 @@
  244.  
  245. config X86_INTEL_USERCOPY
  246. def_bool y
  247. - depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
  248. + depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || MNATIVE || X86_GENERIC || MK8 || MK7 || MK10 || MBARCELONA || MEFFICEON || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2
  249.  
  250. config X86_USE_PPRO_CHECKSUM
  251. def_bool y
  252. - depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM
  253. + depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MK10 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MNATIVE
  254.  
  255. config X86_USE_3DNOW
  256. def_bool y
  257. @@ -363,17 +447,17 @@
  258.  
  259. config X86_TSC
  260. def_bool y
  261. - depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) && !X86_NUMAQ) || X86_64
  262. + depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7-AVX || MATOM) && !X86_NUMAQ) || X86_64 || MNATIVE
  263.  
  264. config X86_CMPXCHG64
  265. def_bool y
  266. - depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM
  267. + depends on X86_PAE || X86_64 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM || MNATIVE
  268.  
  269. # this should be set for all -march=.. options where the compiler
  270. # generates cmov.
  271. config X86_CMOV
  272. def_bool y
  273. - depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM || MGEODE_LX)
  274. + depends on (MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MK7 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MNATIVE || MATOM || MGEODE_LX)
  275.  
  276. config X86_MINIMUM_CPU_FAMILY
  277. int
  278. --- linux-3.10/arch/x86/Makefile 2012-12-10 22:30:57.000000000 -0500
  279. +++ linux-3.10.mod/arch/x86/Makefile 2013-04-06 07:36:39.349203123 -0400
  280. @@ -57,11 +57,25 @@
  281. KBUILD_CFLAGS += $(call cc-option,-mno-sse -mpreferred-stack-boundary=3)
  282.  
  283. # FIXME - should be integrated in Makefile.cpu (Makefile_32.cpu)
  284. + cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
  285. cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
  286. + cflags-$(CONFIG_MK10) += $(call cc-option,-march=amdfam10)
  287. + cflags-$(CONFIG_MBARCELONA) += $(call cc-option,-march=barcelona)
  288. + cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1)
  289. + cflags-$(CONFIG_MBULLDOZER) += $(call cc-option,-march=bdver1)
  290. + cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2)
  291. cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
  292.  
  293. cflags-$(CONFIG_MCORE2) += \
  294. - $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
  295. + $(call cc-option,-march=core2,$(call cc-option,-mtune=core2))
  296. + cflags-$(CONFIG_MCOREI7) += \
  297. + $(call cc-option,-march=corei7,$(call cc-option,-mtune=corei7))
  298. + cflags-$(CONFIG_MCOREI7AVX) += \
  299. + $(call cc-option,-march=corei7-avx,$(call cc-option,-mtune=corei7-avx))
  300. + cflags-$(CONFIG_MCOREAVXI) += \
  301. + $(call cc-option,-march=core-avx-i,$(call cc-option,-mtune=core-avx-i))
  302. + cflags-$(CONFIG_MCOREAVX2) += \
  303. + $(call cc-option,-march=core-avx2,$(call cc-option,-mtune=core-avx2))
  304. cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
  305. $(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
  306. cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
  307. --- linux-3.10/arch/x86/Makefile_32.cpu 2012-12-10 22:30:57.000000000 -0500
  308. +++ linux-3.10.mod/arch/x86/Makefile_32.cpu 2013-04-06 07:37:31.754423693 -0400
  309. @@ -23,7 +23,13 @@
  310. # Please note, that patches that add -march=athlon-xp and friends are pointless.
  311. # They make zero difference whatsosever to performance at this time.
  312. cflags-$(CONFIG_MK7) += -march=athlon
  313. +cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
  314. cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8,-march=athlon)
  315. +cflags-$(CONFIG_MK10) += $(call cc-option,-march=amdfam10,-march=athlon)
  316. +cflags-$(CONFIG_MBARCELONA) += $(call cc-option,-march=barcelona,-march=athlon)
  317. +cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1,-march=athlon)
  318. +cflags-$(CONFIG_MBULLDOZER) += $(call cc-option,-march=bdver1,-march=athlon)
  319. +cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2,-march=athlon)
  320. cflags-$(CONFIG_MCRUSOE) += -march=i686 $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
  321. cflags-$(CONFIG_MEFFICEON) += -march=i686 $(call tune,pentium3) $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
  322. cflags-$(CONFIG_MWINCHIPC6) += $(call cc-option,-march=winchip-c6,-march=i586)
  323. @@ -32,6 +38,10 @@
  324. cflags-$(CONFIG_MVIAC3_2) += $(call cc-option,-march=c3-2,-march=i686)
  325. cflags-$(CONFIG_MVIAC7) += -march=i686
  326. cflags-$(CONFIG_MCORE2) += -march=i686 $(call tune,core2)
  327. +cflags-$(CONFIG_MCOREI7) += -march=i686 $(call tune,corei7)
  328. +cflags-$(CONFIG_MCOREI7AVX) += -march=i686 $(call tune,corei7-avx)
  329. +cflags-$(CONFIG_MCOREAVXI) += -march=i686 $(call tune,core-avx-i)
  330. +cflags-$(CONFIG_MCOREAVX2) += -march=i686 $(call tune,core-avx2)
  331. cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom,$(call cc-option,-march=core2,-march=i686)) \
  332. $(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
Advertisement
RAW Paste Data Copied
Advertisement