SHARE
TWEET

Solaris 11 guest on VMware ESXI submit only one disk I/O at

TheFluffyAdmin Mar 28th, 2017 74 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=310468404053887
  2.  
  3. Solaris 11 guest on VMware ESXI submit only one disk I/O at a time (Doc ID 2238101.1)   To BottomTo Bottom 
  4.  
  5. In this Document
  6. Symptoms
  7. Changes
  8. Cause
  9. Solution
  10. References
  11.  
  12. APPLIES TO:
  13.  
  14. Solaris x64/x86 Operating System - Version 11.2 and later
  15. Information in this document applies to any platform.
  16. SYMPTOMS
  17.  
  18. Hypervisor (ESX 5.5U3) sees I/O from Solaris 11 guest arriving on its virtualized LSI Logic SAS scsi controller as synchronous I/O. Meaning that Solaris 11 only queues 1 disk I/O at a time.
  19.  
  20. Solaris 10 machines, using the same virtualized LSI Logic SAS scsi controller show multiply disk I/Os queued on these same HBA.
  21.  
  22. The end result is a disk I/O bottleneck with Solaris 11.
  23.  
  24. Solaris 11 is performing significantly worse in high disk I/O situations, as it appears to be a bottleneck not being able to send more than 1 outstanding I/O at a time.
  25.  
  26.  
  27. Iostat shows the following output for c4 which is connected to local disk.
  28.  
  29. c4 is a virtual LSI HBA:
  30.  
  31. r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
  32. 0.0 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t0d0s0
  33. 0.0 15.8 0.0 53.8 0.0 0.0 3.1 2.9 1 2 c4t0d0s1
  34. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t0d0p0
  35. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t0d0p1
  36. 1324.3 45.5 338308.7 330.5 0.1 2.0 0.1 1.5 0 13 c4
  37. 0.0 16.2 0.0 53.8 0.0 0.0 3.1 2.9 1 2 c4t0d0
  38. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0
  39. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0s0
  40. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p0
  41. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p1
  42. 278.2 1.2 70646.3 12.8 0.0 0.8 0.0 3.0 0 59 c4t2d0 <============= at most, 3 I/O are sent to the HBA/disk
  43. 278.2 1.2 70646.3 12.8 0.0 0.8 0.0 3.0 0 59 c4t2d0s0
  44. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0s8
  45. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0p0
  46. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0p1
  47. 102.8 1.4 26275.9 14.5 0.0 0.2 0.0 2.1 0 19 c4t3d0
  48. 102.8 1.4 26275.9 14.5 0.0 0.2 0.0 2.1 0 19 c4t3d0s0
  49. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t3d0s8
  50. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t3d0p0
  51. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t3d0p1
  52. 106.8 1.4 27302.2 14.5 0.0 0.2 0.0 2.0 0 19 c4t4d0
  53. 106.8 1.4 27302.2 14.5 0.0 0.2 0.0 2.0 0 19 c4t4d0s0
  54. CHANGES
  55.  
  56. This is a new Solaris 11 installation. The issue only exists with Solaris 11.
  57.  
  58. So far, the customer was using Solaris 10 guest on its ESXi server. No such issue existed with Solaris 10.
  59.  
  60. CAUSE
  61.  
  62. It is caused by a bug in ssd driver of Solaris 11 which wrongly considers SCSI disk properties.
  63.  
  64. On Solaris 11.3, mpt driver has tagged queuing disabled by default.
  65.  
  66. This is visible with :
  67.  
  68. # prtpicl -v | grep TQ which is set to 0
  69. pci15ad,1976 (obp-device, 2c80000025f)
  70. :DeviceID 0x10
  71. :UnitAddress 0
  72. :device-id 0x30
  73. :vendor-id 0x1000
  74. :revision-id 0x1
  75. :class-code 0x10000
  76. :unit-address 10
  77. :subsystem-id 0x1976
  78. :subsystem-vendor-id 0x15ad
  79. ...
  80. :model SCSI bus controller
  81. :compatible (2c8000002adTBL)
  82. | pci1000,30.15ad.1976.1 |
  83. | pci1000,30.15ad.1976 |
  84. | pci15ad,1976 |
  85. | pci1000,30.1 |
  86. | pci1000,30 |
  87. | pciclass,010000 |
  88. | pciclass,0100 |
  89. ...
  90. :scsi-enumeration 0
  91. :scsi-options 0x107ff8
  92. :scsi-reset-delay 0xbb8
  93. :scsi-watchdog-tick 0xa
  94. :scsi-selection-timeout 0xfa
  95. :scsi-tag-age-limit 0x2
  96. :scsi-poll-timeout 0xa
  97. :scsi-transport-timeout 0x3c
  98. :initiator-interconnect-type SPI
  99. :tracebuffer_autostart 0
  100. :tracebuffer_filename /var/tmp/IOCtrace
  101. :snapshot_autostart 0
  102. :shapshot_filename /var/tmp/IOCsnapshot
  103. :disable-sata-mpxio no
  104. :mpxio-disable yes
  105. :ddi-vhci-class scsi_vhci
  106. ...
  107. :driver_chip_revision 1030 b0
  108. :firmware-version 1.3.41.32
  109. :target0-sync-speed 0x4e200
  110. :target0-wide 0x1
  111. :target0-TQ 0 <========================== Tag Queueing is disabled
  112. :target1-sync-speed 0x4e200
  113. :target1-wide 0x1
  114. :target1-TQ 0 <========================== Tag Queueing is disabled
  115. :target2-sync-speed 0x4e200
  116. :target2-wide 0x1
  117. :target2-TQ 0 <========================== Tag Queueing is disabled
  118. ...
  119. :devfs-path /pci@0,0/pci15ad,1976@10
  120. :driver-name mpt
  121. :binding-name pci1000,30
  122.  
  123.  This is also visible in the vmdump file of live kernel :
  124.  
  125. > ::mpt
  126. mpt_t inst mpxio suspend ntargs power
  127. ================================================================================
  128. ffffc1000421f000 0 0 0 16 OFF=D3
  129. >
  130. > ffffc1000421f000::print -t mpt ! grep tag
  131. ushort_t m_notag = 0xffff <============== no tag whatever is the target model
  132. int m_scsi_tag_age_limit = 0
  133. > ::mpt
  134.  
  135. On Solaris 11, the tagged queuing is disabled. Only HBA queuing is allowed which is limited to 3 I/O at most at the same time.
  136. This is exactly what is reflected by the iostat output because we can see up to 3 I/O at the same time in the iostat outputs.
  137.  
  138. Comparing Solaris 10 and Solaris 11 kernel data related to disk and mpt gives:
  139.  
  140. Solaris 10 kernel dump (No issue)
  141.  
  142. CAT(vmcore.0/10X)> analyze
  143. crash file: /cfm_data/isde_data/data11/SR/3-13298414671/uc/vmdump.oseasddvordb003.0_67387170/this/vmcore.0
  144. user: REMI.COLINET@ORACLE.COM (rcolinet:553459)
  145. release: 5.10 (64-bit)
  146. version: Generic_150401-35
  147. machine: i86pc
  148. node name: oseasddvordb003
  149. system type: i86pc
  150. hostid: 260cc364
  151. dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/zvol/dsk/rpool/dump(1G)
  152. time in kernel: Fri Sep 30 08:04:51 UTC 2016
  153. age of system: 45 days 23 hours 27 minutes 58 seconds
  154. CPUs: 1 (3.99G memory)
  155. panicstr:
  156.  
  157. CAT(vmcore.0/10X)>
  158.  
  159. CAT(vmcore.0/10X)> sdump 0xffffffff853ed000 sd_lun | egrep 'throttle|queue|arq'
  160. short un_throttle = 0x100 <============ 256 commands supported
  161. short un_saved_throttle = 0x100
  162. short un_busy_throttle = 0
  163. short un_min_throttle = 8
  164. timeout_id_t un_reset_throttle_timeid = NULL
  165. unsigned int :1 un_f_arq_enabled = 1
  166. unsigned int :1 un_f_opt_queueing = 0 <============== no HBA cmd queueing
  167. unsigned int :1 un_f_use_adaptive_throttle = 0
  168. CAT(vmcore.0/10X)>
  169.  
  170. But sd inquiry is the same.
  171. Issue lies on the HBA interpretation:
  172.  
  173. CAT(vmcore.0/10X)> sdump 0xffffffff86092de8 scsi_inquiry
  174. struct scsi_inquiry {
  175. uchar_t inq_dtype = 0
  176. unsigned char :7 inq_qual = 0
  177. unsigned char :1 inq_rmb = 0
  178. unsigned char :3 inq_ansi = 2
  179. unsigned char :3 inq_ecma = 0
  180. unsigned char :2 inq_iso = 0
  181. unsigned char :4 inq_rdf = 2
  182. unsigned char :1 inq_hisup = 0
  183. unsigned char :1 inq_normaca = 0
  184. unsigned char :1 inq_trmiop = 0
  185. unsigned char :1 inq_aenc = 0
  186. uchar_t inq_len = 0x1f
  187. unsigned char :1 inq_protect = 0
  188. unsigned char :1 inq_5_1 = 0
  189. unsigned char :1 inq_5_2 = 0
  190. unsigned char :1 inq_3pc = 0
  191. unsigned char :2 inq_tpgs = 0
  192. unsigned char :1 inq_acc = 0
  193. unsigned char :1 inq_sccs = 0
  194. unsigned char :1 inq_addr16 = 0
  195. unsigned char :1 inq_addr32 = 0
  196. unsigned char :1 inq_ackqreqq = 0
  197. unsigned char :1 inq_mchngr = 0
  198. unsigned char :1 inq_dualp = 0
  199. unsigned char :1 inq_port = 0
  200. unsigned char :1 inq_encserv = 0
  201. unsigned char :1 inq_bque = 0
  202. unsigned char :1 inq_sftre = 1
  203. unsigned char :1 inq_cmdque = 1
  204. unsigned char :1 inq_trandis = 0
  205. unsigned char :1 inq_linked = 0
  206. unsigned char :1 inq_sync = 1
  207. unsigned char :1 inq_wbus16 = 1
  208. unsigned char :1 inq_wbus32 = 1
  209. unsigned char :1 inq_reladdr = 0
  210. char [8] inq_vid = [ 'V' 'M' 'w' 'a' 'r' 'e' ' ' ' ' ]
  211. char [0x10] inq_pid = [ 'V' 'i' 'r' 't' 'u' 'a' 'l' ' ' 'd' 'i' 's' 'k' ' ' ' ' ' ' ' ' ]
  212. char [4] inq_revision = [ '1' '.' '0' ' ' ]
  213. char [0xc] inq_serial = [ '\0' '\0' '\0' '\0' '\0' '\0' '\0' '\0' '\0' '\0' '\0' '\0' ]
  214. ... // Other fields are all zero
  215. }
  216. CAT(vmcore.0/10X)>
  217.  
  218. Solaris 11 kernel dump (Issue)
  219.  
  220. CAT(vmcore.0/11X)> dev scsi
  221. ...
  222. CAT(vmcore.0/11X)> sdump 0xffffc1002968e1f0 scsi_device sd_inq
  223. struct scsi_inquiry *sd_inq = 0xffffc1001881abf0
  224. CAT(vmcore.0/11X)>
  225. CAT(vmcore.0/11X)> sdump 0xffffc1001881abf0 scsi_inquiry
  226. struct scsi_inquiry {
  227. uchar_t inq_dtype = 0
  228. unsigned char :7 inq_qual = 0
  229. unsigned char :1 inq_rmb = 0
  230. unsigned char :3 inq_ansi = 2
  231. unsigned char :3 inq_ecma = 0
  232. unsigned char :2 inq_iso = 0
  233. unsigned char :4 inq_rdf = 2
  234. unsigned char :1 inq_hisup = 0
  235. unsigned char :1 inq_normaca = 0
  236. unsigned char :1 inq_trmiop = 0
  237. unsigned char :1 inq_aenc = 0
  238. uchar_t inq_len = 0x1f
  239. unsigned char :1 inq_protect = 0
  240. unsigned char :1 inq_5_1 = 0
  241. unsigned char :1 inq_5_2 = 0
  242. unsigned char :1 inq_3pc = 0
  243. unsigned char :2 inq_tpgs = 0
  244. unsigned char :1 inq_acc = 0
  245. unsigned char :1 inq_sccs = 0
  246. unsigned char :1 inq_addr16 = 0
  247. unsigned char :1 inq_addr32 = 0
  248. unsigned char :1 inq_ackqreqq = 0
  249. unsigned char :1 inq_mchngr = 0
  250. unsigned char :1 inq_dualp = 0
  251. unsigned char :1 inq_port = 0
  252. unsigned char :1 inq_encserv = 0
  253. unsigned char :1 inq_bque = 0
  254. unsigned char :1 inq_sftre = 1
  255. unsigned char :1 inq_cmdque = 1
  256. unsigned char :1 inq_trandis = 0
  257. unsigned char :1 inq_linked = 0
  258. unsigned char :1 inq_sync = 1
  259. unsigned char :1 inq_wbus16 = 1
  260. unsigned char :1 inq_wbus32 = 1
  261. unsigned char :1 inq_reladdr = 0
  262. char [8] inq_vid = [ 'V' 'M' 'w' 'a' 'r' 'e' ' ' ' ' ]
  263. char [0x10] inq_pid = [ 'V' 'i' 'r' 't' 'u' 'a' 'l' ' ' 'd' 'i' 's' 'k' ' ' ' ' ' ' ' ' ]
  264. char [4] inq_revision = [ '1' '.' '0' ' ' ]
  265. char [0xc] inq_serial = [ '\0' '\0' '\0' '\0' '\0' '\0' '\0' '\0' '\0' '\0' '\0' '\0' ]
  266. ... // Only zero values
  267. }
  268. CAT(vmcore.0/11X)>
  269.  
  270.  Comparing the mpt HBA state gives:
  271.  
  272.  Solaris 10 (No issue)
  273.  
  274.  
  275. > ::mpt
  276. mpt_t inst mpxio suspend ntargs power
  277. ================================================================================
  278. ffffffff822c7ac0 0 0 0 16 OFF=D3
  279. > ffffffff822c7ac0::print mpt_t m_notag
  280. m_notag = 0xffc0
  281. >
  282.  Solaris 11 (Issue)
  283.  
  284. > ::mpt
  285. mpt_t inst mpxio suspend ntargs power
  286. ================================================================================
  287. ffffc1000421f000 0 0 0 16 OFF=D3
  288. > ffffc1000421f000::print -t mpt m_notag
  289. ushort_t m_notag = 0xffff
  290. >
  291. We have the:
  292.  
  293. - same SCSI disk Inquiry
  294. - same FW version, productid for the mpt HBA.
  295.  
  296. But we have different values for:
  297.  
  298. Solaris 10 (No issue)
  299.  
  300. ushort_t m_notag = 0xffc0
  301. ushort_t m_nowide = 0xffc0
  302.  Solaris 11 (Issue)
  303.  
  304. ushort_t m_notag = 0xffff
  305. ushort_t m_nowide = 0xe080
  306. SOLUTION
  307.  
  308. The sd (SCSI disk) driver makes wrong assumptions about the SCSI devices. It ignores some capacities of the disks exposed by the VMware platform. The change needed to fix such issue is very small.
  309.  
  310. The issue is referenced under Bug 24764515 - Tag command queueing disabled for VMware mpt HBA
  311.  
  312. Cu tested the fix on its systems and the I/O throughput went from 1000 iops to 140000 iops!
  313.  
  314. Fix has been delivered in Oracle Solaris 11.3.17.5.0.
  315.  
  316. REFERENCES
  317.  
  318. NOTE:1580689.1 - Collaborate With Fibre Channel (FC) Storage Area Network (SAN) MOS Community Members
  319. NOTE:1502843.1 - SAN Fibre Channel (FC) Storage Connectivity Issues
  320. NOTE:1393062.2 - Information Center: Troubleshooting the Oracle Solaris 10 Operating System
  321. NOTE:1528697.2 - Information Center: Sun Storage Traffic Manager (MPXIO)
  322. BUG:24764515 - TAGGED COMMAND QUEUING DISABLED FOR SCSI-2 AND SPC TARGETS
  323. NOTE:1929376.1 - My Oracle Support - Automated Troubleshooting
  324. NOTE:1303745.1 - Troubleshooting Solaris[TM] 10 and Above (not for Solaris[TM] 8 & 9) SAN Fibre Channel (FC) HBA connectivity issues.
  325. NOTE:1542438.1 - Discovery of Fibre Channel (FC) Disk (LUN) and Tape Storage Devices from a Solaris host perspective
  326. NOTE:166650.1 - Working Effectively With Oracle Support - Best Practices
  327. NOTE:1550562.2 - Information Center: Oracle Storage Area Network (SAN) Fibre Channel (FC) Card - Overview
  328. BUG:15813747 - SUNBT7195820 MPT DRIVER ISSUING "CAN ONLY START 1 TASK MANAGEMENT COMMAND AT A T
  329. BUG:18496291 - MPT DRIVER ISSUING "CAN ONLY START 1 TASK MANAGEMENT COMMAND AT A TIME"
  330. NOTE:1003635.1 - What Does %b (or %Busy) Actually Mean in the Output of iostat(1M)?
  331. NOTE:1001444.1 - Solaris Storage Driver Troubleshooting SCSI Transport Errors - Command Failed to Complete - Command Timeout
  332. NOTE:1285485.1 - GUDS - A Script for Gathering Solaris Performance Data
  333. NOTE:1595092.1 - SRDC - How to Collect Standard Information for a Database Performance Problem
  334. NOTE:1681652.1 - Oracle Solaris Support on Virtualization Platforms
  335.    
  336.    
  337. Was this document helpful?
  338.  
  339. Yes
  340. No
  341.          
  342.    
  343. Document Details
  344.  
  345. Email link to this documentOpen document in new windowPrintable Page
  346.    
  347. Type:
  348. Status:
  349. Last Major Update:
  350. Last Update:
  351. PROBLEM
  352. PUBLISHED
  353. 03-Mar-2017
  354. 03-Mar-2017
  355.  
  356.          
  357.  
  358.    
  359. Related Products
  360.  
  361.    
  362. Solaris x64/x86 Operating System
  363.          
  364.    
  365. Information Centers
  366.  
  367.            
  368. Information Center: Overview of Sun Patches and Updates [1589780.2]
  369.  
  370. 情報センター: Oracle Explorer Data Collector (STB)の概要 [1675915.2]
  371.  
  372. Information Center: Overview of the Oracle Solaris 10 Operating System [1372665.2]
  373.  
  374. Information Center: Overview of the Oracle Explorer Data Collector (STB) [1589529.2]
  375.  
  376. Information Center: Overview of the Oracle Solaris 11 Operating System [1559480.2]
  377.  
  378.          
  379.    
  380. Document References
  381.  
  382.            
  383. Collaborate With Fibre Channel (FC) Storage Area Network (SAN) MOS Community Members [1580689.1]
  384.  
  385. SAN Fibre Channel (FC) Storage Connectivity Issues [1502843.1]
  386.  
  387. Information Center: Troubleshooting the Oracle Solaris 10 Operating System [1393062.2]
  388.  
  389. Information Center: Sun Storage Traffic Manager (MPXIO) [1528697.2]
  390.  
  391. My Oracle Support - Automated Troubleshooting [1929376.1]
  392.  
  393. Show More
  394.          
  395.    
  396. Recently Viewed
  397.  
  398.     Solaris 11 guest on VMware ESXI submit only one disk I/O at a time [2238101.1] 
  399.  
  400.     Slow IO Performance when running dd [1931027.1]
  401.  
  402.     I/O Scheduler Selection to Optimize Oracle VM Performance [2069125.1]  
  403.  
  404.     Solaris Performance Analysis And Tuning, Step by Step [1007447.1]  
  405.  
  406.     Troubleshooting Disk Performance [1010680.1]   
  407.  
  408. Show More
  409.          
  410. Didn't find what you are looking for?Ask in Community...
  411.  
  412.  
  413. Related
  414.  
  415.  
  416. Products
  417.  
  418. Sun Microsystems > Operating Systems > Solaris Operating System > Solaris x64/x86 Operating System > Storage Target Drivers > sd
  419.  
  420. Keywords
  421.  
  422. BUG;DISK;IOSTAT;QUEUED;SAS;SCSI;SOLARIS;SUBMIT;SYNCHRONOUS;TAG;THROUGHPUT;VMWARE
  423.  
  424.  
  425. -------------------------------------------
RAW Paste Data
Pastebin PRO Summer Special!
Get 40% OFF on Pastebin PRO accounts!
Top