Advertisement
Guest User

Proxmox Crashing with Intel I219-LM NIC logs

a guest
Aug 25th, 2019
441
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 13.71 KB | None | 0 0
  1. root@prox01 ~ # lspci | egrep -i --color 'network|ethernet'
  2. 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM
  3.  
  4. Aug 25 10:19:14 prox01 kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  5. TDH <9e>
  6. TDT <a9>
  7. next_to_use <a9>
  8. next_to_clean <9d>
  9. buffer_info[next_to_clean]:
  10. time_stamp <10001883b>
  11. next_to_watch <9e>
  12. jiffies <100018958>
  13. next_to_watch.status <0>
  14. MAC Status <40080083>
  15. PHY Status <796d>
  16. PHY 1000BASE-T Status <3800>
  17. PHY Extended Status <3000>
  18. PCI Status <10>
  19.  
  20.  
  21. [ 520.930062] ------------[ cut here ]------------
  22. [ 520.930185] NETDEV WATCHDOG: eno1 (e1000e): transmit queue 0 timed out
  23. [ 520.930326] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:323 dev_watchdog+0x222/0x230
  24. [ 520.930454] Modules linked in: veth ip_set ip6table_filter ip6_tables xt_multiport iptable_filter softdog nfnetlink_log nfnetlink intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mxm_wmi kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf shpchp wmi video mac_hid acpi_pad vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 e1000e(O) ptp pps_core i2c_i801 ahci libahci
  25. [ 520.930784] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 4.15.18-20-pve #1
  26. [ 520.930889] Hardware name: Micro-Star International Co., Ltd. MS-7B61/Z370 GAMING PLUS (MS-7B61), BIOS 1.EC 05/21/2019
  27. [ 520.931000] RIP: 0010:dev_watchdog+0x222/0x230
  28. [ 520.931087] RSP: 0018:ffff8f7b2e2c3e58 EFLAGS: 00010286
  29. [ 520.931175] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000083f
  30. [ 520.931266] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
  31. [ 520.931357] RBP: ffff8f7b2e2c3e88 R08: 0000000000000000 R09: 000000000000030e
  32. [ 520.931449] R10: ffff8f7b2e2da770 R11: 0000000091b5f801 R12: 0000000000000001
  33. [ 520.931541] R13: ffff8f7b1c434000 R14: ffff8f7b1c434478 R15: ffff8f7b1d3ca080
  34. [ 520.931632] FS: 0000000000000000(0000) GS:ffff8f7b2e2c0000(0000) knlGS:0000000000000000
  35. [ 520.931737] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  36. [ 520.931825] CR2: 00007f9ac7866142 CR3: 0000000f5200a001 CR4: 00000000003626e0
  37. [ 520.931916] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  38. [ 520.932008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  39. [ 520.932099] Call Trace:
  40. [ 520.932183] <IRQ>
  41. [ 520.932267] ? dev_deactivate_queue.constprop.33+0x60/0x60
  42. [ 520.932356] call_timer_fn+0x32/0x140
  43. [ 520.932482] run_timer_softirq+0x1dd/0x430
  44. [ 520.932614] ? tick_sched_handle+0x34/0x60
  45. [ 520.932693] ? ktime_get+0x43/0xa0
  46. [ 520.932779] __do_softirq+0x10c/0x2bf
  47. [ 520.932865] irq_exit+0xca/0xd0
  48. [ 520.932939] smp_apic_timer_interrupt+0x79/0x140
  49. [ 520.933026] apic_timer_interrupt+0x8c/0xa0
  50. [ 520.933113] </IRQ>
  51. [ 520.933197] RIP: 0010:cpuidle_enter_state+0xa8/0x2e0
  52. [ 520.933277] RSP: 0018:ffffa180062f7e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
  53. [ 520.933382] RAX: ffff8f7b2e2e2880 RBX: 0000000000000008 RCX: 000000000000001f
  54. [ 520.933473] RDX: 0000007949d9c9d4 RSI: fffffff4e4cd0b3e RDI: 0000000000000000
  55. [ 520.933558] RBP: ffffa180062f7e90 R08: 0000000000000002 R09: 00000000000220c0
  56. [ 520.933649] R10: ffffa180062f7e28 R11: 000000000000acc6 R12: ffff8f7b2e2ece00
  57. [ 520.933741] R13: ffffffff91773058 R14: 0000007949d9c9d4 R15: ffffffff91773040
  58. [ 520.933827] ? cpuidle_enter_state+0x97/0x2e0
  59. [ 520.933992] cpuidle_enter+0x17/0x20
  60. [ 520.934079] call_cpuidle+0x23/0x40
  61. [ 520.934164] do_idle+0x19a/0x200
  62. [ 520.934250] cpu_startup_entry+0x73/0x80
  63. [ 520.934329] start_secondary+0x1ab/0x200
  64. [ 520.934416] secondary_startup_64+0xa5/0xb0
  65. [ 520.934503] Code: 36 00 49 63 4e e8 eb 92 4c 89 ef c6 05 f8 7a d7 00 01 e8 02 1d fd ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 78 ef 39 91 e8 ce 2d 7f ff <0f> 0b eb c0 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
  66. [ 520.934654] ---[ end trace 5c099e37ef8ab3f3 ]---
  67. [ 520.934765] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
  68. [ 525.165025] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
  69. [ 617.764555] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  70. TDH <4c>
  71. TDT <62>
  72. next_to_use <62>
  73. next_to_clean <4b>
  74. buffer_info[next_to_clean]:
  75. time_stamp <1000134fb>
  76. next_to_watch <4c>
  77. jiffies <100013650>
  78. next_to_watch.status <0>
  79. MAC Status <40080083>
  80. PHY Status <796d>
  81. PHY 1000BASE-T Status <7800>
  82. PHY Extended Status <3000>
  83. PCI Status <10>
  84. [ 619.780465] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  85. TDH <4c>
  86. TDT <62>
  87. next_to_use <62>
  88. next_to_clean <4b>
  89. buffer_info[next_to_clean]:
  90. time_stamp <1000134fb>
  91. next_to_watch <4c>
  92. jiffies <100013848>
  93. next_to_watch.status <0>
  94. MAC Status <40080083>
  95. PHY Status <796d>
  96. PHY 1000BASE-T Status <7800>
  97. PHY Extended Status <3000>
  98. PCI Status <10>
  99. [ 621.764591] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  100. TDH <4c>
  101. TDT <62>
  102. next_to_use <62>
  103. next_to_clean <4b>
  104. buffer_info[next_to_clean]:
  105. time_stamp <1000134fb>
  106. next_to_watch <4c>
  107. jiffies <100013a38>
  108. next_to_watch.status <0>
  109. MAC Status <40080083>
  110. PHY Status <796d>
  111. PHY 1000BASE-T Status <7800>
  112. PHY Extended Status <3000>
  113. PCI Status <10>
  114. [ 623.780696] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  115. TDH <4c>
  116. TDT <62>
  117. next_to_use <62>
  118. next_to_clean <4b>
  119. buffer_info[next_to_clean]:
  120. time_stamp <1000134fb>
  121. next_to_watch <4c>
  122. jiffies <100013c30>
  123. next_to_watch.status <0>
  124. MAC Status <40080083>
  125. PHY Status <796d>
  126. PHY 1000BASE-T Status <7800>
  127. PHY Extended Status <3000>
  128. PCI Status <10>
  129. [ 625.764728] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  130. TDH <4c>
  131. TDT <62>
  132. next_to_use <62>
  133. next_to_clean <4b>
  134. buffer_info[next_to_clean]:
  135. time_stamp <1000134fb>
  136. next_to_watch <4c>
  137. jiffies <100013e20>
  138. next_to_watch.status <0>
  139. MAC Status <40080083>
  140. PHY Status <796d>
  141. PHY 1000BASE-T Status <7800>
  142. PHY Extended Status <3000>
  143. PCI Status <10>
  144. [ 625.892422] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
  145. [ 629.211451] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
  146. [ 702.790249] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  147. TDH <9e>
  148. TDT <a9>
  149. next_to_use <a9>
  150. next_to_clean <9d>
  151. buffer_info[next_to_clean]:
  152. time_stamp <10001883b>
  153. next_to_watch <9e>
  154. jiffies <100018958>
  155. next_to_watch.status <0>
  156. MAC Status <40080083>
  157. PHY Status <796d>
  158. PHY 1000BASE-T Status <3800>
  159. PHY Extended Status <3000>
  160. PCI Status <10>
  161. [ 704.806370] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  162. TDH <9e>
  163. TDT <a9>
  164. next_to_use <a9>
  165. next_to_clean <9d>
  166. buffer_info[next_to_clean]:
  167. time_stamp <10001883b>
  168. next_to_watch <9e>
  169. jiffies <100018b50>
  170. next_to_watch.status <0>
  171. MAC Status <40080083>
  172. PHY Status <796d>
  173. PHY 1000BASE-T Status <3800>
  174. PHY Extended Status <3000>
  175. PCI Status <10>
  176. [ 706.790379] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  177. TDH <9e>
  178. TDT <a9>
  179. next_to_use <a9>
  180. next_to_clean <9d>
  181. buffer_info[next_to_clean]:
  182. time_stamp <10001883b>
  183. next_to_watch <9e>
  184. jiffies <100018d40>
  185. next_to_watch.status <0>
  186. MAC Status <40080083>
  187. PHY Status <796d>
  188. PHY 1000BASE-T Status <3800>
  189. PHY Extended Status <3000>
  190. PCI Status <10>
  191. [ 708.806397] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  192. TDH <9e>
  193. TDT <a9>
  194. next_to_use <a9>
  195. next_to_clean <9d>
  196. buffer_info[next_to_clean]:
  197. time_stamp <10001883b>
  198. next_to_watch <9e>
  199. jiffies <100018f38>
  200. next_to_watch.status <0>
  201. MAC Status <40080083>
  202. PHY Status <796d>
  203. PHY 1000BASE-T Status <3800>
  204. PHY Extended Status <3000>
  205. PCI Status <10>
  206. [ 709.862071] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
  207. [ 714.113163] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
  208. [ 3127.238813] fwbr101i0: port 2(tap101i0) entered disabled state
  209. [ 3127.266667] fwbr101i0: port 1(fwln101i0) entered disabled state
  210. [ 3127.266843] vmbr0: port 2(fwpr101p0) entered disabled state
  211. [ 3127.267191] device fwln101i0 left promiscuous mode
  212. [ 3127.267336] fwbr101i0: port 1(fwln101i0) entered disabled state
  213. [ 3127.293164] device fwpr101p0 left promiscuous mode
  214. [ 3127.293266] vmbr0: port 2(fwpr101p0) entered disabled state
  215. [ 4190.739818] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
  216.  
  217.  
  218.  
  219. root@prox01 ~ # pveversion -v
  220. proxmox-ve: 5.4-2 (running kernel: 4.15.18-20-pve)
  221. pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
  222. pve-kernel-4.15: 5.4-8
  223. pve-kernel-4.15.18-20-pve: 4.15.18-46
  224. corosync: 2.4.4-pve1
  225. criu: 2.11.1-1~bpo90
  226. glusterfs-client: 3.8.8-1
  227. ksm-control-daemon: not correctly installed
  228. libjs-extjs: 6.0.1-2
  229. libpve-access-control: 5.1-12
  230. libpve-apiclient-perl: 2.0-5
  231. libpve-common-perl: 5.0-54
  232. libpve-guest-common-perl: 2.0-20
  233. libpve-http-server-perl: 2.0-14
  234. libpve-storage-perl: 5.0-44
  235. libqb0: 1.0.3-1~bpo9
  236. lvm2: 2.02.168-pve6
  237. lxc-pve: 3.1.0-6
  238. lxcfs: 3.0.3-pve1
  239. novnc-pve: 1.0.0-3
  240. proxmox-widget-toolkit: 1.0-28
  241. pve-cluster: 5.0-38
  242. pve-container: 2.0-40
  243. pve-docs: 5.4-2
  244. pve-edk2-firmware: 1.20190312-1
  245. pve-firewall: 3.0-22
  246. pve-firmware: 2.0-7
  247. pve-ha-manager: 2.0-9
  248. pve-i18n: 1.1-4
  249. pve-libspice-server1: 0.14.1-2
  250. pve-qemu-kvm: 3.0.1-4
  251. pve-xtermjs: 3.12.0-1
  252. qemu-server: 5.0-54
  253. smartmontools: 6.5+svn4324-1
  254. spiceterm: 3.0-5
  255. vncterm: 1.5-3
  256.  
  257.  
  258. root@prox01 ~ # qm config 100
  259. agent: 1
  260. bootdisk: scsi0
  261. cores: 4
  262. ide2: local:iso/CentOS-7-x86_64-Minimal-1810.iso,media=cdrom
  263. memory: 16384
  264. name: Proxmox-VM01
  265. net0: virtio=FA:3E:C8:D5:83:2E,bridge=vmbr0,firewall=1
  266. numa: 0
  267. onboot: 1
  268. ostype: l26
  269. scsi0: prox01vmstorage:vm-100-disk-0,size=480G
  270. scsihw: virtio-scsi-pci
  271. smbios1: uuid=bee0785b-9f65-49bf-a6fb-08187ccb33c8
  272. sockets: 1
  273. vmgenid: 1c4565b5-0961-486b-adfe-b3d769206d90
  274.  
  275.  
  276. Possible solution I am testing but my crashes only happen once every 24-48 hours:
  277.  
  278. https://serverfault.com/questions/616485/e1000e-reset-adapter-unexpectedly-detected-hardware-unit-hang
  279.  
  280. and/or
  281.  
  282. https://jhartman.pl/2018/08/06/proxmox-enp0s31f6-detected-hardware-unit-hang/
  283.  
  284. If that doesn’t work I’ll try this solution:
  285. Disabling Enhanced C1 (C1E) in the BIOS
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement