Guest User

Untitled

a guest
May 6th, 2025
16
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 11.75 KB | None | 0 0
  1. **Edit**
  2. Ha. I downgraded kernel to:
  3.  
  4. ```
  5. > uname -a
  6. Linux ren 6.14.2 #1-NixOS SMP PREEMPT_DYNAMIC Thu Apr 10 12:44:49 UTC 2025 x86_64 GNU/Linux
  7. ```
  8.  
  9. and evacuation works:
  10.  
  11. ```
  12. > sudo bcachefs device evacuate /dev/nvme0n1p2
  13. Setting /dev/nvme0n1p2 readonly
  14. 0% complete: current position btree extents:25828954:26160
  15. ```
  16.  
  17. Ooops. But this does not look OK:
  18.  
  19. ```
  20. [ 63.966285] bcachefs (a933c02c-19d2-40d7-b5d7-42892bd5e154): Error setting device state: device_state_not_allowed 20:24:20 [1/1571]
  21. [ 67.870661] bcachefs (nvme0n1p2): ro
  22. [ 77.215213] ------------[ cut here ]------------
  23. [ 77.215217] kernel BUG at fs/bcachefs/btree_update_interior.c:1785!
  24. [ 77.215226] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  25. [ 77.215230] CPU: 30 UID: 0 PID: 4637 Comm: bcachefs Not tainted 6.14.2 #1-NixOS
  26. [ 77.215233] Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI, BIOS 1809 09/28/2023
  27. [ 77.215235] RIP: 0010:bch2_btree_insert_node+0x50f/0x6c0 [bcachefs]
  28. [ 77.215270] Code: c8 49 8b 7f 08 41 0f b7 47 3a eb 82 48 8b 5d c8 49 8b 7f 08 4d 8b 84 24 98 00 00 00 41 0f b7 47 3a e9 68 ff ff ff 90 0f 0b 90
  29. <0f> 0b 90 0f 0b 31 c9 4c 89 e2 48 89 de 4c 89 ff e8 2c d8 fe ff 89
  30. [ 77.215272] RSP: 0018:ffffafe748823b40 EFLAGS: 00010293
  31. [ 77.215275] RAX: 0000000000000000 RBX: ffff8ea82b4d41f8 RCX: 0000000000000002
  32. [ 77.215277] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8ea885846000
  33. [ 77.215278] RBP: ffffafe748823b90 R08: ffff8ea885846d50 R09: 0000000000000000
  34. [ 77.215279] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ea602757200
  35. [ 77.215280] R13: ffff8ea885846000 R14: 0000000000000001 R15: ffff8ea82b4d4000
  36. [ 77.215282] FS: 0000000000000000(0000) GS:ffff8eb51e700000(0000) knlGS:0000000000000000
  37. [ 77.215283] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  38. [ 77.215285] CR2: 000000c001b64000 CR3: 000000015ce22000 CR4: 0000000000f50ef0
  39. [ 77.215286] PKRU: 55555554
  40. [ 77.215287] Call Trace:
  41. [ 77.215291] <TASK>
  42. [ 77.215295] ? srso_alias_return_thunk+0x5/0xfbef5
  43. [ 77.215301] bch2_btree_node_rewrite+0x1b3/0x370 [bcachefs]
  44. [ 77.215323] bch2_move_btree.isra.0+0x30d/0x490 [bcachefs]
  45. [ 77.215355] ? __pfx_migrate_btree_pred+0x10/0x10 [bcachefs]
  46. [ 77.215378] ? bch2_move_btree.isra.0+0x106/0x490 [bcachefs]
  47. [ 77.215402] ? __pfx_bch2_data_thread+0x10/0x10 [bcachefs]
  48. [ 77.215426] bch2_data_job+0x10a/0x2f0 [bcachefs]
  49. [ 77.215450] bch2_data_thread+0x4a/0x70 [bcachefs]
  50. [ 77.215472] kthread+0xeb/0x250
  51. ```
  52.  
  53. **Original post**
  54.  
  55. My single and only nvme started reporting smart errors. Great, time for my choice of bcachefs to save me now! Ordered another one, added it to the file system (thanks to two m.2 slots), set metadata replicas to 2, though that I can live with some data loss possibilty so just kept it this way. But after a few days of seeing even more smartd errors, I decided to just replace with another new one.
  56.  
  57. Ordered another one, now I want to remove the failing one from the fs so I can swap it in the nvme slot.
  58.  
  59. My understanding is that I should `device evacuate`, then `device remove` and I'm OK to swap. But I can't:
  60.  
  61. ```
  62. > sudo bcachefs device evacuate /dev/nvme0n1p2
  63. Setting /dev/nvme0n1p2 readonly
  64. BCH_IOCTL_DISK_SET_STATE ioctl error: Invalid argument
  65. > sudo dmesg | tail -n 3
  66. [ 241.528859] bcachefs (a933c02c-19d2-40d7-b5d7-42892bd5e154): Error setting device state: device_state_not_allowed
  67. [ 361.951314] block nvme0n1: No UUID available providing old NGUID
  68. [ 498.032801] bcachefs (a933c02c-19d2-40d7-b5d7-42892bd5e154): Error setting device state: device_state_not_allowed
  69. ```
  70.  
  71.  
  72. ```
  73. > sudo bcachefs device remove /dev/nvme0n1p2
  74. BCH_IOCTL_DISK_REMOVE ioctl error: Invalid argument
  75. > sudo dmesg | tail -n 3
  76. [ 361.951314] block nvme0n1: No UUID available providing old NGUID
  77. [ 498.032801] bcachefs (a933c02c-19d2-40d7-b5d7-42892bd5e154): Error setting device state: device_state_not_allowed
  78. [ 585.233829] bcachefs (nvme0n1p2): Cannot remove without losing data
  79. ```
  80.  
  81. I tried:
  82.  
  83. ```
  84. > sudo bcachefs data rereplicate /
  85. ```
  86.  
  87. and `set-state failed`, and possibly some other things, with no result.
  88.  
  89. It completed, but does not change anything.
  90.  
  91. ```
  92. > sudo bcachefs show-super /dev/nvme1n1p2
  93. Device: (unknown device)
  94. External UUID: a933c02c-19d2-40d7-b5d7-42892bd5e154
  95. Internal UUID: 61d26938-b11f-42f0-8968-372a21e8b739
  96. Magic number: c68573f6-66ce-90a9-d96a-60cf803df7ef
  97. Device index: 1
  98. Label: (none)
  99. Version: 1.25: (unknown version)
  100. Version upgrade complete: 1.25: (unknown version)
  101. Oldest version on disk: 1.3: rebalance_work
  102. Created: Sun Jan 28 21:07:10 2024
  103. Sequence number: 383
  104. Time of last write: Mon May 5 16:48:37 2025
  105. Superblock size: 5.30 KiB/1.00 MiB
  106. Clean: 0
  107. Devices: 2
  108. Sections: members_v1,crypt,replicas_v0,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade
  109. Features: journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
  110. Compat features: alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
  111.  
  112. Options:
  113. block_size: 512 B
  114. btree_node_size: 256 KiB
  115. errors: continue [fix_safe] panic ro
  116. metadata_replicas: 2
  117. data_replicas: 1
  118. metadata_replicas_required: 1
  119. data_replicas_required: 1
  120. encoded_extent_max: 64.0 KiB
  121. metadata_checksum: none [crc32c] crc64 xxhash
  122. data_checksum: none [crc32c] crc64 xxhash
  123. compression: none
  124. background_compression: none
  125. str_hash: crc32c crc64 [siphash]
  126. metadata_target: none
  127. foreground_target: none
  128. background_target: none
  129. promote_target: none
  130. erasure_code: 0
  131. inodes_32bit: 1
  132. shard_inode_numbers: 1
  133. inodes_use_key_cache: 1
  134. gc_reserve_percent: 8
  135. gc_reserve_bytes: 0 B
  136. root_reserve_percent: 0
  137. wide_macs: 0
  138. promote_whole_extents: 0
  139. acl: 1
  140. usrquota: 0
  141. grpquota: 0
  142. prjquota: 0
  143. journal_flush_delay: 1000
  144. journal_flush_disabled: 0
  145. journal_reclaim_delay: 100
  146. journal_transaction_names: 1
  147. allocator_stuck_timeout: 30
  148. version_upgrade: [compatible] incompatible none
  149. nocow: 0
  150.  
  151. members_v2 (size 304):
  152. Device: 0
  153. Label: (none)
  154. UUID: 8e6a97e3-33c6-4aad-ac45-6122ea1eb394
  155. Size: 3.64 TiB
  156. read errors: 1067
  157. write errors: 0
  158. checksum errors: 0
  159. seqread iops: 0
  160. seqwrite iops: 0
  161. randread iops: 0
  162. randwrite iops: 0
  163. Bucket size: 512 KiB
  164. First bucket: 0
  165. Buckets: 7629918
  166. Last mount: Mon May 5 16:48:37 2025
  167. Last superblock write: 383
  168. State: rw
  169. Data allowed: journal,btree,user
  170. Has data: journal,btree,user
  171. Btree allocated bitmap blocksize: 128 MiB
  172. Btree allocated bitmap: 0000000000011111111111111111111111111111111111111111111111111111
  173. Durability: 1
  174. Discard: 0
  175. Freespace initialized: 1
  176. Device: 1
  177. Label: (none)
  178. UUID: 4bd08f3b-030e-4cd1-8b1e-1f3c8662b455
  179. Size: 3.72 TiB
  180. read errors: 0
  181. write errors: 0
  182. checksum errors: 0
  183. seqread iops: 0
  184. seqwrite iops: 0
  185. randread iops: 0
  186. randwrite iops: 0
  187. Bucket size: 1.00 MiB
  188. First bucket: 0
  189. Buckets: 3906505
  190. Last mount: Mon May 5 16:48:37 2025
  191. Last superblock write: 383
  192. State: rw
  193. Data allowed: journal,btree,user
  194. Has data: journal,btree,user
  195. Btree allocated bitmap blocksize: 32.0 MiB
  196. Btree allocated bitmap: 0000010000000000000000000000000000000000000000100000000000101111
  197. Durability: 1
  198. Discard: 0
  199. Freespace initialized: 1
  200.  
  201. errors (size 184):
  202. btree_node_bset_older_than_sb_min 1 Sat Apr 27 17:18:02 2024
  203. fs_usage_data_wrong 1 Sat Apr 27 17:20:43 2024
  204. fs_usage_replicas_wrong 1 Sat Apr 27 17:20:48 2024
  205. dev_usage_sectors_wrong 1 Sat Apr 27 17:20:36 2024
  206. dev_usage_fragmented_wrong 1 Sat Apr 27 17:20:39 2024
  207. alloc_key_dirty_sectors_wrong 3 Sat Apr 27 17:20:35 2024
  208. bucket_sector_count_overflow 1 Sat Apr 27 16:42:51 2024
  209. backpointer_to_missing_ptr 5 Sat Apr 27 17:21:53 2024
  210. ptr_to_missing_backpointer 2 Sat Apr 27 17:21:57 2024
  211. key_in_missing_inode 5 Sat Apr 27 17:22:48 2024
  212. accounting_key_version_0 8 Fri Oct 25 19:00:01 2024
  213. ```
  214.  
  215. Am I hitting a bug, or just confused about something?
  216.  
  217. `nvme0` is the failing drive, `nvme1` is the new one I just added. Another drive waits in the box to replace `nvme0`.
  218.  
  219. ```
  220. > bcachefs version
  221. 1.13.0
  222. > uname -a
  223. Linux ren 6.15.0-rc1 #1-NixOS SMP PREEMPT_DYNAMIC Tue Jan 1 00:00:00 UTC 1980 x86_64 GNU/Linux
  224. ```
  225.  
  226. Upgraded
  227.  
  228. ```
  229. > bcachefs version
  230. 1.25.1
  231. ```
  232.  
  233. but does not seem to change anything.
  234.  
  235. Did the scrub:
  236.  
  237. ```
  238. > sudo bcachefs data scrub /
  239. Starting scrub on 2 devices: nvme0n1p2 nvme1n1p2
  240. device checked corrected uncorrected total
  241. nvme0n1p2 1.93 TiB 0 B 192 KiB 34.6 GiB 5721% complete
  242. nvme1n1p2 175 GiB 0 B 0 B 34.6 GiB 505% complete
  243. ```
Add Comment
Please, Sign In to add comment