Advertisement
Guest User

Untitled

a guest
Jan 22nd, 2021
70
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 8.04 KB | None | 0 0
  1. From afa2ccadd8add70cf742ed7943c01be6fccd13b8 Mon Sep 17 00:00:00 2001
  2. Message-Id: <afa2ccadd8add70cf742ed7943c01be6fccd13b8.1611340095.git.fdmanana@suse.com>
  3. From: Filipe Manana <fdmanana@suse.com>
  4. Date: Fri, 22 Jan 2021 17:41:34 +0000
  5. Subject: [PATCH v3] btrfs: fix log replay failure due to race with space cache
  6. rebuild
  7.  
  8. After a sudden power failure we may end up with a space cache on disk that
  9. is not valid and needs to be rebuilt from scratch.
  10.  
  11. If that happens, during log replay when we attempt to pin an extent buffer
  12. from a log tree, at btrfs_pin_extent_for_log_replay(), we do not wait for
  13. the space cache to be rebuilt through the call to:
  14.  
  15. btrfs_cache_block_group(cache, 1);
  16.  
  17. That is because that only waits for the task (work queue job) that loads
  18. the space cache to change the cache state from BTRFS_CACHE_FAST to any
  19. other value. That is ok when the space cache on disk exists and is valid,
  20. but when the cache is not valid and needs to be rebuilt, it ends up
  21. returning as soon as the cache state changes to BTRFS_CACHE_STARTED (done
  22. at caching_thread()).
  23.  
  24. So this means that we can end up trying to unpin a range which is not yet
  25. marked as free in the block group. This results in the call to
  26. btrfs_remove_free_space() to return -EINVAL to
  27. btrfs_pin_extent_for_log_replay(), which in turn makes the log replay fail
  28. as well as mounting the filesystem. More specifically the -EINVAL comes
  29. from free_space_cache.c:remove_from_bitmap(), because the requested range
  30. is not marked as free space (ones in the bitmap), we have the following
  31. condition triggered:
  32.  
  33. static noinline int remove_from_bitmap(struct btrfs_free_space_ctl *ctl,
  34. (...)
  35. if (ret < 0 || search_start != *offset)
  36. return -EINVAL;
  37. (...)
  38.  
  39. It's the "search_start != *offset" that results in the condition being
  40. evaluated to true.
  41.  
  42. When this happens we got the following in dmesg/syslog:
  43.  
  44. [72383.415114] BTRFS: device fsid 32b95b69-0ea9-496a-9f02-3f5a56dc9322 devid 1 transid 1432 /dev/sdb scanned by mount (3816007)
  45. [72383.417837] BTRFS info (device sdb): disk space caching is enabled
  46. [72383.418536] BTRFS info (device sdb): has skinny extents
  47. [72383.423846] BTRFS info (device sdb): start tree-log replay
  48. [72383.426416] BTRFS warning (device sdb): block group 30408704 has wrong amount of free space
  49. [72383.427686] BTRFS warning (device sdb): failed to load free space cache for block group 30408704, rebuilding it now
  50. [72383.454291] BTRFS: error (device sdb) in btrfs_recover_log_trees:6203: errno=-22 unknown (Failed to pin buffers while recovering log root tree.)
  51. [72383.456725] BTRFS: error (device sdb) in btrfs_replay_log:2253: errno=-22 unknown (Failed to recover log tree)
  52. [72383.460241] BTRFS error (device sdb): open_ctree failed
  53.  
  54. We also mark the range for the extent buffer in the excluded extents io
  55. tree. That is fine when the space cache is valid on disk and we can load
  56. it, in which case it causes no problems.
  57.  
  58. However, for the case where we need to rebuild the space cache, because it
  59. is either invalid or it is missing, having the extent buffer range marked
  60. in the excluded extents io tree leads to a -EINVAL failure from the call
  61. to btrfs_remove_free_space(), resulting in the log replay and mount to
  62. fail. This is because by having the range marked in the excluded extents
  63. io tree, the caching thread ends up never adding the range of the extent
  64. buffer as free space in the block group since the calls to
  65. add_new_free_space(), called from load_extent_tree_free(), filter out any
  66. ranges that are marked as excluded extents.
  67.  
  68. So fix this by making sure that during log replay we wait for the caching
  69. task to finish completely when we need to rebuild a space cache, and also
  70. drop the need to mark the extent buffer range in the excluded extents io
  71. tree, as well as clearing ranges from that tree at
  72. btrfs_finish_extent_commit().
  73.  
  74. This started to happen with some frequency on large filesystems having
  75. block groups with a lot of fragmentation since the recent commit
  76. e747853cae3ae3 ("btrfs: load free space cache asynchronously"), but in
  77. fact the issue has been there for years, it was just much less likely
  78. to happen.
  79.  
  80. Signed-off-by: Filipe Manana <fdmanana@suse.com>
  81. ---
  82.  
  83. V2: Updated patch with Josef's comments and combined both patches into a
  84. single one with an updated changelog.
  85. V3: Eliminated no longer needed code, we do wait for the caching to be
  86. complete before removing free space, so a lot of code from
  87. __exclude_logged_extent() is no longer needed. Also fixed a leak of
  88. the block group in case an error happens.
  89.  
  90. fs/btrfs/extent-tree.c | 61 +++++++++++++-----------------------------
  91. 1 file changed, 18 insertions(+), 43 deletions(-)
  92.  
  93. diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
  94. index 30b1a630dc2f..0c335dae5af7 100644
  95. --- a/fs/btrfs/extent-tree.c
  96. +++ b/fs/btrfs/extent-tree.c
  97. @@ -2602,8 +2602,6 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_trans_handle *trans,
  98. struct btrfs_block_group *cache;
  99. int ret;
  100.  
  101. - btrfs_add_excluded_extent(trans->fs_info, bytenr, num_bytes);
  102. -
  103. cache = btrfs_lookup_block_group(trans->fs_info, bytenr);
  104. if (!cache)
  105. return -EINVAL;
  106. @@ -2615,11 +2613,19 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_trans_handle *trans,
  107. * the pinned extents.
  108. */
  109. btrfs_cache_block_group(cache, 1);
  110. + /*
  111. + * Make sure we wait until the cache is completely built in case it is
  112. + * missing or is invalid and therefore needs to be rebuilt.
  113. + */
  114. + ret = btrfs_wait_block_group_cache_done(cache);
  115. + if (ret)
  116. + goto out;
  117.  
  118. pin_down_extent(trans, cache, bytenr, num_bytes, 0);
  119.  
  120. /* remove us from the free space cache (if we're there at all) */
  121. ret = btrfs_remove_free_space(cache, bytenr, num_bytes);
  122. +out:
  123. btrfs_put_block_group(cache);
  124. return ret;
  125. }
  126. @@ -2629,50 +2635,22 @@ static int __exclude_logged_extent(struct btrfs_fs_info *fs_info,
  127. {
  128. int ret;
  129. struct btrfs_block_group *block_group;
  130. - struct btrfs_caching_control *caching_ctl;
  131.  
  132. block_group = btrfs_lookup_block_group(fs_info, start);
  133. if (!block_group)
  134. return -EINVAL;
  135.  
  136. - btrfs_cache_block_group(block_group, 0);
  137. - caching_ctl = btrfs_get_caching_control(block_group);
  138. -
  139. - if (!caching_ctl) {
  140. - /* Logic error */
  141. - BUG_ON(!btrfs_block_group_done(block_group));
  142. - ret = btrfs_remove_free_space(block_group, start, num_bytes);
  143. - } else {
  144. - /*
  145. - * We must wait for v1 caching to finish, otherwise we may not
  146. - * remove our space.
  147. - */
  148. - btrfs_wait_space_cache_v1_finished(block_group, caching_ctl);
  149. - mutex_lock(&caching_ctl->mutex);
  150. -
  151. - if (start >= caching_ctl->progress) {
  152. - ret = btrfs_add_excluded_extent(fs_info, start,
  153. - num_bytes);
  154. - } else if (start + num_bytes <= caching_ctl->progress) {
  155. - ret = btrfs_remove_free_space(block_group,
  156. - start, num_bytes);
  157. - } else {
  158. - num_bytes = caching_ctl->progress - start;
  159. - ret = btrfs_remove_free_space(block_group,
  160. - start, num_bytes);
  161. - if (ret)
  162. - goto out_lock;
  163. + btrfs_cache_block_group(block_group, 1);
  164. + /*
  165. + * Make sure we wait until the cache is completely built in case it is
  166. + * missing or is invalid and therefore needs to be rebuilt.
  167. + */
  168. + ret = btrfs_wait_block_group_cache_done(block_group);
  169. + if (ret)
  170. + goto out;
  171.  
  172. - num_bytes = (start + num_bytes) -
  173. - caching_ctl->progress;
  174. - start = caching_ctl->progress;
  175. - ret = btrfs_add_excluded_extent(fs_info, start,
  176. - num_bytes);
  177. - }
  178. -out_lock:
  179. - mutex_unlock(&caching_ctl->mutex);
  180. - btrfs_put_caching_control(caching_ctl);
  181. - }
  182. + ret = btrfs_remove_free_space(block_group, start, num_bytes);
  183. +out:
  184. btrfs_put_block_group(block_group);
  185. return ret;
  186. }
  187. @@ -2863,9 +2841,6 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
  188. mutex_unlock(&fs_info->unused_bg_unpin_mutex);
  189. break;
  190. }
  191. - if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
  192. - clear_extent_bits(&fs_info->excluded_extents, start,
  193. - end, EXTENT_UPTODATE);
  194.  
  195. if (btrfs_test_opt(fs_info, DISCARD_SYNC))
  196. ret = btrfs_discard_extent(fs_info, start,
  197. --
  198. 2.28.0
  199.  
  200.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement