[Update - No Panic] ext4 data loss - Datenverlust bei ext4!

Started by devil, 2012/10/24, 15:31:17

Previous topic - Next topic

devil

http://www.phoronix.com/scan.php?page=news_item&px=MTIxNDQ

According to this news item kernels  3.4, 3.5, and 3.6 have a potential for ext4 data loss.

quoting Ted Tso from the article: "Well, the problem won't show up if the journal has wrapped. So it will only show up if the system has been rebooted twice in fairly quick succession. A full conventional distro install probably wouldn't have triggered a bug... although someone who habitually reboots their laptop instead of using suspend/resume or hiberbate, or someone who is trying to bisect the kernel looking for some other bug could easily trip over this --- which I guess is how you got hit by it."

A patch is commited but not yet released.

Laut obigem Artikel besteht bei den Kerneln 3.4, 3.5, und 3.6 die Möglichkeit auf Datenverlust bei ext4.

Das Zitat von Ted Tso sagt, bei normaler Benutzung bestehe kein Problem. Datenverlust könnte auftreten wenn ein System mehrere Male schnell hintereinander gebooted wird. Ein Patch ist eingereicht, aber noch nicht veröffentlicht.

edit: hier nochmal auf deutsch: https://www.computerbase.de/news/2012-10/fehler-in-aktuellen-linux-kerneln/

greetz
devil

reddark

na mist ... mein /home liegt auf ext4 .... ;)

ralul

Yeah, Gentoo also reacts to this by masking nearly all newer linux sources: https://bugs.gentoo.org/show_bug.cgi?id=439502

[update] Also Gentoo considers this bug minor now and has reenabled the previously masked kernels ...
experiencing siduction runs better than my gentoo makes me know I know nothing

ralul

@Towo, having patched from stable-queue with the required patch:
ext4-race-condition-protection-for-ext4_convert_unwritten_extents_endio.patch
my Gentoo linux-3.6.3+7queued runs without errors. But:
- this patch of the issue is not well tested
- this is not even -rc1 aproved
experiencing siduction runs better than my gentoo makes me know I know nothing

towo

Your used patch is Not for that issue!
The patch to fix the issue will be

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 0f16edd..26b2983 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1351,24 +1351,33 @@ void jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
static void jbd2_mark_journal_empty(journal_t *journal)
{
journal_superblock_t *sb = journal->j_superblock;
+ __be32 new_tail_sequence;

BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
read_lock(&journal->j_state_lock);
- /* Is it already empty? */
+ new_tail_sequence = cpu_to_be32(journal->j_tail_sequence);
+ /* Nothing to do? */
if (sb->s_start == 0) {
+ pr_err("JBD2: jbd2_mark_journal_empty bug workaround (%u, %u)\n",
+       (unsigned) be32_to_cpu(sb->s_sequence),
+       (unsigned) be32_to_cpu(new_tail_sequence));
+ WARN_ON(1);
+ }
+ if (sb->s_start == 0 && sb->s_sequence == new_tail_sequence) {
read_unlock(&journal->j_state_lock);
- return;
+ goto set_flushed;
}
jbd_debug(1, "JBD2: Marking journal as empty (seq %d)\n",
 journal->j_tail_sequence);

- sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
+ sb->s_sequence = new_tail_sequence;
sb->s_start    = cpu_to_be32(0);
read_unlock(&journal->j_state_lock);

jbd2_write_superblock(journal, WRITE_FUA);

- /* Log is no longer empty */
+set_flushed:
+ /* Log is empty */
write_lock(&journal->j_state_lock);
journal->j_flags |= JBD2_FLUSHED;
write_unlock(&journal->j_state_lock);

and has still not landed in the stable-queue.
Ich gehe nicht zum Karneval, ich verleihe nur manchmal mein Gesicht.

ralul

As I understood the lkml this patch is an alternative try to fix the issue and keep gains of earlier patches.
But we will see what lands in the queue. I bet Greg is going the most Konvervative way now his stable patchlevel release has been damaged and put to risk his reputation ....

[edit] ... I just see Greg is pushing another 44 files into the queue! Isn't everyone expecting Greg to fix just this ext4 issue timely? I expected an early release tomorrow to fix it. But this cannot happen with such a bulk. Perhaps Greg Kroah-Hartman isn't informed?

[edit2] This writing is due to a panic attack - the next day (2012-10-25) we all calmed down :)
experiencing siduction runs better than my gentoo makes me know I know nothing


michaa7

Ok, you can't code, but you still might be able to write a bug report for Debian's sake


michaa7

side note to the above mentioned article (computerbase.de - update):

siduction in good neighbourhood and a typo
Quote... Weitere Fehlermeldungen von Betroffenen sind bisher nicht bekannt. Dies wäre aber zu erwarten, da Distributionen wie Fedora 7 oder Siduction die Kernel 2.6.2 und 2.6.3 bereits seit deren Erscheinen nutzen ...
Ok, you can't code, but you still might be able to write a bug report for Debian's sake

dibl

Hmmmm.  Have we now discovered that it is a bad idea to shut off a running Linux system with the power switch?  That was the news in 1990!   :lol:
System76 Oryx Pro, Intel Core i7-11800H, ASRock B860 Pro-A, Intel Core Ultra 7 265KF, Nvidia GTX-1060, SSD 990 EVO Plus.

ralul

I was paniced too :(
And Towo was right (as ever) with his patch finding ...  

Could someone change the header of this thread, before this friday traffic in the forums:
"don't panic - unlikely"
experiencing siduction runs better than my gentoo makes me know I know nothing

ralul

The bug was triggered on a NAS, which constantly rebooted by issuing: reboot -f
The "-f" options skips a normal shutdown and clean unmount. The data loss then was due to a missed fsck.

Workaround:
This is only needed when normal shutdown with normal unmount was not possible. There are two possible ways:

a) when booting - at grub menu:
1. edit cmdline with key
e

2. at end of line "linux /boot/vmlinuz-3.6 ..." add
forcefsck

3. boot this grub entry now with
Ctrl-x

b) permanent change of grub.cfg
1. edit /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet ro forcefsck"

2. as root:
update-grub

... you need this when having serious issues with your system in the first place - when normal shutdown behavior is not possible ...
experiencing siduction runs better than my gentoo makes me know I know nothing

michaa7

Ok, you can't code, but you still might be able to write a bug report for Debian's sake

ralul

"ext4 revert: jbd2-don-t-write-superblock-when-if-its-empty.patch"
of linux-3.6.2 is Tytso official fix.
experiencing siduction runs better than my gentoo makes me know I know nothing