Siduction Forum

Siduction Forum => Upgrade Warnings => Topic started by: devil on 2012/10/24, 15:31:17

Title: [Update - No Panic] ext4 data loss - Datenverlust bei ext4!
Post by: devil on 2012/10/24, 15:31:17
http://www.phoronix.com/scan.php?page=news_item&px=MTIxNDQ

According to this news item kernels  3.4, 3.5, and 3.6 have a potential for ext4 data loss.

quoting Ted Tso from the article: "Well, the problem won't show up if the journal has wrapped. So it will only show up if the system has been rebooted twice in fairly quick succession. A full conventional distro install probably wouldn't have triggered a bug... although someone who habitually reboots their laptop instead of using suspend/resume or hiberbate, or someone who is trying to bisect the kernel looking for some other bug could easily trip over this --- which I guess is how you got hit by it."

A patch is commited but not yet released.

Laut obigem Artikel besteht bei den Kerneln 3.4, 3.5, und 3.6 die Möglichkeit auf Datenverlust bei ext4.

Das Zitat von Ted Tso sagt, bei normaler Benutzung bestehe kein Problem. Datenverlust könnte auftreten wenn ein System mehrere Male schnell hintereinander gebooted wird. Ein Patch ist eingereicht, aber noch nicht veröffentlicht.

edit: hier nochmal auf deutsch: https://www.computerbase.de/news/2012-10/fehler-in-aktuellen-linux-kerneln/

greetz
devil
Title: ext4 data loss - Datenverlust bei ext4!
Post by: reddark on 2012/10/24, 15:46:32
na mist ... mein /home liegt auf ext4 .... ;)
Title: ext4 data loss - Datenverlust bei ext4!
Post by: ralul on 2012/10/24, 17:19:35
Yeah, Gentoo also reacts to this by masking nearly all newer linux sources: https://bugs.gentoo.org/show_bug.cgi?id=439502

[update] Also Gentoo considers this bug minor now and has reenabled the previously masked kernels ...
Title: ext4 data loss - Datenverlust bei ext4!
Post by: ralul on 2012/10/24, 20:32:52
@Towo, having patched from stable-queue with the required patch:
ext4-race-condition-protection-for-ext4_convert_unwritten_extents_endio.patch
my Gentoo linux-3.6.3+7queued runs without errors. But:
- this patch of the issue is not well tested
- this is not even -rc1 aproved
Title: RE: ext4 data loss - Datenverlust bei ext4!
Post by: towo on 2012/10/24, 20:49:06
Your used patch is Not for that issue!
The patch to fix the issue will be
Code: [Select]

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 0f16edd..26b2983 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1351,24 +1351,33 @@ void jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
 static void jbd2_mark_journal_empty(journal_t *journal)
 {
  journal_superblock_t *sb = journal->j_superblock;
+ __be32 new_tail_sequence;
 
  BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
  read_lock(&journal->j_state_lock);
- /* Is it already empty? */
+ new_tail_sequence = cpu_to_be32(journal->j_tail_sequence);
+ /* Nothing to do? */
  if (sb->s_start == 0) {
+ pr_err("JBD2: jbd2_mark_journal_empty bug workaround (%u, %u)\n",
+       (unsigned) be32_to_cpu(sb->s_sequence),
+       (unsigned) be32_to_cpu(new_tail_sequence));
+ WARN_ON(1);
+ }
+ if (sb->s_start == 0 && sb->s_sequence == new_tail_sequence) {
  read_unlock(&journal->j_state_lock);
- return;
+ goto set_flushed;
  }
  jbd_debug(1, "JBD2: Marking journal as empty (seq %d)\n",
   journal->j_tail_sequence);
 
- sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
+ sb->s_sequence = new_tail_sequence;
  sb->s_start    = cpu_to_be32(0);
  read_unlock(&journal->j_state_lock);
 
  jbd2_write_superblock(journal, WRITE_FUA);
 
- /* Log is no longer empty */
+set_flushed:
+ /* Log is empty */
  write_lock(&journal->j_state_lock);
  journal->j_flags |= JBD2_FLUSHED;
  write_unlock(&journal->j_state_lock);

and has still not landed in the stable-queue.
Title: RE: ext4 data loss - Datenverlust bei ext4!
Post by: ralul on 2012/10/24, 21:59:10
As I understood the lkml this patch is an alternative try to fix the issue and keep gains of earlier patches.
But we will see what lands in the queue. I bet Greg is going the most Konvervative way now his stable patchlevel release has been damaged and put to risk his reputation ....

[edit] ... I just see Greg is pushing another 44 files into the queue! Isn't everyone expecting Greg to fix just this ext4 issue timely? I expected an early release tomorrow to fix it. But this cannot happen with such a bulk. Perhaps Greg Kroah-Hartman isn't informed?

[edit2] This writing is due to a panic attack - the next day (2012-10-25) we all calmed down :)
Title: RE: ext4 data loss - Datenverlust bei ext4!
Post by: reddark on 2012/10/25, 00:53:00
noch ein text dazu:
http://www.heise.de/open/meldung/Ext4-Bug-gefaehrdet-Daten-1736310.html
Title: RE: ext4 data loss - Datenverlust bei ext4!
Post by: michaa7 on 2012/10/25, 01:02:24
unfortunatley Ted Tso doubts about his first analysis:

https://lkml.org/lkml/2012/10/24/535
Title: RE: ext4 data loss - Datenverlust bei ext4!
Post by: devil on 2012/10/25, 10:17:28
Update: https://www.computerbase.de/news/2012-10/fehler-in-aktuellen-linux-kerneln/

greetz
devil
Title: RE: ext4 data loss - Datenverlust bei ext4!
Post by: michaa7 on 2012/10/25, 11:05:40
side note to the above mentioned article (computerbase.de - update):

siduction in good neighbourhood and a typo
Quote
... Weitere Fehlermeldungen von Betroffenen sind bisher nicht bekannt. Dies wäre aber zu erwarten, da Distributionen wie Fedora 7 oder Siduction die Kernel 2.6.2 und 2.6.3 bereits seit deren Erscheinen nutzen ...
Title: RE: ext4 data loss - Datenverlust bei ext4!
Post by: dibl on 2012/10/25, 11:24:31
Hmmmm.  Have we now discovered that it is a bad idea to shut off a running Linux system with the power switch?  That was the news in 1990!   :lol:
Title: RE: ext4 data loss - Datenverlust bei ext4!
Post by: ralul on 2012/10/25, 12:56:02
I was paniced too :(
And Towo was right (as ever) with his patch finding ...  

Could someone change the header of this thread, before this friday traffic in the forums:
"don't panic - unlikely"
Title: workaround
Post by: ralul on 2012/10/25, 14:30:48
The bug was triggered on a NAS, which constantly rebooted by issuing: reboot -f
The "-f" options skips a normal shutdown and clean unmount. The data loss then was due to a missed fsck.

Workaround:
This is only needed when normal shutdown with normal unmount was not possible. There are two possible ways:

a) when booting - at grub menu:
1. edit cmdline with key
e

2. at end of line "linux /boot/vmlinuz-3.6 ..." add
forcefsck

3. boot this grub entry now with
Ctrl-x

b) permanent change of grub.cfg
1. edit /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet ro forcefsck"

2. as root:
update-grub

... you need this when having serious issues with your system in the first place - when normal shutdown behavior is not possible ...
Title: entwarung / almost all-clear signal
Post by: michaa7 on 2012/10/25, 17:49:39
entwarung / almost all-clear signal

https://plus.google.com/117091380454742934025/posts/Wcc5tMiCgq7

http://www.heise.de/open/meldung/Ext4-Bug-Entwarnung-1736902.html
Title: entwarung / almost all-clear signal
Post by: ralul on 2012/10/25, 22:49:08
"ext4 revert: jbd2-don-t-write-superblock-when-if-its-empty.patch"
of linux-3.6.2 is Tytso official fix.
Title: RE: entwarung / almost all-clear signal
Post by: piper on 2012/10/26, 03:53:34
I use reiserfs :)
Title: Re: RE: entwarung / almost all-clear signal
Post by: dibl on 2012/10/26, 12:15:39
Quote from: "piper"
I use reiserfs :)


 :shock:

Is it still being developed/maintained? I would worry about it on the new 3.x kernels.
Title: Re: RE: entwarung / almost all-clear signal
Post by: ralul on 2012/10/26, 13:32:15
Maybe openSUSE takes some care for reiserfs?
For reiser4 there is a patch for linux-3.6 , but the russian maintainer warned there is an old bug he couldn't solve. If you look at phoronix tests you see btrfs is better for most cases. And it is the same btree algo reiser4 has, and the same developer.
Title: RE: Re: RE: entwarung / almost all-clear signal
Post by: piper on 2012/10/26, 18:48:29
I will use reiserfs till it is dead ;)

My experience on 2 identical machines (dual-core, one has 4 gigs of ram and the other has 8, one uses ext4 and the other (main) uses resierfs) compiling is much faster, building siduction/aptosid is faster, the one that is really faster is building android (22 gigs). Moving files (mythtv movies etc) is also faster.

Not the best *benchmarks to test, I admit, but for me, reiserfs is faster in the way I compute and use my machine (no ssd here)

*maybe when I build my new system (8 cores, 16 gig of ram) I *might go with ext4, OTOH maybe not
Title: RE: Re: RE: entwarung / almost all-clear signal
Post by: agaida on 2012/10/26, 20:42:42
piper: We have 2012. 16G RAM is for beginners. Please, do yourself a favor and give 32 G to your new machine :D
Title: Re: RE: Re: RE: entwarung / almost all-clear signal
Post by: piper on 2012/10/27, 19:25:15
Quote from: "agaida"
piper: We have 2012. 16G RAM is for beginners. Please, do yourself a favor and give 32 G to your new machine :D


+1

I think I will listen to that :)