Siduction Forum
Siduction Forum => Upgrade Warnings => Topic started by: devil on 2012/10/24, 15:31:17
-
http://www.phoronix.com/scan.php?page=news_item&px=MTIxNDQ
According to this news item kernels 3.4, 3.5, and 3.6 have a potential for ext4 data loss.
quoting Ted Tso from the article: "Well, the problem won't show up if the journal has wrapped. So it will only show up if the system has been rebooted twice in fairly quick succession. A full conventional distro install probably wouldn't have triggered a bug... although someone who habitually reboots their laptop instead of using suspend/resume or hiberbate, or someone who is trying to bisect the kernel looking for some other bug could easily trip over this --- which I guess is how you got hit by it."
A patch is commited but not yet released.
Laut obigem Artikel besteht bei den Kerneln 3.4, 3.5, und 3.6 die Möglichkeit auf Datenverlust bei ext4.
Das Zitat von Ted Tso sagt, bei normaler Benutzung bestehe kein Problem. Datenverlust könnte auftreten wenn ein System mehrere Male schnell hintereinander gebooted wird. Ein Patch ist eingereicht, aber noch nicht veröffentlicht.
edit: hier nochmal auf deutsch: https://www.computerbase.de/news/2012-10/fehler-in-aktuellen-linux-kerneln/
greetz
devil
-
na mist ... mein /home liegt auf ext4 .... ;)
-
Yeah, Gentoo also reacts to this by masking nearly all newer linux sources: https://bugs.gentoo.org/show_bug.cgi?id=439502
[update] Also Gentoo considers this bug minor now and has reenabled the previously masked kernels ...
-
@Towo, having patched from stable-queue with the required patch:
ext4-race-condition-protection-for-ext4_convert_unwritten_extents_endio.patch
my Gentoo linux-3.6.3+7queued runs without errors. But:
- this patch of the issue is not well tested
- this is not even -rc1 aproved
-
Your used patch is Not for that issue!
The patch to fix the issue will be
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 0f16edd..26b2983 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1351,24 +1351,33 @@ void jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
static void jbd2_mark_journal_empty(journal_t *journal)
{
journal_superblock_t *sb = journal->j_superblock;
+ __be32 new_tail_sequence;
BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
read_lock(&journal->j_state_lock);
- /* Is it already empty? */
+ new_tail_sequence = cpu_to_be32(journal->j_tail_sequence);
+ /* Nothing to do? */
if (sb->s_start == 0) {
+ pr_err("JBD2: jbd2_mark_journal_empty bug workaround (%u, %u)\n",
+ (unsigned) be32_to_cpu(sb->s_sequence),
+ (unsigned) be32_to_cpu(new_tail_sequence));
+ WARN_ON(1);
+ }
+ if (sb->s_start == 0 && sb->s_sequence == new_tail_sequence) {
read_unlock(&journal->j_state_lock);
- return;
+ goto set_flushed;
}
jbd_debug(1, "JBD2: Marking journal as empty (seq %d)\n",
journal->j_tail_sequence);
- sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
+ sb->s_sequence = new_tail_sequence;
sb->s_start = cpu_to_be32(0);
read_unlock(&journal->j_state_lock);
jbd2_write_superblock(journal, WRITE_FUA);
- /* Log is no longer empty */
+set_flushed:
+ /* Log is empty */
write_lock(&journal->j_state_lock);
journal->j_flags |= JBD2_FLUSHED;
write_unlock(&journal->j_state_lock);
and has still not landed in the stable-queue.
-
As I understood the lkml this patch is an alternative try to fix the issue and keep gains of earlier patches.
But we will see what lands in the queue. I bet Greg is going the most Konvervative way now his stable patchlevel release has been damaged and put to risk his reputation ....
[edit] ... I just see Greg is pushing another 44 files into the queue! Isn't everyone expecting Greg to fix just this ext4 issue timely? I expected an early release tomorrow to fix it. But this cannot happen with such a bulk. Perhaps Greg Kroah-Hartman isn't informed?
[edit2] This writing is due to a panic attack - the next day (2012-10-25) we all calmed down :)
-
noch ein text dazu:
http://www.heise.de/open/meldung/Ext4-Bug-gefaehrdet-Daten-1736310.html
-
unfortunatley Ted Tso doubts about his first analysis:
https://lkml.org/lkml/2012/10/24/535
-
Update: https://www.computerbase.de/news/2012-10/fehler-in-aktuellen-linux-kerneln/
greetz
devil
-
side note to the above mentioned article (computerbase.de - update):
siduction in good neighbourhood and a typo
... Weitere Fehlermeldungen von Betroffenen sind bisher nicht bekannt. Dies wäre aber zu erwarten, da Distributionen wie Fedora 7 oder Siduction die Kernel 2.6.2 und 2.6.3 bereits seit deren Erscheinen nutzen ...
-
Hmmmm. Have we now discovered that it is a bad idea to shut off a running Linux system with the power switch? That was the news in 1990! :lol:
-
I was paniced too :(
And Towo was right (as ever) with his patch finding ...
Could someone change the header of this thread, before this friday traffic in the forums:
"don't panic - unlikely"
-
The bug was triggered on a NAS, which constantly rebooted by issuing: reboot -f
The "-f" options skips a normal shutdown and clean unmount. The data loss then was due to a missed fsck.
Workaround:
This is only needed when normal shutdown with normal unmount was not possible. There are two possible ways:
a) when booting - at grub menu:
1. edit cmdline with key
e
2. at end of line "linux /boot/vmlinuz-3.6 ..." add
forcefsck
3. boot this grub entry now with
Ctrl-x
b) permanent change of grub.cfg
1. edit /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet ro forcefsck"
2. as root:
update-grub
... you need this when having serious issues with your system in the first place - when normal shutdown behavior is not possible ...
-
entwarung / almost all-clear signal
https://plus.google.com/117091380454742934025/posts/Wcc5tMiCgq7
http://www.heise.de/open/meldung/Ext4-Bug-Entwarnung-1736902.html
-
"ext4 revert: jbd2-don-t-write-superblock-when-if-its-empty.patch"
of linux-3.6.2 is Tytso official fix.
-
I use reiserfs :)
-
I use reiserfs :)
:shock:
Is it still being developed/maintained? I would worry about it on the new 3.x kernels.
-
Maybe openSUSE takes some care for reiserfs?
For reiser4 there is a patch for linux-3.6 , but the russian maintainer warned there is an old bug he couldn't solve. If you look at phoronix tests you see btrfs is better for most cases. And it is the same btree algo reiser4 has, and the same developer.
-
I will use reiserfs till it is dead ;)
My experience on 2 identical machines (dual-core, one has 4 gigs of ram and the other has 8, one uses ext4 and the other (main) uses resierfs) compiling is much faster, building siduction/aptosid is faster, the one that is really faster is building android (22 gigs). Moving files (mythtv movies etc) is also faster.
Not the best *benchmarks to test, I admit, but for me, reiserfs is faster in the way I compute and use my machine (no ssd here)
*maybe when I build my new system (8 cores, 16 gig of ram) I *might go with ext4, OTOH maybe not
-
piper: We have 2012. 16G RAM is for beginners. Please, do yourself a favor and give 32 G to your new machine :D
-
piper: We have 2012. 16G RAM is for beginners. Please, do yourself a favor and give 32 G to your new machine :D
+1
I think I will listen to that :)