Welcome, Guest. Please login or register.
Did you miss your activation email?

Author Topic:  Last du screwed my system - again  (Read 6820 times)

Offline clubex

  • User
  • Posts: 265
Last du screwed my system - again
« on: 2015/10/08, 22:44:08 »
The last du (containing listsystemd and xserver-xorg-core) has screwed my system. A journalctl -b -p err show lots of errors mostly relating to deleted services (mistimed?). Also my original /home partition appears not to be mounted yet the system boots (after a lo..ng time) to a sddm login screen and thence after another lo..mg time to the standard KDE wallpaper but minus the panel.

There is a bug #798097 on xserver-xorg-core which may affect login but I could still login. The hd seems to me to cpreforming correctly so I don't think it's a hardware failure. I'm tending towards a software bug causing a system wide failure.

Strange this also happened to me when the last upgrade to libsystemd occurred a few days ago. Could it be that systemd is now too important and too integrated into Debian that it  makes using sid no longer a practical proposition?

Has anyone else experienced this and perhaps have a solution?

Offline clubex

  • User
  • Posts: 265
Re: Last du screwed my system - again
« Reply #1 on: 2015/10/08, 23:55:12 »
For anyone interested:
investigating why home is not mounting at boot
Code: [Select]
df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             10M     0   10M   0% /dev
tmpfs           1.6G  1.2M  1.6G   1% /run
/dev/sda1        19G  7.1G   11G  40% /
tmpfs           4.0G  184K  4.0G   1% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           4.0G     0  4.0G   0% /sys/fs/cgroup
tmpfs           799M     0  799M   0% /run/user/119
tmpfs           799M     0  799M   0% /run/user/1000
note:no /home partition (/dev/sda2)

fstab ie OK
Code: [Select]
UUID=9bd3729b-c15c-48a8-8719-f83c6b97c387     /                    ext4         defaults,relatime,errors=remount-ro           0    1   
UUID=e9aa0c2d-6161-422f-875e-a1a6194491c7     /home           ext4         auto,users,rw,exec,relatime                   0    0   
UUID=40199a5b-aeeb-45eb-9f74-0911df8e7c22     none                 swap         sw                                            0    0   

mounting /dev/sda2 enables me gain access to the original /home. and it's files.

journalctl  -b -p err gives
Code: [Select]

- Logs begin at Sat 2015-09-26 10:03:25 BST, end at Thu 2015-10-08 22:32:35 BST. --
Oct 08 21:10:57 westfield2 systemd[1]: nfs-common.service: Job nfs-common.service/start deleted to break ordering cycle$
Oct 08 21:10:57 westfield2 systemd[1]: rpcbind.service: Job rpcbind.service/start deleted to break ordering cycle start$
Oct 08 21:10:59 westfield2 systemd[1]: rpcbind.service: Job rpcbind.service/start deleted to break ordering cycle start$
Oct 08 21:11:04 westfield2 kernel: k10temp 0000:00:18.3: unreliable CPU thermal sensor; monitoring disabled
Oct 08 21:11:04 westfield2 systemd[1]: nfs-common.service: Job nfs-common.service/start deleted to break ordering cycle$
Oct 08 21:11:05 westfield2 systemd[1]: rpcbind.target: Job rpcbind.target/start deleted to break ordering cycle startin$
Oct 08 21:11:05 westfield2 systemd[1]: rpcbind.service: Job rpcbind.service/start deleted to break ordering cycle start$
Oct 08 21:11:05 westfield2 systemd[1]: nfs-common.service: Job nfs-common.service/start deleted to break ordering cycle$
Oct 08 21:11:05 westfield2 systemd[1]: rpcbind.target: Job rpcbind.target/start deleted to break ordering cycle startin$
Oct 08 21:11:05 westfield2 systemd[1]: rpcbind.service: Job rpcbind.service/start deleted to break ordering cycle start$
Oct 08 21:11:16 westfield2 hp[1128]: io/hpmud/pp.c 627: unable to read device-id ret=-1
Oct 08 22:11:17 westfield2 hpfax[1129]: [1129]: error: Failed to create /var/spool/cups/tmp/.hplip
Oct 08 22:11:19 westfield2 systemd[1229]: Failed unmounting /sys/kernel/debug.
Oct 08 22:11:19 westfield2 systemd[1229]: Failed unmounting /dev/mqueue.
Oct 08 22:11:19 westfield2 systemd[1229]: Failed unmounting /run/user/119.
Oct 08 22:11:19 westfield2 systemd[1229]: Failed unmounting /dev/hugepages.
Oct 08 22:11:30 westfield2 colord-sane[1100]: io/hpmud/pp.c 627: unable to read device-id ret=-1
Oct 08 22:11:44 westfield2 sddm-helper[1172]: pam_systemd(sddm-greeter:session): Failed to create session: Connection t$
Oct 08 22:11:56 westfield2 systemd[3953]: Failed unmounting /dev/hugepages.
Oct 08 22:11:56 westfield2 systemd[3953]: Failed unmounting /sys/kernel/debug.
Oct 08 22:11:56 westfield2 systemd[3953]: Failed unmounting /run/user/119.
Oct 08 22:11:56 westfield2 systemd[3953]: Failed unmounting /dev/mqueue.
Oct 08 22:11:56 westfield2 systemd[3953]: Failed unmounting /run/user/1000.
Oct 08 22:12:21 westfield2 sddm-helper[3952]: pam_systemd(sddm:session): Failed to create session: Connection timed out
Oct 08 22:12:37 westfield2 pulseaudio[5626]: [pulseaudio] pid.c: Daemon already running.
Oct 08 22:14:19 westfield2 systemd[1]: Failed to start User Manager for UID 119.
Oct 08 22:14:56 westfield2 systemd[1]: Failed to start User Manager for UID 1000.
Oct 08 22:20:23 westfield2 systemd[30270]: Failed unmounting /run/user/0.
Oct 08 22:20:23 westfield2 systemd[30270]: Failed unmounting /.
Oct 08 22:20:53 westfield2 systemd[1]: Failed to start Clean up any mess left by 0dns-up.
Oct 08 22:21:12 westfield2 systemd[1075]: Failed unmounting /run/user/0.
Oct 08 22:21:12 westfield2 systemd[1075]: Failed unmounting /.
Oct 08 22:21:44 westfield2 systemd[1]: Failed to start Clean up any mess left by 0dns-up.
Oct 08 22:21:45 westfield2 systemd[2493]: Failed unmounting /run/user/119.
Oct 08 22:21:45 westfield2 systemd[2493]: Failed unmounting /dev/hugepages.
Oct 08 22:21:45 westfield2 systemd[2493]: Failed unmounting /dev/mqueue.
Oct 08 22:21:45 westfield2 systemd[2493]: Failed unmounting /sys/kernel/debug.
Oct 08 22:22:10 westfield2 sddm-helper[2490]: pam_systemd(sddm-greeter:session): Failed to create session: Connection t$
Oct 08 22:22:18 westfield2 systemd[4921]: Failed unmounting /sys/kernel/debug.
Oct 08 22:22:18 westfield2 systemd[4921]: Failed unmounting /dev/mqueue.
Oct 08 22:22:18 westfield2 systemd[4921]: Failed unmounting /run/user/1000.
Oct 08 22:22:18 westfield2 systemd[4921]: Failed unmounting /dev/hugepages.
Oct 08 22:22:18 westfield2 systemd[4921]: Failed unmounting /run/user/119.
Oct 08 22:22:43 westfield2 sddm-helper[4920]: pam_systemd(sddm:session): Failed to create session: Connection timed out
Oct 08 22:22:49 westfield2 pulseaudio[6468]: [pulseaudio] pid.c: Daemon already running.
Oct 08 22:22:49 westfield2 pulseaudio[6474]: [pulseaudio] pid.c: Daemon already running.
Oct 08 22:24:45 westfield2 systemd[1]: Failed to start User Manager for UID 119.
Oct 08 22:25:19 westfield2 systemd[1]: Failed to start User Manager for UID 1000.
Far too many error for my small brain to get around!
Any ideas anyone?
[/code]
CPU~Quad core AMD Phenom 9650 (-MCP-) speed/max~1150/2300 MHz Kernel~4.2.3-towo.1-siduction-amd64 x86_64 Up~40 min Mem~1569.1/7989.8MB HDD~250.1GB(64.9% used) Procs~194 Client~Shell inxi~2.2.28
[/code]
Edit: forgot to add the output of BLKID
Code: [Select]
/sbin/blkid                                                                                         
/dev/sda1: UUID="9bd3729b-c15c-48a8-8719-f83c6b97c387" TYPE="ext4" PARTUUID="0007a04f-01"                               
/dev/sda2: LABEL="Home" UUID="e9aa0c2d-6161-422f-875e-a1a6194491c7" TYPE="ext4" PARTUUID="0007a04f-02"                 
/dev/sda3: UUID="40199a5b-aeeb-45eb-9f74-0911df8e7c22" TYPE="swap" PARTUUID="0007a04f-03"
                                                                       
« Last Edit: 2015/10/09, 00:09:11 by clubex »

Offline ghstryder

  • User
  • Posts: 95
Re: Last du screwed my system - again
« Reply #2 on: 2015/10/09, 00:16:50 »
I don't have time right now to document everything, but my data files are on a second drive/partition that is not being mounted automatically. Nothing changed other than the my doing the DU.  UUID is still good in fstab (with blkid and ls -l /dev/disk/by-uuid/). I can mount the partition by clicking in Dolphin, Manually, and it will mount with mount -a.

Offline seasons

  • User
  • Posts: 269
Re: Last du screwed my system - again
« Reply #3 on: 2015/10/09, 01:27:45 »
I would try removing nfs-common package if you don't plan to use/read NFS volumes.

This is the closest Debian bug I could find: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=788191

Offline ghstryder

  • User
  • Posts: 95
Re: Last du screwed my system - again
« Reply #4 on: 2015/10/09, 02:06:51 »
Removing nfs-common did not change anything. Everything was fine first thing this morning I did an apt-get update and du - there were no new packages, no changes. (Eastern Time, US), it is now almost 8PM.

My data folders live on sdc3 (a conventional drive), they are linked to /home on sda2 (a SSD). Everything works, other than my desktop background and dropbox, those files are on the unmounted sdc3.

I did a du by going to tty1, becoming root and issuing systemctl isolate multi-user.target. Following the du which finished uneventfully I rebooted
Quote
Start-Date: 2015-10-08  12:42:19
Commandline: apt-get dist-upgrade
Upgrade: man-db:amd64 (2.7.3-1, 2.7.4-1), librsvg2-2:amd64 (2.40.10-1, 2.40.11-1), os-prober:amd64 (1.67, 1.68), libexiv2-14:amd64 (0.25-2, 0.25-2.1), xserver-xorg-input-all:amd64 (7.7+9, 7.7+12), libpam-systemd:amd64 (226-4, 227-1), xserver-xorg-core:amd64 (1.17.2-1.1, 1.17.2-3), udev:amd64 (226-4, 227-1), xserver-common:amd64 (1.17.2-1.1, 1.17.2-3), libspice-server1:amd64 (0.12.5-1.2, 0.12.5-1.3), libudev1:amd64 (226-4, 227-1), libmm-glib0:amd64 (1.4.10-1, 1.4.12-1), python-six:amd64 (1.9.0-5, 1.10.0-1), librsvg2-common:amd64 (2.40.10-1, 2.40.11-1), systemd-sysv:amd64 (226-4, 227-1), libcairomm-1.0-1v5:amd64 (1.10.0-1.2, 1.12.0-1), modemmanager:amd64 (1.4.10-1, 1.4.12-1), systemd:amd64 (226-4, 227-1), libsigc++-2.0-0v5:amd64 (2.6.1-1, 2.6.1-2), xserver-xorg:amd64 (7.7+9, 7.7+12), libnss-myhostname:amd64 (226-4, 227-1), libsystemd0:amd64 (226-4, 227-1), x11-common:amd64 (7.7+9, 7.7+12), python3-six:amd64 (1.9.0-5, 1.10.0-1), libpq5:amd64 (9.4.4-2, 9.4.5-1), exfat-utils:amd64 (1.2.1-1, 1.2.1-2)
End-Date: 2015-10-08  12:42:49

Upon reboot, the screen stayed black for an unusual amount of time before the sddm screen appeared. It still seems to be taking a long time, but I have never timed it. I get an error from Dropbox, since it can't find it's data.

Code: [Select]
cat /etc/fstab
# /etc/fstab: static file system information.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>

#Entry for /dev/sda1 :
UUID=f5236926-cfcd-4df4-9439-f26a29f1d6c9       /       ext4    noatime,commit=120,errors=remount-ro    0       1
#Entry for /dev/sda2 :
UUID=c84955dd-4a65-4bbf-aabb-0166f8af7b26       /home   ext4    defaults,relatime,errors=remount-ro     0       2
#Entry for /dev/sda3 :
UUID=4dabe4a5-4b70-4d93-a4ca-97ae840b1204       none    swap    sw      0       0
#Entry for /dev/sdb1 :
UUID=44228B67228B5D34   /disks/disk2part1       ntfs-3g defaults,auto,users,locale=en_US.utf8   0       0
#Entry for /dev/sdb2 :
UUID=EE96907496903ED1   /disks/disk2part2       ntfs-3g defaults,auto,users,locale=en_US.utf8   0       0
#Entry for /dev/sdc2 :
UUID=030cf640-699f-423f-98ea-f89e06ad9278       /disks/disk3part2       ext4    auto,users,rw,exec,relatime     0       0
#Entry for /dev/sdc3 :
UUID=0b8ed16b-f34f-4e98-aebd-5912ed524de4       /disks/disk3part3       ext4    auto,users,rw,exec,relatime     0       0

The UUID's have not changed.

Code: [Select]
# blkid /dev/sdc3
/dev/sdc3: LABEL="data" UUID="0b8ed16b-f34f-4e98-aebd-5912ed524de4" TYPE="ext4" PARTUUID="f5fda669-03"

# ls -l /dev/disk/by-uuid
total 0
lrwxrwxrwx 1 root root 10 Oct  8 19:41 0b8ed16b-f34f-4e98-aebd-5912ed524de4 -> ../../sdc3

As I stated earlier, I can open Dolphin, click the Data(sdc3) folder and it will mount after I provide the root password.
If I issue mount /dev/sdc3 it will mount, and if I issue mount -a all the sdc partitions mount.

At that point I can open Dropbox, change the background, etc. Everything seems to work fine unless I reboot.
« Last Edit: 2015/10/09, 02:11:03 by ghstryder »

Offline terroreek

  • User
  • Posts: 202
Re: Last du screwed my system - again
« Reply #5 on: 2015/10/09, 06:28:57 »
I just did a du, boot up is fine, but I found that boot up did take longer, and it took a bit longer for lightdm to start up. 

The one thing I did notice was that systemd-analyze is showing me that it took 3mins to boot up.  Now, usually its between 3-7 seconds.  systemd-analyze blame show the two big culprits are;

Code: [Select]
terroreek@darthvader ~ ❯❯❯ systemd-analyze blame                                                                  ⏎
      3min 312ms user@105.service
         30.232s systemd-rfkill.service

when I look at the user@105.service status (systemctl status user@105.service) I am seeing the following.

Code: [Select]
● user@105.service - User Manager for UID 105
   Loaded: loaded (/lib/systemd/system/user@.service; static; vendor preset: enabled)
   Active: failed (Result: timeout) since Fri 2015-10-09 00:14:57 EDT; 4min 27s ago
  Process: 3551 ExecStart=/lib/systemd/systemd --user (code=killed, signal=KILL)
 Main PID: 3551 (code=killed, signal=KILL)

Oct 09 00:13:27 darthvader systemd[3551]: Stopped target Default.
Oct 09 00:13:27 darthvader systemd[3551]: Stopped target Basic System.
Oct 09 00:13:27 darthvader systemd[3551]: Reached target Shutdown.
Oct 09 00:13:27 darthvader systemd[3551]: Stopped target Timers.
Oct 09 00:13:27 darthvader systemd[3551]: Stopped target Sockets.
Oct 09 00:13:27 darthvader systemd[3551]: Stopped target Paths.
Oct 09 00:14:57 darthvader systemd[1]: user@105.service: State 'stop-final-sigterm' timed out. Killing.
Oct 09 00:14:57 darthvader systemd[1]: Failed to start User Manager for UID 105.
Oct 09 00:14:57 darthvader systemd[1]: user@105.service: Unit entered failed state.
Oct 09 00:14:57 darthvader systemd[1]: user@105.service: Failed with result 'timeout'.

Also seeing the following for the systemd-rfkill.service
Code: [Select]
● systemd-rfkill.service - Load/Save RF Kill Switch Status
   Loaded: loaded (/lib/systemd/system/systemd-rfkill.service; static; vendor preset: enabled)
   Active: failed (Result: timeout) since Fri 2015-10-09 00:12:24 EDT; 12min ago
     Docs: man:systemd-rfkill.service(
  Process: 753 ExecStart=/lib/systemd/systemd-rfkill (code=killed, signal=TERM)
 Main PID: 753 (code=killed, signal=TERM)

Oct 09 00:11:54 darthvader systemd[1]: Starting Load/Save RF Kill Switch Status...
Oct 09 00:12:24 darthvader systemd[1]: systemd-rfkill.service: Start operation timed out. Terminating.
Oct 09 00:12:24 darthvader systemd[1]: Failed to start Load/Save RF Kill Switch Status.
Oct 09 00:12:24 darthvader systemd[1]: systemd-rfkill.service: Unit entered failed state.
Oct 09 00:12:24 darthvader systemd[1]: systemd-rfkill.service: Failed with result 'timeout'.


I am thinking its the systemd update.  -edit- sometimes boot up completely fails and I just get
Code: [Select]
systemd[1]: Failed to subscribe to NameOwnerChanged signal: Connection Timed out
over and over again. 

I believe this bug is the relevant bug; https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801354
« Last Edit: 2015/10/09, 07:03:21 by terroreek »

Offline seasons

  • User
  • Posts: 269
Re: Last du screwed my system - again
« Reply #6 on: 2015/10/09, 07:20:54 »
Do you folks have rpcbind package installed? (I don't and I'm not experiencing the issue). It may just be a symptom, but there are error messages related to rpcbind in the OP's output and in the bug report linked in the last post. I would try removing it if not needed.

Offline terroreek

  • User
  • Posts: 202
Re: Last du screwed my system - again
« Reply #7 on: 2015/10/09, 07:33:38 »
Do you folks have rpcbind package installed? (I don't and I'm not experiencing the issue). It may just be a symptom, but there are error messages related to rpcbind in the OP's output and in the bug report linked in the last post. I would try removing it if not needed.

I removed rpcbind and it didn't help. 

linearJim

  • Guest
Re: Last du screwed my system - again
« Reply #8 on: 2015/10/09, 08:21:13 »
so until we find the root cause for this we blame systemd? :P
why dont you try downgrading it? maybe xserver-xorg-core  too

Offline cs

  • User
  • Posts: 94
Re: Last du screwed my system - again
« Reply #9 on: 2015/10/09, 08:43:11 »
I ran into the same problems after the systemd upgrade and downgrading systemd from 227 back to 226-4 did the trick.

I got the 226-4 versions of libpam-systemd, libsystemd0, libsystemd0:i386 and systemd from debian snapshot http://snapshot.debian.org/

Hope it helps!

cs

linearJim

  • Guest
Re: Last du screwed my system - again
« Reply #10 on: 2015/10/09, 09:01:50 »
I ran into the same problems after the systemd upgrade and downgrading systemd from 227 back to 226-4 did the trick.

I got the 226-4 versions of libpam-systemd, libsystemd0, libsystemd0:i386 and systemd from debian snapshot http://snapshot.debian.org/

Hope it helps!

cs

nice..so putting it on hold for now
you can also find it in testing repos


Code: [Select]
systemd:
  Installed: 226-4
  Candidate: 227-1
  Version table:
     227-1 0
        500 http://debian.netcologne.de/debian/ unstable/main amd64 Packages
 *** 226-4 0
        500 http://debian.netcologne.de/debian/ testing/main amd64 Packages
        100 /var/lib/dpkg/status

Offline bluelupo

  • User
  • Posts: 2.068
    • BluelupoMe
Re: Last du screwed my system - again
« Reply #11 on: 2015/10/09, 09:16:46 »
Hi all,
I can also confirm the problems with the new version 227-1 of systemd. It is either /home is not mounted or the root filesystem mounted read-only when the system is booted.

Are fixed until the problems can you systemd put on hold with the following command:
Code: [Select]
# apt-mark hold systemd
« Last Edit: 2015/10/09, 09:22:25 by bluelupo »

Offline der_bud

  • User
  • Posts: 1.072
  • member
Re: Last du screwed my system - again
« Reply #12 on: 2015/10/09, 10:11:48 »
Probably related, perhaps some of you who are affected could add logs or information:
Debian Bug #801354 - libpam-systemd, boot and services fail with "Connection timed out"
Debian Bug #801361 - systemd, attempts to unmount filesystems at end of boot
Du lachst? Wieso lachst du? Das ist doch oft so, Leute lachen erst und dann sind sie tot.

Offline clubex

  • User
  • Posts: 265
Re: Last du screwed my system - again
« Reply #13 on: 2015/10/09, 11:45:31 »
Good to know I'm not the only one.

I'm into hospital this morning so hopefully I'll be able to try downgrading systemd some time this evening( providing the don't keep me in).

der_bud: Will do bearing in mind the above.

Offline musca

  • User
  • Posts: 725
  • sid, fly high!
Re: Last du screwed my system - again
« Reply #14 on: 2015/10/09, 12:56:03 »
Hello der_bud,

the debian team is working on a patched version that will adress the two issues:
systemd (227-2) UNRELEASED; urgency=medium
  * Revert "sd_pid_notify_with_fds: fix computing msg_controllen", it causes
    connection errors from various services on boot. (Closes: #801354)
  * debian/tests/boot-smoke: Check for failed unmounts. This reproduces
    #801361 (but not in a minimal VM, just in a desktop one).


The package isn't uploaded yet.

greetings
musca
„Es irrt der Mensch, solang er strebt.“  (Goethe, Faust)