[SOLVED] Nvidia graphics, xorg memory leak since kernel 4.19.2

Started by dibl, 2018/12/01, 00:15:43

Previous topic - Next topic

dibl

I have 4 siduction systems, only 1 has Nvidia graphics -- the others are all Intel graphics.  The one with Nvidia graphics is using the nvidia-driver.

On the system with Nvidia graphics, I have been seeing a serious memory leak since at least the 4.19.2 kernel -- possibly it started earlier.  The culprit is xorg.  The platform has 32 G of memory, and if I let it run overnight, in the morning it will be starting to swap, and the problem process is xorg.

Does anyone else see this issue?

System76 Oryx Pro, Intel Core i7-11800H, ASRock B860 Pro-A, Intel Core Ultra 7 265KF, Nvidia GTX-1060, SSD 990 EVO Plus.

piper

Which nvidia driver are you using?

I cannot reproduce this with experimental 396.54-2

The only thing I can find is

https://devtalk.nvidia.com/default/topic/1043938/nvidia-geforce-4xx-series-drivers-segfault-with-kernel-4-19-x/

But not related here (driver wise)

I have a Lucky Rabbit:    "Svoot" ..... (It's Swedish)

I am MAGA

dibl

don@dibl-patience:~$ inxi -G
Graphics:
  Device-1: NVIDIA GP106 [GeForce GTX 1060 6GB] driver: nvidia v: 390.87
  Display: x11 server: X.Org 1.20.3 driver: nvidia
  resolution: 1920x1200~60Hz, 1920x1080~60Hz
  OpenGL: renderer: llvmpipe (LLVM 7.0 256 bits) v: 3.3 Mesa 18.2.6



Is that v. 396 driver in the experimental repo?
System76 Oryx Pro, Intel Core i7-11800H, ASRock B860 Pro-A, Intel Core Ultra 7 265KF, Nvidia GTX-1060, SSD 990 EVO Plus.

dibl

System76 Oryx Pro, Intel Core i7-11800H, ASRock B860 Pro-A, Intel Core Ultra 7 265KF, Nvidia GTX-1060, SSD 990 EVO Plus.

piper

I have a Lucky Rabbit:    "Svoot" ..... (It's Swedish)

I am MAGA

piper

07:29:04 PM-Fri Nov 30-[piper@x1] $ inxi -G
Graphics:  Device-1: NVIDIA GM204 [GeForce GTX 970] driver: nvidia v: 396.54
           Display: x11 server: X.Org 1.20.3 driver: nvidia unloaded: fbdev,modesetting,nouveau,vesa
           resolution: 1920x1080~60Hz
           OpenGL: renderer: GeForce GTX 970/PCIe/SSE2 v: 4.6.0 NVIDIA 396.54
I have a Lucky Rabbit:    "Svoot" ..... (It's Swedish)

I am MAGA

dibl

OK, fully updated, Torsten's latest kernel installed, and the 396 Nvidia driver from experimental.

System:
  Host: dibl-patience Kernel: 4.19.5-towo.3-siduction-amd64 x86_64 bits: 64
  Desktop: KDE Plasma 5.14.3 Distro: siduction 18.3.0 Patience - kde - (201805132121)
Machine:
  Type: Desktop Mobo: ASUSTeK model: ROG STRIX X299-E GAMING v: Rev 1.xx
  serial: <filter> UEFI: American Megatrends v: 1401 date: 05/21/2018
CPU:
  Topology: Quad Core model: Intel Core i7-7740X bits: 64 type: MT MCP
  L2 cache: 8192 KiB
  Speed: 3668 MHz min/max: 800/4500 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 4: 800
  5: 800 6: 800 7: 800 8: 800
Graphics:
  Device-1: NVIDIA GP106 [GeForce GTX 1060 6GB] driver: nvidia v: 396.54
  Display: x11 server: X.Org 1.20.3 driver: nvidia
  resolution: 1920x1200~60Hz, 1920x1080~60Hz
  OpenGL: renderer: llvmpipe (LLVM 7.0 256 bits) v: 3.3 Mesa 18.2.6
Audio:
  Device-1: Intel 200 Series PCH HD Audio driver: snd_hda_intel
  Device-2: NVIDIA GP106 High Definition Audio driver: snd_hda_intel
  Device-3: Logitech HD Pro Webcam C920 type: USB driver: snd-usb-audio,uvcvideo
  Sound Server: ALSA v: k4.19.5-towo.3-siduction-amd64
Network:
  Device-1: Intel Ethernet I219-V driver: e1000e
  IF: enp0s31f6 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IF-ID-1: br0 state: up speed: N/A duplex: N/A mac: <filter>
Drives:
  Local Storage: total: 2.27 TiB used: 910.28 GiB (39.1%)
  ID-1: /dev/sda vendor: Samsung model: SSD 860 EVO 500GB size: 465.76 GiB
  ID-2: /dev/sdb vendor: Western Digital model: WD1003FZEX-00K3CA0 size: 931.51 GiB
  ID-3: /dev/sdc vendor: Western Digital model: WD1003FZEX-00K3CA0 size: 931.51 GiB
Partition:
  ID-1: / size: 58.81 GiB used: 10.19 GiB (17.3%) fs: ext4 dev: /dev/sda2
  ID-2: /home size: 393.18 GiB used: 51.12 GiB (13.0%) fs: ext4 dev: /dev/sda4
  ID-3: swap-1 size: 5.00 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/sda3
Sensors:
  System Temperatures: cpu: 35.0 C mobo: N/A gpu: nvidia temp: 33 C
  Fan Speeds (RPM): cpu: 0 gpu: nvidia fan: 0%
Info:
  Processes: 265 Uptime: 2m Memory: 31.33 GiB used: 1.72 GiB (5.5%) Shell: bash
  inxi: 3.0.27


I will let it run overnight and see what I have in the morning.  Thanks, piper!
System76 Oryx Pro, Intel Core i7-11800H, ASRock B860 Pro-A, Intel Core Ultra 7 265KF, Nvidia GTX-1060, SSD 990 EVO Plus.

piper

I have a Lucky Rabbit:    "Svoot" ..... (It's Swedish)

I am MAGA

towo

driver: nvidia v: 396.54
OpenGL: renderer: llvmpipe (LLVM 7.0 256 bits) v: 3.3 Mesa 18.2.6


You see the problem?
Ich gehe nicht zum Karneval, ich verleihe nur manchmal mein Gesicht.

piper

I'm wondering if he has a

nvidia-blacklists-nouveau.conf

in

/etc/modprobe.d


I always just do a

apt purge xserver-xorg-video-nouveau

But, I don't recommend that to newer users
I have a Lucky Rabbit:    "Svoot" ..... (It's Swedish)

I am MAGA

dibl

Thanks towo -- I was a little too fast with the driver from experimental, and forgot the mesa packages.  Does this look better?

Graphics:
  Device-1: NVIDIA GP106 [GeForce GTX 1060 6GB] driver: nvidia v: 396.54
  Display: x11 server: X.Org 1.20.3 driver: nvidia
  resolution: 1920x1200~60Hz, 1920x1080~60Hz
  OpenGL: renderer: llvmpipe (LLVM 7.0 256 bits) v: 3.3 Mesa 18.3.0-rc5


don@dibl-patience:/$ glxinfo | grep OpenGL
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: llvmpipe (LLVM 7.0, 256 bits)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 18.3.0-rc5
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 18.3.0-rc5
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 18.3.0-rc5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:



I do have the nouveau driver blacklisted and the package is not installed -- thanks piper.  Now that I fixed the OpenGLX version situation, it will take some time to see if the memory issue is fixed.
System76 Oryx Pro, Intel Core i7-11800H, ASRock B860 Pro-A, Intel Core Ultra 7 265KF, Nvidia GTX-1060, SSD 990 EVO Plus.

piper

I question the "blacklist thing" only because what I read at opensuse to a few people ;)
I have a Lucky Rabbit:    "Svoot" ..... (It's Swedish)

I am MAGA

towo

OpenGL: renderer: llvmpipe (LLVM 7.0 256 bits) v: 3.3 Mesa 18.3.0-rc5
No, that's even wrong!
Your path to the right glx lib seems to be wrong. After a look in the changelog of the nvidia driver, there was a packaging error which sould be fixed in 296.87-3

Quote
Revert move of libglx.so to /usr/lib/nvidia/current/modules.

I switched to ATI, so i can't investigate fourther.
Ich gehe nicht zum Karneval, ich verleihe nur manchmal mein Gesicht.

piper

What I would do first

apt purge nvidia.

Yes, use the period  .

apt update; apt install nvidia-driver nvidia-driver-libs nvidia-settings nvidia-smi nvidia-kernel-dkms -t experimental

If you don't need/use thermal monitors you don't have to include nvidia-smi

Post back if this don't work
I have a Lucky Rabbit:    "Svoot" ..... (It's Swedish)

I am MAGA

dibl

Great -- thanks guys.  I knew something is wrong -- xorg continues to slowly suck up memory, even when no user packages are running.  I will try these ideas.
System76 Oryx Pro, Intel Core i7-11800H, ASRock B860 Pro-A, Intel Core Ultra 7 265KF, Nvidia GTX-1060, SSD 990 EVO Plus.