The original post: /r/nvidia by /u/Ok_Cartographer_6086 on 2024-12-20 01:47:19.
I am not a nut.
I posted here a while back that I could predict my 4090 was about to crash when the screen flickered but if I could go into the NVIDIA settings and see it as >= 42C I had to manually enable GPU fan settings to 100% to cool it down and set powermiser to prefer max performance. I also had frequent segmentation faults - usually when running something important to me like an IDE but could game at 260hrz for hours… OS and driver independent - tried all combinations of Windows, Ubuntu, Debian and “don’t break Debian” drivers or the run scripts.
I understand my post was relatively dismissed because 42C isn’t that hot for these chips. After a lot of really deep dives into my system I was convinced there was an issue with the physical unit and it was definitely not CPU / RAM / MB / OS / Driver related.
I pulled off the fans and radiator and replaced the entire cooling system with liquid coolant. First thing I noticed was the temp settling in around 60C being idle. I’ve never seen the temp that high before but things are really stable (ubuntu).
Before this the magic number was always 42C before it crashed. (I’m not sure if there’s any significance or meaning to that number, 6x9 maybe? :)
I think there was something wrong with my temp sensor, maybe I was way higher than 42(! if you know you know) and taking the cooling job away from the GPU and giving it to the water cooler was the fix.
What’s your high end machines resting thermal temp? 63C right now just on reddit but it’s a new cooling system so I need to still add some radiators.