Advanced search

Message boards : Number crunching : lost 4 WUs seconds after enabling gtx-770

Author Message
Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,392,037,468
RAC: 1,112,278
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49078 - Posted: 22 Feb 2018 | 14:13:55 UTC

On a system with 1070 TI and a lowly 770, I noticed a backlog of 3 gpugrid work units and the 770 was idle as milkyway had run dry. I edited the cc_config that prevented gpugrid from using 770 and rebooted. All 4 work units crashed seconds after BOINC started up. I didn't have time to suspend anything.

I have run gpugrid "longs" on 770s before, just really slow. I didn't expect this. Even the one that was 1/2 way done also died.

Anyway, I can now remove the 1070 TI from this core 2 quad and put it in my new Area51 system. No longer have to wait until the WUs finish to get started with the upgrade. They just didn't finish the way I had hoped.

So why did this happen?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2346
Credit: 16,293,515,968
RAC: 5,831,839
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49079 - Posted: 22 Feb 2018 | 19:59:54 UTC - in response to Message 49078.
Last modified: 22 Feb 2018 | 20:00:24 UTC

So why did this happen?
This is a known problem of the present GPUGrid CUDA8.0 app (v9.18). CC3.0 cards (like the GTX 770) will produce this error if the host has multiple GPUs with different Compute Capabilities. (Your GTX 1070 Ti is CC6.1). You should disable the GTX 770, or move it to a different PC to avoid this.

See the App update 17 April 2017 thread
and the middle of the all WUs downloaded recently produce "computation error" right away thread.

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,392,037,468
RAC: 1,112,278
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49099 - Posted: 25 Feb 2018 | 16:22:18 UTC

I didn't know that, all I remember was I used to be able to use gtx 770, they were just slow. I missed that thread back in April about the problem.

I removed the gtx 1070 and put it into a system that had two others. Had to use liquid cooling as it was in a bad location with little air flow.


I found a new problem that was also unexpected. I switched this system from VNC to Splashtop as I need to be able to access it from out of town and that feature is part of the Splashtop package I am using.


The first time I brought up splashtop it crashed the gpugrid workunits that were on the two gtx 1070s that were connected to monitors. The one I had just added was not on a monitor and it continued running its gpugrid WU just fine. I have never seen this before and I have used Splashtop for several years. I use VNC as well to get around the 5 free pc limit.

I suspect I should have stopped BOINC from running before installing Splashtop. I assume that gpugrid "registered" for as much vram as it could and nothing was left over for Splashtop but that is just a guess.

I wonder if it would be usefull to run gpugrid with CPU=1 and GPU=0.9 or something like that. I have not have to do anything like that before and this is the first time I have seen a conflict with Splashtop and CUDA

mmonnin
Send message
Joined: 2 Jul 16
Posts: 337
Credit: 7,711,257,660
RAC: 10,801,436
Level
Tyr
Scientific publications
watwatwatwatwat
Message 49100 - Posted: 25 Feb 2018 | 16:40:32 UTC - in response to Message 49099.

If splashtop is like Microsoft RDP it loads a separate driver to display the remote desktop which no longer has compute capability to crunch tasks. Could be the problem. Teamviewer is fine though.

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,392,037,468
RAC: 1,112,278
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49108 - Posted: 26 Feb 2018 | 15:08:08 UTC - in response to Message 49100.

If splashtop is like Microsoft RDP it loads a separate driver to display the remote desktop which no longer has compute capability to crunch tasks. Could be the problem. Teamviewer is fine though.


Yes, RDP has a problem as I discovered years ago and switched to VNC first and then splashtop.

Just happened again, this time I lost 6 einstein tasks, 3 on each of the two 1070s connected to monitors.

I uninstalled splashtop but is is based on vnc which I had originally and there was no problem with vnc. I can look into teamviewer when I get back into town but vnc ran fine on this system for the whole 3 weeks I have had it.

I have other older systems with nVidia cards, gtx 670 & 770 and 1060 & 2 other 1070 and they don't have a problem with splashtop. Some of them might be running splashtops "mirror driver" as I was asked if I wanted to install a mirror driver on at least 1 system as I recall.

OK, I just splashtop'ed to my movies server that has a 1070 and it did not crash collatz (gpugrid has run dry for me). That system does not have the mirror driver and I access it almost daily to record or play movies and have never seen a crashed CUDA wu after loging in..... AND just went to my network storage system which has a gtx1060 that is running a gpugrid WU and there is no problem. It did not have the mirror driver either. Both of these systems go through an HDMI switch to a TV. My other systems have only an HDMI dummy load.

I cannot account for why this new Dell system, the only one I have ever bought assembled, cannot handle a splashtop login w/o losing work on the gpu's handleing the monitors. Something is misconfigured. They all run win10x64 and I will look into this in a week or so.

Post to thread

Message boards : Number crunching : lost 4 WUs seconds after enabling gtx-770

//