Message boards : Graphics cards (GPUs) : Eight compute errors in a row
Author | Message |
---|---|
So, after finally getting the server to send me work, the next units go down with compute errors. They all failed in <5-10 seconds after starting and gave me an on-screen error message. I didn't get a chance to copy it down but an exe was failing to start. The units came from a few different subprojects (KASHIF, IBUCH, etc.). After running so smoothly for over a month it's weird to suddenly have so many problems. I'll try to capture the error when the server tries to send more work units. It won't now saying i exceeded my quota and that the server has no work. | |
ID: 10177 | Rating: 0 | rate: / Reply Quote | |
So, after finally getting the server to send me work, the next units go down with compute errors. They all failed in <5-10 seconds after starting and gave me an on-screen error message. I didn't get a chance to copy it down but an exe was failing to start. The units came from a few different subprojects (KASHIF, IBUCH, etc.). After running so smoothly for over a month it's weird to suddenly have so many problems. I'll try to capture the error when the server tries to send more work units. It won't now saying i exceeded my quota and that the server has no work. At the moment there are only three GPU Grid, SaH, and SaH Beta that have a "steady" supply of work. The Lattice Project, Ramsey, Aqua, and Milky Way are all in the start up phase with unknown work availability. Aqua just came on line in the last 24 hours. I have not tried them yet but they have been having problems so don't know their status. Milky Way is just getting going so they do not yet have work. The Lattice Project always has been intermittent with work and they had a limited CUDA test but I don't know if they are issuing work at this time. Einstein said they were on the verge of a CUDA release, but, have not said anything positive about it in a month or so ... | |
ID: 10186 | Rating: 0 | rate: / Reply Quote | |
Well there is another project "folding@home" , they make their own application and runs on many videocards and does support also smp meaning 4 cpu can work on 1 unit, i am not sure if the gpu-client also uses the cpu's but it probably will. | |
ID: 10189 | Rating: 0 | rate: / Reply Quote | |
Yeah, and the last time I tried to install the GPU verison of the Folding At Home client (few weeks ago) it was an impressively complicated install. I've been using computers since I was 4 so I know I can do it, but the effort involved and the amount of changes made to my computer, and this rig is actually important for some other things, means I"m not gonna do it. | |
ID: 10195 | Rating: 0 | rate: / Reply Quote | |
Yeah, and the last time I tried to install the GPU verison of the Folding At Home client (few weeks ago) it was an impressively complicated install. I've been using computers since I was 4 so I know I can do it, but the effort involved and the amount of changes made to my computer, and this rig is actually important for some other things, means I"m not gonna do it. Youwill likely get that message for 24 hours ... try Aqua them may have their issues worked out. I have not tried it myself, but, you may have nothing to lose but some time ... There are a couple threads on the boards about the CUDA experience there though they don't have much in them yet ... be the first ... start a fashion ... :) | |
ID: 10199 | Rating: 0 | rate: / Reply Quote | |
anthonmg, | |
ID: 10207 | Rating: 0 | rate: / Reply Quote | |
anthonmg, | |
ID: 10210 | Rating: 0 | rate: / Reply Quote | |
Sorry about that. This was originally a thread about a related problem to the main one, but when the second problem croped up, seemingly unrelated to the first, I started a new thread on it, and got different help in each one. It's finally working. | |
ID: 10217 | Rating: 0 | rate: / Reply Quote | |
Glad to hear the problem is solved! | |
ID: 10233 | Rating: 0 | rate: / Reply Quote | |
I have one GPU that has a runtime error every week or so. It kills one running WU then kills the next 5 or 6. After a reboot, it runs fine for the next week. | |
ID: 10237 | Rating: 0 | rate: / Reply Quote | |
I have one GPU that has a runtime error every week or so. It kills one running WU then kills the next 5 or 6. After a reboot, it runs fine for the next week. If you raised the OC, lower it ... check the fans for dirt, check running temps, make sure you are running one of the "approved" versions of the drivers for your OS ... check for viruses and malware ... and not to be too flip, reboot the machine every three days ... :) | |
ID: 10239 | Rating: 0 | rate: / Reply Quote | |
This one GPU does run hotter than my other 3, but I have increased the fan to 90%. | |
ID: 10242 | Rating: 0 | rate: / Reply Quote | |
I checked one of your hosts and it gets quite some errors. It's running at 1.40 GHz, whereas standard is 1.25 GHz. Could be clock speed.. keep in mind that individual chips and their frequency / temperature headroom are different and they degrade over time. | |
ID: 10272 | Rating: 0 | rate: / Reply Quote | |
This MSI card is factory OC'ed. I have not done any overclocking myself. In fact, EVGA Precision shows it at less than MSI claims, 648 to 655 respectively. | |
ID: 10283 | Rating: 0 | rate: / Reply Quote | |
This one GPU does run hotter than my other 3, but I have increased the fan to 90%. outlnder, since that card is running hotter than your other 3 MSI factory OCed GTX 260s it most likely has a problem, maybe a heatsink that isn't getting good contact. Since it's a very new card you may want to consider an RMA. Have you tried it on a different machine (not that you have many to spare :-) Edit: It looks like you might have already swapped it with a different card since the failing one is listed as your oldest client. Personally I'd RMA it. | |
ID: 10304 | Rating: 0 | rate: / Reply Quote | |
The last errored out WU also errored the Docking WU's being done by the CPU. This tends to tell me that it isn't the GPU that is causing this problem. I will continue to watch it, but I think it may be the OS causing the errors. | |
ID: 10322 | Rating: 0 | rate: / Reply Quote | |
The last errored out WU also errored the Docking WU's being done by the CPU. This tends to tell me that it isn't the GPU that is causing this problem. I will continue to watch it, but I think it may be the OS causing the errors. That may be a very useful observation. Swap GPUs with a system that works. If the errors travel with the GPU it looks like hardware (clockspeed / temperature), but if the same machine erros out than it's not the GPU and an RMA won't help. People frequently blame the OS if anything goes wrong.. but mostly that's not the reason. There could be file corruption or some dodgy driver installation, but there's also CPUs overclocked too much and defect memory or, much more common: RAM set to wrong timings, either by the user during OC or the bios in automatic mode. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 10502 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : Eight compute errors in a row