Advanced search

Message boards : Number crunching : Blue Screens

Author Message
tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32477 - Posted: 29 Aug 2013 | 16:53:10 UTC

Help!

I've been at this all day. After booting I get many "Display driver stopped responding and has recovered". After six or seven of these I get a blue screen with stop code X00000116. Googling tells me it may be my GTX 660. I replaced it with my GTX 460. Still the same.

If I turn off BOINC al,l is sweetness and light!

What to do??


flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 32485 - Posted: 29 Aug 2013 | 18:22:46 UTC - in response to Message 32477.

You should upgrade to the latest beta driver 326.80, you won't be able to do the new tasks with out it and the drivers your using are buggy.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32486 - Posted: 29 Aug 2013 | 18:28:58 UTC - in response to Message 32485.

You should upgrade to the latest beta driver 326.80, you won't be able to do the new tasks with out it and the drivers your using are buggy.


Can you give me a link to that beta driver? Thanks.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32495 - Posted: 29 Aug 2013 | 19:00:53 UTC - in response to Message 32486.

Just go to nVidia.com, the normal driver search offers it just fine.

MrS
____________
Scanning for our furry friends since Jan 2002

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 32496 - Posted: 29 Aug 2013 | 19:03:49 UTC - in response to Message 32486.

Use the drop downs to pick out your hardware and OS.

http://www.geforce.com/drivers

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32521 - Posted: 30 Aug 2013 | 6:18:24 UTC - in response to Message 32496.

Use the drop downs to pick out your hardware and OS.

http://www.geforce.com/drivers

Got it! Thanks.

I installed 326.80. No change. With BOINC inactive everything works fine. With it active I get a blue screen one minute into every completed boot.

The active WU, which I've had for pushing two days, is here.

Should I try another GPU project, to see if it's a GPUGrid problem?

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32522 - Posted: 30 Aug 2013 | 7:15:48 UTC

Looks like I'm out of the woods :)

I suspended GPUGrid, got a POEM which finished and gave credit.

I aborted the active NATHAN and got another. It's been running for 15 minutes with no hiccough.

Bewildering!

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32533 - Posted: 30 Aug 2013 | 9:01:30 UTC - in response to Message 32522.

The present situation is,
Not much work around, Lots of testing (8.02 in the short queue), NOELIA WU's in the Beta queue, and lots of posts.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Danthro Krose
Send message
Joined: 25 Jan 12
Posts: 1
Credit: 22,134,478
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 32650 - Posted: 3 Sep 2013 | 1:51:46 UTC - in response to Message 32521.

Also been getting the same error lately on GPUGrid projects, including the BSOD

GTX 570

localizer
Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32651 - Posted: 3 Sep 2013 | 5:37:24 UTC

.... I get the 'driver stopped responding error', recovery and my system will eventually bluescreen - this is with the 326.80 drivers.

I believe, in my case it is only happening with the Noelia WUs.

For now, I abort the offending WU, however my system is never quite the same until after a full reboot - not to mention my Raid array is caused to verify & repair.

Going to give these WUs a wide birth for the time being - happily crunching Nathans & Santi at the moment....

werdwerdus
Send message
Joined: 15 Apr 10
Posts: 123
Credit: 1,004,473,861
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32669 - Posted: 4 Sep 2013 | 4:52:51 UTC

here's a tip for aborting a work unit that crashes or freezes your pc:

start the pc in safe mode, BOINC will see no gpu available so it can't start the work unit. then you can easily abort the work unit without worrying about a freeze or crash. reboot to normal mode and request a new task.
____________
XtremeSystems.org - #1 Team in GPUGrid

nanoprobe
Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 32751 - Posted: 5 Sep 2013 | 21:58:21 UTC - in response to Message 32522.

Looks like I'm out of the woods :)

I suspended GPUGrid, got a POEM which finished and gave credit.

I aborted the active NATHAN and got another. It's been running for 15 minutes with no hiccough.

Bewildering!

116 BSOD error is most often low IOH (NorthBridge) voltage. It's most common GPU issue is when running multi-GPU/overclocking GPU.
Hope this helps.

Dave_In_Oz
Send message
Joined: 13 Jul 09
Posts: 32
Credit: 287,042,950
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32820 - Posted: 7 Sep 2013 | 12:03:50 UTC

I have had similar errors with display driver aborts on one of my Win 7 systems with 2 x GTX 670 cards, leads to machine locking up completely. Only get the aborts and lock up when BOINC is running.

Have tried resetting project (GPUgrid), moving between WHQL and Beta drivers and also tried different combos of BOINC client (latest beta and general release).

Got a output file absent happening on my other Win7 system with two x GTX770's

Not sure if they are related. Reset project has not corrected.

Any ideas?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32821 - Posted: 7 Sep 2013 | 13:33:59 UTC - in response to Message 32820.
Last modified: 7 Sep 2013 | 13:35:03 UTC

I have had similar errors with display driver aborts on one of my Win 7 systems with 2 x GTX 670 cards, leads to machine locking up completely. Only get the aborts and lock up when BOINC is running.

Is your power supply sufficient for two GTX 670 cards?
I would try running just one of them to see if it solves the problem.

Dave_In_Oz
Send message
Joined: 13 Jul 09
Posts: 32
Credit: 287,042,950
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32824 - Posted: 8 Sep 2013 | 0:29:17 UTC - in response to Message 32821.

I hope so, it is 1100 Watts :)

On the system with the dual 670's I ran Windows Install - upgrade last night and it appears to be completing CUDA 5 tasks now.

Might try the same on the one with dual 770's

juan BFP
Send message
Joined: 11 Dec 11
Posts: 21
Credit: 145,887,858
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 32825 - Posted: 8 Sep 2013 | 0:46:34 UTC

I use in a 770+670 (No OC) host NVidia 326.80 driver, with and I5-2310 with a Real 750W PSU and works fine.

When you install the NVdriver you remember to do a clean instalation?


____________

Rick A. Sponholz
Avatar
Send message
Joined: 20 Jan 09
Posts: 52
Credit: 2,518,707,115
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32840 - Posted: 8 Sep 2013 | 16:50:43 UTC - in response to Message 32477.

Help!

I've been at this all day. After booting I get many "Display driver stopped responding and has recovered". After six or seven of these I get a blue screen with stop code X00000116. Googling tells me it may be my GTX 660. I replaced it with my GTX 460. Still the same.

If I turn off BOINC al,l is sweetness and light!

What to do??




I was having the same problem with one of my 5 GPU equipped machines. GPUGRID was the only project causing the blue screens. After dozens of failures at finding the cause (sorry for all the failed wu's), I un-installed one of my extra fans to lower the draw of amps from my power supply, and the problem went away. I would suggest you look into the size of your power supply, especially if you're in a warm area, and your computer fans are running overtime. Hope this helps, Rick
____________

Operator
Send message
Joined: 15 May 11
Posts: 108
Credit: 297,176,099
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33255 - Posted: 29 Sep 2013 | 14:55:58 UTC

I have a system with two GTX 590's in it on water.

A few months ago it had a PSU failure under warranty and I let the system sit for months while other things took priority. I got around to fixing the system and updated all the drivers, etc.

I was running the latest 327.23 driver and was experiencing BSODs repeatedly.

Always HEX 116 code with memory dump, etc.

I decided to investigate and downloaded the MS debugging tools.

When I ran the debugger on the latest memory dump it pointed to nvlddmkm.sys as the culprit.

Here's the output of the debugger (sorry this is so long):

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: fffffa80530b14e0, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff88005c3c6b4, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 000000000000000d, Optional internal context dependent data.

Debugging Details:
------------------


FAULTING_IP:
nvlddmkm+14d6b4
fffff880`05c3c6b4 803d4e3e6a0000 cmp byte ptr [nvlddmkm!nvDumpConfig+0x1c1149 (fffff880`062e0509)],0

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT

BUGCHECK_STR: 0x116

PROCESS_NAME: System

CURRENT_IRQL: 0

ANALYSIS_VERSION: 6.3.9431.0 (debuggers(dbg).130615-1214) amd64fre

STACK_TEXT:
fffff880`027d2758 fffff880`0693e054 : 00000000`00000116 fffffa80`530b14e0 fffff880`05c3c6b4 00000000`00000000 : nt!KeBugCheckEx
fffff880`027d2760 fffff880`0693dddb : fffff880`05c3c6b4 fffffa80`530b14e0 fffffa80`53dd8300 fffffa80`28379010 : dxgkrnl!TdrBugcheckOnTimeout+0xec
fffff880`027d27a0 fffff880`0680ff13 : fffffa80`530b14e0 00000000`00000000 fffffa80`53dd8300 fffffa80`28379010 : dxgkrnl!TdrIsRecoveryRequired+0x21f
fffff880`027d27d0 fffff880`0683ded6 : 00000000`ffffffff 00000000`0000b9ef fffff880`027d2930 00000000`00000102 : dxgmms1!VidSchiReportHwHang+0x40b
fffff880`027d28b0 fffff880`06839e21 : fffffa80`295b8000 ffffffff`feced300 fffffa80`542d9c20 fffff880`0681e9d3 : dxgmms1!VidSchWaitForCompletionEvent+0x196
fffff880`027d28f0 fffff880`06839fd9 : fffffa80`295b8000 fffffa80`28379010 fffffa80`295b8d00 00000000`00000000 : dxgmms1!VidSchiWaitForCompletePreemption+0x7d
fffff880`027d29e0 fffff880`06838eb8 : 00000000`00000014 00000000`00002624 fffffa80`542d9c20 fffffa80`28379010 : dxgmms1!VidSchiSendToExecutionQueueWithWait+0x171
fffff880`027d2ae0 fffff880`06838514 : fffff880`038b6400 fffff880`06837f00 fffffa80`00000000 fffffa80`00000004 : dxgmms1!VidSchiSubmitRenderCommand+0x920
fffff880`027d2cd0 fffff880`06838012 : 00000000`00000000 fffffa80`53dd8300 00000000`00000080 fffffa80`28379010 : dxgmms1!VidSchiSubmitQueueCommand+0x50
fffff880`027d2d00 fffff800`03517bae : 00000000`057eabfe fffffa80`28e19b50 fffffa80`24ffa040 fffffa80`28e19b50 : dxgmms1!VidSchiWorkerThread+0xd6
fffff880`027d2d40 fffff800`0326a8c6 : fffff880`038b1180 fffffa80`28e19b50 fffff880`038bc4c0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`027d2d80 00000000`00000000 : fffff880`027d3000 fffff880`027cd000 fffff880`0ede8d70 00000000`00000000 : nt!KxStartSystemThread+0x16


STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
nvlddmkm+14d6b4
fffff880`05c3c6b4 803d4e3e6a0000 cmp byte ptr [nvlddmkm!nvDumpConfig+0x1c1149 (fffff880`062e0509)],0

SYMBOL_NAME: nvlddmkm+14d6b4

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nvlddmkm

IMAGE_NAME: nvlddmkm.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 52314e10

FAILURE_BUCKET_ID: X64_0x116_IMAGE_nvlddmkm.sys

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:x64_0x116_image_nvlddmkm.sys

FAILURE_ID_HASH: {1f9e0448-3238-5868-3678-c8e526bb1edc}

Followup: MachineOwner
---------

14: kd> lmvm nvlddmkm
start end module name
fffff880`05aef000 fffff880`065e6000 nvlddmkm (export symbols) nvlddmkm.sys
Loaded symbol image file: nvlddmkm.sys
Image path: \SystemRoot\system32\DRIVERS\nvlddmkm.sys
Image name: nvlddmkm.sys
Timestamp: Thu Sep 12 00:16:00 2013 (52314E10)
CheckSum: 00AC9E83
ImageSize: 00AF7000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4

So I've gone back to the 314.22 driver as a safeguard and we'll see how that goes. I'm looking for stability more than speed at this point. No overclocking.

Operator

____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33268 - Posted: 29 Sep 2013 | 20:12:39 UTC - in response to Message 33255.

What I'd try first:
- check and clean internals, if neccessary
- remove one GPU
- if it works, switch them
- if it works, put both back in and lower GPU clocks and voltages, first slightly then significantly
- try other projects

MrS
____________
Scanning for our furry friends since Jan 2002

Matt
Avatar
Send message
Joined: 11 Jan 13
Posts: 216
Credit: 846,538,252
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33272 - Posted: 30 Sep 2013 | 3:24:33 UTC
Last modified: 30 Sep 2013 | 3:27:48 UTC

I had one of these errors today. This WU caused my driver to repeatedly fail and recover only to fail immediately again.

http://www.gpugrid.net/workunit.php?wuid=4807784

As soon as I was able to abort in between screen black-outs, the problem went away. I don't believe power is an issue as I'm several hundred watts below the rating for my power supply and it's quite cool in the room. The temps are reading normal (60 - 65) on my GPUs as well.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33311 - Posted: 1 Oct 2013 | 20:52:18 UTC - in response to Message 33272.

If your machine was fine again afterwards it was very likely not a problem on you side, but rather with the WU + app being in some strange state.

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Number crunching : Blue Screens

//