Advanced search

Message boards : Graphics cards (GPUs) : nVidia driver 340.52

Author Message
Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37718 - Posted: 23 Aug 2014 | 2:08:16 UTC

Ever since I installed these newest drivers for my MSI GTX670 I've been spitting out nothing but cuda60 errors and even a few cuda42. I did a complete wipe of the drivers and went back to the previous 337.88 drivers and cuda60 are validating. Has there been any problems reported with the 670 and 340.52 drivers?
____________
me@rescam.org

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37719 - Posted: 23 Aug 2014 | 3:31:32 UTC - in response to Message 37718.
Last modified: 23 Aug 2014 | 3:39:22 UTC

I have not been having those problems with the newer drivers. But, the newer drivers do push the GPUs harder, and it's possible that your current clocks can't handle them. Please read on.

First, use a tool like Precision-X to set up a fan curve, where the GPU reaches max fan before 70*C. So, at 69*C, it should be a max fan.

Then try this: Install the newer drivers, and see if you can run the Heaven 4.0 benchmark at 1920x1080, Ultra Quality, Extreme Tesselation, 8x Antialiasing... for 5 hours solid with no crashes and no watchdogs dmps reported in C:\Windows\LiveKernelReports\WATCHDOG. .... If you do get a crash, use a tool like Precision-X to back off the Core GPU Mhz by 13 Mhz. Keep backing off in 13 Mhz intervals, until you can run Heaven at those settings for 5+ hours with no crashes. Take notes on how much you needed to back it off, so you can remember in the future. :)

For a GPU that is already completely stable at its default clocks, you can also use that process to increase your GPU Core clock too, to find out the max clock before Heaven yields errors.

I did this procedure, for both of my GTX 660 Ti GPUs in my system. I discovered that my eVGA GTX 660 Ti 3GB FTW, was factory-overclocked too much -- I had to downclock it 52 Mhz for it to be completely stable in Heaven, and now it is completely stable in GPUGrid and also in iRacing! But my MSI GTX 660 Ti 3GB OC, was factory overclocked too little -- I discovered that I could overclock it 39 Mhz with no problems, and so now it crunches GPUGrid a little faster.

It's just a hunch that the drivers are pushing the cards too hard, but seriously, use Heaven to see if you can get the clocks into a "completely stable 24/7" clock setting. Then, once you are sure it is completely stable, test against GPUGrid tasks. I'm hopeful that the procedure solves your problem.

Regards,
Jacob

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37720 - Posted: 23 Aug 2014 | 8:53:27 UTC - in response to Message 37719.

I'm puking up the cuda60 errors again. One thing that stands out is they crash immediately with CPU time = 0. I've been using MSI Afterburner with auto fan control and the highest the temp has gotten is 79. However with these units crashing right away the GPU isn't given the chance to get hot. Nor is the cuda42 having this insta-crash problem.

To get the temps below 70 with these fans I'm gonna need earplugs.

Last three with this error:
Stderr output

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741819 (0xc0000005)
</message>
]]>


The one before that:
Stderr output

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
The extended attributes are inconsistent.
(0xff) - exit code 255 (0xff)
</message>
]]>


(and the one before that was somehow killed when I rebooted into safe mode.)

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37721 - Posted: 23 Aug 2014 | 9:02:46 UTC

Have you tried the Heaven 4.0 suggestions I gave?
Have you tried running with CPU projects suspended?

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37723 - Posted: 23 Aug 2014 | 18:32:34 UTC - in response to Message 37721.

Have you tried the Heaven 4.0 suggestions I gave?
Have you tried running with CPU projects suspended?

1) Yes, but so far I haven't been able to get thru a 5 hour segment with the gaming I do at night, and I work during the day. It's only been one single day since you made that suggestion.

2) If you mean the CPU portion of this project I don't run those. If you mean other projects like WCG no. If this project can't work with other CPU projects I'll dump this one, or at least after its moved past cuda42.

But let me ask you a question about Heaven. Will that tell me why they are crashing immediately?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37725 - Posted: 23 Aug 2014 | 20:39:20 UTC

It will tell you immediately if your GPU is unstable.

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37732 - Posted: 25 Aug 2014 | 5:14:48 UTC - in response to Message 37725.

It will tell you immediately if your GPU is unstable.

Too bad it only allows single diagnostic runs. Anything at -26mhz or less will crash immediately. I was doing diagnostics of 10 runs in a row up to -35 and thought I found a sweet spot at -31 based on FPS and score. Then at that speed I was only able to run it for a few hours before the 'computer hog' family complaints started so I had to abort. Looks like during the night the cuda60 was still failing w/ 0 CPU time so I'll be at it again.

I do however have games that will run with Lucid Virtu MVP now that were crashing before. So at least that problem has been solved.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37735 - Posted: 25 Aug 2014 | 11:13:16 UTC

If you don't click "benchmark", you can run it overnight.

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37741 - Posted: 26 Aug 2014 | 5:10:08 UTC - in response to Message 37735.

I thought of that. Unfortunately when the app stops responding it leaves me with a black screen, no crunching and just eating elec. So I'll take the long weekend for this. Meanwhile I've dropped the card -60 to stock reference specs. When I manage to pick some cuda60 if that errors then it shouldn't be the card.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37742 - Posted: 26 Aug 2014 | 14:13:57 UTC

With some BOINC GPU projects (e.g., Einstein and maybe POEM), I have had problems with anti-viruses. I normally don't use any of them, since I have dedicated machines. But on my main PC, I have tried various AVs, and they all cause problems of different sorts eventually. I don't run GPUGrid on my main PC, so can't offer specific advice, but the AV exclusions usually don't do any good since they are for scanning, but don't necessarily preclude real-time protection. I think the exclusions in avast! also precluded real-time protection, but it caused some other problem that I don't recall at the moment (all on Win7 64-bit).

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37749 - Posted: 28 Aug 2014 | 22:42:02 UTC - in response to Message 37742.

I'm currently using the free Avast. I like that and AVG. Avira had too many false positives.

Well the cuda60's are currently finishing and validating. The only change from when this problem first started is I've underclocked the card to the reference specs. Last night I ran FurMark xtreme burnin, with the fans at 100% the temp topped out at 70. I think next time I buy a card I'll ignore MSI's overclock marketing crap.

I'll see how things go thru the long weekend.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37750 - Posted: 28 Aug 2014 | 22:44:54 UTC - in response to Message 37749.
Last modified: 28 Aug 2014 | 22:49:20 UTC

I'm glad it's starting to work for you!

Even at low temps, a GPU will act incorrectly when GPU Core clocks are too high. And it is possible that the new drivers push the shader clusters harder than before, making them susceptible to problems as compared to the previous drivers. At least, that's what I've been told.

That's why I continue to recommend getting it right with Heaven 4.0, and then to also use custom fan curves to keep it below clock-limiting thermal thresholds (70*C for Boost 1.0 GPUs like your GTX 670, 80*C for Boost 2.0 GPUs).

Sorry to sound like a broken record.

Good luck!

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37768 - Posted: 31 Aug 2014 | 10:02:50 UTC
Last modified: 31 Aug 2014 | 10:19:16 UTC

The 340.52 drivers are working very well on my GTX 750 Ti's (Asus, not overclocked) under WinXP. I am now getting 520,000 RAC, whereas with the 337.88 drivers it was under 500,000 RAC for the two cards. And I no longer get "unstable machine" restarts. This may all be due to the work units themselves, but at least the new drivers are not hurting anything. The temps are also good, 55 to 58C with a 120 mm side fan in a 20C room running NOELIA_5MG.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37773 - Posted: 31 Aug 2014 | 15:43:00 UTC - in response to Message 37768.
Last modified: 31 Aug 2014 | 15:44:34 UTC

The temps are also good, 55 to 58C with a 120 mm side fan in a 20C room running NOELIA_5MG.

I must have read the wrong machine. These cards are now running 63 to 67C; still quite low enough on the NOELIA_5MG.

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37775 - Posted: 31 Aug 2014 | 20:03:58 UTC

Well I had a few good units and then everything went south with more unknown errors. Digging deeper with my research MSI lists a difference reference base clock than what nVidia shows for its reference, that being 915. So I've dropped my card -105 to match that reference. Now looking at GPU-Z it shows:

Under the Graphics Card tab
GPU Clock 915 MHz
Boost 993 MHZ

Under the Sensors tab
GPU Core Clock 1058.2 MHz (while crunching, down to 324.0 when snoozed)

AIDA64 is also showing GPU Clock 1058. Somehow this thing is auto-boosting itself and ignoring my manual settings.

-----

Just as base point when I let the card run at its default settings for comparison...

GPU Clock 1020
Boost 1098
Sensors 1175.8 (AIDA64 1175)

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37776 - Posted: 31 Aug 2014 | 21:44:06 UTC - in response to Message 37775.

Instead of guessing at the clock speeds, you could do the Heaven steps, to ensure stability... I tried really hard to make them simple.

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37777 - Posted: 1 Sep 2014 | 4:38:20 UTC - in response to Message 37776.

Instead of guessing at the clock speeds, you could do the Heaven steps, to ensure stability... I tried really hard to make them simple.

Hi Jacob. Please don't assume I failed to follow your steps. I did follow them. It was running stable at -60. I was still getting errors which is why I dropped it down to where it is now. I can run that program day which makes me think it's not the GPU speed. So yes you succeeded in making the steps simple. Unfortunately your solution has failed.

I watched as this result crapped out in 6 seconds of elapsed time. This was the first unit that caused a pop-up error with Windows error reporting. Acemd.841-60.exe has stopped working

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37778 - Posted: 1 Sep 2014 | 4:44:50 UTC - in response to Message 37777.

Sorry. So, were you able to determine a "Max Boost Clock" speed that was completely stable for 5+ hours in Heaven?

If it's stable in Heaven, but crashing in GPUGrid, then, I'm not sure what the issue could be.

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37783 - Posted: 1 Sep 2014 | 23:21:38 UTC - in response to Message 37778.

Sorry. So, were you able to determine a "Max Boost Clock" speed that was completely stable for 5+ hours in Heaven?

If it's stable in Heaven, but crashing in GPUGrid, then, I'm not sure what the issue could be.

I am at a loss as well. Unfortunately since my card is ignoring my settings and auto-boosting it's impossible to give the factual number. Researching the issue the card wants to boost up to a percentage of the TDP which will change based on the need of the WU. Lowering the power allowed in Afterburner hasn't been effective with that.

The only OC I've cared about, researched and bought, and worked out all the nitty gritty BIOS bits was for my CPU. My 3.4 i5 (intel turbo to 3.8) I have OC'd to 4.5 stable. When it comes to videocards I look at build quality, cooling solution and customer service. It came down to MSI or Gigabyte. My previous card was a Gigabyte on my older system but with this new build I chose a MSI mobo so I went with the matching card that just happened to have it's own OC built in. Just for giggles I clocked the i5 down to 3.4 stock with the gpu still set at 915 stock and cud60 still died. I know it's not a speed/temp issue with either the gpu or cpu.

Using Heaven the difference between unstable (being able to benchmark at least once) and fully stable was 3 FPS. I don't think my eyes will notice that. In fact the only thing that caught my attn was the errors I've been having with cuda60. Had I not seen that I wouldn't even know there was a problem.

Check this out. I'm not the only one having problems with cuda60 errors. Many of my cuda60's also have errored out with other crunchers. With the application error I had yesterday that caused windows error reporting service to pop-up (and not the normal system tray popup if a driver crashed) I wonder if there may be something wrong with these units or the app. I would guess the app but I think it's something the devs should at least take a look at.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37784 - Posted: 2 Sep 2014 | 0:05:42 UTC - in response to Message 37783.
Last modified: 2 Sep 2014 | 0:05:53 UTC

If you adjust the "GPU CLOCK OFFSET" in Precision-X, that limits how high it will "auto-boost" while under load. Each 13Mhz decrement is a step. I had to decrease one of my GTX 660 Ti GPUS 4 steps (-52 Mhz), before Heaven was stable at maximum settings for 5+ hours.

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37794 - Posted: 3 Sep 2014 | 1:41:48 UTC - in response to Message 37784.

I don't have (evga) Precision-X. Should I dump (msi) Afterburner for it?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37795 - Posted: 3 Sep 2014 | 1:43:13 UTC - in response to Message 37794.

I believe you can use Afterburner, as it functions nearly identically to Precision-X. You are looking to control the "GPU Clock" to ensure maximum stability, and Kepler GPUs ramp up and down in 13 Mhz intervals.

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37860 - Posted: 9 Sep 2014 | 1:56:52 UTC

Back to a full night test of Heaven, going by the 13 interval at -39 it ran for 9 hours before the uningine stopped responding. Interesting enough at the next interval -52 the uningine stopped responding 7 hours in. Now they both pass the 5 hour mark but I don't know if technically it's supposed to go on like the Energizer bunny.

I believe you can use Afterburner, as it functions nearly identically to Precision-X. You are looking to control the "GPU Clock" to ensure maximum stability, and Kepler GPUs ramp up and down in 13 Mhz intervals.

The Core Clock slider changes the base speed which is then boosted. However the boost always seems to be 78 MHZ and always seem to be on if the card is running 3D apps. What was your source of the 13 mhz interval?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37861 - Posted: 9 Sep 2014 | 2:00:53 UTC - in response to Message 37860.

Source is by experience. You can slide the slider around in 1Mhz intervals, and then watch whether your GPU's clock goes up/down or not. The registered clock should only "move" in 13 Mhz intervals. I believe it works much like a CPU, where the resulting clock is 13 Mhz times some multiplier.

Similarly, I believe boost is "add a certain number of multipliers on top of the base". That is why adjusting the base, affects boost too. In 13 Mhz intervals. Watch closely as you test it, record your values, and please tell me if I'm wrong.

Finally, you should be able to run Heaven for 5 months straight with no issues, if your system is fully stable. I recommended 5 hours previously, but overnight is a very good test. If it crashed at all, doesn't matter how far into the run, if it crashed AT ALL, it means your core clock is too high.

Again, from experience.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37868 - Posted: 9 Sep 2014 | 17:20:53 UTC - in response to Message 37861.

Jacob, would the similar Unigine Valley benchmark test work as well as Heaven for this purpose?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37869 - Posted: 9 Sep 2014 | 17:26:42 UTC

Valley would work, but it has been my experience that Heaven pushes the GPU harder than Valley.

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37871 - Posted: 10 Sep 2014 | 1:01:02 UTC - in response to Message 37861.

Per GPU-Z it showed a 1mhz boost change per 1mhz clock change. So the boost is staying constant at 78 mhz above.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37872 - Posted: 10 Sep 2014 | 1:12:38 UTC - in response to Message 37871.
Last modified: 10 Sep 2014 | 1:13:50 UTC

You need to be clearer in your findings.

For me:
- I start GPU-Z
- I click the question mark, to begin the render test, which puts the GPU under load
- In GPU-Z, I monitor the "GPU Core Clock" value on the "Sensors" tab
- With Precision-X "GPU Clock Offset" set to 0, GPU-Z shows 1241 "GPU Core Clock", which is my max boost.
- If I change Precision-X's "GPU Clock Offset" down a single Mhz, to -1, and click Apply, I see GPU-Z's "GPU Core Clock" go from 1241 to 1228Mhz.
- If I change Precision-X to any value between -1 to -13 (using arrow keys), and click Apply, GPU-Z still reports 1228Mhz.
- If I change Precision-X to -14, and click Apply, GPU-Z now reports 1215Mhz.

Using those steps, I conclude that my GTX 660 Ti GPUs (which use Boost v1.0), are incrementing and decrementing in 13Mhz intervals.

Are you saying your GPU doesn't increment like that? Can you confirm with exact steps?

Thanks,
Jacob

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37903 - Posted: 14 Sep 2014 | 3:10:39 UTC - in response to Message 37872.
Last modified: 14 Sep 2014 | 3:12:01 UTC

I've tried to DL your Precision-X however it's currently unavailable. Some sort of copyright controversy with it.

MSI has just released v4 of it's Afterburner so I've upgraded.
I've also upgraded back to the 340.52 nvidia drivers.

Source is by experience.

Are you SeanPoe?

Are you saying your GPU doesn't increment like that? Can you confirm with exact steps?

Conditionally yes and no. When I was adjusting the GPU speed I wasn't under a full load. Many times when I've adjusted with gpu grid running it crashed the WU. When I was running Heaven fullscreen there was also no way of adjusting speed without, at a minimum, exiting the window which would instantly change what was shown in GPU-Z.

So going from your last post and the guide in the link my max kepler boost before things start getting dropped by temperature is 7 offset steps (91mhz). Running the GPU-Z render test does confirm the 13 drop when I manually drop it by 1 as shown in your example.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37904 - Posted: 14 Sep 2014 | 4:32:49 UTC - in response to Message 37903.
Last modified: 14 Sep 2014 | 4:38:06 UTC

Precision X v4 can be downloaded here:
http://www.techspot.com/downloads/5348-evga-precision-x.html

Also, you can run Heaven in non-full-screen mode, by making sure the checkbox isn't checked when you run it.

I am not SeanPoe -- did you give me that link as a hint of some sort, that I should read through it?

The goal, in my opinion, should be to set the fan curve so that it reaches max-fan before the [70*C if Kepler Boost v1.0 GPU like yours, 80*C if Kepler Boost v2.0 GPU] thermal-mhz-limiting-threshold ... and then to keep decreasing core clock in Precision-X, in 13 Mhz intervals, until Heaven can run overnight with no crashes and no TDRs logged in C:\Windows\LiveKernelReports\WATCHDOG

Regards,
Jacob

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37909 - Posted: 15 Sep 2014 | 5:09:54 UTC - in response to Message 37904.
Last modified: 15 Sep 2014 | 5:11:05 UTC

Heaven ran overnight at -65 with no crashes and no watchdog dmps. However even at 80% fan speed it was holding at 71C. With the noise at that speed I pretty much have a hair dryer at arm's length; which I can't have since this is my gaming rig and not a cruncher in the garage. Max temp under Heaven has never reached 80C even when it was 80F inside the house and the fan speed set to auto - sometimes will hit 60% which is audible but not bothersome. I can live with the card throttling down a step. (And my max setting will be somewhere between -65 and -53.)

The link is more for my benefit since I lost it once, but it could have been written by you. I see no harm in asking.

Thanks for the Precision link. I have it now and will see how it compares to Afterburner - although too much noise is still too much.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37913 - Posted: 15 Sep 2014 | 11:53:16 UTC

You can setup a fan curve to suit your tastes. I prefer the curve to eventually hit max fan before the thermal limit, so MHz don't get limited. We were doing that here, for testing, just to ensure that the GPU did not downclock due to the thermal limit.

Hope that makes sense.

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37927 - Posted: 16 Sep 2014 | 23:56:47 UTC - in response to Message 37913.

It makes sense. Thanks Jacob.

Here is something interesting. I did a boot scan (Avast) and that turned up a couple of corrupted BOINC files.

ProgramData\boinc\symbols\Kernelbase.pdb
ProgramData\boinc\symbols\ole32.pdb

So I completely uninstalled boinc, wiped it off the drive and out of the registry, and reinstalled. Right now it's crunching away a cuda60 where normally it would've crashed. I'm hoping I'll get lucky and this would have been the cause of the problem although I have no idea what these files do and why they would only effected gpugrid cuda60.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37930 - Posted: 17 Sep 2014 | 1:03:15 UTC - in response to Message 37927.

I would envision that those files wouldn't matter. They are symbol files that are downloaded, I think, whenever a BOINC debug version crashes or is debugged.

Don't count on anything related to those files, having any effect on how GPUGrid runs.

Matt
Avatar
Send message
Joined: 11 Jan 13
Posts: 216
Credit: 846,538,252
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37936 - Posted: 18 Sep 2014 | 1:06:31 UTC

Just wanted to confirm the 13MHz steps in Kepler boost. My GTX 680 and GTX 780Ti cards both step up and down in 13MHz increments. I have a Temp Target of 68C set on my 780Ti cards and they will clock up or down in 13MHz steps until they reach a speed where they can maintain the desired temperature.

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37937 - Posted: 18 Sep 2014 | 1:56:55 UTC - in response to Message 37930.

I would envision that those files wouldn't matter. They are symbol files that are downloaded, I think, whenever a BOINC debug version crashes or is debugged.

Don't count on anything related to those files, having any effect on how GPUGrid runs.

True - but I can't help but wonder maybe some file (the cuda60 exe) got borked. I've turned in 4 straight valid WUs since the wipe and clean install.

Well at least now I know more about my card than I ever thought I would need. I know where the stable point is. And the problem appears to be fixed.

Overnight I'll return the card to its default values and see if the instability churns up any errors. At least that way I can determine if the card contributed to the problem.

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38141 - Posted: 29 Sep 2014 | 2:01:00 UTC

Cuda60 has been churning good, and I've ditched the MSI app for the one from EVGA. Thanks for the help.

Post to thread

Message boards : Graphics cards (GPUs) : nVidia driver 340.52

//