Advanced search

Message boards : Graphics cards (GPUs) : NOELIA_SH2eq Short Work unit(s) Instantly Failing

Author Message
eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 36996 - Posted: 5 Jun 2014 | 16:48:27 UTC

Noelia_SH2eq work unit(s)-argprox1/argasnx1/asnmetx3/argcysx3/alaphex6/argalax3/argvalx1/asaaspx6/asnserx6/argasnx2/argargx7/alailex7 all failing with Code (98) along with statement: ERROR: file mdioload.cpp line 162: No CHARMM parameter file specified. Wingman generating same error line.

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 206
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36997 - Posted: 5 Jun 2014 | 16:51:26 UTC - in response to Message 36996.

Hi: Well, I've already made a few of these short tasks without problem, in Linux -Ubuntu 14.04.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 36998 - Posted: 5 Jun 2014 | 17:28:56 UTC - in response to Message 36997.
Last modified: 5 Jun 2014 | 17:35:04 UTC

Thanks for sharing you're experience. I've found couple Linux wingman who completed Noelia_SH2eq work units, yet failing work units showing process exited code 199 (0xc7, -57)and FATAL : Cuda driver error 35 in file 'swanlibnv2.cpp' in line 446.

Are these similar meaning errors for Windows and Linux? Or completely apart for another?

All Failing Windows wingman hosts generating Code(98)and ERROR: file mdioload.cpp line 162: No CHARMM parameter file specified.

Jeremy Zimmerman
Send message
Joined: 13 Apr 13
Posts: 61
Credit: 726,605,417
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 36999 - Posted: 5 Jun 2014 | 18:00:14 UTC - in response to Message 36998.

eXaPower,

We had the same error on the NOELIA_TRP WU's a week ago. See below message.
http://www.gpugrid.net/forum_thread.php?id=3770&nowrap=true#36979

I have since received only two NOELIA_TRP WU's since then, and both finished. No updates if a change was made or not.
http://www.gpugrid.net/result.php?resultid=11091492
http://www.gpugrid.net/result.php?resultid=11094060

Only running long WU's now, so I have not received any short ones. Thought the above information may be relevant since same error with the same researcher.

Regards,
Jeremy

Profile [VENETO] sabayonino
Send message
Joined: 4 Apr 10
Posts: 50
Credit: 645,641,596
RAC: 44,792
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37437 - Posted: 27 Jul 2014 | 7:51:57 UTC - in response to Message 36999.
Last modified: 27 Jul 2014 | 8:08:07 UTC

Hi guys

many WU's failed (NATHAN)

http://www.gpugrid.net/results.php?hostid=174773

All Wu's was crunched by GTX 780 Ti

<core_client_version>7.2.41</core_client_version>
<![CDATA[
<message>
process exited with code 199 (0xc7, -57)
</message>
<stderr_txt>
# SWAN Device 0 :
# Name : GeForce GTX 780 Ti
# ECC : Disabled
# Global mem : 3071MB
# Capability : 3.5
# PCI ID : 0000:01:00.0
# Device clock : 1071MHz
# Memory clock : 3600MHz
# Memory width : 384bit
SWAN : FATAL : Cuda driver error 700 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert -57

</stderr_txt>
]]>


Other GPU's Cards run fine.
(GTX 780 ** GTX 760 ** GTX 660Ti ** GTX 750Ti)

All Cards running 340.24 Nvidia Drivers (for Linux)
All OS have SWAN_SYNC=0 environment variable (Gentoo Linux)
No Overclock (CPU and GPU)
All systems hardware (except GPU's) are the same (i7-4770 ASUS-Z87-A)

ONLY 780Ti has too much WU's failed

Other GPU Projects GTX780Ti run fine


PS : GTX 780Ti + GTX 760 same PC
running only 780Ti WU's fail

Profile [VENETO] sabayonino
Send message
Joined: 4 Apr 10
Posts: 50
Credit: 645,641,596
RAC: 44,792
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37442 - Posted: 27 Jul 2014 | 12:05:21 UTC

uhm.. I think the probl is hight temperature

PCB of GTX780Ti was very very Hot

I replaced GTX780Ti and GTX760 to other MainBoard with 2x660 and add 2 Fans

At the moment all WUs are crunching fine

Profile [VENETO] sabayonino
Send message
Joined: 4 Apr 10
Posts: 50
Credit: 645,641,596
RAC: 44,792
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37447 - Posted: 27 Jul 2014 | 18:08:14 UTC
Last modified: 27 Jul 2014 | 18:08:52 UTC

nothing changes

WUs still got errors

http://www.gpugrid.net/results.php?hostid=174768

(GPU was installed to other MainBoard)


GPU

Sun Jul 27 20:07:58 2014
+------------------------------------------------------+
| NVIDIA-SMI 340.24 Driver Version: 340.24 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 780 Ti Off | 0000:01:00.0 N/A | N/A |
| 55% 79C P0 N/A / N/A | 570MiB / 3071MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 760 Off | 0000:04:00.0 N/A | N/A |
| 62% 81C P0 N/A / N/A | 530MiB / 2047MiB | N/A Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+



CPU
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +65.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +60.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +65.0°C (high = +80.0°C, crit = +100.0°C)
Core 2: +61.0°C (high = +80.0°C, crit = +100.0°C)
Core 3: +61.0°C (high = +80.0°C, crit = +100.0°C)

Profile [VENETO] sabayonino
Send message
Joined: 4 Apr 10
Posts: 50
Credit: 645,641,596
RAC: 44,792
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37448 - Posted: 27 Jul 2014 | 19:47:56 UTC
Last modified: 27 Jul 2014 | 19:50:04 UTC

:O

playing with app_config something changes

<name>acemdshort</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>0.49</cpu_usage>
</gpu_versions>

with this configuration GPU temperature is ~54°C

If I increase CPU_USAGE >=0.5 or 1 (keeping GPU_USAGE=1), temperature increase over 70°C

If I decrease CPU_USAGE to <0.5 temperature is very low but WU is very very slowly

CPU_USAGE >=0.5 temp increase
No Changes playing with GPU_USAGE
CPU_USAGE=0.49 ** GPU_USAGE=1
Sun Jul 27 21:50:51 2014
+------------------------------------------------------+
| NVIDIA-SMI 340.24 Driver Version: 340.24 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 780 Ti Off | 0000:01:00.0 N/A | N/A |
| 40% 55C P0 N/A / N/A | 576MiB / 3071MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 760 Off | 0000:04:00.0 N/A | N/A |
| 50% 59C P0 N/A / N/A | 526MiB / 2047MiB | N/A Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+





:O any ideas ?

No problems with GTX780 ....

Post to thread

Message boards : Graphics cards (GPUs) : NOELIA_SH2eq Short Work unit(s) Instantly Failing

//