Advanced search

Message boards : Server and website : New acemd version under test

Author Message
Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 819
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52562 - Posted: 4 Sep 2019 | 14:19:32 UTC

The acemd3 app is again under test. It should work on windows (including RTX!).

Billy Ewell 1931
Send message
Joined: 22 Oct 10
Posts: 20
Credit: 267,509,845
RAC: 509,608
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 52563 - Posted: 4 Sep 2019 | 15:08:30 UTC

Reactivated my RTX 2080 on an i7 Windows 10 and unfortunately a NON-acemd3 task downloaded and errored out after 8 seconds. Hopefully, correctly, I excluded all GPUGrid tasks except acemd3 until conditions change.

(Ryle)
Send message
Joined: 7 Jun 09
Posts: 23
Credit: 839,420,996
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52564 - Posted: 4 Sep 2019 | 15:26:53 UTC

Thanks Toni, I'm looking forward to it's release. I hope Linux version also will be released at that time. :)

eXaPower
Send message
Joined: 25 Sep 13
Posts: 280
Credit: 1,449,553,667
RAC: 1,185
Level
Met
Scientific publications
watwatwatwatwatwatwatwat
Message 52565 - Posted: 4 Sep 2019 | 16:05:25 UTC - in response to Message 52562.

The acemd3 app is again under test. It should work on windows (including RTX!).


Windows 8.1 RTX 2080ti error at start of WU. Wu loop until Suspend/resume is used. Error message occurs each time the Wu restarts.

http://www.gpugrid.net/result.php?resultid=21341937
http://www.gpugrid.net/result.php?resultid=21341954

Problem signature:
Problem Event Name: BEX64
Application Name: acemd3.exe
Application Version: 0.0.0.0
Application Timestamp: 5d6535ed
Fault Module Name: ucrtbase.DLL
Fault Module Version: 10.0.17134.12
Fault Module Timestamp: 587decd7
Exception Offset: 000000000006e75e
Exception Code: c0000409
Exception Data: 0000000000000007
OS Version: 6.3.9600.2.0.0.768.101
Locale ID: 1033
Additional Information 1: 723f
Additional Information 2: 723ff68f3f17ee5cfa26fbef8ee09749
Additional Information 3: 096f
Additional Information 4: 096f337e301f747985865265c5b96cfe

Profile [PUGLIA] kidkidkid3
Avatar
Send message
Joined: 23 Feb 11
Posts: 64
Credit: 702,058,317
RAC: 829,585
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52566 - Posted: 4 Sep 2019 | 16:57:56 UTC - in response to Message 52565.
Last modified: 4 Sep 2019 | 16:59:07 UTC

Hi all,
these are my setting :

ACEMD short runs (2-3 hours on fastest card): yes
ACEMD long runs (8-12 hours on fastest GPU): yes
ACEMD3 Beta: yes
Quantum Chemistry (CPU): no
Quantum Chemistry (CPU, beta): no
Python Runtime: no

Actually i have got only a long time WU, no acemd3 ... with 4 in queue !
Any suggestion ?

Thanks in advance
K.

edit : now 0 WU
____________
Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing.
(Martin Luther King)

klepel
Send message
Joined: 23 Dec 09
Posts: 161
Credit: 2,817,802,438
RAC: 629,342
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52567 - Posted: 4 Sep 2019 | 18:47:39 UTC

Task http://www.gpugrid.net/result.php?resultid=21342341

Errored immediately out:
Stderr output

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -59 (0xffffffc5)</message>
<stderr_txt>
# GPU [GeForce RTX 2070] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce RTX 2070
# ECC : Disabled
# Global mem : 8192MB
# Capability : 7.5
# PCI ID : 0000:1F:00.0
# Device clock : 1815MHz
# Memory clock : 7001MHz
# Memory width : 256bit
# Driver version : r430_00 : 43160
#SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 750

</stderr_txt>
]]>

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 288
Credit: 237,915,213
RAC: 115,984
Level
Leu
Scientific publications
wat
Message 52568 - Posted: 4 Sep 2019 | 20:24:24 UTC

You have to have the new acemd3 app enabled and the run test applications setting set. Did you get the new wrapper app for Windows acemd3 version 2.05?

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 819
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52569 - Posted: 4 Sep 2019 | 21:17:02 UTC - in response to Message 52568.
Last modified: 4 Sep 2019 | 21:17:57 UTC

The app (v206) is out for Linux and Windows.

There has been a problem with units with -1-3- in their name (solved).

The scheduler will need improvements. Right now I've seen some cases of the cuda 92 app being sent to RTXes (such cases error out with "gpu architecture").

mmonnin
Send message
Joined: 2 Jul 16
Posts: 265
Credit: 647,845,139
RAC: 1,067
Level
Lys
Scientific publications
wat
Message 52570 - Posted: 4 Sep 2019 | 22:14:57 UTC - in response to Message 52569.

The app (v206) is out for Linux and Windows.

There has been a problem with units with -1-3- in their name (solved).

The scheduler will need improvements. Right now I've seen some cases of the cuda 92 app being sent to RTXes (such cases error out with "gpu architecture").


One of the bad ones:
https://www.gpugrid.net/result.php?resultid=21342392

3 other completed successfully in Linux.

One of my PC sran under cuda80 plan class and another PC with cuda100 plan class. Both with Pascal cards. The plan class is determined by compute capability of the driver I guess.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 381
Credit: 4,777,720,789
RAC: 929,149
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52571 - Posted: 4 Sep 2019 | 22:52:04 UTC

I had 2 cuda(100) units succeed and 1 fail. I also had 2 cuda(92) fail. I last unit I received was a cuda(92).


http://www.gpugrid.net/results.php?hostid=494023&offset=0&show_names=0&state=0&appid=32


The scheduler is still a problem.





Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 381
Credit: 4,777,720,789
RAC: 929,149
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52572 - Posted: 5 Sep 2019 | 1:49:43 UTC - in response to Message 52571.

The -2-3- units using (cuda100) is the combination that finishes successfully. I had one more unit that was valid, the others failed.



I had 2 cuda(100) units succeed and 1 fail. I also had 2 cuda(92) fail. I last unit I received was a cuda(92).


http://www.gpugrid.net/results.php?hostid=494023&offset=0&show_names=0&state=0&appid=32


The scheduler is still a problem.






Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 819
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52574 - Posted: 5 Sep 2019 | 7:05:30 UTC - in response to Message 52572.

I'm fixing things incrementally. Failing stuff may be resent and succeed.

Azmodes
Send message
Joined: 7 Jan 17
Posts: 10
Credit: 602,208,315
RAC: 864,207
Level
Lys
Scientific publications
wat
Message 52575 - Posted: 5 Sep 2019 | 8:39:40 UTC
Last modified: 5 Sep 2019 | 8:57:04 UTC

Not getting any tasks on Linux, nothing but errors on RTX on Windows so far.

I have a Windows system with both GTX and RTX cards in it. Do I have to exclude non-ACEMD3 tasks for the RTX via cc_config?

EDIT: Looks like I've been getting some new tasks on Linux too, but they're erroring out:
http://www.gpugrid.net/result.php?resultid=21343717
http://www.gpugrid.net/result.php?resultid=21341872

Two more were validated on the same system (very short, though?).

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 819
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52576 - Posted: 5 Sep 2019 | 9:00:22 UTC - in response to Message 52575.
Last modified: 5 Sep 2019 | 9:02:35 UTC

Errors with "nelems != 1" were solved. Should go away sooner or later. Please ignore them.

All tests are very short (a few minutes) not to waste your time. They are however very important because I can see the behavior in many realistic card/app combinations.

klepel
Send message
Joined: 23 Dec 09
Posts: 161
Credit: 2,817,802,438
RAC: 629,342
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52577 - Posted: 5 Sep 2019 | 15:49:54 UTC

Both computers with Windows 10 and lateste generation Nvidia Cards receive the following application: Long runs (8-12 hours on fastest card) v9.23 (cuda80). Both fail immidiately:
http://www.gpugrid.net/results.php?hostid=504655
http://www.gpugrid.net/results.php?hostid=512242

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 437
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52578 - Posted: 5 Sep 2019 | 16:28:18 UTC

I completed 27 ACEMD v2.06 (cuda100) tasks without an error on Linux.

http://www.gpugrid.net/results.php?userid=5539

Profile [AF>Libristes] hermes
Send message
Joined: 11 Nov 16
Posts: 23
Credit: 301,170,226
RAC: 60,315
Level
Asp
Scientific publications
watwatwat
Message 52579 - Posted: 5 Sep 2019 | 16:29:44 UTC - in response to Message 52577.
Last modified: 5 Sep 2019 | 16:31:31 UTC

I had 4 WU todays, 4 are OK.
Well done Toni !

On Arch Linux [5.2.11-zen1-1-zen|libc 2.29 (GNU libc)]
NVIDIA GeForce RTX 2080 Ti (4095MB) driver: 435.21

One of them, a toni_test ;-)

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<stderr_txt>
15:29:34 (15838): wrapper (7.7.26016): starting
15:29:34 (15838): wrapper (7.7.26016): starting
15:29:34 (15838): wrapper: running acemd3 (--boinc input --device 0)
15:31:06 (15838): acemd3 exited; CPU time 63.446112
15:31:06 (15838): called boinc_finish(0)

</stderr_txt>
]]>

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 288
Credit: 237,915,213
RAC: 115,984
Level
Leu
Scientific publications
wat
Message 52580 - Posted: 5 Sep 2019 | 20:58:27 UTC

Still waiting on some new apps to go along with new work to test. Not lucky so far.

Profile [AF>Libristes] hermes
Send message
Joined: 11 Nov 16
Posts: 23
Credit: 301,170,226
RAC: 60,315
Level
Asp
Scientific publications
watwatwat
Message 52581 - Posted: 6 Sep 2019 | 6:04:07 UTC - in response to Message 52580.

On error:

https://www.gpugrid.net/result.php?resultid=21349753

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
07:54:11 (1654): wrapper (7.7.26016): starting
07:54:11 (1654): wrapper (7.7.26016): starting
07:54:11 (1654): wrapper: running acemd3 (--boinc input --device 0)
EXCEPTIONAL CONDITION: /home/user/conda/conda-bld/acemd3_1566914012210/work/src/mdio/bincoord.c, line 193: "nelems != 1"
07:54:14 (1654): acemd3 exited; CPU time 1.975979
07:54:14 (1654): app exit status: 0x86
07:54:14 (1654): called boinc_finish(195)

</stderr_txt>
]]>

Post to thread

Message boards : Server and website : New acemd version under test