Advanced search

Message boards : Number crunching : This computer has finished a daily quota of 31 tasks

Author Message
Jim1348
Send message
Joined: 28 Jul 12
Posts: 683
Credit: 1,371,521,768
RAC: 31,817
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 50345 - Posted: 30 Aug 2018 | 13:16:21 UTC
Last modified: 30 Aug 2018 | 13:17:16 UTC

I seem to be one of the few people who can run the QC jobs without problems. But I have 32 GB memory, and 180 GB free on my SSD.
http://www.gpugrid.net/results.php?hostid=483848

I am running three work units at a time (4 cores each) on my i7-8700, and run through them rapidly. But now GPUGrid won't send me any more.
It would be unfortunate if I have to find another projects because I run QC too fast.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 894
Credit: 2,076,424,820
RAC: 1,253,092
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50346 - Posted: 30 Aug 2018 | 13:37:34 UTC - in response to Message 50345.

There's something a bit strange about that statement. The current state of play is

Quantum Chemistry 3.30 x86_64-pc-linux-gnu (mt)
Number of tasks completed 5048
Max tasks per day 31
Number of tasks today 391
Consecutive valid tasks 21
Average processing rate 4.5176716287373
Average turnaround time 0.06 days

(from Application details for host 483848)

Ah - three of your tasks errored out this morning. Look at task 18604263. Apart from
==> WARNING: A newer version of conda exists. <==
current version: 4.5.4
latest version: 4.5.11

the task actually failed because of

<message>
upload failure: <file_xfer_error>
<file_name>6955_1_15_16_18_dd130713_n00001-SDOERR_SELE2-0-1-RND2528_0_1</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</message>

That's a job creation error by the project, outside your control. But it will have reset the daily quota to 31, and by then you were already way off into the distance.

This is a good safety measure in the BOINC code. When you hit the bad batch of WUs, you were prevented from wasting bandwidth by downloading further tasks which were doomed to fail. That gives the project team 24 hours to fix the problem: if the tasks available tomorrow morning have been fixed, or otherwise come from a batch without this error, your daily quota will start rising with every task you return, and you won't hit any limit until things go wrong again.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 683
Credit: 1,371,521,768
RAC: 31,817
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 50350 - Posted: 30 Aug 2018 | 13:57:07 UTC - in response to Message 50346.

That's a job creation error by the project, outside your control. But it will have reset the daily quota to 31, and by then you were already way off into the distance.

That makes sense. Thanks for looking into it. I will wait it out, though maybe with a backup project.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 787
Credit: 4,294,282
RAC: 139
Level
Ala
Scientific publications
watwatwatwat
Message 50401 - Posted: 5 Sep 2018 | 12:57:38 UTC - in response to Message 50350.

Two issues at play here: the limit on tasks, being hit because too fast completions. This we should probably raise for CPU jobs.

Then there is file size too big, which is somewhat surprising. Another cruncher completed it, so does not seem to be a WU issue. Possibly restart-related (a large leftover temporary file maybe).

Jim1348
Send message
Joined: 28 Jul 12
Posts: 683
Credit: 1,371,521,768
RAC: 31,817
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 50402 - Posted: 5 Sep 2018 | 13:38:13 UTC - in response to Message 50401.

Then there is file size too big, which is somewhat surprising. Another cruncher completed it, so does not seem to be a WU issue. Possibly restart-related (a large leftover temporary file maybe).

I will detach and re-attach. That should clean it out.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 787
Credit: 4,294,282
RAC: 139
Level
Ala
Scientific publications
watwatwatwat
Message 50403 - Posted: 5 Sep 2018 | 14:12:55 UTC - in response to Message 50402.

Or just reset. I don't think it would occur in more than one wus though (or did it?).

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 894
Credit: 2,076,424,820
RAC: 1,253,092
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50407 - Posted: 5 Sep 2018 | 17:38:27 UTC - in response to Message 50403.

Or just reset. I don't think it would occur in more than one wus though (or did it?).

There were three visible at the time I first responded to the issue, I think with very similar reporting times: I did say "three of your tasks errored out this morning", though I didn't look for or comment on any other similarities.

One thing I did notice was that the errored tasks had run for approximately double the time of all Jim's other (successful) tasks. They're still visible on page 2 of error tasks for host 483848, sent around 09:30 on 30 August and reported a couple of hours later.

Post to thread

Message boards : Number crunching : This computer has finished a daily quota of 31 tasks