Author |
Message |
gianniSend message
Joined: 11 Jul 08 Posts: 18 Credit: 105,098 RAC: 0 Level
Scientific publications
|
<daily_result_quota> N </daily_result_quota>
Each host has a field MRD in the interval [1 .. daily_result_quota]; it's initially daily_result_quota, and is adjusted as the host sends good or bad results. The maximum number of jobs sent to a given host in a 24-hour period is MRD*(NCPUS + GM*NGPUS). You can use this to limit the impact of faulty hosts.
My understanding is that this option set the limit of good + bad results per day.
One user claims that there is no limit as long as the results are good.
Do you know? |
|
|
|
I think that's effectively the case. The purpose of the quota is to limit the damage caused by faulty hosts - quota is reduced every time a task fails, and reaches a minimum of 1 per day. That allows a user to fix a faulty computer and re-start processing valid tasks.
Since quota is incremented every time a returned task validates, users at this project can always get a new task shortly (30 seconds) after reporting a successful run. At other projects, where validation relies on comparison with a second result, the quota may not increase immediately.
There is a second quota mechanism, associated with the runtime estimation component of CreditNew (mentioned briefly in CreditNew under database changes, but without further documentation. It's active here, and the values can be seen in the application details for each host). There's a table of 'host_app_version' with a field for 'max_jobs_per_day'. Those limits tend to be more generous than the daily_result_quota, and I don't know whether there's a defined order of precedence. The general view on project message boards is that the 'max_jobs_per_day' tool is unsuccessful at limiting faulty hosts, but I don't know if any project administrator has ever tested that theory directly with David Anderson.
Hope that helps.
Edit - there's a note in Trouble-shooting the job pipeline about using <debug_quota/> to show details of quota enforcement.
I don't know if there's a similar tool for max_jobs_per_day (or if max_jobs_per_day enforcements are logged by <debug_quota/>) - I suspect the answer is 'no' in both cases, but I can ask David Anderson if you like. |
|
|
|
From observation of my hosts application details and others.
First thing you need to know it is "Application Specific" If a host runs both long and short WUs they have a baseline of 10 on both applications or indeed, if you change the App everbodys goes back to the baseline of 10.
Baseline = 10
3 consecutive errored WUs are returned it goes down to 7
1 valid result returned sends it immediately back to 10
Every consecutive valid result adds +1
Return 20 consecutive valid results sends it to 30
1 errored result will return it immediately back to 10 and then -1 for each consecutive errored result.
When you send 10 errored results this will reduce you MAX WU to 1 per day but the server will actually send you 2 a day. (Don't ask me why)
Thing to notice is that no matter how many valid results you return one error will send you back to baseline 10
In addition, if you are on 1 a day because of sending errors 1 valid unit will return you to baseline 10
Another thing to note is an "error" includes:
Aborting a WU
Server cancellation of a WU |
|
|
|
From observation of my hosts application details ...
Thanks. That's very clear description of the 'max_jobs_per_day' version of the mechanism - but if you search the server Wiki for the word 'quota', you don't find it. |
|
|
|
From observation of my hosts application details ...
Thanks. That's very clear description of the 'max_jobs_per_day' version of the mechanism - but if you search the server Wiki for the word 'quota', you don't find it.
Hi Richard,
Unfortunately I am not a BOINC expert, in fact not anywere near.
I believe you have worked on BOINC in the past and maybe you still do so may be able to help with this problem.
I think reducing MAX GPU to 4 would not hurt anyone that sends mostly valid WUs but restrict WUs going to bad hosts.
It doesn't cure the problem but at least mitigates it. |
|
|
|
After a quick peek at the code, it does look as if max_jobs_per_day movements are logged by <debug_quota/>. |
|
|
|
Each host has a field MRD ...
We've found a note in the comments starting
https://github.com/BOINC/boinc/blob/master/db/boinc_db_types.h#L335:
"// DEPRECATED: only use is -1 means host is blacklisted", but that hasn't been documented in http://boinc.berkeley.edu/trac/wiki/ProjectOptions#Joblimits. Continuing to search for the replacement.
Edit - it does look as if <daily_result_quota> can be used globally to limit the maximum number of tasks sent to each host attached to the project - but the stuff about an MRD for a single host is a bit misleading - you can't restrict individual hosts that way, unless you go all the way down to -1 and blacklist the host. Gianni, what were you actually trying to do with this field? |
|
|
|
From observation of my hosts application details and others.
First thing you need to know it is "Application Specific" If a host runs both long and short WUs they have a baseline of 10 on both applications or indeed, if you change the App everbodys goes back to the baseline of 10.
Baseline = 10
3 consecutive errored WUs are returned it goes down to 7
1 valid result returned sends it immediately back to 10
Every consecutive valid result adds +1
Return 20 consecutive valid results sends it to 30
1 errored result will return it immediately back to 10 and then -1 for each consecutive errored result.
When you send 10 errored results this will reduce you MAX WU to 1 per day but the server will actually send you 2 a day. (Don't ask me why)
Thing to notice is that no matter how many valid results you return one error will send you back to baseline 10
In addition, if you are on 1 a day because of sending errors 1 valid unit will return you to baseline 10
Another thing to note is an "error" includes:
Aborting a WU
Server cancellation of a WU
Error rate can hardly can get out of an "orange" figure. Don't you think you could adopt a more "proactive" approach and set MAX WU to 4 and MIN to 0.
I say Min 0 because setting to 1 gives 2 and setting to -1 gives 0 so setting to 0 must give 1 ( pure logic) Impatient. |
|
|
|
After all this time you are not listening, are you? rhetorical. |
|
|