Advanced search

Message boards : Number crunching : JIT (Just In Time) or Queue and Wait?

Author Message
Betting Slip
Send message
Joined: 5 Jan 09
Posts: 669
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42029 - Posted: 26 Oct 2015 | 9:26:58 UTC
Last modified: 26 Oct 2015 | 9:27:45 UTC

I would like to propose a JIT policy be used on this project and I will attempt to articulate why.

In view of this projects need for a "FAST TURN AROUND" of WU's would a policy of ONE WU per GPU and a GPU only gets another when it has returned the last one. This simple policy would surely speed the throughput of WU's especially since GERARD has expressed on this forum a desire to do that and I would think he considers it an important goal.

This would also end the inefficiency of having a WU cached on one machine for several hours that could be RUNNING on another machine that can't get any work.

We could get a faster throughput of WU's and from that an increasing availability of new WU's because new WU's are generated from returned results.

Does that make sense to you?

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,991,617,060
RAC: 146,649
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42030 - Posted: 26 Oct 2015 | 12:42:03 UTC - in response to Message 42029.
Last modified: 26 Oct 2015 | 13:00:20 UTC

It does make sense, but there are issues with that approach which make it impractical.

Tasks take a long time to upload. ~6 to 10min for me (Europe), usually longer from US to Europe and for people on slower broadband connections.
I have 2 GPU's in each of two systems. Returning 2 WU's per GPU per day I would lose 50 to 80min of GPU crunching/day. For some people it could be several hours/day. Too much.

If a new task would not download until an existing task was at 90% that might be the happy medium.

Some people also run 2 WU's at a time on the same GPU. This increases overall throughput, but requires more tasks to be available.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 669
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42031 - Posted: 26 Oct 2015 | 13:43:57 UTC - in response to Message 42030.

I take your point SK but without sounding to be too harsh my suggestions are aimed at "project benefits" NOT "user benefits" although I'm sure some of the difficulties you mention could be mitigated.

My point of view is that, the project benefits are more important and while a few users may suffer I wouldn't let them become a bottleneck.

As for running 2 WU's on one GPU that has no project benefit whatsoever and is only designed to raise RAC. In fact it actually slows the project down

klepel
Send message
Joined: 23 Dec 09
Posts: 161
Credit: 2,817,802,438
RAC: 629,342
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42032 - Posted: 26 Oct 2015 | 16:04:52 UTC

Sorry! I completely disagree with this suggestion! I wanted to propose quite the opposite: Three WUs per GPU!

I recently acquired two GTX 970 and put them in the same computer (quite an investment) and as many suggested I am running two WUs at the same time, just to get a better load on them! Nothing to do with your suggested RAC optimization, as you might well know if your read the forums and the experiences of GTX 980 (TI) owners.

Now with the new policy of GERARD of small batches I have a huge problem! I do receive four WUs in a short time and all finish more or less at the same time after 18 hours… (within 24 hours) and then they get stuck for several hours in the up-load queue, with luck it solves itself over the night or until I come back and up-load those units by hand!

It is because the Internet Provider from Spain applies out dated Internet Polices on its subsidiary branches in the Southern Hemisphere: High Fees, slow up- and down load speeds, guarantying only 10% of the announced speeds, and penalizing the up-load speed even more.

This translates in a minimum up-load time of 30-40 minutes for each WU (if it is one by one, and not in parallel) and often gets interrupted by other WUs starting to up-load from my other computers or from the receiving up-load server at GPUGRID.net and then it kicks in the awful BOINC policies of logarithmic waiting time for each stop and finally they do not up-loaded at all until I intervene manually.

If we had three or four WUs per GPU, at least I would be able to keep crunching until the first WUs have been up-loaded (automatically or by hand), but at the moment, the GPUs just sit idle!

Don’t suggest a secondary or alternative BOINC Project, I do have already some with recourse sharing set on 0%, this is not quite a solution:
1) I do have an emphasis on GPUGRID,
2) There are not a lot of projects that have short turnaround times (or do penalize short WUs like GPUGRID does itself), which is a necessity as it serves only for occiping the GPUs until there are new WUs from GPUGRID available.

There shall be more WUs available per GPU and not less for those crunching out-side Europe or North America and for those who have a slow Internet Connection!!

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 3
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42034 - Posted: 26 Oct 2015 | 18:42:15 UTC

I like the JIT queue too, but as an option. That is, if people have good upload/download speeds and want to run only 1 work unit at a time, they could do it. Otherwise, they would use the normal queue.

But I doubt that they have enough work for both at the moment. In fact, I get the impression they mentioned the fast turn-around at all mainly because they don't. But if the work permits, I would opt for it myself.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,813,587,539
RAC: 893,726
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42036 - Posted: 27 Oct 2015 | 19:36:11 UTC
Last modified: 27 Oct 2015 | 19:36:32 UTC

Here's my thought:

Set the server to default to "1 task per GPU"
... but allow a web profile override, so the user could change it to "2 tasks per GPU", or even "3 tasks per GPU" (to better support the case where they actually run 2-per-GPU but are uploading).

Recap: "1 per GPU" by default, but overrideable by user to be 2 or 3.

However, for hosts that are not connected to the internet 24/7, the default setting may cause them to crunch less.

Honestly, I like the setting where it's at right now (server sends 2 per GPU), but wish we could have a web user override setting, for 3 per GPU.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 669
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42062 - Posted: 1 Nov 2015 | 0:11:42 UTC - in response to Message 42032.

I recently acquired two GTX 970 and put them in the same computer (quite an investment) and as many suggested I am running two WUs at the same time, just to get a better load on them! Nothing to do with your suggested RAC optimization


Whatever you wish to delude yourself with by running 2 WU's on one GPU you are not only depriving other machines of work but also slowing down the return of results to create other WU's and so ultimately you are creating a bottleneck.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,813,587,539
RAC: 893,726
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42063 - Posted: 1 Nov 2015 | 0:20:46 UTC

It's not that simple.

Sure, in situations where 1) the availability of work units depends on others being completed, AND 2) there are no work units available ... then doing 2-at-a-time can cause a decrease in overall project efficiency.

HOWEVER

In situations where work units are readily available ... then doing 2-at-a-time can cause an increase in overall project efficiency, because we can get them done ~10% quicker.

SO ... Please don't fault the users doing 2-at-a-time. If the admins believe that 1-at-a-time will help them more, then they could/should publicly request that, and some of us would be persuaded to revert to 1-at-a-time. But, in the meantime, if I can get work units done faster by 2-at-a-time, increasing my machines overall throughput, then I'm going to do that.

:)

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 669
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42065 - Posted: 1 Nov 2015 | 0:56:03 UTC - in response to Message 42063.

Sure, in situations where 1) the availability of work units depends on others being completed, AND 2) there are no work units available ... then doing 2-at-a-time can cause a decrease in overall project efficiency.


I disagree its exactly that simple as you have pointed out as both conditions 1 and 2 of your statement are applicable at this time.

In situations where work units are readily available ... then doing 2-at-a-time can cause an increase in overall project efficiency, because we can get them done ~10% quicker.


The conditions in this situation are not applicable at this time and WOW a maybe 10% quicker.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,813,587,539
RAC: 893,726
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42066 - Posted: 1 Nov 2015 | 1:21:54 UTC

This project usually has tasks. So, the more common scenario is the scenario where 2-at-a-time would help.

I have temporarily set mine to 1-at-a-time, until plenty of work units are available.

Lighten up, please.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 669
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42067 - Posted: 1 Nov 2015 | 1:33:00 UTC - in response to Message 42066.

This project usually has tasks. So, the more common scenario is the scenario where 2-at-a-time would help.



Since this project benefits from speed of return of each WU there will NEVER be a time or situation when a small increase in throughput at the (large) expense of speed will be of any benefit to this project but only to the user in RAC terms.

BTW the original post was about caching WU's by one user at the expense of another user and the project. It was SK that raised the dubious benefits of running 2 WU's on one GPU.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,813,587,539
RAC: 893,726
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42068 - Posted: 1 Nov 2015 | 1:38:33 UTC
Last modified: 1 Nov 2015 | 1:39:08 UTC

there will NEVER be a time or situation when a small increase in throughput at the (large) expense of speed will be of any benefit to this project


You are incorrect. If plenty of jobs are available, and a host can increase throughput by 10% on their machine, then that helps the project, especially if that is the normal scenario.

Like I said... I'll change to 1-at-a-time temporarily, during this non-normal work outage, but will eventually change back to 2-at-a-time.

Perhaps the best approach would be for the project server to stop handing out 2-at-a-time, when the queue is very near empty. Just a thought. There are problems with that approach, too, though.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,813,587,539
RAC: 893,726
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42069 - Posted: 1 Nov 2015 | 13:27:48 UTC - in response to Message 42068.
Last modified: 1 Nov 2015 | 13:55:48 UTC

Re-reading your initial first post, I do believe that there is a compromise that can be made server-side, but I don't know exactly what it is.

Perhaps if the server:
1) detects/estimates that it will come close to running out of work
AND
2) the remaining work units are of the type where new ones would be generated upon the completion of the existing work units
THEN ... the server could switch to a "hand out 1-per-GPU" mode.

If those conditions aren't met, it could switch back to "2-per-GPU" mode.

But the client will still crunch at their same x-per-GPU setting, which sort of sucks, since generally 2-at-a-time is better, but in cases like we have right now, 1-at-a-time is better. Ideally, it'd be best if that too could be controlled server-side, and I don't think it's outside the realms of possibility.

The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file.

The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts.

I'd love to hear the admins respond to this proposal. I think it's a great compromise, that could fix multiple problems.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2048
Credit: 14,826,576,669
RAC: 2,426,205
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42070 - Posted: 1 Nov 2015 | 14:31:06 UTC - in response to Message 42069.
Last modified: 1 Nov 2015 | 14:38:27 UTC

The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file.
There is such a user profile setting at the Einstein@home project.
But ideally this setting should be done by the server, as you say:
The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts.
However, there are such hosts which don't need to run more than one workunit to achieve maximal GPU usage, so this behavior could be disabled by the user. Beside this, if the project wants to prioritize its throughput I think the server should make use of the host's profile, especially the "average turnaround time" to decide which host is worthwhile to receive urgent workunits. This will handicap the hosts with lesser GPUs, but surely will decrease the overall processing time. I don't think the participants with lesser GPUs will appreciate much this though.
It's like Formula-1: if you over-complicate the rules, it will hurt competition.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 3
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42074 - Posted: 1 Nov 2015 | 16:20:12 UTC - in response to Message 42070.
Last modified: 1 Nov 2015 | 16:25:18 UTC

This will handicap the hosts with lesser GPUs, but surely will decrease the overall processing time. I don't think the participants with lesser GPUs will appreciate much this though.
It's like Formula-1: if you over-complicate the rules, it will hurt competition.

I am perfectly willing to give up my GTX 660 Ti and GTX 750 Ti's on this project for the moment, and get faster cards later. I can always use the slower cards elsewhere. The project needs to do what is best for them, though it will alienate some people, and they need to consider that.

Folding handles it by awarding "Quick Return Bonus" points that rewards the faster cards more than they would normally be. Also, they overlap the download of the new work unit with the finishing of the old work when it reaches 99% complete. I think that is why the Folding people never bought into BOINC, but developed their own control app. Maybe BOINC could adopt some part of it?

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 669
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42076 - Posted: 1 Nov 2015 | 16:50:48 UTC

I'm sorry guys but I think you are over thinking my original post.

One WU per GPU and you don't get another one until

you begin to upload last one
last one is 99% complete

Nobody needs to be alienated, leave the project, or have theit contribution or card questioned.

It really is that simple.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2048
Credit: 14,826,576,669
RAC: 2,426,205
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42078 - Posted: 1 Nov 2015 | 17:11:00 UTC - in response to Message 42076.

Betting Slip wrote:
One WU per GPU and you don't get another one until

you begin to upload last one
last one is 99% complete
This is ok, but the BOINC client-server architecture is not working this way, because there's the concept of "reporting completed tasks": it's not just the host has to upload the result, it has to be reported to the server to begin further processing (awarding credits for it, comparing results from different host for validation, creating and issuing new tasks, etc). That's why the "report_results_immediately" option is recommended for GPUGrid users. But there's no preemptive way for sending workunits to hosts in BOINC.

Jim1348 wrote:
Folding handles it by awarding "Quick Return Bonus" points that rewards the faster cards more than they would normally be.
It's the same for GPUGrid.
Jim1348 wrote:
Also, they overlap the download of the new work unit with the finishing of the old work when it reaches 99% complete. I think that is why the Folding people never bought into BOINC, but developed their own control app. Maybe BOINC could adopt some part of it?
That's one possibility, but not quite a realistic one.
It is clear, that BOINC is not the perfect choice for GPUGrid (or for any project), if there's a shortage of work.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 669
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42079 - Posted: 1 Nov 2015 | 17:44:24 UTC - in response to Message 42078.
Last modified: 1 Nov 2015 | 17:44:59 UTC

This is ok, but the BOINC client-server architecture is not working this way, because there's the concept of "reporting completed tasks": it's not just the host has to upload the result, it has to be reported to the server to begin further processing (awarding credits for it, comparing results from different host for validation, creating and issuing new tasks, etc). That's why the "report_results_immediately" option is recommended for GPUGrid users. But there's no preemptive way for sending workunits to hosts in BOINC.



OK, so lets get back to basics;

One WU per GPU and you don't get another one until

Last WU is uploaded and credit is granted.

Project is then as fast as it can get given the resources that are available to it.

Now I will wait until someone comes up with an objection due to their internet connection is slow, etc. There is only so much this project can do to keep everyone happy, which, as we know is impossible.

This project should implement policies that increase and benefit the efficiency of the project and scientists.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 3
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42080 - Posted: 1 Nov 2015 | 18:01:37 UTC - in response to Message 42079.
Last modified: 1 Nov 2015 | 18:03:25 UTC

One WU per GPU and you don't get another one until

Last WU is uploaded and credit is granted.

OK with me, but I can (and do) accomplish that now with zero resource share. However, they could grease the wheels a little by providing more granularity in their bonus system. They are probably not prepared to do a full "Quick Return Bonus", which calculates the bonus on a continuous exponential curve, but they could provide more steps. For example, rather than just 24 and 48 hour bonus levels, they could start at 6 hours and have increments every six hours until maybe 36 hours. That would keep an incentive for the slower cards, while rewarding the faster cards appropriately. They can work out the numbers to suit themselves, but the infrastructure seems to be more or less in place for that already.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 669
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42081 - Posted: 1 Nov 2015 | 18:12:50 UTC - in response to Message 42080.

One WU per GPU and you don't get another one until

Last WU is uploaded and credit is granted.

OK with me, but I can (and do) accomplish that now with zero resource share. However, they could grease the wheels a little by providing more granularity in their bonus system. They are probably not prepared to do a full "Quick Return Bonus", which calculates the bonus on a continuous exponential curve, but they could provide more steps. For example, rather than just 24 and 48 hour bonus levels, they could start at 6 hours and have increments every six hours until maybe 36 hours. That would keep an incentive for the slower cards, while rewarding the faster cards appropriately. They can work out the numbers to suit themselves, but the infrastructure seems to be more or less in place for that already.


I'll certainly +1 that

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2048
Credit: 14,826,576,669
RAC: 2,426,205
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42082 - Posted: 1 Nov 2015 | 18:29:10 UTC - in response to Message 42080.

One WU per GPU and you don't get another one until

Last WU is uploaded and credit is granted.

OK with me, but I can (and do) accomplish that now with zero resource share. However, they could grease the wheels a little by providing more granularity in their bonus system. They are probably not prepared to do a full "Quick Return Bonus", which calculates the bonus on a continuous exponential curve, but they could provide more steps. For example, rather than just 24 and 48 hour bonus levels, they could start at 6 hours and have increments every six hours until maybe 36 hours. That would keep an incentive for the slower cards, while rewarding the faster cards appropriately. They can work out the numbers to suit themselves, but the infrastructure seems to be more or less in place for that already.
Besides it makes more "profitable" to have a low cache setting even for fast hosts.
So basically it will encourage everyone to lower their cache size.
I think this is a very good idea.

klepel
Send message
Joined: 23 Dec 09
Posts: 161
Credit: 2,817,802,438
RAC: 629,342
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42084 - Posted: 2 Nov 2015 | 5:15:32 UTC

Before I go to bed.

Betting Slip starts his argument with the wrong hypothesis:

“GPUGRID project needs the fastest turnaround times possible and every second counts!”

This seems to me completely out of place:

First of all if a project needs a turnaround time that fasts, it’s on the wrong place with BOINC as its platform as it is a scientific citizen project depending on volunteers and their willingness to spend quite a lot of their money on hardware and energy. If there is really the need for turnaround times at the speed of light, I think the project shall pay its time on a supercomputer! As GPUGRID decided to use the BOINC platform, I do think the turnaround time is important but not every second counts, as Betting Slip likes to imply.

Second: As Gerard wrote in another post somewhere, there is an intellectual process in analyzing the generated data from the WUs and generating the next batch of WUs, as the scientist has to decide, where to go. So if GPUGRID really would need the results that fast as Betting Slip suggests, GPUGRID staff would have to work 24 hours / 7 days to analyze and generate new work, which it clearly does not, as normally WU shortage occurs over the weekends. And I think as it is a scientific research project publishing papers, it is not in a Formula 1 race, where every second counts!

This said, I think your suggestion of JIT(Just in Time) and all your further argumentation is discriminating all the participants which can not afford or are not willing to buy and/or replace the fastest GPU of every card generation, or with slow internet connections or do run two WUs at a time with no benefits what so ever for the project!

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42085 - Posted: 2 Nov 2015 | 9:44:07 UTC - in response to Message 42029.

I would like to propose a JIT policy be used on this project and I will attempt to articulate why.

In view of this projects need for a "FAST TURN AROUND" of WU's would a policy of ONE WU per GPU and a GPU only gets another when it has returned the last one. This simple policy would surely speed the throughput of WU's especially since GERARD has expressed on this forum a desire to do that and I would think he considers it an important goal.

This would also end the inefficiency of having a WU cached on one machine for several hours that could be RUNNING on another machine that can't get any work.

We could get a faster throughput of WU's and from that an increasing availability of new WU's because new WU's are generated from returned results.

Does that make sense to you?


This proposition doesn't make much sense to me. If anything, it will only achieve to slow WU processing dramatically!

Requiring a WU just completed, in order to get another one, means hosts that have not just completed a WU (or new hosts getting in the project) will never get a new WU! Unless BOINC supports priority queuing (and the GPUGRID people implement it), this is simply not feasible.

Without priority queuing, and with the current scarceness of WUs, eventually most hosts in the project would be denied new WUs, with the exception of a few lucky / "chosen ones". Not efficient and not fair!
____________

mikey
Send message
Joined: 2 Jan 09
Posts: 278
Credit: 453,901,190
RAC: 472,654
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42088 - Posted: 2 Nov 2015 | 11:59:16 UTC - in response to Message 42082.

One WU per GPU and you don't get another one until

Last WU is uploaded and credit is granted.


OK with me, but I can (and do) accomplish that now with zero resource share. However, they could grease the wheels a little by providing more granularity in their bonus system. They are probably not prepared to do a full "Quick Return Bonus", which calculates the bonus on a continuous exponential curve, but they could provide more steps. For example, rather than just 24 and 48 hour bonus levels, they could start at 6 hours and have increments every six hours until maybe 36 hours. That would keep an incentive for the slower cards, while rewarding the faster cards appropriately. They can work out the numbers to suit themselves, but the infrastructure seems to be more or less in place for that already.
Besides it makes more "profitable" to have a low cache setting even for fast hosts.
So basically it will encourage everyone to lower their cache size.
I think this is a very good idea.


It would also keep people buying the faster and more powerful cards, whereas the 1 unit at a time with the longer time bonuses like they have now means that even if they do run 3 units at a time they can still get all the time bonuses, on top of chewing thru the units very fast.

No your gpu isn't loaded to almost 100% but you are finishing units well within the new time bonus time frames, they will need to be adjusted though as the cards get more powerful and faster. I see a 980 is finishing units in about 4 hours now, so even the 6 hour time frame may not be quick enough, depending on whether it is doing 1, 2 or 3 units at a time to get that time.

mikey
Send message
Joined: 2 Jan 09
Posts: 278
Credit: 453,901,190
RAC: 472,654
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42089 - Posted: 2 Nov 2015 | 12:17:08 UTC - in response to Message 42085.
Last modified: 2 Nov 2015 | 12:18:19 UTC

I would like to propose a JIT policy be used on this project and I will attempt to articulate why.

In view of this projects need for a "FAST TURN AROUND" of WU's would a policy of ONE WU per GPU and a GPU only gets another when it has returned the last one. This simple policy would surely speed the throughput of WU's especially since GERARD has expressed on this forum a desire to do that and I would think he considers it an important goal.

This would also end the inefficiency of having a WU cached on one machine for several hours that could be RUNNING on another machine that can't get any work.

We could get a faster throughput of WU's and from that an increasing availability of new WU's because new WU's are generated from returned results.

Does that make sense to you?


This proposition doesn't make much sense to me. If anything, it will only achieve to slow WU processing dramatically!

Requiring a WU just completed, in order to get another one, means hosts that have not just completed a WU (or new hosts getting in the project) will never get a new WU! Unless BOINC supports priority queuing (and the GPUGRID people implement it), this is simply not feasible.

Without priority queuing, and with the current scarceness of WUs, eventually most hosts in the project would be denied new WUs, with the exception of a few lucky / "chosen ones". Not efficient and not fair!


The idea is that if someone has a high powered gpu and chooses to run 3 units at a time, they could cache on their system as many as 18 units per day, the rough number they could do if they were finishing them in 4 hours each. BUT if that same person had to finish a unit, get to 99% to start downloading a new unit, then they would only be caching the unit they are working on plus the new units they only get at the very end of the processing time, and there would be more units on GpuGrid for everyone else to download, until the Project runs out.

That person finishing units in 4 hours would still get to do 6 units per day, but if they ALSO adjusted the time bonus credit granting system, then that person could get similar credits to what they are getting now, but still leave plenty of units for everyone else wanting to help too.

The problem with not doing something like this is that they could scare people with lesser capable gpu's away, and then when someone with a very powerful gpu has problems and/or leaves the project, the project has no one to take up the slack and the project suffers. As stated elsewhere the project does not work 24/7/365, but we crunchers do, keeping alot of users happy keeps the project going for a longer time. Specifically catering to the latest and greatest gpu's is all well and good, until the users shut their systems down. Making small changes along the way keeps lots of people happy and the project is sustainable over the long term. The old tortoise and the hare analogy is what I am saying. Have lots of tortoises means it could take forever to reach the goals, having nothing but hares is great, until they leave for someone place else. Having a mixture is better, IMHO. Finding a way to do that that makes sense for both the project and the users is what we are talking about.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42090 - Posted: 2 Nov 2015 | 13:18:50 UTC - in response to Message 42089.

Since we are a wonderful community of crunchers, devoted to "The Science", affectionate and considerate to one-another (especially so the mega / divine crunchers towards the mere "mortal" crunchers), why don't we all lower our work buffer to something like 5 minutes (or even zero, why not??), so that the WU pool has as many available as possible?

Why can't those that run > 1 WU on 1 GPU revert back to 1-on-1, at least for the current "dry" period, sacrificing some "efficiency" (or is it "more creditzzz!") for the benefit of better WU distribution?

Why do we need the project to enforce fairness unto us, and not be fair by ourselves??

Maybe RAC is the single most important crunching motive after all...
____________

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 3
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42091 - Posted: 2 Nov 2015 | 14:32:07 UTC

Everyone has his own definition of "fair". If I were to believe some of them (which is unlikely), "fair" means requiring the GPU manufacturers to limit the speed of their cards so that some of the crunchers on GPUGrid don't fall behind in the ratings.

My definition of "useful" is for the science of GPUGrid (and all the other projects) to be advanced as fast as is feasible. I let the scientists figure that out. Sometimes they want the largest number of work units possible but don't care much about the delay ("latency" in computer-ese), and other times they want the delay to be minimized, since that gets back results they need to build new work units based on those results faster. It ultimately is dictated by the science they are trying to do, and our ideas of "fairness" are more or less irrelevant insofar as I am concerned. I will adapt my equipment to their needs, rather than vice-versa, and maybe get some useful work done.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,813,587,539
RAC: 893,726
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42092 - Posted: 2 Nov 2015 | 14:55:46 UTC

GPUGrid isn't my only project. I'm attached to 59 projects.

I run certain cache/buffer settings that minimize the RPCs to the projects' servers, while also keeping my GPUs satisfied despite the GPU Exclusions I have in place that cause work fetch problems.

And my goal is to maximize GPUGrid's ability to get work done, utilizing my resources. I couldn't care less about the credits. So, generally, to maximize utility, 2-on-a-GPU is the best way to do that.

If a change should be made to increase project utility during times when work units are scarce, it should be made server-side. I recommend my prior proposals.

klepel
Send message
Joined: 23 Dec 09
Posts: 161
Credit: 2,817,802,438
RAC: 629,342
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42093 - Posted: 2 Nov 2015 | 16:10:24 UTC
Last modified: 2 Nov 2015 | 16:11:52 UTC

I did not want to finish my point without some numbers.

But first reading a new the discussion from yesterday and all the answers from today, I would like to stress what a lively community we are! And there seems varying ways to be “fair”, and it seems that I have missed some.

I still would like to mention my own experience:
My GTX 570 and GTX 670 (both cards in the higher faster segments of its generation) are used to crunch one WU at the time: GTX570 needs around 76000 seconds to finish a Gerard (with bonuses 24 hour, 50% bonus RAC 255000) and the GTX 670 finishes in around 61000 seconds.
My GTX 970 loaded with two WUs at a time need for each 76000 seconds, which is in line with my GTX 570.
And at least the GTX 650 TI never meets the 24h deadline, needs around 127000 seconds and its usual RAC is 212000 for each Gerard.
I set a very low cache number for all cards, so I will receive new WUS only if the actual WU is nearly finished, otherwise I will miss the 24 limits more often on all cards as I do have very slow and unreliable internet connection.

The two WUs of the GTX970 are more or less in line with the same card segment on the previous two generations. The GTX650TI never finishes the WUs in 24 hours, and I would like to mention that in the forums you can read that GTX750 TI struggle to comply with the 24 hours limit as well.

Therefore I personally do believe JIT and its variants discussed in this forum are not fair at all, as it obliges to have always the fastest cards of each generation and therefore benefits the users who can afford changing their GPUs each generation and buy the fastest card of each generation.

So as we can see with all the comments, from the user side there should be a much liberty in deciding how to tweak / run the WUs on their computers within the established rules of GPUGRID to comply with the established 24 / 48 hours limits to their liking.

However Betting Slip I am open to any help to tweak my cards to increase the through put, as I am running all the cards out of the package, and do only set the fan speed as high as possible.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 3
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42094 - Posted: 2 Nov 2015 | 18:21:15 UTC - in response to Message 42093.

Therefore I personally do believe JIT and its variants discussed in this forum are not fair at all, as it obliges to have always the fastest cards of each generation and therefore benefits the users who can afford changing their GPUs each generation and buy the fastest card of each generation.

I certainly do not plan to rush out and buy the fastest card just to maximize my points. If you want to, that is up to you, but no one is obliged to. The point system, in my view, should reflect the value to the project. How the user responds to that, like how he spends his money, is up to him.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2048
Credit: 14,826,576,669
RAC: 2,426,205
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42095 - Posted: 3 Nov 2015 | 1:31:30 UTC - in response to Message 42090.
Last modified: 3 Nov 2015 | 1:46:52 UTC

Maybe I took it personally, or just my tranquilizer had rolled away?

Since we are a wonderful community of crunchers, devoted to "The Science", affectionate and considerate to one-another (especially so the mega / divine crunchers towards the mere "mortal" crunchers), why don't we all lower our work buffer to something like 5 minutes (or even zero, why not??), so that the WU pool has as many available as possible?
That's a provocative question, but easy to answer:
The group you refer as "we all" does not exist as a single entity, therefore it can't act like a single entity, so must it be "governed" by rules and motives to act coherently and effectively. (e.g. there are many contributors who don't read the forum at all - so they don't know that they should do something.) How many people read this thread at all? It has 448 views so far - that's much less than the active contributors.

Why can't those that run > 1 WU on 1 GPU revert back to 1-on-1, at least for the current "dry" period, sacrificing some "efficiency" (or is it "more creditzzz!") for the benefit of better WU distribution?
My personal reason to have a spare workunit is that I have a strong evidence, that it's in much better hands on my host than on a host which fail every single workunit. You can call it vigilantism, but it is certainly the result of the inadequate governing. (i.e. the server gives work for unreliable hosts)

Why do we need the project to enforce fairness unto us, and not be fair by ourselves??
The whole time of our civilization's existence wasn't enough to answer this question.
The best answer so far: that's human nature.
The first part of my reply is also applies here.

Maybe RAC is the single most important crunching motive after all...
Here we go again. Every time it comes to "reform" the credit system this assumption/accusation/disillusion is made.
Maybe we'll have a cure for those deadly illnesses in 10-15-20 years from now, and maybe this project will have a tiny part in it, but it is certainly not enough to convince the rest of the population to take part in it. I am confident of that we could have that cure right now, if all people of the wealthier part of the world would have been united 20 years ago for this goal, and would have been using their computers since then to take part in biomedical research, but they were playing 3D games, buying new cars & bigger houses instead. Why? Because they are motivated to do so, they are socialized to do so. The credit system is the only thing which could be the "interface" between the "material world" and the "BOINC world". Surely greed and vanity also came along from the real world, but that's human nature (that phrase again). By reforming the credit system we're trying to make the best out of it.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42096 - Posted: 3 Nov 2015 | 11:27:10 UTC - in response to Message 42095.

The group you refer as "we all" does not exist as a single entity, therefore it can't act like a single entity, so must it be "governed" by rules and motives to act coherently and effectively. (e.g. there are many contributors who don't read the forum at all - so they don't know that they should do something.) How many people read this thread at all? It has 448 views so far - that's much less than the active contributors.

So, you're admitting that you're not willing to lower your work buffer voluntarily, and the only way for this to happen is for the project to enforce this on us.

My personal reason to have a spare workunit is that I have a strong evidence, that it's in much better hands on my host than on a host which fail every single workunit. You can call it vigilantism, but it is certainly the result of the inadequate governing. (i.e. the server gives work for unreliable hosts)

That's a very selfish argument, which is also not valid: how do you know the results you return are not invalid, every single one of them? GPUGRID has a quorum of one, which makes all returned results potentially invalid! Just because a task doesn't crash on you, doesn't mean it doesn't contain computation errors, especially when running on a GPU. Incidentally, I strongly believe GPUGRID should establish a quorum of two.

Here we go again. Every time it comes to "reform" the credit system this assumption/accusation/disillusion is made.
Maybe we'll have a cure for those deadly illnesses in 10-15-20 years from now, and maybe this project will have a tiny part in it, but it is certainly not enough to convince the rest of the population to take part in it. I am confident of that we could have that cure right now, if all people of the wealthier part of the world would have been united 20 years ago for this goal, and would have been using their computers since then to take part in biomedical research, but they were playing 3D games, buying new cars & bigger houses instead. Why? Because they are motivated to do so, they are socialized to do so. The credit system is the only thing which could be the "interface" between the "material world" and the "BOINC world". Surely greed and vanity also came along from the real world, but that's human nature (that phrase again). By reforming the credit system we're trying to make the best out of it.

I'm with you 100%, I love credits myself! But, I hate hypocrisy! We're all here "for the science", but will complain in no time when the queues run empty!! We're not only eager to help the project, we demand to "help" (makes me wonder, how many times we've processed the same WUs, over and over, just so BOINC credit is generated and awarded...). We don't like it when scientists issue very long WUs and we lose the 24h bonus! On the other hand, we're sympathetic to the complainers, but we rush to grab as many WUs as we can, so our RAC isn't affected by the drought...

Long story short: People, you're voluntarily participating in BOINC and GPUGRID to help! The queues running empty should make you happy, because processing supply is far exceeding the demand! The project is making progress and the scientists get their results back ASAP. Realize what you're doing here, setup back-up projects and stop whining!! Also, it won't actually hurt you if you lower your work buffer and let some other folk process a WU, get some GPUGRID credit and maybe a GPUGRID badge. Your GPU may get a WU from a "humbler" project and you may lose your RAC first place, so what??
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2048
Credit: 14,826,576,669
RAC: 2,426,205
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42097 - Posted: 3 Nov 2015 | 18:01:03 UTC - in response to Message 42096.
Last modified: 3 Nov 2015 | 18:19:57 UTC

So, you're admitting that you're not willing to lower your work buffer voluntarily, and the only way for this to happen is for the project to enforce this on us.
I admit that.
Do you admit that if me, you and the 20 other people who actually read this thread would do lower their work buffer it still wouldn't help to solve the shortage because the other 3000 users just didn't care at all? BOINC is supposed to be a "set & forget" system, so it has to be well configured to work optimally without user intervention.

That's a very selfish argument, which is also not valid: how do you know the results you return are not invalid, every single one of them?
I don't know, just as you don't know either. No one does, so we all are similar from this point of view, which invalidates your argument.
But I got the goal of your argument, therefore I admit that I'm selfish. I also think that I'm not the only one here. Moreover we don't have the time and resources to convince the other 3000 careless / selfish participants to be careful & fair. It's much easier to create a system which enforces on us what is best for the system. That's how almost every community works.

GPUGRID has a quorum of one, which makes all returned results potentially invalid! Just because a task doesn't crash on you, doesn't mean it doesn't contain computation errors, especially when running on a GPU. Incidentally, I strongly believe GPUGRID should establish a quorum of two.
There's a lot of random factors involved in the simulation, so it will always have two different outputs when the same workunit is run twice.
This would take a major rewrite of the application, also it would cut the throughput in half.
I think the scientists filter the wrong results through visual checking.

But, I hate hypocrisy! We're not only eager to help the project, we demand to "help" (makes me wonder, how many times we've processed the same WUs, over and over, just so BOINC credit is generated and awarded...).
By asking for quorum of 2 you've asked for the same thing.

We don't like it when scientists issue very long WUs and we lose the 24h bonus! On the other hand, we're sympathetic to the complainers, but we rush to grab as many WUs as we can, so our RAC isn't affected by the drought...
I'm concerned about my falling RAC is mostly because at first sight I can't tell the reason of it, which could be that my hosts failing tasks, or there's a shortage, or there's an internet outage. Any of them is annoying, but only the shortage could be prevented.

klepel
Send message
Joined: 23 Dec 09
Posts: 161
Credit: 2,817,802,438
RAC: 629,342
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42098 - Posted: 3 Nov 2015 | 21:02:06 UTC

Donating the computer time, pay the electrical bill associated with it, and buying computer parts I otherwise would not need, is the philanthropic part of the equation participating in BOINC projects.

But reading and participating in the forums as well as the RACs are the fun it involves participating, and yes the later motivated me to buy some new stuff to get over a million RACs a day.

I still do not see any benefits in JIT, other that finally we are all at each other throat, discussing our personal motivation for participating!

Let’s get the job done, crunching through all the WUs as fast as possible!

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2048
Credit: 14,826,576,669
RAC: 2,426,205
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42099 - Posted: 4 Nov 2015 | 11:06:31 UTC - in response to Message 42098.

... finally we are all at each other throat, discussing our personal motivation for participating!

Let’s get the job done, crunching through all the WUs as fast as possible!
Suddenly I've realized that we're doing a Mark Twain adaptation here in this thread. :) Tom Sawyer Whitewashing the Fence

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 3
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42101 - Posted: 4 Nov 2015 | 13:22:32 UTC - in response to Message 42099.

Suddenly I've realized that we're doing a Mark Twain adaptation here in this thread. :)

http://www.filefactory.com/preview/4o1qp2s3vet9/

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 180
Credit: 144,701,536
RAC: 1,539
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 42102 - Posted: 4 Nov 2015 | 14:26:58 UTC

Oh, dear, this is all doubledutch to me :(. I process about four GPUGrid WUs every weekend if available. If there's no work, I process other WUs.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 3
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42103 - Posted: 4 Nov 2015 | 15:30:36 UTC - in response to Message 42102.

You raise a good point; the only reason we are having this discussion is that there has been a shortage of work. That may be easing, but the point still remains: how to prioritize the distribution of what they do have (if at all). Only GPUGrid can really decide that, since they know best what they are trying to accomplish. I hope they consider the ideas here though, and set the proper rewards. I think the crunching will fall into line after that.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 669
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42104 - Posted: 4 Nov 2015 | 16:57:08 UTC - in response to Message 42103.

You raise a good point; the only reason we are having this discussion is that there has been a shortage of work. That may be easing, but the point still remains: how to prioritize the distribution of what they do have (if at all). Only GPUGrid can really decide that, since they know best what they are trying to accomplish. I hope they consider the ideas here though, and set the proper rewards. I think the crunching will fall into line after that.


I agree there is little point of continuing this thread and it's time for GERARD or GDF to get involved and let us know what would be good for them and the project.

Hello, hello

Gerard
Volunteer moderator
Project developer
Project scientist
Send message
Joined: 26 Mar 14
Posts: 101
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 42111 - Posted: 6 Nov 2015 | 10:48:32 UTC - in response to Message 42104.

I don't personally see any point to enforce a maximum amount of 1 WU per user. I think this is a decision that must come from the user, for the various reasons that you have expressed in this thread.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2048
Credit: 14,826,576,669
RAC: 2,426,205
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42112 - Posted: 6 Nov 2015 | 12:08:46 UTC - in response to Message 42111.
Last modified: 6 Nov 2015 | 12:09:32 UTC

The best idea expressed in this thread to encourage crunchers to lower their work buffer by creating shorter than 24h return bonus level(s).
I think a 3rd bonus level of 75% for less than 12h would be sufficient, as a long workunit takes ~10.5h to process on a GTX970 (Win7).
I don't think there should be a shorter period for higher bonus, as it would not be fair to create a level which could be achieved only with the fastest cards. But it could be debated, as there's a lot of GTX980Tis attached to the project. Even some of my host could achieve higher PPD if there was a shorter bonus level with higher percentages, but the throughput increase of the whole project matters, not the single cruncher's desire.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 3
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42113 - Posted: 6 Nov 2015 | 16:18:49 UTC - in response to Message 42112.

It looks like a judicious compromise that will encourage the desired behavior (insofar as we know it) without burdening anyone.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 381
Credit: 4,777,720,789
RAC: 929,149
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42117 - Posted: 7 Nov 2015 | 11:36:13 UTC

After reading through this thread, I think we should leave things the way they are, with one exception not mentioned here. Which is not allow hosts with old and slow cards to be able to download tasks. I would include the 200 series and earlier, the lower end 400 and 500 series, early Quadro, early Tesla, and the M series. These cards take days to complete tasks (sometimes finishing after the deadline), and often finish with errors. This really slows the project.


Betting Slip
Send message
Joined: 5 Jan 09
Posts: 669
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42118 - Posted: 7 Nov 2015 | 12:09:50 UTC - in response to Message 42117.

200 series has been excluded for a while now.

mikey
Send message
Joined: 2 Jan 09
Posts: 278
Credit: 453,901,190
RAC: 472,654
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42119 - Posted: 7 Nov 2015 | 12:14:21 UTC - in response to Message 42117.

After reading through this thread, I think we should leave things the way they are, with one exception not mentioned here. Which is not allow hosts with old and slow cards to be able to download tasks. I would include the 200 series and earlier, the lower end 400 and 500 series, early Quadro, early Tesla, and the M series. These cards take days to complete tasks (sometimes finishing after the deadline), and often finish with errors. This really slows the project.


You don't think the negative rewards for running those cards is enough, then why stop them by design? I think cutting off people willing to TRY is not a good idea, but letting them know up front that they will not get the bonus could be a good idea. As for 'slowing the project' Seti tried MANY years ago now, to only send resends to the top performing users, maybe they could try that here with the 980 cards? I think it could be done fairly easily by having those with the 980 cards look at the resends group prior to the 'new' group of units when they ask for new workunits. I'm guessing they aren't separated now, but a folder system could fix that.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 381
Credit: 4,777,720,789
RAC: 929,149
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42121 - Posted: 7 Nov 2015 | 15:27:28 UTC - in response to Message 42118.

200 series has been excluded for a while now.



Then this page needs to be updated:


https://www.gpugrid.net/forum_thread.php?id=2507



Betting Slip
Send message
Joined: 5 Jan 09
Posts: 669
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42122 - Posted: 7 Nov 2015 | 20:02:27 UTC - in response to Message 42121.

200 series has been excluded for a while now.



Then this page needs to be updated:


https://www.gpugrid.net/forum_thread.php?id=2507





Indeed

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,813,587,539
RAC: 893,726
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42125 - Posted: 9 Nov 2015 | 4:38:49 UTC - in response to Message 42069.

The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file.

The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts.

I'd love to hear the admins respond to this proposal. I think it's a great compromise, that could fix multiple problems.


I still would REALLY appreciate this option. That way, I can set the "Run up to this many tasks per GPU" setting to 2, and the server would generally send "gpu_usage 0.5", but in times where the server decides it'd be better for 1-at-a-time, it would ignore my setting and send "gpu_usage 1.0".

From what I gather, this is possible. And, if it the admins think it would benefit their throughput enough to be useful, I would appreciate its implementation, as then I could see my GPUs get "dynamically adjusted" as deemed appropriate by GPUGrid, instead of "micro-managed" by me with an app_config.xml file.

Regards,
Jacob

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42127 - Posted: 9 Nov 2015 | 9:20:09 UTC - in response to Message 42125.

The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file.

The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts.

I'd love to hear the admins respond to this proposal. I think it's a great compromise, that could fix multiple problems.


I still would REALLY appreciate this option. That way, I can set the "Run up to this many tasks per GPU" setting to 2, and the server would generally send "gpu_usage 0.5", but in times where the server decides it'd be better for 1-at-a-time, it would ignore my setting and send "gpu_usage 1.0".

From what I gather, this is possible. And, if it the admins think it would benefit their throughput enough to be useful, I would appreciate its implementation, as then I could see my GPUs get "dynamically adjusted" as deemed appropriate by GPUGrid, instead of "micro-managed" by me with an app_config.xml file.

Regards,
Jacob


The problem I see with this is that it would apply equally to far unequally capable cards. For a recent, mid/high end GPU with 4GB or more it may be OK to process 2 tasks at a time, but what about an older GPU with say 2GB? At the best it would make processing crawl, at the worst it would cause crashes.

In the end, such an approach would also need to consider the type of GPU and amount of memory. I don't know how much complexity this would add to the scheduling logic and how more difficult it would make its maintenance.
____________

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,813,587,539
RAC: 893,726
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42129 - Posted: 9 Nov 2015 | 12:34:19 UTC - in response to Message 42127.
Last modified: 9 Nov 2015 | 12:35:11 UTC

The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file.

The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts.

I'd love to hear the admins respond to this proposal. I think it's a great compromise, that could fix multiple problems.


I still would REALLY appreciate this option. That way, I can set the "Run up to this many tasks per GPU" setting to 2, and the server would generally send "gpu_usage 0.5", but in times where the server decides it'd be better for 1-at-a-time, it would ignore my setting and send "gpu_usage 1.0".

From what I gather, this is possible. And, if it the admins think it would benefit their throughput enough to be useful, I would appreciate its implementation, as then I could see my GPUs get "dynamically adjusted" as deemed appropriate by GPUGrid, instead of "micro-managed" by me with an app_config.xml file.

Regards,
Jacob


The problem I see with this is that it would apply equally to far unequally capable cards. For a recent, mid/high end GPU with 4GB or more it may be OK to process 2 tasks at a time, but what about an older GPU with say 2GB? At the best it would make processing crawl, at the worst it would cause crashes.

In the end, such an approach would also need to consider the type of GPU and amount of memory. I don't know how much complexity this would add to the scheduling logic and how more difficult it would make its maintenance.



I don't think you understand my proposal.

I'm proposing a user web setting, that by default, would be set to "run at most 1 task per GPU", which is no different than today. But the user could change it, if they wanted to. Yes, they'd be responsible for knowing the types of GPUs they have attached to that profile venue/location. And the scheduling logic/changes shouldn't be too difficult - They would just need to "trump" the user setting, and use gpu_usage of 1.0, on any task they send out that they want back faster than if the user did 2-tasks-per-GPU.

By default the web setting would function no different than how GPUGrid functions today.

PS: At one time, I had a GTX 660 Ti and a GTX 460 in my machine.. and because of how BOINC server software works, it thought I had 2 GTX 660 Ti GPUs. And although the 660 Ti had enough memory for 2-per-GPU, the GTX 460 did not, and I had to set my app_config.xml to 1.0 gpu_usage. Times have changed. I now have 3 GPUs in this rig, GTX 970, GTX 660 Ti, GTX 660 Ti, and I can use 0.5 gpu_usage. But I'd prefer not have to use an app_config.xml file at all!

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42130 - Posted: 9 Nov 2015 | 14:09:04 UTC - in response to Message 42129.

Yes, I understand better now. You mean, the user setting a value for more than one task per GPU (a setting that would apply in general) and the server overriding this only for forcing 1-to-1 whenever the need arises.
____________

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,813,587,539
RAC: 893,726
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42131 - Posted: 9 Nov 2015 | 15:49:11 UTC

Yep, you got it! GPUGrid should only implement it, if they think the project throughput benefits would outweigh the dev costs. I can live with micro-managing the app_config.xml file, either way. I just think it sounds like a neat/appropriate feature for this project.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,991,617,060
RAC: 146,649
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42176 - Posted: 16 Nov 2015 | 22:13:03 UTC - in response to Message 42112.
Last modified: 16 Nov 2015 | 22:13:48 UTC

It's a misconception to think that you can't have a small cache of work without negatively impacting on the project. Crunchers who regularly return work all make a valuable contribution. This should never be misconstrued as slowing down the research - without the crunchers GPUGrid would not exist.

The turnaround problem stems from the research structure. Slow turnaround from crunchers is mostly down to crunchers who have smaller cards &/or don't crunch regularly. Many of these crunchers don't know what they are doing (Boinc settings) or the consequences of slow work return or having a cache.
Other issues such as crunching for other projects (task switching and priorities), bad work units, computer or Internet problems are significant factors too.

A solution might be to give optimal crunchers the most important work (if possible) and delegate perceived lesser work to those who return work more slowly/less reliably and to only send short tasks to slow crunchers. To some extent I believe this is being done.

If return time is critical to the project then credits should be based on return time and instead of having 2 or 3 cut-off points it should be a continual gradient from 200% down based on the fastest valid return, say 8h for example being reduced by 1% every half hour down to 1% for a WU returned after ~4.5days.
Would make the performance tables more relevant, add a bit of healthy competition and prevent people being harshly chastised for missing a bonus time by a few minutes (GTX750Ti on WDDM).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

disturber
Send message
Joined: 11 Jan 15
Posts: 11
Credit: 62,705,704
RAC: 0
Level
Thr
Scientific publications
watwatwat
Message 42229 - Posted: 28 Nov 2015 | 3:05:38 UTC

After reading these posts, I decided to set my queue time to 0.25 days. I have mismatched video cards, a 970 and a 660ti, so the queue is based on the slower card. I found that work returned by the 660ti was given less credit since the wu queue and compute time exceeded 24 hours. So this gave me the incentive to cut back on the number of waiting wu.

So this thread was beneficial to me. I have smaller wu queue (1 to be exact) and end up with more credits. A win win for all.

Thanks

Post to thread

Message boards : Number crunching : JIT (Just In Time) or Queue and Wait?