Advanced search

Message boards : Frequently Asked Questions (FAQ) : FAQ: Accelerator processors. A note on credits

Author Message
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 1097 - Posted: 30 Apr 2008 | 14:50:50 UTC
Last modified: 27 Feb 2009 | 23:48:06 UTC





A conventional processor dedicates a relatively large fraction of its transistors to complex control logic, to maximise performance of a serial code. The Cell processor contains 8 fast Synergetic Processing Elements (SPEs) designed to maximise arithmetic throughput. Graphical processing units (GPUs) have a very large number of slower cores maximizing parallel throughput.
All this computational power comes at the cost of a programming paradigm change. An existing application would run on the Cell processor using only the PPE core without any performance benefit. Therefore, in order to obtain the maximum performance, it is necessary to use all SPEs and to adapt the code to match the underlying hardware architecture. This means addressing issues of vectorization, memory alignment and communication between main memory and local stores. On GPUs, it is now available a nice programming environment (CUDA) which helps dramatically in fully exploiting the potentiality of these devices.

HOW DO WE ASSIGN CREDITS?

On a standard PC, BOINC assigns credits based on the average between the floating-point and integer performance of the machine according to a set of benchmarks performed by the client, regardless of the real performance of the application on the machine.

Credits = 0.5(million float ops/sec + million int ops/sec)/864,000 * (cpu time in seconds),
(each unit of BOINC credit, the Cobblestone, is 864,000 MIPS)

where "float ops" are floating-point operations, and "int ops" are integer operations. These benchmarks on the Cell processor are of course wrong because they do not use the SPEs. The same applies for GPUs which are not considered by the benchmarks. In any case, as we said, these benchmarks are just an indication of the speed of the machine not of the speed of the application.

For instance, this machine returns the following benchmark by the BOINC client:
GenuineIntel Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz [Family 6 Model 15 Stepping 11]
Number of CPUs 2
Measured floating point speed 2281.82 million ops/sec
Measured integer speed 6348.82 million ops/sec

The average is therefore 4343 MIPS (million instruction per second) or equivalently 18.09 Cobblestone/hour as assigned by the BOINC system automatically. So, BOINC will assign to this machine 18.09 Cobblestones each CPU hour of calculation. Note that the ratio between floating operations and integer operations is approximately 3 (=6348/2281).

The way we assign credits takes into account these facts.
First of all we need to measure the floating point performance of the application. We have build a performance model of our applications (CELLMD and ACEMD) by counting the number of flops manually per step. For a specific WU, we are able to compute how many total floating operations are performed in average depending on the number of atoms, number of steps and so on. For CELLMD it was also possible to verify that the estimated flops were correct within few percent from the real value (multiplication, addition, subtraction, division and reciprocal square root are counted as as a single floating-point operation). In the case of GPU, we can also use interpolating texture units instead of computing some expensive expression. In this case, as the CPU does not have anything similar, we use the number of floats of the equivalent expression. It is not easy to measure the number of integer operations, so we guess the estimated MIPS to be 2 times the number of floating-point operations (really, we reckon that it would be correct to assign up to a factor 3 times, as in the example above). Therefore,

Credits = 0.5(MFLOP per WU + approx MIPS per WU)/864,000
(MFLOP is million of floating point operations)

Finally note, that this method produces the credits for the real performance of the application, not a benchmark as the BOINC client does, so it is a bit penalized.

In molecular dynamics, speed is critical and we put all our efforts into providing the most efficient molecular dynamics codes. To give you an idea, the development of these codes took literally years of work. Read more on the performance and efficiency of our applications:
ACEMD
CELLMD

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 10333 - Posted: 30 May 2009 | 19:36:06 UTC - in response to Message 1097.

Current crediting was modified a little while ago, due to the fact that SETI is using a flops multiplier of 2.4x for their project. The reason is to make CPU and GPU workunits to return the same credits (GPUs in much less time of course).

So now, we apply a multiplier of 2.0x for all workunits, but give and additional 25% for WUs returned within two days. This is useful for us to reduce latency of the results while keeping us in line with a correct value for credits as in SETI.

gdf

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 13113 - Posted: 10 Oct 2009 | 10:13:18 UTC - in response to Message 10333.

We are now in the process of measuring our own flops multiplier as we can run ACEMD on the CPU as well. So, we don't have to use the seti one. This means that it will likely be subject to slow changes. The multiplier is computed as such the same workunit returns the same credits irrespective of where it is run (CPU or GPU), the only difference will be in the time to finish it (much less in the GPU case).

gdf

Message boards : Frequently Asked Questions (FAQ) : FAQ: Accelerator processors. A note on credits

//