Advanced search

Message boards : Server and website : python tasks get to 2.00% and hang

Author Message
Sanford
Send message
Joined: 23 Nov 09
Posts: 5
Credit: 382,298,193
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58899 - Posted: 11 Jun 2022 | 4:50:16 UTC

Remaining estimate is over 45 days on an 8th-gen Intel system with a nVidia 1060 GPU. Also see some crash popups mentioning Python, but the task in BOINC still shows Active, but never seems to progress. Have suspended and even rebooted, but task still stuck at 2%. Perhaps my mix of software (Android development/emulators, etc that use VT-d modes) is causing problems? I guess it would be nice if you could roll everything up into an .EXE without needing to run a VirtualBox VM.

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 1,305,850,756
RAC: 354,518
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58903 - Posted: 11 Jun 2022 | 18:53:25 UTC

AFAIK, GPUGRID does not use a VirtualBox VM. At lease it doesn't on either of my machines.

Remaining estimate always starts out really high until several tasks are completed successfully, then it will start being more accurate.

A Python task will use about 65% of available CPU threads. For example, on my 6 core 12 thread CPU, Python tasks will use 7-8 threads.

On all of your tasks that I checked, it shows an error message of "OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\ProgramData\BOINC\slots\13\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies.
Traceback (most recent call last):"

Suggest that you increase the size of your paging file, run a python task by itself and watch system usage, run the python task through to completion without interruption.

Let us know if that helps.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1104
Credit: 1,469,862,408
RAC: 296,034
Level
Met
Scientific publications
watwatwatwatwat
Message 58908 - Posted: 11 Jun 2022 | 20:56:12 UTC

The problem is that Windows handles the request for reserving memory without question. And the Python tasks request a ton of memory, more than what the automatically sized paging file can handle in Windows.

Solution is to deselect automatic sizing and either set system managed size or Custom size and set a very large value on the order of tens of gigabytes.

From a good Github reply on the problem that concisely sums up the issue.


The issue is with how multi-process Python works on Windows with the pytorch/cuda DLLs. The number of workers you set in the DataLoader directly relates to how many Python processes are created.

Each time a Python process imports pytorch it loads several DLLs. These DLLs have very large sections of data in them that aren't really used, but space is reserved for them in memory anyways. We're talking in the range of hundreds of megabytes to a couple gigabytes, per DLL.

When Windows is asked to reserve memory, if it says that it returned memory then it guarantees that memory will be available to you, even if you never end up using it.

Linux allows overcommitting. By default on Linux, when you ask it to reserve memory, it says "Yeah sure, here you go" and tells you that it reserved the memory. But it hasn't actually done this. It will reserve it when you try to use it, and hopes that there is something available at that time.

So, if you allocate memory on Windows, you can be sure you can use that memory. If you allocate memory on Linux, it is possible that when you actually try to use the memory that it will not be there, and your program will crash.

On Linux, when it spawns num_workers processes and each one reserves several gigabytes of data, Linux is happy to say it reserved this, even though it didn't. Since this "reserved memory" is never actually used, everything is good. You can create tons of worker processes. Just because pytorch allocated 50GB of memory, as long as it never actually uses it it won't be a problem. (Note: I haven't actually ran pytorch on Linux. I am just describing how Linux would not have this crash even if it attempted to allocate the same amount of memory. I do not know for a fact that pytorch/CUDA overallocate on Linux)

On Windows, when you spawn num_workers processes and each one reserves several gigabytes of data, Windows insists that it can actually satisfy this request should the memory be used. So, if Python tries to allocate 50GB of memory, then your total RAM + page file size must have space for 50GB.

So, on Windows NumPythonProcesses*MemoryPerProcess < RAM + PageFileSize must be true or you will hit this error.


The number of workers for the Python on GPU tasks is 32 spawned workers.

So the equation is going to be 32 * MemoryPerProess < RAM +PageFileSize

And many of the PyTorch DLL's request a couple of GB's of memory allocation each.

This is why it is difficult to run the tasks on gpus because the system memory + pagefile size is most often inadequate. The pagefile size needs to be greatly increased. And to do that means you need a large piece of storage real estate for the pagefile.

lukeu
Send message
Joined: 14 Oct 11
Posts: 31
Credit: 74,250,504
RAC: 24,916
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59273 - Posted: 19 Sep 2022 | 9:21:47 UTC - in response to Message 58908.

Hi & thanks for the clear info. I was just rejoining to crunch for the winter season, and first thought (after frantically increasing swap-size) was indeed: "is something assuming Linux-style overcommit?"

Out of curiosity I also ran up VMMap, which led me to this SO article which did the same: https://stackoverflow.com/a/69489193/932359

The frustrating bit is it seems to just be an incorrect flag set by nVidea for their embedded "fat binaries": Setting copy-on-write means Window's rigorous memory accounting needs to commit space for each instance. If it's not space for _data_ then it should be read-only to be shared (memory-mapped) between all processes. It reads to me that this probably includes the binary code for all the various GPUs we don't own! :-D

Notable quote:

edit 2022-01-20: Per NVIDIA: "We have gone ahead and marked the nv_fatb section as read-only, this change will be targeting next major CUDA release 11.7 . We are not changing the ASLR, as that is considered a safety feature ."


I take it, just from the filenames, that GPUGrid is using CUDA 10.x? (Their DLLs don't seem to embed version info.)

(Problem for me is I'm trying to use an old 64GB SSD as a scratch disk for all swap & temp files. But not to worry, I'll see if I can scrape by with 48 GB & will add another swapfile on another disk if need be.)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1104
Credit: 1,469,862,408
RAC: 296,034
Level
Met
Scientific publications
watwatwatwatwat
Message 59276 - Posted: 19 Sep 2022 | 23:52:56 UTC - in response to Message 59273.

The Nvidia drivers are already up to CUDA 11.7 in the 515 series.

So maybe you can ping the developer abouh and see whether he can drop the lower compatibility CUDA 10.2 and 11.3 versions he is compiling the Windows apps with and move to the CUDA 11.7 SDK so that the nv_fatb sections in the DLL's will be marked read-only.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 790
Credit: 5,099,500,994
RAC: 2,917,204
Level
Tyr
Scientific publications
wat
Message 59277 - Posted: 20 Sep 2022 | 0:29:00 UTC - in response to Message 59273.

I take it, just from the filenames, that GPUGrid is using CUDA 10.x? (Their DLLs don't seem to embed version info.)


the app is CUDA 11.3.1 (11.3 Update 1) or CUDA 10.2. which version you get probably depends on which drivers you have. since your 512 drivers support 11.3+ you got the cuda1131 app. can see it here in your list of tasks: http://www.gpugrid.net/results.php?hostid=470942
____________

lukeu
Send message
Joined: 14 Oct 11
Posts: 31
Credit: 74,250,504
RAC: 24,916
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59278 - Posted: 20 Sep 2022 | 6:45:35 UTC - in response to Message 59276.

The Nvidia drivers are already up to CUDA 11.7 in the 515 series. (...)


Ah, right. I didn't realise this would also put a minimum version constraint on the local gfx drivers. My one was at 512.15 from Mar-2022 via Windows automatic updates. (I'm currently downloading the latest to manually install.)

I guess this catches users between:

    - "a rock": runs with larger than expected memory consumption, and
    - "a hard place": doesn't run at all due to driver incompatibility


So on sum, it's probably premature for them to jump to 11.7 right now: "the rock" probably works out of the box for more users. Maybe once a sufficient version has comes through Windows Update then it could be time to consider it.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 790
Credit: 5,099,500,994
RAC: 2,917,204
Level
Tyr
Scientific publications
wat
Message 59284 - Posted: 20 Sep 2022 | 14:47:49 UTC - in response to Message 59278.

The Nvidia drivers are already up to CUDA 11.7 in the 515 series. (...)


Ah, right. I didn't realise this would also put a minimum version constraint on the local gfx drivers. My one was at 512.15 from Mar-2022 via Windows automatic updates. (I'm currently downloading the latest to manually install.)

I guess this catches users between:

    - "a rock": runs with larger than expected memory consumption, and
    - "a hard place": doesn't run at all due to driver incompatibility


So on sum, it's probably premature for them to jump to 11.7 right now: "the rock" probably works out of the box for more users. Maybe once a sufficient version has comes through Windows Update then it could be time to consider it.



since your 1060 supports old drivers, you could backdate the drivers to 10.2+ and get the 10.2 app, and maybe this will use less memory without having any cuda 11+ code and no bins for cuda 11 cards.
____________

Post to thread

Message boards : Server and website : python tasks get to 2.00% and hang

//