Author |
Message |
rtXSend message
Joined: 2 Apr 09 Posts: 10 Credit: 80,975,982 RAC: 37,187 Level
Scientific publications
|
I have a WU that has been reporting 'uploading' since around 3 October 2021) it's currently 5 November). Nothing seems to be happening. Despite the WU taking 2d6h to process, I have tried to abort it in desperation. I can't even do that, it is just stuck there. Please can anyone offer advice on this? |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1335 Credit: 7,455,517,459 RAC: 13,725,640 Level
Scientific publications
|
Reset the project is the easiest solution.
But first, make sure you are running the latest BOINC client that has the fix for the expired security certificate that caused the stalled upload in the first place.
https://boinc.berkeley.edu/download_all.php |
|
|
zooxitSend message
Joined: 4 Jul 21 Posts: 23 Credit: 9,329,747,892 RAC: 49,852,830 Level
Scientific publications
|
Hi,
I have the same problem. 15 WUs are waiting for upload in Transfers (two computers, one Windows one Linux).
If I reset the project, do I loose the WUs? :(
Or is it possible to backup the completed WUs somewhere on the disk and resend them after resetting the project?
7 WUs where sent normally from the same two computers on 26th of November. I don't remember changing anything... |
|
|
|
DON'T reset the project. It won't help.
I have both reported completed work, and downloaded new work, on four machines this morning - two Linux, and two Windows. But it's a complicated process, and has to be followed exactly. It'll take a while for me to write it up - please be patient. |
|
|
zooxitSend message
Joined: 4 Jul 21 Posts: 23 Credit: 9,329,747,892 RAC: 49,852,830 Level
Scientific publications
|
Thanks for quick answer. Waiting eagerly for the solution :)
More info:
I see now that all (3) my hosts last connected to gpugrid on 27th at around 11P.M. They all reported sucesfull tasks on 26th. |
|
|
|
Instructions for working round the 'expired certificate'problem at GPUGrid.
-1) Set 'No new tasks' for GRUGid. You cannot both report completed work, and request new work, in the same operation.
Read the following instructions carefully and fully before starting. If you don't understand anything, STOP - now is not the moment to start learning about BOINC.
To report completed work:
1) Stop the BOINC client.
2) Navigate to the BOINC data directory.
3) Open the file 'client_state.xml' for editing, using a plain text editor.
4) Locate the section for GPUGrid. Don't change anything outside this section.
5) Locate the line that starts <scheduler_url> (towards the end of the first section, above <code_sign_key>)
6) Change the scheduler url from https to http
7) Find every example of <upload_url> within the GPUGrid section. Change https to http
8) Save the edited file
9) Restart the BOINC client. GPUGrid files should upload, and finisted tasks should report, automatically - possibly after a short delay while benchmarks are run.
To fetch new work:
0) This assumes you have modified the scheduler url to report completed work. If you haven't done that already, do it now.
1) From the 'Activity' menu in BOINC Manager, suspend network activity.
2) Allow new work for GPUGrid
3) Update the GPUGrid project manually. It won't do this automatically while networking is suspended.
4) Verify that new task(s) have been allocated, and that files are waiting to download.
5) Set 'No new work' again for GPUGrid
6) Stop the BOINC client.
7) Navigate to the BOINC data directory.
8) Open the file 'client_state.xml' for editing, using a plain text editor.
9) Locate the section for GPUGrid. Don't change anything outside this section.
10) Find every example of <download_url> within the GPUGrid section. Change https to http
11) You may as well change the new upload urls to http while you're here - it'll save time when the task finishes.
12) Save the edited file
13) Restart the BOINC client. GPUGrid files should download, and new tasks should start running, automatically - possibly after a short delay while benchmarks are run.
|
|
|
|
Every text editor has a "search and replace" function, so it's easier to use it when editing the clent_state.xml file.
Search for: https://www.gpugrid.net
Replace to: http://www.gpugrid.net
Is there a way to make this change permanent?
I've tried to edit the account_www.gpugrid.net.xml file, but it still tries to pull the new tasks using https.
I will try to detach one of my hosts, and reattach manually through http://www.gpugrid.net |
|
|
|
Is there a way to make this change permanent?
Every new task requires new files to be downloaded, and each new download comes with its own url - https by default.
so NO. |
|
|
|
6) Change the scheduler url from https to http
7) Find every example of <upload_url> within the GPUGrid section. Change https to http
10) Find every example of <download_url> within the GPUGrid section. Change https to http
11) You may as well change the new upload urls to http while you're here - it'll save time when the task finishes.
NB! If you change project url from secure HTTPS to unsecure HTTP, it's wise to beforehand change authentication key( <authenticator></authenticator> ) in account_www.gpugrid.net.xml to weak account key( https://gpugrid.net/weak_auth.php ) to prevent account abuse if http traffic will be sniffed by someone. |
|
|
|
I will try to detach one of my hosts, and reattach manually through http://www.gpugrid.net Don't try this, as you won't be able to attach your host to the project again.
You should ignore the message:
GPUGRID: Notice from BOINC
You are attached to this project twice. Please remove projects named GPUGRID, then add https://www.gpugrid.net/
|
|
|
|
Instructions for working round the 'expired certificate'problem at GPUGrid.
I didn't remember if the project state in the BOINC client would reset when project_url was changed, so I went a different way to get around the problem of checking an expired certificate - "a little" more stoned xD
1) Created a local CA (for simplicity, you can use Easy-RSA, there are a lot of instructions on the Internet) and issued a certificate for www.gpugrid.net
2) Added a local CA to the client's ca-bundle BOINC.
3) Changed <authenticator></authenticator> in account_www.gpugrid.net.xml to weak account key( https://gpugrid.net/weak_auth.php ) to prevent account abuse if http traffic will be sniffed by someone.
4) Configured stunnel to accept HTTPS on localhost(127.0.0.1) for BOINC client and transmit unencrypted HTTP to GPUGRID's IP-address 84.89.134.145 (Yeah, it's not secure, but weak account key used for authentication).
5) In hosts file for www.gpugrid.net reassigned IP-address to 127.0.0.1 (localhost).
6) PROFIT! xD
If suddenly someone will be interested in this variant, I can try to make instructions for Windows(for *nix-like, in principle, everything is the same, only file's paths differ, and I think that *nix users can cope with this task anyway). |
|
|
|
Instructions for working round the 'expired certificate'problem at GPUGrid.
-1) Set 'No new tasks' for GRUGid. You cannot both report completed work, and request new work, in the same operation.
Read the following instructions carefully and fully before starting. If you don't understand anything, STOP - now is not the moment to start learning about BOINC.
To report completed work:
1) Stop the BOINC client.
2) Navigate to the BOINC data directory.
3) Open the file 'client_state.xml' for editing, using a plain text editor.
4) Locate the section for GPUGrid. Don't change anything outside this section.
5) Locate the line that starts <scheduler_url> (towards the end of the first section, above <code_sign_key>)
6) Change the scheduler url from https to http
7) Find every example of <upload_url> within the GPUGrid section. Change https to http
8) Save the edited file
9) Restart the BOINC client. GPUGrid files should upload, and finisted tasks should report, automatically - possibly after a short delay while benchmarks are run.
To fetch new work:
0) This assumes you have modified the scheduler url to report completed work. If you haven't done that already, do it now.
1) From the 'Activity' menu in BOINC Manager, suspend network activity.
2) Allow new work for GPUGrid
3) Update the GPUGrid project manually. It won't do this automatically while networking is suspended.
4) Verify that new task(s) have been allocated, and that files are waiting to download.
5) Set 'No new work' again for GPUGrid
6) Stop the BOINC client.
7) Navigate to the BOINC data directory.
8) Open the file 'client_state.xml' for editing, using a plain text editor.
9) Locate the section for GPUGrid. Don't change anything outside this section.
10) Find every example of <download_url> within the GPUGrid section. Change https to http
11) You may as well change the new upload urls to http while you're here - it'll save time when the task finishes.
12) Save the edited file
13) Restart the BOINC client. GPUGrid files should download, and new tasks should start running, automatically - possibly after a short delay while benchmarks are run.
yesterday I downloaded 2 tasks, then set NNT. one completed and uploaded. the other is "stuck" in uploading. the file size is not too big (290MB and less, max_nbytes set to 1024MB), and the upload URLs are already all https.
so?
____________
|
|
|
|
so? So, change them to http if you want to bypass the certificate error.
|
|
|
|
oh sorry, i misread your post, I thought you were going the other way round (http->https). I'll try that.
____________
|
|
|
|
oh sorry, i misread your post, I thought you were going the other way round (http->https). I'll try that.
Read it carefully, and fully. All steps are necessary, and in the order I've given them. |
|
|
|
I understand it. the task is trivial for me, editing client_state is no big deal. as zoltan pointed out, find and replace of the entire upload URL (it's not present anywhere else) works fine. it's done, and works. thanks.
____________
|
|
|
|
Worked like a charm for me as well. Thanks for the instructions. Changed the authentificator to the weak account key beforehand as suggested.
Edit: @Ian: your machine is a beast. the 3080Ti finished the WU ~4x faster than my 1660S. times are calling for an update. wish the prices would finally relax a little bit but might be way too soon to even hope for... |
|
|
|
Instructions for working round the 'expired certificate'problem at GPUGrid.
I followed the instructions (and thanks for posting that!), and it worked to get the files uploaded. But the task will not report. I have this same problem across 3 different machines. I get the following:
131 GPUGRID 11/28/2021 7:18:52 AM update requested by user
135 GPUGRID 11/28/2021 7:19:01 AM [sched_op] Fetching master file
136 GPUGRID 11/28/2021 7:19:01 AM Fetching scheduler list
137 GPUGRID 11/28/2021 7:19:03 AM [sched_op] Deferring communication for 1 days 00:00:00
138 GPUGRID 11/28/2021 7:19:03 AM [sched_op] Reason: 52 consecutive failures fetching scheduler list
FWIW, I did update the "<scheduler_url>" line as instructed.
____________
Reno, NV
Team: SETI.USA
|
|
|
|
If you've reached that state (implying 10 or more consecutive failed attempts to contact the scheduler), you'll probably have to change the "<master_url>" - first line in the project section, in client_state.xml - to http like the others.
But don't change that line - with a global replace or otherwise - unless you really have to. You'll get warning messages in the Event Log.
Edit - after you change the master url, you'll probably be sent a new scheduler url. But that'll be https again, so then you'll need to change that again, as well. |
|
|
|
Moving the clock (time and date) back on the host, also works, but boinc runs a little funky, so as soon as you finish uploading, downloading and/or reporting your WUs, switch it back.
|
|
|
marsinphSend message
Joined: 11 Feb 18 Posts: 41 Credit: 579,891,424 RAC: 0 Level
Scientific publications
|
Instructions for working round the 'expired certificate'problem at GPUGrid.
-1) Set 'No new tasks' for GRUGid. You cannot both report completed work, and request new work, in the same operation.
Read the following instructions carefully and fully before starting. If you don't understand anything, STOP - now is not the moment to start learning about BOINC.
To report completed work:
1) Stop the BOINC client.
2) Navigate to the BOINC data directory.
3) Open the file 'client_state.xml' for editing, using a plain text editor.
4) Locate the section for GPUGrid. Don't change anything outside this section.
5) Locate the line that starts <scheduler_url> (towards the end of the first section, above <code_sign_key>)
6) Change the scheduler url from https to http
7) Find every example of <upload_url> within the GPUGrid section. Change https to http
8) Save the edited file
9) Restart the BOINC client. GPUGrid files should upload, and finisted tasks should report, automatically - possibly after a short delay while benchmarks are run.
To fetch new work:
0) This assumes you have modified the scheduler url to report completed work. If you haven't done that already, do it now.
1) From the 'Activity' menu in BOINC Manager, suspend network activity.
2) Allow new work for GPUGrid
3) Update the GPUGrid project manually. It won't do this automatically while networking is suspended.
4) Verify that new task(s) have been allocated, and that files are waiting to download.
5) Set 'No new work' again for GPUGrid
6) Stop the BOINC client.
7) Navigate to the BOINC data directory.
8) Open the file 'client_state.xml' for editing, using a plain text editor.
9) Locate the section for GPUGrid. Don't change anything outside this section.
10) Find every example of <download_url> within the GPUGrid section. Change https to http
11) You may as well change the new upload urls to http while you're here - it'll save time when the task finishes.
12) Save the edited file
13) Restart the BOINC client. GPUGrid files should download, and new tasks should start running, automatically - possibly after a short delay while benchmarks are run.
Hello Richard for this full explanation.
Like you write, not easy.
By the way, why, we need to solve the problems, if admin seems to do nothing ?
I have try your solution.
Instead of CA error, now I have transient error.
So, roll back to normal settings.
Once again, thank you for yourhelp.
____________
|
|
|
|
By the way, why, we need to solve the problems, if admin seems to do nothing ?
My sympathies are with the project's scientific researchers, who are probably just as exasperated with the project's administrators as we are.
This problem has surfaced on a Sunday, which is probably the worst day of the week for a quick fix. Doing what we can to get results back for the scientists is at least a token attempt to keep things running, until the administrators reach their desks tomorrow.
|
|
|
zooxitSend message
Joined: 4 Jul 21 Posts: 23 Credit: 9,329,747,892 RAC: 49,852,830 Level
Scientific publications
|
I tried as RH instructed.
Files uploaded I guess (they are gone from /var/lib/boinc/projects/www.gpugrid.net and are not visible in Transfers (manager) anymore)...
BUT, on the website it says: Upload failed
What now? (I did make a copy of the whole folder /var/lib/boinc/projects/www.gpugrid.net beforehand)
Couldn't make the download though:
To fetch new work:
I did 0, 1), 2) and 3) but than I couldn't verify that new tasks were alocated (how can thez be if network is suspended in step 1)?)?
Why did this start happening anyway - tasks were uploading normaly on 26th??
Moving to Windows machine now, hope it works there... |
|
|
|
.. but than I couldn't verify that new tasks were alocated (how can thez be if network is suspended in step 1)?)?
Learn to use other parts of BOINC's user interface. Switch to 'Advanced view', if you haven't already.
Pressing 'Update' while networking is suspended temporarily allows that one single request to get out to the network.
If it is successful, files awaiting transfer will appear on the 'Transfers' tab. The tasks themselves will be visible on the tasks tab, and details of the transaction will be listed in the Event Log. |
|
|
zooxitSend message
Joined: 4 Jul 21 Posts: 23 Credit: 9,329,747,892 RAC: 49,852,830 Level
Scientific publications
|
@Richard Haselgrove: since always I've been using Advanced view, which doesn't meen at all that I am advanced... (for years i've only been running Einstein, Milkywy, WCG - projects that in all the years didn't request any intervention, so... no opportunity to learn there). :)
Worked on a Win11 machine like a charm! Thanks!
Didn't work on Debian11 though - it seamed it worked on my computer, but Gpugrid-website still says Upload failed.
Is there anyway I can try it again with backed-up files?
(I already tried copying the e1s* files back to /var/lib/boinc/projects/www.gpugrid.net - the files where automaticaly deleted after the first (un)succesfull upload attempt. But now they don't seem to be recognized by boinc at all.)
|
|
|
|
Sorry, once the server has decided on an outcome, that's the end of it. We can only influence the outcome before that final report has been made.
My Linux here is Linux Mint - a form of Ubuntu. That's the only one I can advise with confidence about. |
|
|
zooxitSend message
Joined: 4 Jul 21 Posts: 23 Credit: 9,329,747,892 RAC: 49,852,830 Level
Scientific publications
|
:( 4GPUs working for almost a day...
moving on
So, I cannot seem to get past step 4 in fetching data. I do exactly as instructed but no gpugrid files showup in Transfers.
I tried it as master_url set to http as well.
EDIT:
Event log: Not requesting tasks: don't need...project not highest priority
So, stopped all other GPU projects. Making progress... |
|
|
|
Will the administrators actually fix this problem by updating the required certificate. Seems the obvious solution. |
|
|
|
If you've reached that state (implying 10 or more consecutive failed attempts to contact the scheduler), you'll probably have to change the "<master_url>" - first line in the project section, in client_state.xml - to http like the others.
But don't change that line - with a global replace or otherwise - unless you really have to. You'll get warning messages in the Event Log.
Edit - after you change the master url, you'll probably be sent a new scheduler url. But that'll be https again, so then you'll need to change that again, as well.
didnt work for me. I was in the same situation. tasks "uploaded" but would not "report". changed the scheduler_url to http. nothing. changed the master url to http and it bombed the whole project lol. now it can't re-attach until it's fixed. so yeah, changing the master url is not the right move and will just make you lose everything. it's basically like hitting project reset.
glad I only had one stuck task to lose.
____________
|
|
|
|
The tasks aren't due until Dec 2. So I will wait to try this until just before they are late, hoping the crew issue will be fixed before that.
____________
Reno, NV
Team: SETI.USA
|
|
|
Erich56Send message
Joined: 1 Jan 15 Posts: 1131 Credit: 9,791,257,676 RAC: 34,519,849 Level
Scientific publications
|
Will the administrators actually fix this problem by updating the required certificate. Seems the obvious solution.
all we can do is hope. Although I am unsure how long this will take to happen |
|
|
|
Just had a brief database outage, so somebody's awake and poking around. Two of my manual downloads from yesterday have uploaded and reported, without further manual intervention.
But I'm still getting the privacy warning on this website, so proceed at caution for the time being.
Edit: and as soon as I post that, the privacy warning disappears and I have a proper web connection again. I'll try to get replacements for those two uploads, and report back.
Edit 2: Yup, those went fine - one Windows, one Linux. Normal service is resumed. Stand back, away from the rush - that server is going to be mighty busy today!
Kudos to the team for the rapid Monday morning rescue. |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1335 Credit: 7,455,517,459 RAC: 13,725,640 Level
Scientific publications
|
If you've reached that state (implying 10 or more consecutive failed attempts to contact the scheduler), you'll probably have to change the "<master_url>" - first line in the project section, in client_state.xml - to http like the others.
But don't change that line - with a global replace or otherwise - unless you really have to. You'll get warning messages in the Event Log.
Edit - after you change the master url, you'll probably be sent a new scheduler url. But that'll be https again, so then you'll need to change that again, as well.
didnt work for me. I was in the same situation. tasks "uploaded" but would not "report". changed the scheduler_url to http. nothing. changed the master url to http and it bombed the whole project lol. now it can't re-attach until it's fixed. so yeah, changing the master url is not the right move and will just make you lose everything. it's basically like hitting project reset.
glad I only had one stuck task to lose.
I did the same exact thing and had the same outcome. Essentially a project reset on the daily driver. Lost one task. |
|
|
|
I can't find link to this page anywhere
https://www.gpugrid.net/apps.php |
|
|
|
I can't find link to this page anywhere
https://www.gpugrid.net/apps.php I've made one for you above.
The link gone missing when the webpage redesigned a couple of years ago.
|
|
|
|
I can't find link to this page anywhere
https://www.gpugrid.net/apps.php I've made one for you above.
The link gone missing when the webpage redesigned a couple of years ago.
Not missing. Apparently it was put on the “Join Us” page.
http://www.gpugrid.net/join.php
____________
|
|
|
rtXSend message
Joined: 2 Apr 09 Posts: 10 Credit: 80,975,982 RAC: 37,187 Level
Scientific publications
|
Thanks to Richard Haselgrove for attempting a workaround narrative to my OP.
Recap: I've had a WU stuck on uploading since 3 October. I can't abort it. It just won't go. I am running BOINC 7.16.11. which it tells me is the latest version when I check for updates.
Using Richard's workaround, I changed to No New Tasks and changed the relevant URLs to http. It was still stuck on uploading even after requesting a manual update with communication deferred for 23h59m59s. Pressing update just resets the comm deferred clock. I checked the xml file and it was showing http. I have not changed the master URL.
On the server state page it reports all processes running properly, so I don't think the workaround worked for me.
I've read people getting frustrated with the administration of this project. Is there no server side fix for this? Should it ever have happened in the first place? I've never had problems with other projects.
I've ended up resetting the project in frustration. Now, of course, there are no new WUs on the server, but at least I have cleared the unreported task. |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1335 Credit: 7,455,517,459 RAC: 13,725,640 Level
Scientific publications
|
Your Manager will not report the latest available BOINC Client and Manager. You are currently running an outdated BOINC package with a known flaw of an expired SSL certificate at the end of September that prevents correct communication with many projects including this one.
Please update your BOINC installation to the latest version 7.16.20 available here.
https://boinc.berkeley.edu/download_all.php |
|
|
Aurum Send message
Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 972,098 Level
Scientific publications
|
5) Locate the line that starts <scheduler_url> (towards the end of the first section, above <code_sign_key>)
6) Change the scheduler url from https to http
7) Find every example of <upload_url> within the GPUGrid section. Change https to http Is this still needed? |
|
|
|
No, it was a temporary, already overcome situation. |
|
|
|
No, it was a temporary, already overcome situation.
(And this is an still bothering double post :-) |
|
|