Advanced search

Message boards : Server and website : Server only allows one connection at a time from an IP? 30s cooldown is too short.

Author Message
Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,493,857,483
RAC: 71,175,505
Level
Trp
Scientific publications
wat
Message 54877 - Posted: 22 May 2020 | 14:40:42 UTC
Last modified: 22 May 2020 | 14:42:18 UTC

So I've been pulling my hair out trying to figure out why I've had such issues trying to load the GPUGRID website or communicate with the project via BOINC. it seemed like only one computer could make a connection, and if that computer was running BOINC, all other systems at the house could not load the GPUGRID website, nor communicate via BOINC. in all instances I was able to successfully ping gpugrid.net from any system, so it wasn't a DNS problem.

It's because of 1 and/or 2 things.

1. it seems like on the gpugrid server side, their network is only allowing 1 connection at a time from a single IP address.
2. the 30 second default cooldown after a schedule request is not long enough to release the connection so another computer from the same IP can communicate, so all attempts get blocked.

I have solved my problem using a brute force method to force cooldowns longer than default, to 10 mins. now suddenly all systems can reach the project though the website and BOINC. but if I have just one system using the default 30 second cooldown, it hogs the connection and no one else can communicate.

please make your default cooldown longer and/or allow multiple connections from a single IP address. this wreaks havoc on people running multiple computers from the same location. there's no need to keep pinging for more work every 30 seconds when WUs run for 30min to many hours.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,599,836,851
RAC: 8,799,578
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54878 - Posted: 22 May 2020 | 14:53:35 UTC - in response to Message 54877.

This has been the case for many years - I've posted about it before. It's also aggravated by the way that all functions operate on a single server (Grosso). So uploads and downloads also trigger the lockout - I can see another machine trying to download a task while I type this.

The current tasks are actually shorter than has been common at this project, which aggravates, as (obviously) has been the influx of new volunteers from SETI.

Obviously, volunteers with just a single computer won't know what all the fuss is about!

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,920,331,959
RAC: 6,322,138
Level
Arg
Scientific publications
watwatwatwatwat
Message 54880 - Posted: 22 May 2020 | 15:32:14 UTC

Count me in as afflicted also. Sure would like a solution from the project end.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54890 - Posted: 22 May 2020 | 22:17:51 UTC

I only run 4 GPUs on 2 hosts and have had that happen. I wonder if it is a built-in defense against DOS attacks?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,920,331,959
RAC: 6,322,138
Level
Arg
Scientific publications
watwatwatwatwat
Message 54893 - Posted: 22 May 2020 | 23:54:05 UTC

That is what we postulated a long time ago.

But it could just be that one lone server doing all functions just can't support the required https connections that the influx of new users has caused and the server loading to increase beyond what was originally configured.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 54898 - Posted: 23 May 2020 | 6:03:16 UTC - in response to Message 54893.

I am not aware of an explicit limit set in the server. It can be anywhere (including ISP throttling). Does it affect the web pages too?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,599,836,851
RAC: 8,799,578
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54899 - Posted: 23 May 2020 | 6:24:00 UTC - in response to Message 54898.
Last modified: 23 May 2020 | 6:52:51 UTC

Yes. One computer doing a task operation (upload, report, download) can prevent another computer accessing these message boards. Or reading/writing here can prevent another computer doing task operations.

Edit - perhaps we should say that the problem happens at the 'connect' phase, when our device is attempting to open a TCP/IP connection to Grosso. I'm pretty sure it's something in the server operating system or web server components, long before any of the BOINC software comes into play.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 54911 - Posted: 23 May 2020 | 20:56:40 UTC - in response to Message 54877.

I have solved my problem using a brute force method to force cooldowns longer than default, to 10 mins. now suddenly all systems can reach the project though the website and BOINC. but if I have just one system using the default 30 second cooldown, it hogs the connection and no one else can communicate.
please make your default cooldown longer and/or allow multiple connections from a single IP address.


I can't find any cooldown command in my cc_config. How does one implement this fix???

Should this be set 0 or 1???
<report_results_immediately>1</report_results_immediately>

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54912 - Posted: 23 May 2020 | 21:28:59 UTC - in response to Message 54911.

Should this be set 0 or 1???
<report_results_immediately>1</report_results_immediately>
It should be set to 1 for GPUGrid, but this property is set by the GPUGrid project in the tasks (so this option has no effect on GPUGrid tasks, until it's set by the project).
Look for <report_immediately/> in the client_state.xml and you'll find similar records:
<result> <name>2c1dB00_379_1-TONI_MDADex2sc-0-50-RND9291_0</name> <final_cpu_time>0.000000</final_cpu_time> <final_elapsed_time>0.000000</final_elapsed_time> <exit_status>0</exit_status> <state>2</state> <platform>windows_x86_64</platform> <version_num>210</version_num> <plan_class>cuda101</plan_class> <report_immediately/> <wu_name>2c1dB00_379_1-TONI_MDADex2sc-0-50-RND9291</wu_name> <report_deadline>1590700376.000000</report_deadline> <received_time>1590268377.505087</received_time> ...

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,920,331,959
RAC: 6,322,138
Level
Arg
Scientific publications
watwatwatwatwat
Message 54913 - Posted: 23 May 2020 | 22:18:51 UTC - in response to Message 54911.

I have solved my problem using a brute force method to force cooldowns longer than default, to 10 mins. now suddenly all systems can reach the project though the website and BOINC. but if I have just one system using the default 30 second cooldown, it hogs the connection and no one else can communicate.
please make your default cooldown longer and/or allow multiple connections from a single IP address.


I can't find any cooldown command in my cc_config. How does one implement this fix???

Should this be set 0 or 1???
<report_results_immediately>1</report_results_immediately>

Ian was asking Toni to change the server default to a longer period.

31 seconds is just too often.

Both Ian and myself use a proprietary GPUUG client that offers a configurable cooldown period for any project through a special configuration file. With that we have been able to tame both Milkyway and GPUGrid.

That is not available with the standard client.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,493,857,483
RAC: 71,175,505
Level
Trp
Scientific publications
wat
Message 54916 - Posted: 23 May 2020 | 23:49:41 UTC - in response to Message 54913.

You could probably script something to get the job done with boinccmd though. Like project update, wait, disable networking, wait 10mins, Re-enable networking, project update, wait. Something like that. I’d have to look at all the command options available to see if there’s a more elegant solution than this off the cuff guess
____________

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 727,920,933
RAC: 155,858
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54919 - Posted: 24 May 2020 | 3:41:56 UTC

I've seen a somewhat similar problem where if one of my computers was uploading a GPUGRID output file, my other computer was blocked from sending ANYTHING to the internet, such as a request to view a certain webpage. In my case, persuading my ISP to install a different brand of ADSL modem was what it took to fix this problem.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,920,331,959
RAC: 6,322,138
Level
Arg
Scientific publications
watwatwatwatwat
Message 54921 - Posted: 24 May 2020 | 4:24:40 UTC - in response to Message 54919.

I'd be curious to find out if this symptom is specific to hardware as in your ADSL modem.

I wonder if everyone who is afflicted is an ADSL subscriber. I'm sure there are plenty of folk over in Europe with fiber connections that are immune to the issue.

Or those on cable modems in fact?

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 54922 - Posted: 24 May 2020 | 4:49:04 UTC - in response to Message 54921.

Or those on cable modems in fact?

I am on cable modem, and I havn't noticed this problem so far.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,920,331,959
RAC: 6,322,138
Level
Arg
Scientific publications
watwatwatwatwat
Message 54923 - Posted: 24 May 2020 | 5:29:57 UTC - in response to Message 54922.
Last modified: 24 May 2020 | 5:34:56 UTC

Or those on cable modems in fact?

I am on cable modem, and I havn't noticed this problem so far.

Interesting. I'm assuming you have at least two or your hosts simultaneously crunching GPUGrid and using the same internet connection?

[Edit] The only project website or any website in fact that I have this issue with is GPUGrid.

But I am really curious now if only ADSL modem subscribers have the issue.

@Ian, are you on an ADSL internet connection?

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 54924 - Posted: 24 May 2020 | 5:42:58 UTC - in response to Message 54923.

Or those on cable modems in fact?

I am on cable modem, and I havn't noticed this problem so far.

Interesting. I'm assuming you have at least two or your hosts simultaneously crunching GPUGrid and using the same internet connection?

four hosts simultaneously

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 54927 - Posted: 24 May 2020 | 9:58:45 UTC - in response to Message 54924.

Are you positive it's not the upload bandwidth being saturated?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54928 - Posted: 24 May 2020 | 10:28:00 UTC - in response to Message 54927.

Are you positive it's not the upload bandwidth being saturated?
I'm sure about it. I have this problem on my symmetrical 1Gbps fiber optics internet connection. Earlier I had an ADSL (through an old phone line copper wire) with 50Mbps download 15Mbps upload bandwidth, which had the same problem.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 54932 - Posted: 24 May 2020 | 13:26:55 UTC - in response to Message 54913.

...use a proprietary GPUUG client that offers a configurable cooldown period for any project through a special configuration file. That is not available with the standard client.
Ohh, it's an elitist thing :-)

I had many idle computers this morning, all with Download Pending. I clicked Retry All from my BoincTasks Transfers page. Doing that too often seems to make the problem worse. Also, both http & https flavors suffer from this affliction.

My cable modem in the US is an E31N2V1 Hitron Technologies. The spec sheet does not say ADSL:
http://www.hitrontech.com/wp-content/uploads/2020/04/5338868b2233337bd4cb308c048cf211.pdf

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,599,836,851
RAC: 8,799,578
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54934 - Posted: 24 May 2020 | 14:01:46 UTC - in response to Message 54927.
Last modified: 24 May 2020 | 14:07:54 UTC

Are you positive it's not the upload bandwidth being saturated?

Yes.

1) It's been happening for years, when things were quieter.
2) The hourly project-specified scheduler update is sufficient to block other computers, even when no data needs to be transferred (limit reached or no tasks).
3) Downloads also trigger it.

My connection is hybrid, 'Fibre to the Cabinet' - optical trunk feed, VDSL for the final 500m. I'm getting Downstream 70.619 Mbps, Upstream 12.773 Mbps at the moment - the internet link from the UK doesn't allow me to saturate that.

And yes - I had to take three goes even to get a preview of that post, because one of the machines in the spare bedroom upstairs decided to upload at the critical moment. I wasn't typing fast enough to saturate either your or my upload link!

Edit - now typing from the machine upstairs. It had tried, but failed, to upload (log says 'connect() failed') - so no bandwidth used, except for the attempted handshake. It cleared on the retry, and downloaded a new task as well.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,493,857,483
RAC: 71,175,505
Level
Trp
Scientific publications
wat
Message 54935 - Posted: 24 May 2020 | 15:26:31 UTC

I have a gigabit up/down fiber connection at both locations that run my computers (separate external IPs) and experience the same problem at both locations if one system is running the default 30s cooldown.

Changing the default cooldown on the project server side to something longer like 5-10mins will largely solve this problem for everyone without the need for each user to run a custom client to work around the problem.

Toni, please implement this on the project servers.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,599,836,851
RAC: 8,799,578
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54936 - Posted: 24 May 2020 | 16:12:02 UTC - in response to Message 54935.

Ian, what are your current statistics? My two fastest machines are the two Linux boxes, each with 2x GTX 1660 Super or GTX 1660 Ti. I've got 321 valid tasks showing at the moment - since the start of the current run, probably.

The runtimes are

Max 8,994.80 sec 149 minutes Min 1,180.58 sec 19 minutes Avg 3,114.27 sec 51 minutes

I'm guessing your fastest will be better than 19 minutes - maybe we ought to ask Toni to start with a 5 minute delay, and see how we go, before upping it to 10 minutes if we have to?

I'm also worrying about what happens if we get more bad batches - these machines spit out the error tasks in just 3 seconds. Blow two of those in succession, and I'm left waiting for the next scheduler contact.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,493,857,483
RAC: 71,175,505
Level
Trp
Scientific publications
wat
Message 54939 - Posted: 24 May 2020 | 16:52:39 UTC - in response to Message 54936.

the shortest i've seen on my 2080ti (PL 225W) is about 800s (13.3mins)
the longest i've seen on my 2080ti (PL 225W) is about 3200s (53.3mins)

the shortest i've seen on my 2070 (PL 150W) is about 1200s (20mins)
the longest i've seen on my 2070 (PL 150W) is about 6000s (1.6hrs)

They could also allow more than 2 WU per GPU, and increase the max in-progress to reflect that. but really things like bad batches shouldn't be considered for figuring the cooldown IMO. treat that as an edge case. Plan for things to work normally most of the time.
____________

RFGuy_KCCO
Send message
Joined: 13 Feb 14
Posts: 6
Credit: 1,047,426,005
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 54968 - Posted: 26 May 2020 | 16:30:36 UTC

Please fix this issue, as it is clearly causing problems with receiving and sending work for many users. I am setting "No New Work" on this project until the issue is corrected.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 54969 - Posted: 26 May 2020 | 18:05:25 UTC

so far, I have had no connection problems.
However, since this afternoon there are many of them.
Should be fixed ASAP.

Gunnar Hjern
Send message
Joined: 22 May 20
Posts: 2
Credit: 22,042,067
RAC: 0
Level
Pro
Scientific publications
wat
Message 54970 - Posted: 26 May 2020 | 19:10:59 UTC - in response to Message 54969.
Last modified: 26 May 2020 | 20:10:45 UTC

Hi!

I just experienced the same problem:
I have two old HP Z220 with GTX-960 and GTX-750Ti, and both were standing still with files that wouldn't be downloaded, and no new tasks as dl was pending on the current task.
It didn't help to abort the stalled downloads, or aborting the whole task - it was STILL complaining about those downloads!! :-(

In the end it was nothing to do but to hit the "reset project" button on both of the machines, but that resulted in several hundred MB:s of downloading for each one! :-O

Now both machines are up and running again - let's see how long it'll last.

Hope admins will sort this problem out as soon as possible, before the server lines will be all bogged down.

Happy crunching!!!

//Gunnar

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54972 - Posted: 26 May 2020 | 22:43:03 UTC - in response to Message 54970.

It didn't help to abort the stalled downloads, or aborting the whole task - it was STILL complaining about those downloads!!
That's a different problem. These tasks were created before the http->https transition, so they still want to download through http, but that won't succeed. You have to abort the downloads, then restart the BOINC manager, or manually edit the client_state.xml file (see the Warning: bad tasks re-appearing in the download queue thread for details).

Gunnar Hjern
Send message
Joined: 22 May 20
Posts: 2
Credit: 22,042,067
RAC: 0
Level
Pro
Scientific publications
wat
Message 54973 - Posted: 26 May 2020 | 23:43:34 UTC - in response to Message 54972.

Thanks for pointing that out!

Didn't know about that problem as I'm pretty new on this project, and when the same thing happened on both my computers simultaneous, I thought it was related to this problem. :-)

Hope them faulty tasks will be cleaned out from the database asap!!
They are effectively locking up my machines and forcing me to reset the project manually.

Happy crunching!!!

Post to thread

Message boards : Server and website : Server only allows one connection at a time from an IP? 30s cooldown is too short.

//