Advanced search

Message boards : Number Crunching : Please Ban Bad Hosts

Author Message
mmonnin
Send message
Joined: 31 May 16
Posts: 38
Combined Credit: 34,803,115
DNA@Home: 0
SubsetSum@Home: 1,023,200
Wildlife@Home: 33,779,915
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 54
Images Observed: 10,675

        
Message 7548 - Posted: 9 Jul 2018, 10:52:54 UTC

I've had 2 very long tasks become invalid because bad hosts produce nothing but invalid results. Probably 30-40k credit each just wasted.

https://csgrid.org/csg/workunit.php?wuid=3844485
https://csgrid.org/csg/workunit.php?wuid=3842552

https://csgrid.org/csg/show_host_detail.php?hostid=68534
https://csgrid.org/csg/show_host_detail.php?hostid=76594
https://csgrid.org/csg/show_host_detail.php?hostid=68685
https://csgrid.org/csg/show_host_detail.php?hostid=65811
https://csgrid.org/csg/show_host_detail.php?hostid=79738
https://csgrid.org/csg/show_host_detail.php?hostid=68497
https://csgrid.org/csg/show_host_detail.php?hostid=63831
https://csgrid.org/csg/show_host_detail.php?hostid=78977

The list goes on an on.

mmonnin
Send message
Joined: 31 May 16
Posts: 38
Combined Credit: 34,803,115
DNA@Home: 0
SubsetSum@Home: 1,023,200
Wildlife@Home: 33,779,915
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 54
Images Observed: 10,675

        
Message 7552 - Posted: 17 Jul 2018, 0:47:48 UTC
Last modified: 17 Jul 2018, 0:47:55 UTC

https://csgrid.org/csg/show_host_detail.php?hostid=69885
https://csgrid.org/csg/show_host_detail.php?hostid=76206
https://csgrid.org/csg/show_host_detail.php?hostid=59505

Luigi R.
Send message
Joined: 1 Jun 16
Posts: 1
Combined Credit: 318,829
DNA@Home: 0
SubsetSum@Home: 44,132
Wildlife@Home: 274,697
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

    
Message 7559 - Posted: 24 Jul 2018, 7:39:05 UTC

I hope that our results marked as invalid (cause of bad hosts) will be resent to get validated.

https://csgrid.org/csg/results.php?hostid=69711 (2 valid)
https://csgrid.org/csg/results.php?hostid=80167
https://csgrid.org/csg/results.php?hostid=80243 (2 valid)
https://csgrid.org/csg/results.php?hostid=83590
https://csgrid.org/csg/results.php?hostid=84431
https://csgrid.org/csg/results.php?hostid=86019

mmonnin
Send message
Joined: 31 May 16
Posts: 38
Combined Credit: 34,803,115
DNA@Home: 0
SubsetSum@Home: 1,023,200
Wildlife@Home: 33,779,915
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 54
Images Observed: 10,675

        
Message 7561 - Posted: 25 Jul 2018, 1:39:30 UTC

They might but I don't think the original hosts will get credit.

morgan
Send message
Joined: 1 May 13
Posts: 2
Combined Credit: 27,613,530
DNA@Home: 111,775
SubsetSum@Home: 274,030
Wildlife@Home: 27,227,725
Wildlife@Home Watched: 180s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 3

      
Message 7569 - Posted: 21 Aug 2018, 9:08:35 UTC

I see bad host making some of my wu´s error out, as well.
One here; https://csgrid.org/csg//workunit.php?wuid=3913454

could it help a bit, if the, max # of error/total/success tasks are set to a bigger/different value???

mmonnin
Send message
Joined: 31 May 16
Posts: 38
Combined Credit: 34,803,115
DNA@Home: 0
SubsetSum@Home: 1,023,200
Wildlife@Home: 33,779,915
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 54
Images Observed: 10,675

        
Message 7573 - Posted: 24 Aug 2018, 2:19:04 UTC

Started up CSG again on a host and 2 of 16 tasks were task _2 or higher. 14 of them had a bad host already. 87.5%.

1 Valid
4 Pending
812 Errors
https://csgrid.org/csg/results.php?hostid=73773

99.4% error!

lanbrown
Send message
Joined: 17 Sep 16
Posts: 14
Combined Credit: 206,706,649
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 206,706,649
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7579 - Posted: 9 Sep 2018, 1:18:02 UTC

https://csgrid.org/csg/show_host_detail.php?hostid=68349

State: All (145) · In progress (7) · Validation pending (2) · Validation inconclusive (0) · Valid (2) · Invalid (0) · Error (134)


2 valid and 134 errors is not a very good statistic.

https://csgrid.org/csg/show_host_detail.php?hostid=86144

State: All (541) · In progress (192) · Validation pending (8) · Validation inconclusive (0) · Valid (22) · Invalid (0) · Error (319)


Why they need 192 WU's is absurd given how long some of them take to run. Their valid to error ratio is pretty bad. It is especially bad when they are aborting a large number of WU's.

https://csgrid.org/csg/results.php?hostid=88828

State: All (8) · In progress (0) · Validation pending (0) · Validation inconclusive (0) · Valid (0) · Invalid (0) · Error (8)


Not a single valid result.

https://csgrid.org/csg/show_host_detail.php?hostid=81455

State: All (166) · In progress (3) · Validation pending (0) · Validation inconclusive (0) · Valid (0) · Invalid (0) · Error (163)


Not a single valid result.

The above 4 caused me to lose credit for a WU since there were too many total results. I was the only one that completed it; two had computing errors and the other two were aborted by the user.

https://csgrid.org/csg/workunit.php?wuid=3944350

lanbrown
Send message
Joined: 17 Sep 16
Posts: 14
Combined Credit: 206,706,649
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 206,706,649
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7580 - Posted: 9 Sep 2018, 1:19:55 UTC - in response to Message 7569.

I see bad host making some of my wu´s error out, as well.
One here; https://csgrid.org/csg//workunit.php?wuid=3913454

could it help a bit, if the, max # of error/total/success tasks are set to a bigger/different value???


It could help, but doesn't fix the problem of hosts that are detrimental instead of being beneficial to the cause.

Profile searching to find the meaning of life I was lost & desperate. Then God touched my heart & soul and showed me He loved me so deeply as to lay down His life for me. Jesus proves God is Love. Dios es Amor, Jesus demuestra. LPa H
Avatar
Send message
Joined: 11 Jul 15
Posts: 6
Combined Credit: 58,796,055
DNA@Home: 601
SubsetSum@Home: 3,031
Wildlife@Home: 58,792,423
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7589 - Posted: 10 Sep 2018, 20:45:42 UTC

Of the hosts listed here that can still be viewed (WU and tasks that can still be accessed) none use windoze 10, four have windoze 8:
https://csgrid.org/csg/show_host_detail.php?hostid=73773
https://csgrid.org/csg/show_host_detail.php?hostid=81455
https://csgrid.org/csg/show_host_detail.php?hostid=88828
https://csgrid.org/csg/show_host_detail.php?hostid=68349

Perhaps win 8 should not be supported by citsg...

The hosts themselves (hardware) seem powerfull enough.

https://csgrid.org/csg/show_host_detail.php?hostid=86144
is an amazing linux system with 32 GB memory...
"Why they need 192 WU's"
because the machine has 64 processors (32 physical cores, hyperthreaded).
3 WU per core is not too much.
____________
I think, therefore I THINK I am.
My thinking's neither the source of my being--
not proves it to others.
God Is Love, Jesus proves it

Profile searching to find the meaning of life I was lost & desperate. Then God touched my heart & soul and showed me He loved me so deeply as to lay down His life for me. Jesus proves God is Love. Dios es Amor, Jesus demuestra. LPa H
Avatar
Send message
Joined: 11 Jul 15
Posts: 6
Combined Credit: 58,796,055
DNA@Home: 601
SubsetSum@Home: 3,031
Wildlife@Home: 58,792,423
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7590 - Posted: 10 Sep 2018, 20:49:32 UTC
Last modified: 10 Sep 2018, 21:42:30 UTC

On rare occasions I have had to abort a WU, for various reasons.
Surely CSG can distinguish between a real compute error and an abort by the user.
If not, THIS needs to be fixed.
Thanks,
LLP, PhD PE

lanbrown
Send message
Joined: 17 Sep 16
Posts: 14
Combined Credit: 206,706,649
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 206,706,649
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7593 - Posted: 10 Sep 2018, 22:23:30 UTC - in response to Message 7589.
Last modified: 10 Sep 2018, 22:29:40 UTC

Of the hosts listed here that can still be viewed (WU and tasks that can still be accessed) none use windoze 10, four have windoze 8:
https://csgrid.org/csg/show_host_detail.php?hostid=73773
https://csgrid.org/csg/show_host_detail.php?hostid=81455
https://csgrid.org/csg/show_host_detail.php?hostid=88828
https://csgrid.org/csg/show_host_detail.php?hostid=68349

Perhaps win 8 should not be supported by citsg...

The hosts themselves (hardware) seem powerfull enough.

https://csgrid.org/csg/show_host_detail.php?hostid=86144
is an amazing linux system with 32 GB memory...
"Why they need 192 WU's"
because the machine has 64 processors (32 physical cores, hyperthreaded).
3 WU per core is not too much.


That would depend on the length of the WU; if they are the 3 day variety, then yes, that is too much; 3 x 3 = 9 days. Even if they are the 2 days 3 x 2 = 6 days.

Do outages occur? Sure, but you don't need to keep 6 to 9 days worth of work.

This host has done more errors than valid results; so once again, why the huge number of WU's when it doesn't really contribute much in a positive manner?

Profile searching to find the meaning of life I was lost & desperate. Then God touched my heart & soul and showed me He loved me so deeply as to lay down His life for me. Jesus proves God is Love. Dios es Amor, Jesus demuestra. LPa H
Avatar
Send message
Joined: 11 Jul 15
Posts: 6
Combined Credit: 58,796,055
DNA@Home: 601
SubsetSum@Home: 3,031
Wildlife@Home: 58,792,423
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7594 - Posted: 10 Sep 2018, 23:49:21 UTC - in response to Message 7593.
Last modified: 10 Sep 2018, 23:54:41 UTC

3 x 3 = 9 days. Even if they are the 2 days 3 x 2 = 6 days.

Well, "too much" is subjective.
On projects which I particularly like to support, I tend to keep a pretty full 'inventory' of tasks in case of outages, which DO happen. Lack of tasks can be for several reasons: maintenance downtime, time for the scientists to access and tune the project etc.
I believe all project set the limit not on time, but on number of tasks per CPU or GPU. (The project would have trouble keeping up with possible changes to a user's host: from driver updates, run time environment as to other system demands on the PC, etc. The boinc manager on the user's hosts keeps tabs on how many hours/day the system runs, how much of the time boinc is allowed to run etc and can and does block task requests). Some projects set a pretty low limit: GPUNET only allows two, but then their due date is 5 days (120 hours). On my slower GPU I have run tasks which have taken 48 - 72 even up to 100 hours (over 4 days). But then I have a UPS (uninterruptible power source, or battery backup).

For another example SETI lets users download a BUNCH of tasks, but invariable gives months for a deadline. In fact, I'd say 3 per core may be on the low end as project limits go, but I set the options (Computing preferences) so that I usually don't get much more than 24 tasks for the 8 cores on each machine.

SCG has of late been giving a month or two as a deadline. The only 'drawback' (not really) is that it may take longer for a finished task to be validated and then points to be awarded. So, what's the big rush to get the points - assuming the validation DOES occur.

The real problem is not having 192 tasks (for a 64 core machine). The problem is why that linux machine has so many compute errors and wastes so much of ITS own time!, and electricity!

I do hope some moderator or team member answers a very key question:

DOES CSG (correctly) differentiate a user abort of a task (especially one with 0 sec of run time) versus a true compute error?

IF not, then this should be corrected.

LLP, PhD PE
____________
I think, therefore I THINK I am.
My thinking's neither the source of my being--
not proves it to others.
God Is Love, Jesus proves it

lanbrown
Send message
Joined: 17 Sep 16
Posts: 14
Combined Credit: 206,706,649
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 206,706,649
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7595 - Posted: 11 Sep 2018, 0:05:08 UTC - in response to Message 7594.

The real problem is not having 192 tasks (for a 64 core machine). The problem is why that linux machine has so many compute errors and wastes so much of ITS own time!, and electricity! /quote]

That is the problem when it deals with a host that provides by a wide margin more errors than valid results. If the system had to wait for tasks it would be an improvement since they would ultimately send less errors.


[quote]DOES CSG (correctly) differentiate a user abort of a task (especially one with 0 sec of run time) versus a true compute error?

IF not, then this should be corrected.

LLP, PhD PE


Why not ask your god to answer the question? You posted the same thing multiple times; wouldn't he have heard it the first time?

lanbrown
Send message
Joined: 17 Sep 16
Posts: 14
Combined Credit: 206,706,649
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 206,706,649
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7601 - Posted: 14 Sep 2018, 1:54:05 UTC

Here is the perfect bad host. I lost out on credit because there was one user aborted and three WU errors. One of the hosts that errored out was added a little over a month ago. In that time, it has yet to receive a single credit; it has a 0.00.

See for yourself:
https://csgrid.org/csg/show_host_detail.php?hostid=86973

That hosts is useless to have on the project.

Another host; out of the 92 WU's listed, they have aborted 83 of them, returned 2 valid results, three pending and four are in progress:
https://csgrid.org/csg/results.php?hostid=82553&offset=0&show_names=0&state=6&appid=

The last computer is that of JOSE MANUEL MG
Of the 23 results listed; all resulted in errors. It has completed some work in the past though:
https://csgrid.org/csg/show_host_detail.php?hostid=78658

enginerd
Send message
Joined: 25 Sep 17
Posts: 4
Combined Credit: 32,730,188
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 32,730,188
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7603 - Posted: 17 Sep 2018, 6:48:32 UTC - in response to Message 7548.

I've had 2 very long tasks become invalid because bad hosts produce nothing but invalid results. Probably 30-40k credit each just wasted.

https://csgrid.org/csg/workunit.php?wuid=3844485
https://csgrid.org/csg/workunit.php?wuid=3842552

https://csgrid.org/csg/show_host_detail.php?hostid=68534
https://csgrid.org/csg/show_host_detail.php?hostid=76594
https://csgrid.org/csg/show_host_detail.php?hostid=68685
https://csgrid.org/csg/show_host_detail.php?hostid=65811
https://csgrid.org/csg/show_host_detail.php?hostid=79738
https://csgrid.org/csg/show_host_detail.php?hostid=68497
https://csgrid.org/csg/show_host_detail.php?hostid=63831
https://csgrid.org/csg/show_host_detail.php?hostid=78977

The list goes on an on.


Almost all of these hosts seem to be "Charity Engine" computers where people might not even realize that BOINC is installed - perhaps users are manually aborting mystery processes that they don't recognize.

Notice version 7.0.80 of BOINC (which was never released, and is custom to Charity Engine). Here's some background:

https://steemit.com/gridcoin/@dutch/what-is-pomegranate-the-investigation-and-interview

Unfortunately they give Gridcoin users - most of whom are longtime BOINC crunchers and not just in it for the coins - a bad name in the community as we're lumped in with this group.

May I humbly suggest that the developers enforce a minimum BOINC version > 7.1.x to start, and investigate further. SRBase did this as I understand it.

These people are wasting a lot of crunching power and frustrating the rest of us with their bad hosts.

enginerd
Send message
Joined: 25 Sep 17
Posts: 4
Combined Credit: 32,730,188
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 32,730,188
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7604 - Posted: 17 Sep 2018, 7:00:47 UTC - in response to Message 7589.

Of the hosts listed here that can still be viewed (WU and tasks that can still be accessed) none use windoze 10, four have windoze 8:
https://csgrid.org/csg/show_host_detail.php?hostid=73773
https://csgrid.org/csg/show_host_detail.php?hostid=81455
https://csgrid.org/csg/show_host_detail.php?hostid=88828
https://csgrid.org/csg/show_host_detail.php?hostid=68349

Perhaps win 8 should not be supported by citsg...

The hosts themselves (hardware) seem powerfull enough.

https://csgrid.org/csg/show_host_detail.php?hostid=86144
is an amazing linux system with 32 GB memory...
"Why they need 192 WU's"
because the machine has 64 processors (32 physical cores, hyperthreaded).
3 WU per core is not too much.


The sad thing is that the Charity Engine version of BOINC is so old that it marks Windows 10 machines as Windows 8 - BOINC v7.0.80 is apparently from 2014. It seems that most Charity Engine installs are from bundled installers and ads.

But don't worry, it's apparently "The best PC app ever invented?"

https://www.charityengine.com/

Note - in the spirit of full disclosure, I am a Gridcoin user and did just have a task error out from a computation error. So I feel kind of bad now. But I've previously had ~8600 valid tasks on Wildlife. I've detached on the host with error as I don't want to spoil the fun for everyone else.

mmonnin
Send message
Joined: 31 May 16
Posts: 38
Combined Credit: 34,803,115
DNA@Home: 0
SubsetSum@Home: 1,023,200
Wildlife@Home: 33,779,915
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 54
Images Observed: 10,675

        
Message 7606 - Posted: 18 Sep 2018, 9:43:09 UTC

Interesting, I hadn't noticed the BOINC version. With charityengine users being overall #1 and #1 at CSG, I doubt they will be banned via a min BOINC version update.

enginerd
Send message
Joined: 25 Sep 17
Posts: 4
Combined Credit: 32,730,188
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 32,730,188
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7608 - Posted: 18 Sep 2018, 19:36:21 UTC - in response to Message 7606.

Interesting, I hadn't noticed the BOINC version. With charityengine users being overall #1 and #1 at CSG, I doubt they will be banned via a min BOINC version update.


Apparently Charity Engine has updated at least some of their hosts to BOINC v7.6.33, which accounts for these:

https://csgrid.org/csg/show_host_detail.php?hostid=86973
https://csgrid.org/csg/show_host_detail.php?hostid=78977

Basically if it's a Windows host with BOINC v7.0.80, it's Charity Engine for sure.
If Anonymous/Windows/7.6.33/errors, almost certainly is Charity Engine.

Since almost all of their BOINC installs are via a bundled installer for some other software, users typically never update. The bundler rolls out a new version included with their software to new hosts, and the old hosts are left to wither away (causing problems and wasting work for many projects, but who cares, right?)

Quoth the founder Mark McAndrew (from Steemit post https://steemit.com/gridcoin/@guk/charity-engine-controversy):
Our users don't care two hoots about BOINC, or science, or computing. That's the whole point of it, we appeal to entirely non-techies who wouldn't otherwise participate in BOINC in a million years. They're all new to it, the silent majority. That's also why our forums aren't nearly as busy as regular BOINC groups: they're not remotely interested. Our adverts specifically emphasise 'it's all automatic, you never have to lift a finger again after installing', etc.


Another from here: https://steemit.com/gridcoin/@dutch/exposing-the-pomegranate-botnet#@markmcandrew/re-ravonn-re-markmcandrew-re-deltik-re-dutch-exposing-the-pomegranate-botnet-20171212t135334095z
Nearly all (over 99%) of our users come via adverts, not via the main site.

Peppernrino
Avatar
Send message
Joined: 20 Mar 17
Posts: 11
Combined Credit: 177,613,414
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 177,613,414
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 54
Images Observed: 0

    
Message 7610 - Posted: 20 Sep 2018, 18:36:02 UTC - in response to Message 7589.
Last modified: 20 Sep 2018, 18:55:01 UTC

i run nothing but windows 8.1, and i don't have a problem with errors. this feels like how AMD stopped offering win8.1 drivers... it's all just a push to win10. once win7 stops being supported, you won't have a choice.

the problem lies elsewhere, i'm sure. power outages, charityengine... :)
____________


Post to thread

Message boards : Number Crunching : Please Ban Bad Hosts