1) Message boards : Number Crunching : What is causing me to produce Validate Errors? (Message 7420)
Posted 21 Feb 2018 by Profile Beyond
I have 15 spread over 8 machines. I wonder if this has something to do with it:

https://csgrid.org/csg/forum_thread.php?id=2504#7418

Almost all these WUs are very long. Everyone I've looked at is getting them. I figure I've lost about 20 days of CPU time so far. It's a waste of time and money. Stopping work. Admin???
2) Message boards : Number Crunching : Can validation be changed for inconclusive work units (Message 7409)
Posted 13 Feb 2018 by Profile Beyond
It looks like there is something odd going on.
I cannot upload results, have a lot of errors and do not get new wu's.

Citizen Science Grid 2/13/2018 12:29:34 PM [error] Error reported by file upload server: Server is out of disk space
3) Message boards : Number Crunching : Can validation be changed for inconclusive work units (Message 7407)
Posted 13 Feb 2018 by Profile Beyond
I just had one that went almost a day and then was marked as "Completed, can't validate" because the 4 other people had either not bothered to return it or aborted it. Then the WU is cancelled due to "Too many total results". Could the "max # of error/total/success tasks 2, 5, 2" be upped to more than 5 so that we're not wasting days of CPU time on this sort of thing? The one reliable user gets the shaft because users like Dr Z, Punchy, Petr, and Annonymous don't bother to return their work. Here's the WU as an example of the problem:

https://csgrid.org/csg/workunit.php?wuid=2695545
4) Message boards : Number Crunching : Monster MNIST units (Message 6870)
Posted 5 Apr 2017 by Profile Beyond
Are these from the same system?

If that's the case I need to update the credit calculation. Inconsistent runtime to credit is probably a problem with how I'm calculating credit as the convolutional neural networks get larger. This is an easier fix and not a problem with the application at any rate.

I'll make it a priority and try and get it resolved by the end of the week (unfortunately I'm stuck in some 8 hours of meetings tomorrow which will slow me down).

Here's 3 consecutive results from the same system. All EXACT MNIST Convolutional Neural Network Trainer v0.20:

2142450 1000174 3 Apr 2017, 13:42:20 UTC 4 Apr 2017, 13:28:43 UTC 57,110.73 56,810.53 2,418.63
2140525 999371 3 Apr 2017, 4:41:38 UTC 4 Apr 2017, 6:40:51 UTC 54,518.36 54,258.40 1,429.07
2140165 999224 3 Apr 2017, 2:12:39 UTC 3 Apr 2017, 20:46:23 UTC 52,309.10 52,088.45 3,400.23
5) Message boards : Number Crunching : Monster MNIST units (Message 6864)
Posted 4 Apr 2017 by Profile Beyond
So this seems like something weird may be afoot. Any chance you could link me the work units you aborted? Or if you see these again let me know the work unit so I can take a deeper look and see what's going on? There may be some kind of bug going on that's making them run significantly longer than they should.

Here's one of many:

http://csgrid.org/csg/workunit.php?wuid=980866

My theory is that the apps are now being compiled with the wrong switches for general use.

This is really really odd. I'm looking into it. Both of those are running Windows 7, so I don't know why one would be significantly slower depending on how they were compiled.

What's even weirder, is that the one it's running slower on has more cache and memory, so if anything it should be running quite a bit faster.

It used to be, not long ago that these machines were very competitive in CSG, now at least on some WUs they're dirt slow. It really makes me wonder about the switches used for compiling the last few app versions. All of a sudden my 8 core AMD 83xx machines are abysmal on this project. They used to be relatively fast. They're still fast on other projects where they're situated for now.
6) Message boards : Number Crunching : Monster MNIST units (Message 6857)
Posted 3 Apr 2017 by Profile Beyond
So this seems like something weird may be afoot. Any chance you could link me the work units you aborted? Or if you see these again let me know the work unit so I can take a deeper look and see what's going on? There may be some kind of bug going on that's making them run significantly longer than they should.

Here's one of many:

http://csgrid.org/csg/workunit.php?wuid=980866

My theory is that the apps are now being compiled with the wrong switches for general use.
7) Message boards : Number Crunching : Monster MNIST units (Message 6853)
Posted 3 Apr 2017 by Profile Beyond
exact_genome_1490477651_3_1411_1 -> 52 hrs and 64% complete (wingman complete at 68 hours).

Should be interesting to look at.

Well .. this one finished and it is interesting, but not quite what I hoped. This 96 hour work unit yielded only 747.82 points ( I, of course, was expecting a veritable bonanza of over 10K points!). I know that points are not based on a straight line basis of time, but this is seriously out of whack no matter what the algorithm. It obviously was not doing the kind of work needed to accrue points.

So could you investigate. If there something hinky with my machine, then I need to fix it (though other units appear to be proceeding normally). Somewhere, this one went off the rails.

Seems to be par for the course. Huge runtimes and tiny credit on some WUs. Strangely, this seems to be happening mostly on my fastest machines. Just decided to drop those off the project at least until this is sorted out. Have one WU that's been running for 180 hours on a fast box and am expecting small credit when it finishes...
8) Message boards : News : [wildlife] EXACT v0.17 for windows xp (Message 6743)
Posted 8 Feb 2017 by Profile Beyond
Is there anybody out there using XP?? :-P

Only our silly president. ;-)
9) Message boards : News : [wildlife] v0.13 apps released! (Message 6731)
Posted 1 Feb 2017 by Profile Beyond
Dropped the credit awarded by a factor of 5. Probably still too high but I'll leave it at there for awhile given our alpha status and hopefully it will help attract some more crunchers. :)

The credits were high but now they seem low. Don't think you'll be attracting many users at this rate, in fact I see that some are leaving. The WU length has increased many fold and doesn't seem to be very well coordinated with the credits awarded. While the WU length is greatly increasing, credit doesn't seem to be increasing at all. Longer WUs are often credited less than shorter ones.

Okay, I need to look into how I'm calculating credit. I'll bump it up a notch for the time being. How far off do you think it is right now?

The truth is that there's a large difference in credits awarded by various projects. No matter what you set it at some will be unhappy. The self appointed SETI credit cops will say it's too high no matter what it is. I don't think that most of us crunch primarily for credit but it definitely seems to be a consideration. Look at it this way, if a user likes a few projects equally which do you think that they'll go with primarily? If you want to increase participation, higher credit will definitely help achieve that. It's proven to be a fact of life in BOINC-land. I don't want to suggest a specific figure but I feel that it's too low presently.
10) Message boards : News : [wildlife] v0.13 apps released! (Message 6727)
Posted 1 Feb 2017 by Profile Beyond
Dropped the credit awarded by a factor of 5. Probably still too high but I'll leave it at there for awhile given our alpha status and hopefully it will help attract some more crunchers. :)

The credits were high but now they seem low. Don't think you'll be attracting many users at this rate, in fact I see that some are leaving. The WU length has increased many fold and doesn't seem to be very well coordinated with the credits awarded. While the WU length is greatly increasing, credit doesn't seem to be increasing at all. Longer WUs are often credited less than shorter ones.
11) Message boards : News : [wildlife] v0.13 apps released! (Message 6725)
Posted 30 Jan 2017 by Profile Beyond
These xp machines are spewing out thousands of failed WUs. Why not ban them until this gets sorted out or at least set a WU limit/day to mitigate the damage?

This is fixed now. I updated the plan class for windows workunits so that XP machines should not be downloading any more WUs.

Three thumbs up.
12) Message boards : News : [wildlife] v0.13 apps released! (Message 6723)
Posted 30 Jan 2017 by Profile Beyond
I had 3 show up this morning as "invalid". In each case the WU had been tried by XP machines that have not had a single valid WU out of 200 tried.

It is frustrating to spend 7 hours on WUs that probably have no problem only to have it discarded because machines with problems are pulling WU.

Completely understandable. I've been trying to shake down a solution for this, so far just crickets from the BOINC mailing lists though.

These xp machines are spewing out thousands of failed WUs. Why not ban them until this gets sorted out or at least set a WU limit/day to mitigate the damage?