Advanced search

Message boards : News : [wildlife] large batch of test workunits

1 · 2 · Next
Author Message
Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1812
Combined Credit: 23,456,042
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,813,267
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 755

              
Message 6532 - Posted: 23 Dec 2016, 1:38:55 UTC

I just generated a large batch of workunits as a sanity check/stress test. Again, let me know if you're having any issues with any of the apps.

David Duvall
Send message
Joined: 4 Apr 15
Posts: 3
Combined Credit: 29,779,538
DNA@Home: 267,360
SubsetSum@Home: 565,264
Wildlife@Home: 28,946,914
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

      
Message 6533 - Posted: 23 Dec 2016, 15:54:26 UTC - in response to Message 6532.

The EXACT Convolution Neural Net Trainer 0.07 is having a little bit of a problem on my Windows 7 x64 2900k system (citizen10102016 running BOINC 7.6.22). The estimated times for completion are way off - no big deal to me. The first unit to make it to 100% is still running after about 15 minutes and I will leave it alone until tomorrow. The WUs start out with a bang on their progression and then slow down and eventually jump in increments of .500% - again, not a problem for me but, some participants will probably complain. (I just checked the machine and the WU that was stuck at 100% just finished and uploaded it's results - woop! woop!). Later today I will upgrade my BOINC to the latest version and see how that works with the new WUs.

Profile JumpinJohnny
Avatar
Send message
Joined: 24 Sep 13
Posts: 237
Combined Credit: 10,275,610
DNA@Home: 192,548
SubsetSum@Home: 201,740
Wildlife@Home: 9,881,323
Wildlife@Home Watched: 55,997,833s
Wildlife@Home Events: 15,584
Climate Tweets: 327
Images Observed: 351

              
Message 6534 - Posted: 23 Dec 2016, 16:00:38 UTC - in response to Message 6532.

I recieved 10 of these EXACT Convolutional Neural Network Trainer for windoz.
They are quite long running as the first one to finish was 5 hours 50 min 10 sec.
Two others have been running for over 12 hours and show 50% and 70% completion. (check-pointing seems a bit spare)
I have an average speed AMD CPU and using only 3 of 4 cores for BOINC.
When constructing the actual work units, perhaps making them a bit smaller would increase project participatiion for users and get less "error- not finished in time" that only have to be resent. (I know that's a BOINC 'learning' issue.)
Also, long work units generally have much longer due dates than the 3 - 4 days you've assigned to these.

**Thanks for working on this project again, Travis.**


PS: I hope you can get someone to work on the image and video event verification issues also.
____________

David Duvall
Send message
Joined: 4 Apr 15
Posts: 3
Combined Credit: 29,779,538
DNA@Home: 267,360
SubsetSum@Home: 565,264
Wildlife@Home: 28,946,914
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

      
Message 6535 - Posted: 23 Dec 2016, 16:29:28 UTC - in response to Message 6534.

JumpinJohnny - thanks for the heads-up on the completion times - I didn't even notice that one. Can I reduce the BOINC default "store at leave .5 days" of work to something like .1 and the "store up to an additional 1 days" of work to say "0" so that I basically load on demand after each WU completes?

Profile JumpinJohnny
Avatar
Send message
Joined: 24 Sep 13
Posts: 237
Combined Credit: 10,275,610
DNA@Home: 192,548
SubsetSum@Home: 201,740
Wildlife@Home: 9,881,323
Wildlife@Home Watched: 55,997,833s
Wildlife@Home Events: 15,584
Climate Tweets: 327
Images Observed: 351

              
Message 6536 - Posted: 23 Dec 2016, 18:09:03 UTC

David,
The initial download of any new WU will always have a "wrong" time for estimated completion until BOINC learns what time to assign it by completing some of them over a few days time on your specific machine.
The actual time the WU takes is usually a function of the amount of work done and is controlled by the person compiling the app and varies, of coarse, by the efficiency of the processor doing the crunching.
That said, I would be interested to know if you see any difference after the BOINC upgrade....(I think it should not affect it.)... and if adjusting the computing preferences downward works for you.
It is generally not suggested because it causes unnecessary communications to the project server and having the WU's stored up is a good thing for guarding against project/website outages and limited availability of the work.
I keep it at 1 day plus .5.
Hopefully all these issues will be worked out as these trial WU's are tested.
CSG will get back info from these "empty" apps and can proceed with the work on the actual apps.
It should be the responsibility of the Project (CSG) to adjust the size of each app to the time allowed so that it makes sense to the volunteers and doesn't result in extra unnecessary activity on their own servers.


PSS: I hope someone can also work on the image and video event verification issues.
____________

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1812
Combined Credit: 23,456,042
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,813,267
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 755

              
Message 6537 - Posted: 23 Dec 2016, 18:22:34 UTC - in response to Message 6536.

It could be that my FLOPS calculation for the workunits is off. How is the awarded credit? If it's too low then that's the case.

Profile JumpinJohnny
Avatar
Send message
Joined: 24 Sep 13
Posts: 237
Combined Credit: 10,275,610
DNA@Home: 192,548
SubsetSum@Home: 201,740
Wildlife@Home: 9,881,323
Wildlife@Home Watched: 55,997,833s
Wildlife@Home Events: 15,584
Climate Tweets: 327
Images Observed: 351

              
Message 6538 - Posted: 23 Dec 2016, 19:21:48 UTC - in response to Message 6537.

It could be that my FLOPS calculation for the workunits is off. How is the awarded credit? If it's too low then that's the case.


hmmm , well I would say low but probably not the issue.
Looking around I see a fast computer doing in 5 hours what a slower wingman did in 8 hours and both got 183 credits. The same computers got 338 credits for WU's taking 8.5 and 12.5 hours.

That is not attractive credit wise but may not be the problem. Perhaps it is a BOINC client "learning" issue? Still, all that should be pre-adjusted by CPU speed identification through BOINC before uploading 8 days worth of work that is only allowed 3 days to finish. There must be some other way you can lengthen the due dates? (even if it means also giving a little better credit).

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1812
Combined Credit: 23,456,042
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,813,267
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 755

              
Message 6539 - Posted: 23 Dec 2016, 19:23:45 UTC - in response to Message 6534.

I recieved 10 of these EXACT Convolutional Neural Network Trainer for windoz.
They are quite long running as the first one to finish was 5 hours 50 min 10 sec.
Two others have been running for over 12 hours and show 50% and 70% completion. (check-pointing seems a bit spare)
I have an average speed AMD CPU and using only 3 of 4 cores for BOINC.
When constructing the actual work units, perhaps making them a bit smaller would increase project participatiion for users and get less "error- not finished in time" that only have to be resent. (I know that's a BOINC 'learning' issue.)
Also, long work units generally have much longer due dates than the 3 - 4 days you've assigned to these.

**Thanks for working on this project again, Travis.**


PS: I hope you can get someone to work on the image and video event verification issues also.



I'll get to increasing the deadlines so more. Originally they were just 1 day. The WUs are going to have varying runtimes, and I think they'll probably get up to 1-2 days or so depending on how things go. I can push back the deadline to maybe 5-6 days if that works.

Once I get the FLOPS right I should be able to use that to also adjust the deadline accordingly. These WUs are fixed credit, but based on the number of FLOPS when I generate them, so if the credit is right then the FLOPS should be right and I can work on the deadline.

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1812
Combined Credit: 23,456,042
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,813,267
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 755

              
Message 6540 - Posted: 23 Dec 2016, 19:25:25 UTC - in response to Message 6538.

It could be that my FLOPS calculation for the workunits is off. How is the awarded credit? If it's too low then that's the case.


hmmm , well I would say low but probably not the issue.
Looking around I see a fast computer doing in 5 hours what a slower wingman did in 8 hours and both got 183 credits. The same computers got 338 credits for WU's taking 8.5 and 12.5 hours.


How far off is that credit wise from being in line with other projects? Am I off by an order of magnitude? On that faster CPU, what would be an expected amount of credit for 5 hours work?

Profile JumpinJohnny
Avatar
Send message
Joined: 24 Sep 13
Posts: 237
Combined Credit: 10,275,610
DNA@Home: 192,548
SubsetSum@Home: 201,740
Wildlife@Home: 9,881,323
Wildlife@Home Watched: 55,997,833s
Wildlife@Home Events: 15,584
Climate Tweets: 327
Images Observed: 351

              
Message 6541 - Posted: 23 Dec 2016, 19:36:49 UTC - in response to Message 6539.
Last modified: 23 Dec 2016, 19:42:11 UTC

Travis< Thanks for the explanation.
Again, the credit is on the low end and not attractive but probably not out of line.
I would definitly adjust the deadline upwards for the newly generated batch.... That will save problems hammering your servers needlessly with overlimits that must be sent out again.
Typically I see about double the credits on most projects.
Thanks again for working on this.

PS: Please don't forget about the video events and image marking and verification issues.

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1812
Combined Credit: 23,456,042
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,813,267
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 755

              
Message 6545 - Posted: 23 Dec 2016, 20:00:02 UTC - in response to Message 6541.


PS: Please don't forget about the video events and image marking and verification issues.


Will try and get that figured out ASAP. Probably will end up putting Marshall on it... :)

Profile Skivelitis2
Avatar
Send message
Joined: 16 May 15
Posts: 60
Combined Credit: 10,664,606
DNA@Home: 19,068
SubsetSum@Home: 575,552
Wildlife@Home: 10,069,986
Wildlife@Home Watched: 9,158s
Wildlife@Home Events: 4
Climate Tweets: 387
Images Observed: 52

            
Message 6546 - Posted: 23 Dec 2016, 21:08:26 UTC

Is there no checkpointing on the Linux workunits?

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1812
Combined Credit: 23,456,042
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,813,267
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 755

              
Message 6547 - Posted: 23 Dec 2016, 21:19:32 UTC - in response to Message 6546.
Last modified: 23 Dec 2016, 21:19:41 UTC

Is there no checkpointing on the Linux workunits?


All the apps should be checkpointing. Are the linux apps not?

Profile Skivelitis2
Avatar
Send message
Joined: 16 May 15
Posts: 60
Combined Credit: 10,664,606
DNA@Home: 19,068
SubsetSum@Home: 575,552
Wildlife@Home: 10,069,986
Wildlife@Home Watched: 9,158s
Wildlife@Home Events: 4
Climate Tweets: 387
Images Observed: 52

            
Message 6548 - Posted: 23 Dec 2016, 22:23:44 UTC - in response to Message 6547.
Last modified: 23 Dec 2016, 22:24:13 UTC

Is there no checkpointing on the Linux workunits?


All the apps should be checkpointing. Are the linux apps not?


I have 2 that are not:

1) 66.5% complete after 09:17:31 (dec_22_1_1_533)
2) 64.0% complete after 09:05 53 (dec_22_1_1_536)

Both v.0.07.

Yes, it is a slow machine with HT on.

Profile Steve Hawker*
Send message
Joined: 8 Apr 13
Posts: 134
Combined Credit: 829,896
DNA@Home: 11,932
SubsetSum@Home: 299,708
Wildlife@Home: 518,257
Wildlife@Home Watched: 5,541,577s
Wildlife@Home Events: 2,169
Climate Tweets: 8,630
Images Observed: 55

              
Message 6549 - Posted: 23 Dec 2016, 23:36:11 UTC - in response to Message 6547.
Last modified: 23 Dec 2016, 23:37:23 UTC

Is there no checkpointing on the Linux workunits?


All the apps should be checkpointing. Are the linux apps not?


I ran the tasks until they were at least 5% complete and then suspended them.
I repeated this until all the tasks had some elapsed time.
I quit BOINC and restarted the computer
On restart, every task that was suspended lost all progress. When they came to run they would run for 34 seconds and then get postponed because BOINC couldn't get a slot lock (or similar unhelpful message). I abort the task at this point.

I've managed to get five WUs to complete. None have validated. I've logged just over 200 hours of crunching.

All were v0.05 on OS X

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1812
Combined Credit: 23,456,042
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,813,267
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 755

              
Message 6550 - Posted: 24 Dec 2016, 0:34:29 UTC - in response to Message 6549.

Is there no checkpointing on the Linux workunits?


All the apps should be checkpointing. Are the linux apps not?


I ran the tasks until they were at least 5% complete and then suspended them.
I repeated this until all the tasks had some elapsed time.
I quit BOINC and restarted the computer
On restart, every task that was suspended lost all progress. When they came to run they would run for 34 seconds and then get postponed because BOINC couldn't get a slot lock (or similar unhelpful message). I abort the task at this point.

I've managed to get five WUs to complete. None have validated. I've logged just over 200 hours of crunching.

All were v0.05 on OS X


Now that is really odd. They should be validating. I'll look into this.

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1812
Combined Credit: 23,456,042
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,813,267
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 755

              
Message 6551 - Posted: 24 Dec 2016, 0:39:51 UTC - in response to Message 6550.

Updated the apple client to v0.07. I'm going to generate a few more WUs and see how things are checkpointing.

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1812
Combined Credit: 23,456,042
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,813,267
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 755

              
Message 6552 - Posted: 24 Dec 2016, 0:49:09 UTC - in response to Message 6551.

OSX is checkpointing for me 0.07. I could shut down the BOINC manager and restart it and WUs started up where they left off...

Profile JumpinJohnny
Avatar
Send message
Joined: 24 Sep 13
Posts: 237
Combined Credit: 10,275,610
DNA@Home: 192,548
SubsetSum@Home: 201,740
Wildlife@Home: 9,881,323
Wildlife@Home Watched: 55,997,833s
Wildlife@Home Events: 15,584
Climate Tweets: 327
Images Observed: 351

              
Message 6553 - Posted: 24 Dec 2016, 3:06:38 UTC - in response to Message 6540.

It could be that my FLOPS calculation for the workunits is off....
How far off is that credit wise from being in line with other projects? Am I off by an order of magnitude?


There are still Wildlife@home "Descriptor Collection (SURF) v0.03" in my task page. It shows there still from May 2014. (*I wish they weren't there.)
Runtime= 16,511.64 CPU SECONDS= 16,330.68 CREDIT= 1,307.40

Newly validated on Dec.23,2016
Wildlife @home "EXACT Convolutional Neural Network Trainer v0.07" Runtime= 21,106.98 CPU SECONDS= 21,010.11 CREDIT=169.68


Can you bump it up a bit for that size?

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1812
Combined Credit: 23,456,042
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,813,267
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 755

              
Message 6554 - Posted: 24 Dec 2016, 3:38:25 UTC - in response to Message 6553.

It could be that my FLOPS calculation for the workunits is off....
How far off is that credit wise from being in line with other projects? Am I off by an order of magnitude?


There are still Wildlife@home "Descriptor Collection (SURF) v0.03" in my task page. It shows there still from May 2014. (*I wish they weren't there.)
Runtime= 16,511.64 CPU SECONDS= 16,330.68 CREDIT= 1,307.40

Newly validated on Dec.23,2016
Wildlife @home "EXACT Convolutional Neural Network Trainer v0.07" Runtime= 21,106.98 CPU SECONDS= 21,010.11 CREDIT=169.68


Can you bump it up a bit for that size?


Increased the credit (and FLOPS) by 2.5 for the last small batch and all new ones, hopefully that'll mean work estimates are more in line.

1 · 2 · Next
Post to thread

Message boards : News : [wildlife] large batch of test workunits