Advanced search

Message boards : News : [wildlife] "monster" workunits status update

Author Message
Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1791
Combined Credit: 2,265,607
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 1,622,832
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 710

              
Message 6862 - Posted: 4 Apr 2017, 0:37:09 UTC
Last modified: 4 Apr 2017, 0:37:23 UTC

Some users have been reporting that some workunits are taking significantly longer to run than what wingmen are reporting -- which is a pretty weird bug. See this forum thread.

For a status update, what I believe is happening is that certain parameters to the backpropagation is causing this issue on some operating systems/architectures. I've whipped up a page to track these "monster" workunits, which can be found here. This page shows all workunits with results that had runtimes more than 4x apart from each other. Quite a few of these had initial learning rates set at 1e-08 which is extremely low. I'm thinking this might be part of the problem so I've updated things server side to prevent the initial learning rate from being any lower that 1e-05.

I'm hoping that with this change, newly generated workunits should not have the problem anymore. If you're still seeing it please let me know. If you see it happen with a workunit that doesn't have a very low initial learning rate, please let me know as well as I'll need to do some more digging.

Zir Madmax
Send message
Joined: 17 Feb 17
Posts: 11
Combined Credit: 467,831
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 467,831
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 6894 - Posted: 12 Apr 2017, 18:37:03 UTC - in response to Message 6862.

Hi ... i have 1 monster .. at the begining i say 1 day and 8 houers now i say over 11d.14h days..
name exact_genome_1491416078_5_8943_1
And ther is no posibility to make it finish till 04/17 It still ading some houer now and then.
will u i delite it? or u can add some extra days for me? I need atleast 5 more days
Thomas/ Zir Madmax sweden
____________

Zir Madmax
Send message
Joined: 17 Feb 17
Posts: 11
Combined Credit: 467,831
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 467,831
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 6897 - Posted: 13 Apr 2017, 6:09:41 UTC - in response to Message 6894.

new update
exact_genome_1491416078_5_8943_1 have run 1d23h but have 12d5h left and adding more time (time limit 20170417)
and a new
exact_genome_1491416078_5_9502_1 have run 1d7h and 9d19h left and still adding more time (Time limit 20170418)
____________

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1791
Combined Credit: 2,265,607
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 1,622,832
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 710

              
Message 6899 - Posted: 13 Apr 2017, 18:33:17 UTC - in response to Message 6897.

I'll look into extending the deadline for workunits. Hopefully get something done by the end of the day. Not quite sure if updates on my end will change workunits that have already been sent out however... but I can at least fix it for newly generated workunits.

Zir Madmax
Send message
Joined: 17 Feb 17
Posts: 11
Combined Credit: 467,831
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 467,831
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 6901 - Posted: 14 Apr 2017, 7:39:16 UTC - in response to Message 6899.
Last modified: 14 Apr 2017, 7:52:57 UTC

There are no change on my Wu.. I need over 10-15 days more after dead line.. :-( for my 2 longruns....
update..
exact_genome_1491416078_5_8943_1 have run 3d 2h but have 13d10h left and still addig more time (time limit 20170417)
and
exact_genome_1491416078_5_9502_1 have run 2d 10h but have 12d5h left and still adding more time (time limit 20170417)

Profile Conan
Avatar
Send message
Joined: 13 Apr 12
Posts: 138
Combined Credit: 25,774,528
DNA@Home: 399,792
SubsetSum@Home: 1,448,876
Wildlife@Home: 23,925,860
Wildlife@Home Watched: 70,910s
Wildlife@Home Events: 0
Climate Tweets: 393
Images Observed: 0

          
Message 6902 - Posted: 14 Apr 2017, 8:59:22 UTC

If the change has been made at the project end then the user wont see the change on work units already sent out to their computer, however if the user then goes into their account at the project and look those work units up they should see the time extensions on their account pages.

All work units sent out after the change will show the extended time on BOINC manager as well as their account pages.

I have run past on a couple but have still returned the work unit before the third one gets sent back so got credit.
I have recently had 145 hour, 114 hour and 98 hour work units across both Linux and Windows.

I also have still running 3 work units with estimated times of 72 hours, 96 hours and 257 hours.

The credit so far has not been great, but I am hoping it will get better.

Conan
____________

Zir Madmax
Send message
Joined: 17 Feb 17
Posts: 11
Combined Credit: 467,831
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 467,831
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 6911 - Posted: 16 Apr 2017, 9:23:47 UTC - in response to Message 6901.

exact_genome_1491416078_5_8943_1 have run 5d 2h but have 12d19h left and still addig more time (time limit 20170417)
and
exact_genome_1491416078_5_9502_1 have run 4d 10h but have 13d5h left and still adding more time (time limit 20170418)

Ozzie1989
Send message
Joined: 16 Apr 17
Posts: 4
Combined Credit: 2,278
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 2,278
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0
Message 6919 - Posted: 19 Apr 2017, 8:07:55 UTC - in response to Message 6911.

HI all. New to this but picked up my first task a few days ago.

exact_genome_1492132398_5_12116

Initially started out well but is now hanging at 48.344% with the remaining time increasing second by second.

Elapsed: 2d 10h
Remaining: 2d 14h and going up

I assume that my problem is linked to this discussion and not something I'm doing wrong!

Ozzie1989
Send message
Joined: 16 Apr 17
Posts: 4
Combined Credit: 2,278
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 2,278
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0
Message 6920 - Posted: 19 Apr 2017, 10:58:30 UTC - in response to Message 6919.

To add to the above the task status page is saying:

2193743 51373 16 Apr 2017, 20:05:50 UTC 19 Apr 2017, 5:19:17 UTC Timed out - no response 0.00 0.00 --- EXACT MNIST Convolutional Neural Network Trainer v0.20

Perhaps I am doing something wrong!

mmonnin
Send message
Joined: 31 May 16
Posts: 25
Combined Credit: 17,058,860
DNA@Home: 0
SubsetSum@Home: 1,023,200
Wildlife@Home: 16,035,659
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 54
Images Observed: 0

      
Message 6921 - Posted: 19 Apr 2017, 11:45:19 UTC - in response to Message 6920.

Even for units that aren't send out past their deadline, the deadlines are pretty tight. Some only like 1.5 days. Not everyone runs a Project exclusively. There are many BOINC competitions and my team participates in some. With these short deadlines they can't be completed if there is even a short 2 day competition for another project. I'm seeing a lot of Timed out - no response from winmen due to the short deadlines.

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1791
Combined Credit: 2,265,607
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 1,622,832
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 710

              
Message 6923 - Posted: 19 Apr 2017, 21:21:21 UTC - in response to Message 6921.

Even for units that aren't send out past their deadline, the deadlines are pretty tight. Some only like 1.5 days. Not everyone runs a Project exclusively. There are many BOINC competitions and my team participates in some. With these short deadlines they can't be completed if there is even a short 2 day competition for another project. I'm seeing a lot of Timed out - no response from winmen due to the short deadlines.


I did update the deadlines to be dependent on WU size. So the deadlines should be somewhat dynamic now. Do you think they're too soon across the board? I can increase them for new workunits.

Ozzie1989
Send message
Joined: 16 Apr 17
Posts: 4
Combined Credit: 2,278
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 2,278
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0
Message 6943 - Posted: 21 Apr 2017, 22:48:18 UTC - in response to Message 6923.

According to the task status page I didn't get any credit for the last one which went over the deadline...

2193743 1020651 51373 16 Apr 2017, 20:05:50 UTC 21 Apr 2017, 20:52:00 UTC Completed, too late to validate 428,051.89 397,603.30 0.00 EXACT MNIST Convolutional Neural Network Trainer v0.20


Post to thread

Message boards : News : [wildlife] "monster" workunits status update