Advanced search

Message boards : Number Crunching : Marked invalid, I've got a question.

Author Message
P . P . L .
Send message
Joined: 10 Aug 14
Posts: 59
Combined Credit: 336,654
DNA@Home: 336,605
SubsetSum@Home: 0
Wildlife@Home: 49
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 4735 - Posted: 21 Oct 2014, 5:28:31 UTC
Last modified: 21 Oct 2014, 5:31:07 UTC

Hi Travis.

I've gotten just two of these marked as completed & invalid, both have the same sort of result file that I can see on my page. Two different rigs both are the longer tasks.

What is the cause of these can you tell me, sorry about all the text there are 32 lines of 0 & other numbers.

Thanks.

http://volunteer.cs.und.edu/csg/workunit.php?wuid=164815

gibbs_test_hg19_1000fa_3_119__172401_10000

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<stderr_txt>

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,3,137,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

986 sequences with 988407 total base pairs.
doing walk: 0
seeding: 1623381610
Zeroing counts for motif models.
Incrementing intial counts for motifs.
burn in period: 0, sample period: 10000
09:26:10 (3533): called boinc_finish(0)

</stderr_txt>
____________

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1813
Combined Credit: 23,514,257
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,871,482
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 774

              
Message 4736 - Posted: 21 Oct 2014, 16:32:36 UTC - in response to Message 4735.

That's very odd. The output of that task looks very off (it shouldn't be printing that). Maybe it was using a wrong or different application version?

P . P . L .
Send message
Joined: 10 Aug 14
Posts: 59
Combined Credit: 336,654
DNA@Home: 336,605
SubsetSum@Home: 0
Wildlife@Home: 49
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 4737 - Posted: 21 Oct 2014, 20:58:08 UTC
Last modified: 21 Oct 2014, 20:59:00 UTC

Travis.

These are the two tasks, in both cases there using the new app?

gibbs_test_hg19_1000fa_3_119__172401_10000

Completed and validated 7,912.40 7,612.46 45.38 DNA@Home Gibbs Sampler v0.44

Completed, marked as invalid 7,132.73 7,029.25 0.00 DNA@Home Gibbs Sampler v0.44

Completed and validated 7,860.41 7,804.56 45.38 DNA@Home Gibbs Sampler v0.47

====================================

gibbs_test_hg19_1000fa_3_119__171603_40000

Completed, marked as invalid 8,520.61 8,463.99 0.00 DNA@Home Gibbs Sampler v0.47

Completed and validated 6,551.01 4,788.90 35.79 DNA@Home Gibbs Sampler v0.44

Completed and validated 6,473.41 6,124.50 35.79 DNA@Home Gibbs Sampler v0.47
____________

Ananas
Send message
Joined: 12 Aug 14
Posts: 27
Combined Credit: 1,727,216
DNA@Home: 959,874
SubsetSum@Home: 767,342
Wildlife@Home: 0
Wildlife@Home Watched: 73,544s
Wildlife@Home Events: 10
Climate Tweets: 0
Images Observed: 0

        
Message 4738 - Posted: 21 Oct 2014, 21:12:43 UTC
Last modified: 21 Oct 2014, 21:26:13 UTC

Not only Linux boxes are affected :

Same error type on Windows 8.1 x64 (not my machine), that means that it cannot be an issue with the 0.47 built (there is no 0.47 on windows).

Looks much like an uninitialized variable or an array that is filled across it's limits, messing up a variable that is behind it in memory.

All machines have mostly normal looking results and bail out only seldom.

p.s.: Here would be one more, Windows 7 x64 this time

And this box has several on Linux x64, all 0.47 this time.

P . P . L .
Send message
Joined: 10 Aug 14
Posts: 59
Combined Credit: 336,654
DNA@Home: 336,605
SubsetSum@Home: 0
Wildlife@Home: 49
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 4739 - Posted: 22 Oct 2014, 2:04:56 UTC

Well I'm sort of glad that it's not just me then. ;)
____________

Ananas
Send message
Joined: 12 Aug 14
Posts: 27
Combined Credit: 1,727,216
DNA@Home: 959,874
SubsetSum@Home: 767,342
Wildlife@Home: 0
Wildlife@Home Watched: 73,544s
Wildlife@Home Events: 10
Climate Tweets: 0
Images Observed: 0

        
Message 4742 - Posted: 24 Oct 2014, 15:57:23 UTC

This is probably really bad :

http://volunteer.cs.und.edu/csg/workunit.php?wuid=182132

In this case, two of those strange results ended up in the scientific database and the one with the normal looking stdout became invalid.

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1813
Combined Credit: 23,514,257
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,871,482
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 774

              
Message 4743 - Posted: 24 Oct 2014, 16:35:50 UTC - in response to Message 4739.

Will be looking into this this weekend, hopefully can find a fix. Strange that it's only happening occasionally.

P . P . L .
Send message
Joined: 10 Aug 14
Posts: 59
Combined Credit: 336,654
DNA@Home: 336,605
SubsetSum@Home: 0
Wildlife@Home: 49
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 4942 - Posted: 10 Jan 2015, 2:10:32 UTC

Hi Travis.

Still getting invalids with both app's, lousing a lot of time. Hope you can get it fixed soon'ish.

2366044 1148371 4199 8 Jan 2015, 4:57:08 UTC 8 Jan 2015, 20:44:26 UTC Completed, marked as invalid 1,185.49 1,170.04 0.00 DNA@Home Gibbs Sampler v0.48
2360740 1145844 4200 7 Jan 2015, 21:37:23 UTC 9 Jan 2015, 0:16:21 UTC Completed, marked as invalid 19,251.23 19,116.83 0.00 DNA@Home Gibbs Sampler v0.50
2336135 1133808 4199 6 Jan 2015, 8:56:03 UTC 7 Jan 2015, 21:41:55 UTC Completed, marked as invalid 16,290.49 15,980.51 0.00 DNA@Home Gibbs Sampler v0.48
2318899 1125515 4200 5 Jan 2015, 2:33:35 UTC 6 Jan 2015, 0:10:00 UTC Completed, marked as invalid 19,049.79 18,956.66 0.00 DNA@Home Gibbs Sampler v0.50
____________

Profile Conan
Avatar
Send message
Joined: 13 Apr 12
Posts: 151
Combined Credit: 47,672,899
DNA@Home: 399,792
SubsetSum@Home: 1,448,876
Wildlife@Home: 45,824,231
Wildlife@Home Watched: 70,910s
Wildlife@Home Events: 0
Climate Tweets: 412
Images Observed: 0

          
Message 4943 - Posted: 10 Jan 2015, 4:48:30 UTC

G'Day P.P.L.,
Just checking and each of your "invalids" has been caused by the work unit restarting from a checkpoint. When this happens the "seed" changes so invalidates your work units.

No idea what the cause is, so we will have to wait till Travis has some spare time to find out why this happens.
Can't say I have struck this myself, and it is happening on both your computers.

Waiting on Travis.

Conan
____________

Euclid
Send message
Joined: 31 Dec 14
Posts: 1
Combined Credit: 2,214
DNA@Home: 1,155
SubsetSum@Home: 1,059
Wildlife@Home: 0
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0
Message 4944 - Posted: 10 Jan 2015, 11:23:47 UTC

Me too....on a WU that was 15,000 secs CPU. Time wasted...

Sorry am disconnecting from the project.

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1813
Combined Credit: 23,514,257
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,871,482
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 774

              
Message 4949 - Posted: 11 Jan 2015, 23:26:03 UTC - in response to Message 4944.

Still working on this, things are a little backed up with a paper we're working on with the DNA@Home results. That's due this thursday so hopefully I'll be able to get back to this next weekend (I was having trouble recreating the problem starting from checkpoints).

gtippitt
Send message
Joined: 26 Mar 14
Posts: 4
Combined Credit: 10,688,594
DNA@Home: 950,688
SubsetSum@Home: 0
Wildlife@Home: 9,737,906
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

    
Message 4953 - Posted: 13 Jan 2015, 0:46:30 UTC - in response to Message 4949.

If it might help you track down the bug, I've had 11 of these invalid jobs in the past week. Most were the typical looking thing where I'm the odd man out, and the other two get credit. Two of the work units were odd in that all 3 results were invalid. These two are below. I've had about 239,533 seconds (66 hours) of CPU time on invalid tasks this week.

http://volunteer.cs.und.edu/csg/workunit.php?wuid=1170301

http://volunteer.cs.und.edu/csg/workunit.php?wuid=1178312

Good Luck with the paper. I have sympathy with that pressure. When my late wife was a graduate student in Cognitive Psychology, I was working as an actuarial programing, so I always got stuck doing their data analysis for papers.

Greg

gtippitt
Send message
Joined: 26 Mar 14
Posts: 4
Combined Credit: 10,688,594
DNA@Home: 950,688
SubsetSum@Home: 0
Wildlife@Home: 9,737,906
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

    
Message 4956 - Posted: 15 Jan 2015, 18:39:42 UTC - in response to Message 4953.

Travis,
On my same machine that had work unit 1203615 (gibbs_snail_hg19_1000fa_3motifs_3_154__207475_290000), where none of the 3 machines got credit, I have gotten 14 tasks with errors. I sometimes get errors when systems crash, but on these work units, every user than tried to run them got errors.

http://volunteer.cs.und.edu/csg/results.php?hostid=5424&offset=0&show_names=0&state=6&appid=

These are not as disappointing as the work units where we've worked for a long time without credit since they end quickly, but they are are happening at the same time as the rash of invalid tasks, so they might be a different result from the same underlying problem.

I have a 6 systems running CSG, which all have the same hardware and Ubuntu 14.04 software, if I could help by running any tests for you where they might help.

I hope you get a few extra hours sleep when the paper is complete.

Greg

Jacob Klein
Send message
Joined: 1 Sep 14
Posts: 1
Combined Credit: 232,135
DNA@Home: 209,622
SubsetSum@Home: 1,349
Wildlife@Home: 21,164
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

    
Message 4985 - Posted: 9 Feb 2015, 3:32:15 UTC
Last modified: 9 Feb 2015, 3:32:42 UTC

Is this (task getting marked as invalid because it resumes from a checkpoint) still happening?

For instance, check this work unit:
http://volunteer.cs.und.edu/csg/workunit.php?wuid=1375428
... my task was marked as invalid.

How can we fix this? I don't want to waste my computer's processing time.

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1813
Combined Credit: 23,514,257
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,871,482
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 21
Images Observed: 774

              
Message 4988 - Posted: 9 Feb 2015, 18:13:51 UTC - in response to Message 4985.

Pretty sure the error is due to checkpointing but I've had issues recreating it to debug it. Once I'm back from my trip I'll bump this up to a higher priority.


Post to thread

Message boards : Number Crunching : Marked invalid, I've got a question.