Advanced search

Message boards : Number Crunching : Running time / validation errors / running time

Author Message
Profile marsinph
Send message
Joined: 2 Apr 18
Posts: 13
Combined Credit: 4,184,350
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 4,184,350
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7571 - Posted: 22 Aug 2018, 15:39:39 UTC

Hello,
Like some of us, we receive validation errors. Due to out dated WU on other hosts, too late, computation errors.
Is it possible to reduce the size of each WU ???
Some of them on my hosts are estimated about five days running time.
Not a problem. But it take all out for other projects, with the risks of "invalid tasks".
Then each task takes about 600 or 700 Mb of RAM.
And estimated about 2,000,000 GFLOPS to terminate !!!
it will says on a normal host with 8Gb RAM, it takes about 70% of available.
With results very few to run Windows, AV,
Of course, manually I can let run only 7 WU at the ame time. But then, It is needed to stay each couple of hour to look what happens.

Why not to reduce the size of each WU ???
And of course the credits.
It is more efficient for everyone !
Best regards from Belgium first team
____________

Profile marsinph
Send message
Joined: 2 Apr 18
Posts: 13
Combined Credit: 4,184,350
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 4,184,350
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7576 - Posted: 4 Sep 2018, 14:46:40 UTC

No answer !!!

Because the very long WU, I decide to run only four at the same time. Also because amount of memory (700mb/WU), I can not let run more.
And no any in "waiting" or "suspended" because they stay in RAM
I need RAM for other projects and to be able to work.
So now, I decide to cancel all WU exceeding 1 day.
It is not a problem of returning delay. It is very long.
Only to keep resources for other software.

So once again, why not to reduce running time (and also credits).
Instead of sending some WU who need a estimated of five days, no any WU more than one day !!!
Best regards

mmonnin
Send message
Joined: 31 May 16
Posts: 38
Combined Credit: 34,803,115
DNA@Home: 0
SubsetSum@Home: 1,023,200
Wildlife@Home: 33,779,915
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 54
Images Observed: 10,675

        
Message 7577 - Posted: 4 Sep 2018, 18:34:16 UTC
Last modified: 4 Sep 2018, 18:38:13 UTC

The admin won't even respond to a direct email.

Going around and complaining on every BOINC project forum for things that are normal doesn't help either.

Profile marsinph
Send message
Joined: 2 Apr 18
Posts: 13
Combined Credit: 4,184,350
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 4,184,350
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7578 - Posted: 6 Sep 2018, 19:24:51 UTC - in response to Message 7577.

Mmonin,
OK, you are right.
I only to try to understand.
You are very active on a lot of project. Like me. Instead to complain, then give explanation !!!
The latest here. My setting are set to accept only one day WU.
This to prvent to get WU unable to be returned in delay.
It is my choice.
But this project send WU with expected running time about 2 or 3 or 5 days.
Please do not speak about running time and delay, because this project sendabout 10 WU with expected running of some days.
Because the WU take a lot of RAM, I do not want to block my RAM for some days !!! (750mb/WU).

Also till now, nobody is able to give explanation why on the same host, same App, same estimated Gflops, ...Alle the same
Why a WU , i repeat same host, same apps (and same batch file) ,one receive twice less credits running twice more !!!
Please analyze my hosts, all is fully open.
So, i consider till explanation that credit here are very unstable.
Not follow any logical.
Once again and for the latest time I repeat, I compare WU between same host, config, same Apps...
Best regards

lanbrown
Send message
Joined: 17 Sep 16
Posts: 14
Combined Credit: 206,753,842
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 206,753,842
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7581 - Posted: 9 Sep 2018, 1:29:09 UTC - in response to Message 7576.


So now, I decide to cancel all WU exceeding 1 day.


By aborting units you are contributing to the problems of not getting credit for a WU that gets completed but no credit given because it cannot be validated. It will only send a WU out five times. In some cases I completed a WU but I can't get credit for it because some people decided to abort the WU.

So the bigger issue is bad hosts that return a lot of invalid WU's and some hosts where large quantities of WU's are aborted by the user.

You can control how much RAM BOINC will use; why not use that to control how many WU's are run?

Profile marsinph
Send message
Joined: 2 Apr 18
Posts: 13
Combined Credit: 4,184,350
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 4,184,350
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7582 - Posted: 9 Sep 2018, 11:50:23 UTC - in response to Message 7581.


So now, I decide to cancel all WU exceeding 1 day.


By aborting units you are contributing to the problems of not getting credit for a WU that gets completed but no credit given because it cannot be validated. It will only send a WU out five times. In some cases I completed a WU but I can't get credit for it because some people decided to abort the WU.

So the bigger issue is bad hosts that return a lot of invalid WU's and some hosts where large quantities of WU's are aborted by the user.

You can control how much RAM BOINC will use; why not use that to control how many WU's are run?



Hello,
I understand aborting WU is not fair. Sorry, I appologize.

To prevent excessive amount of WU, my BAM settings are "accept tasks" for one day.
But sometimes, I receive ten WU's with estimated running time of several days (till 5 days on a I7-3820, 3.6Ghz, 8/cores and 8Gb RAM )!!!

Of course, manually, i let run two WU at the same time to not block all the RAM (one WU need about 750Mb)
Like you see on my stats, I run several projects.
But if I let run one WU during five days, it block (reserve) 750Mb.
With also the risk of invalid WU.
Best regards
____________

mmonnin
Send message
Joined: 31 May 16
Posts: 38
Combined Credit: 34,803,115
DNA@Home: 0
SubsetSum@Home: 1,023,200
Wildlife@Home: 33,779,915
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 54
Images Observed: 10,675

        
Message 7583 - Posted: 9 Sep 2018, 14:31:41 UTC

Some tasks just take several days to complete. Even on my 2700x some tasks take 3 days. The scheduler can't predict that. It'll just send you tasks to run for a a day. If you need a task to run then it will send you a task. It won't not send you a task because it may take too long. Its going to send a task to keep a CPU busy.

Run times and memory usage are part of each individual project. There's nothing we can do about it. If the run times and memory usage is beyond what you want to run then the project isn't for you.

lanbrown
Send message
Joined: 17 Sep 16
Posts: 14
Combined Credit: 206,753,842
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 206,753,842
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7584 - Posted: 9 Sep 2018, 18:29:42 UTC - in response to Message 7583.

Run times and memory usage are part of each individual project. There's nothing we can do about it. If the run times and memory usage is beyond what you want to run then the project isn't for you.


+1

There are many projects out there. There are other projects that use around the same memory; they don't run for nearly as long though but that has nothing to do with it. Say they run for 12 hours. How is it any different that it uses 750MB for 12 hours but then you just run more of those back to back. After a day you ran two WU's that took 750MB or one WU that took that. The memory is being used regardless.

If you have limited RAM and need to use the machine for other tasks, then either less CPU cores should be allowed for BOINC or you need to pick and choose projects that fit within the memory footprint.

Profile marsinph
Send message
Joined: 2 Apr 18
Posts: 13
Combined Credit: 4,184,350
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 4,184,350
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7587 - Posted: 10 Sep 2018, 15:45:29 UTC - in response to Message 7584.

Run times and memory usage are part of each individual project. There's nothing we can do about it. If the run times and memory usage is beyond what you want to run then the project isn't for you.


+1

There are many projects out there. There are other projects that use around the same memory; they don't run for nearly as long though but that has nothing to do with it. Say they run for 12 hours. How is it any different that it uses 750MB for 12 hours but then you just run more of those back to back. After a day you ran two WU's that took 750MB or one WU that took that. The memory is being used regardless.

If you have limited RAM and need to use the machine for other tasks, then either less CPU cores should be allowed for BOINC or you need to pick and choose projects that fit within the memory footprint.




Hello, thankl you for explanation.
On one host, I have 8Gb, on all other hosts, I have 16Gb.
So enough on 16Gb hosts. But If I let run also NFS or Cosmology, (also 750Mb / WU), the host with 8Gb run out of memory.

To suspend one project will nothing change because RAM stay allocated.
Of course, I can choose in BAM to not leave task in RAM while suspended. But then a lot of time lost.

So depending availability, I let run WU or not. It is why I return few WU.
Regarding my stats, you can see I "crunch" on much project, depending the ressources I need on my hosts. Three are fully and only for BOINC (host 82553 / 8Gb RAM, host 80783 with 16Gb and host 80117 with 16Gb).
My fourth and latest host runs depending my normal use.

This to explain configs.

But it not explain why on the dedecated host, I sometimes receive ten WU with estimated running time of some days !!! For sure when my BAM say to not accept work for more than one day. This to prevent to much WU who would unable to be finished in time.
____________

lanbrown
Send message
Joined: 17 Sep 16
Posts: 14
Combined Credit: 206,753,842
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 206,753,842
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7588 - Posted: 10 Sep 2018, 16:19:21 UTC - in response to Message 7587.

This to explain configs.

But it not explain why on the dedecated host, I sometimes receive ten WU with estimated running time of some days !!! For sure when my BAM say to not accept work for more than one day. This to prevent to much WU who would unable to be finished in time.


If you look at the preferences at CS Grid, they don't have a setting like that. So while you might set that at BAM, CS Grid cannot respect a setting that they do not have. You get what WU's are available.

Since you also run other projects, it sounds like you might be better off removing this one host from CS Grid and let it run other projects where it doesn't have an issue. Sometimes it is more beneficial not to run a project on a particular machine than run it.

Profile searching to find the meaning of life I was lost & desperate. Then God touched my heart & soul and showed me He loved me so deeply as to lay down His life for me. Jesus proves God is Love. Dios es Amor, Jesus demuestra. LPa H
Avatar
Send message
Joined: 11 Jul 15
Posts: 6
Combined Credit: 58,796,055
DNA@Home: 601
SubsetSum@Home: 3,031
Wildlife@Home: 58,792,423
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7591 - Posted: 10 Sep 2018, 21:41:14 UTC - in response to Message 7583.

Hi, mmonnin,
" part of each individual project. There's nothing we can do about it"
You say, "we can do"... are you a moderator for this project, or otherwise on the project team?

Message 7581 - Posted: 9 Sep 2018, 1:29:09 UTC :
"By aborting units you are contributing to the problems ..."

On rare occasions I have to abort a WU, for various reasons.
Surely CSG can distinguish between a real compute error and an abort by the user.
If not, THIS needs to be fixed.
Thanks,
LLP, PhD PE
____________
I think, therefore I THINK I am.
My thinking's neither the source of my being--
not proves it to others.
God Is Love, Jesus proves it

mmonnin
Send message
Joined: 31 May 16
Posts: 38
Combined Credit: 34,803,115
DNA@Home: 0
SubsetSum@Home: 1,023,200
Wildlife@Home: 33,779,915
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 54
Images Observed: 10,675

        
Message 7597 - Posted: 11 Sep 2018, 1:24:16 UTC - in response to Message 7591.

Hi, mmonnin,
" part of each individual project. There's nothing we can do about it"
You say, "we can do"... are you a moderator for this project, or otherwise on the project team?

Message 7581 - Posted: 9 Sep 2018, 1:29:09 UTC :
"By aborting units you are contributing to the problems ..."

On rare occasions I have to abort a WU, for various reasons.
Surely CSG can distinguish between a real compute error and an abort by the user.
If not, THIS needs to be fixed.
Thanks,
LLP, PhD PE


No just a regular user. And as a regular user, "There's nothing we can do about it" meaning the user cannot change the memory usage of the app. Your quote of me is out of context.

Gap Filler
Send message
Joined: 1 Mar 17
Posts: 2
Combined Credit: 7,008,322
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 7,008,322
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 7600 - Posted: 12 Sep 2018, 8:32:21 UTC

Another possibility to limit the impact is to try an app_config.xml in your boinc\projects\<project>\ directory

Something like
C:\ProgramData\BOINC\projects\csgrid.org_csg\app_config.xml

<app_config>
<app>
<name>exact_bn_sfmp</name>
<max_concurrent>6</max_concurrent>
</app>
</app_config>

See https://boinc.berkeley.edu/wiki/Client_configuration


Post to thread

Message boards : Number Crunching : Running time / validation errors / running time