1) Message boards : Number Crunching : Marked invalid, I've got a question. (Message 4956)
Posted 15 Jan 2015 by gtippitt
Travis,
On my same machine that had work unit 1203615 (gibbs_snail_hg19_1000fa_3motifs_3_154__207475_290000), where none of the 3 machines got credit, I have gotten 14 tasks with errors. I sometimes get errors when systems crash, but on these work units, every user than tried to run them got errors.

http://volunteer.cs.und.edu/csg/results.php?hostid=5424&offset=0&show_names=0&state=6&appid=

These are not as disappointing as the work units where we've worked for a long time without credit since they end quickly, but they are are happening at the same time as the rash of invalid tasks, so they might be a different result from the same underlying problem.

I have a 6 systems running CSG, which all have the same hardware and Ubuntu 14.04 software, if I could help by running any tests for you where they might help.

I hope you get a few extra hours sleep when the paper is complete.

Greg
2) Message boards : Number Crunching : Running multiple tasks on a multi-core system takes longer. (Message 4955)
Posted 14 Jan 2015 by gtippitt
Your problem is most likely that you don't have enough RAM for more than 1 or 2 CSG/DNA tasks to run concurrently. This project requires a great deal more RAM per tasks than POEM. The discussion in the "Memory Usage" thread in this forum has details on this and how to balance the load on multi-core systems.

Some of the DNA tasks currently running on my systems right now require 1.5 GB per tasks, which for your systems would be about half the systems' memory for 1 DNA task. Restricting your systems to running 1 or 2 DNA tasks, and allowing POEM to use the other cores will give you better overall throughput of work units.

Good Luck,
Greg
3) Message boards : Number Crunching : Marked invalid, I've got a question. (Message 4953)
Posted 13 Jan 2015 by gtippitt
If it might help you track down the bug, I've had 11 of these invalid jobs in the past week. Most were the typical looking thing where I'm the odd man out, and the other two get credit. Two of the work units were odd in that all 3 results were invalid. These two are below. I've had about 239,533 seconds (66 hours) of CPU time on invalid tasks this week.

http://volunteer.cs.und.edu/csg/workunit.php?wuid=1170301

http://volunteer.cs.und.edu/csg/workunit.php?wuid=1178312

Good Luck with the paper. I have sympathy with that pressure. When my late wife was a graduate student in Cognitive Psychology, I was working as an actuarial programing, so I always got stuck doing their data analysis for papers.

Greg
4) Message boards : Number Crunching : Memory usage (Message 4952)
Posted 12 Jan 2015 by gtippitt
Having 1GB per job is a good rule of thumb for this project. I've got 7 motherboards with 24 cores each (quad AMD 8431 Hex cores) running DNA jobs. The RAM in these systems varies from 16GB and up. Only ones with at least 32GB are able to run 20 DNA jobs without memory problems. Setting a limit for concurrent DNA jobs to allow the system to run other jobs on other cores is the best solution. POEM@Home is a good choice, as they are working on similar research and use less RAM.

You should also check your BOINC config to turn off the option to leave jobs in memory when interrupted. This applies not only to when a job is suspended by a user's non-BOINC work, but also if a BOINC job is waiting for another with higher priority to finish.

My Ubuntu systems are diskless and use ISCSI drives from my disk server. For the 24 core systems, I limit the number of cores for BOINC to 23 to leave 1 core free to take care of system overhead and LAN IO. With this setting, I get overall CPU utilization of at least 95%. If BOINC jobs are running on all CPU cores, the CPU utilization tops at about 85% due to jobs waiting on resources. These systems are also running jobs on 1 or more GPUs for POEM and GpuGrid, so keeping resources free to prevent bottlenecks to the GPUs is important.

On multicore systems I would recommend not running DNA jobs on all cores by limiting the number of concurrent DNS jobs. I would leave 1 or more cores to run smaller jobs like POEM and taking care of system overhead. This way the DNA jobs get everything they need to run as fast as possible. For example on dual and quad core systems, I have 1 core limit for projects like CSG and WCG, and let smaller RAM projects like SETI and POEM use whatever is left.

Every project is a bit different. Some run best using the fast, but limited instruction set of the GPU, while other projects need a CPU, which is slower but has a more robust instruction set. Some projects are doing only a few operations against data that can be chopped in small chunks and run with few resources. Other projects with more complex logic will require more resources, even with small chunks of data.

For a simple (and silly) example:
I have a bunch of small animals that need to be sorted into different holding areas. These animals are either wild sewer rats or tame pet chinchillas. No matter how many animals I need to sort, or how many people are sorting them, the instructions are fairly short. Is the animal smooth haired or fuzzy? Simple logic that requires few instructions to be written.

Another project has lots of pictures of cats, guinea pigs, rabbits, and dogs to be sorted and counted. The instructions for this project would be much more complex. How to tell the difference between, a long haired guinea pig, a persian kitten, and a Yorkshire terrier? How you tell the beagle from a rabbit? Both have long ears. You can't hear the Beagle's bay that she's smelled a rabbit. Both are standing still and alert smelling the air. Some of the rabbits are while, while others are brown. Some of the Beagles are white, while others are tri-color. They both are in the same tall grass, so fur length and size are difficult to determine from the two photos. A long list of instructions will be needed for the sorters. It is not as simple as "use less paper, so the sorters don't need to read as many instructions." Using a smaller font size won't help.

It isn't that some projects are "better", they are all trying to determine different things, even when their goals and subjects of research are similar.

Greg