Author |
Message |
Travis DesellVolunteer moderator Project administrator Project developer Project scientist Send message
Joined: 16 Jan 12 Posts: 1813 Combined Credit: 23,514,257 DNA@Home: 293,563 SubsetSum@Home: 349,212 Wildlife@Home: 22,871,482 Wildlife@Home Watched: 212,926s Wildlife@Home Events: 51 Climate Tweets: 26 Images Observed: 774
 |
Well that was a deep rabbit hole! After implementing my own exp() function, along with finding out bout hexfloats and upgrading visual studio to a version that actually implemented them... AFAIK we really have new working applications! Everything is checkpointing and returning identical results on my end across windows/osx/linux so please let me know if you anything fishy happening.
I've got the new validator and assimilator up and going, and will be keeping an eye on them. I'm running them manually to check output/etc, so they won't show up on the status page until I enable them to run by default. With any luck, we really should be up and running for awhile now! |
|
|
WTBroughtonSend message
Joined: 25 Apr 12 Posts: 2 Combined Credit: 611,562 DNA@Home: 34,170 SubsetSum@Home: 15,272 Wildlife@Home: 562,121 Wildlife@Home Watched: 0s Wildlife@Home Events: 0 Climate Tweets: 1 Images Observed: 0
 |
I'm getting the following errors, a miss match on version number of input file (v0.11).
<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
Incorrect function.
(0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
arguments:
'projects/csgrid.org_csg/exact_client_0.13_windows_x86_64.exe'
'--samples_file'
'samples.bin'
'--genome_file'
'input_genome.txt'
'--output_file'
'output_genome.txt'
'--checkpoint_file'
'checkpoint.txt'
converting arguments to vector
boincified samples filename: '../../projects/csgrid.org_csg/mnist_training_data.bin'
boincified genome filename: '../../projects/csgrid.org_csg/exact_genome_1483942068_21_12434.txt'
boincified output filename: '../../projects/csgrid.org_csg/exact_genome_1483942068_21_12434_3_r732544715_0'
boincified checkpoint filename: 'checkpoint.txt'
parsed arguments, loading images
number_classes: 10
rowscols: 28
vals_per_pixel: 1
reading image set with 5923 images.
reading image set with 6742 images.
reading image set with 5958 images.
reading image set with 6131 images.
reading image set with 5842 images.
reading image set with 5421 images.
reading image set with 5918 images.
reading image set with 6265 images.
reading image set with 5851 images.
reading image set with 5949 images.
image_size: 28x28 = 784
read 60000 images.
class 0: 5923
class 1: 6742
class 2: 5958
class 3: 6131
class 4: 5842
class 5: 5421
class 6: 5918
class 7: 6265
class 8: 5851
class 9: 5949
normalizing images.
average pixel value: 0.13066
pixel variance: 0.0949304
pixel standard deviation: 0.308108
normalized.
loaded images
starting from input file: '../../projects/csgrid.org_csg/exact_genome_1483942068_21_12434.txt'
read CNN_Genome file with version string: 'v0.11'
breaking because version_str '0.11' did not match EXACT_VERSION '0.13': 1
parsed input file
ERROR: exact application with version '0.13' trying to process workunit with incompatible input version: '0.11'
11:03:51 (12692): called boinc_finish(1)
</stderr_txt>
]]> |
|
|
Travis DesellVolunteer moderator Project administrator Project developer Project scientist Send message
Joined: 16 Jan 12 Posts: 1813 Combined Credit: 23,514,257 DNA@Home: 293,563 SubsetSum@Home: 349,212 Wildlife@Home: 22,871,482 Wildlife@Home Watched: 212,926s Wildlife@Home Events: 51 Climate Tweets: 26 Images Observed: 774
 |
I'm getting the following errors, a miss match on version number of input file (v0.11).
7.6.33
Incorrect function.
(0x1) - exit code 1 (0x1)
arguments:
'projects/csgrid.org_csg/exact_client_0.13_windows_x86_64.exe'
'--samples_file'
'samples.bin'
'--genome_file'
'input_genome.txt'
'--output_file'
'output_genome.txt'
'--checkpoint_file'
'checkpoint.txt'
converting arguments to vector
boincified samples filename: '../../projects/csgrid.org_csg/mnist_training_data.bin'
boincified genome filename: '../../projects/csgrid.org_csg/exact_genome_1483942068_21_12434.txt'
boincified output filename: '../../projects/csgrid.org_csg/exact_genome_1483942068_21_12434_3_r732544715_0'
boincified checkpoint filename: 'checkpoint.txt'
parsed arguments, loading images
number_classes: 10
rowscols: 28
vals_per_pixel: 1
reading image set with 5923 images.
reading image set with 6742 images.
reading image set with 5958 images.
reading image set with 6131 images.
reading image set with 5842 images.
reading image set with 5421 images.
reading image set with 5918 images.
reading image set with 6265 images.
reading image set with 5851 images.
reading image set with 5949 images.
image_size: 28x28 = 784
read 60000 images.
class 0: 5923
class 1: 6742
class 2: 5958
class 3: 6131
class 4: 5842
class 5: 5421
class 6: 5918
class 7: 6265
class 8: 5851
class 9: 5949
normalizing images.
average pixel value: 0.13066
pixel variance: 0.0949304
pixel standard deviation: 0.308108
normalized.
loaded images
starting from input file: '../../projects/csgrid.org_csg/exact_genome_1483942068_21_12434.txt'
read CNN_Genome file with version string: 'v0.11'
breaking because version_str '0.11' did not match EXACT_VERSION '0.13': 1
parsed input file
ERROR: exact application with version '0.13' trying to process workunit with incompatible input version: '0.11'
11:03:51 (12692): called boinc_finish(1)
]]>
Sorry this is working as intended and part of the reason I let the system (mostly) flush out of workunits before starting up the new application. There were a couple stragglers left but I didn't want to wait around another week to see if/when those results would come home.
v0.13 uses a different input file so it can't parse older workunits generated for v0.11. (and vice versa). |
|
|
WTBroughtonSend message
Joined: 25 Apr 12 Posts: 2 Combined Credit: 611,562 DNA@Home: 34,170 SubsetSum@Home: 15,272 Wildlife@Home: 562,121 Wildlife@Home Watched: 0s Wildlife@Home Events: 0 Climate Tweets: 1 Images Observed: 0
 |
Understood, no problem, I'll keep on crunching. |
|
|
Travis DesellVolunteer moderator Project administrator Project developer Project scientist Send message
Joined: 16 Jan 12 Posts: 1813 Combined Credit: 23,514,257 DNA@Home: 293,563 SubsetSum@Home: 349,212 Wildlife@Home: 22,871,482 Wildlife@Home Watched: 212,926s Wildlife@Home Events: 51 Climate Tweets: 26 Images Observed: 774
 |
Understood, no problem, I'll keep on crunching.
Thanks! Every cruncher/computer helps. :) |
|
|
Chris Skull Send message
Joined: 11 Apr 15 Posts: 20 Combined Credit: 4,766,966 DNA@Home: 55,861 SubsetSum@Home: 1,272,523 Wildlife@Home: 3,438,582 Wildlife@Home Watched: 1,312,789s Wildlife@Home Events: 475 Climate Tweets: 0 Images Observed: 46
 |
isn't the granded credit a litle bit high ? :)
____________
Greetz
Chris
|
|
|
JumpinJohnny Send message
Joined: 24 Sep 13 Posts: 237 Combined Credit: 10,275,610 DNA@Home: 192,548 SubsetSum@Home: 201,740 Wildlife@Home: 9,881,323 Wildlife@Home Watched: 55,997,833s Wildlife@Home Events: 15,584 Climate Tweets: 389 Images Observed: 351
 |
isn't the granded credit a litle bit high ? :)
I also noticed that the v0.13 are about 80 percent faster than the previous versions but paying about the same credits.
____________
|
|
|
JumpinJohnny Send message
Joined: 24 Sep 13 Posts: 237 Combined Credit: 10,275,610 DNA@Home: 192,548 SubsetSum@Home: 201,740 Wildlife@Home: 9,881,323 Wildlife@Home Watched: 55,997,833s Wildlife@Home Events: 15,584 Climate Tweets: 389 Images Observed: 351
 |
isn't the granded credit a litle bit high ? :)
I also noticed that the v0.13 are about 80 percent faster than the previous versions but paying about the same credits.
EDIT: Should have said EIGHT TIMES faster.
Yes, very high granted credit for CPU work. |
|
|
Conan Send message
Joined: 13 Apr 12 Posts: 151 Combined Credit: 47,672,899 DNA@Home: 399,792 SubsetSum@Home: 1,448,876 Wildlife@Home: 45,824,231 Wildlife@Home Watched: 70,910s Wildlife@Home Events: 0 Climate Tweets: 450 Images Observed: 0
 |
How much memory do these new work units require on Windows 32 bit?
On my Linux 64 bit they seem to use about 400 to 450 MB
However I only get errors on my Windows computer even though only 2.3 GB is in use before a CSG WU starts, the WU then fails in seconds due to an "Out Of Memory" error.
In the case of the credits it helps when there are a lot of errors during the sorting out of an application, to make up for the ones that don't make it to completion.
So I don't see a big issue with them, as I am losing work units which get nothing.
Conan
____________
|
|
|
[AF>France>IDF]LicSend message
Joined: 30 Aug 13 Posts: 6 Combined Credit: 11,024,652 DNA@Home: 436,798 SubsetSum@Home: 140,468 Wildlife@Home: 10,447,386 Wildlife@Home Watched: 16,667s Wildlife@Home Events: 0 Climate Tweets: 384 Images Observed: 264
 |
On my Windows 32bit host, I get some errors
26/01/2017 09:32:03 Citizen Science Grid Starting exact_genome_1485416402_21_6679_0
26/01/2017 09:32:03 Citizen Science Grid [task_debug] task_state=EXECUTING for exact_genome_1485416402_21_6679_0 from start
26/01/2017 09:32:03 Citizen Science Grid Starting task exact_genome_1485416402_21_6679_0 using exact version 13
26/01/2017 09:32:03 Citizen Science Grid Starting exact_genome_1485416402_21_6678_1
26/01/2017 09:32:03 Citizen Science Grid [task_debug] task_state=EXECUTING for exact_genome_1485416402_21_6678_1 from start
26/01/2017 09:32:03 Citizen Science Grid Starting task exact_genome_1485416402_21_6678_1 using exact version 13
26/01/2017 09:32:07 Citizen Science Grid [task_debug] Process for exact_genome_1485416402_21_6679_0 exited
26/01/2017 09:32:07 Citizen Science Grid [task_debug] task_state=EXITED for exact_genome_1485416402_21_6679_0 from handle_exited_app
26/01/2017 09:32:07 Citizen Science Grid [task_debug] result state=COMPUTE_ERROR for exact_genome_1485416402_21_6679_0 from CS::report_result_error
26/01/2017 09:32:07 Citizen Science Grid [task_debug] Process for exact_genome_1485416402_21_6679_0 exited
26/01/2017 09:32:07 Citizen Science Grid [task_debug] exit code -529697949 (0xe06d7363):
26/01/2017 09:32:07 Citizen Science Grid [task_debug] Process for exact_genome_1485416402_21_6678_1 exited
26/01/2017 09:32:07 Citizen Science Grid [task_debug] task_state=EXITED for exact_genome_1485416402_21_6678_1 from handle_exited_app
26/01/2017 09:32:07 Citizen Science Grid [task_debug] result state=COMPUTE_ERROR for exact_genome_1485416402_21_6678_1 from CS::report_result_error
26/01/2017 09:32:07 Citizen Science Grid [task_debug] Process for exact_genome_1485416402_21_6678_1 exited
26/01/2017 09:32:07 Citizen Science Grid [task_debug] exit code -529697949 (0xe06d7363):
26/01/2017 09:32:07 Citizen Science Grid Computation for task exact_genome_1485416402_21_6679_0 finished
26/01/2017 09:32:07 Citizen Science Grid Output file exact_genome_1485416402_21_6679_0_r1407925858_0 for task exact_genome_1485416402_21_6679_0 absent
26/01/2017 09:32:07 Citizen Science Grid [task_debug] result state=COMPUTE_ERROR for exact_genome_1485416402_21_6679_0 from CS::app_finished
26/01/2017 09:32:07 Citizen Science Grid Computation for task exact_genome_1485416402_21_6678_1 finished
26/01/2017 09:32:07 Citizen Science Grid Output file exact_genome_1485416402_21_6678_1_r1308759064_0 for task exact_genome_1485416402_21_6678_1 absent
26/01/2017 09:32:07 Citizen Science Grid [task_debug] result state=COMPUTE_ERROR for exact_genome_1485416402_21_6678_1 from CS::app_finished
If you have an idea. |
|
|
Travis DesellVolunteer moderator Project administrator Project developer Project scientist Send message
Joined: 16 Jan 12 Posts: 1813 Combined Credit: 23,514,257 DNA@Home: 293,563 SubsetSum@Home: 349,212 Wildlife@Home: 22,871,482 Wildlife@Home Watched: 212,926s Wildlife@Home Events: 51 Climate Tweets: 26 Images Observed: 774
 |
isn't the granded credit a litle bit high ? :)
I also noticed that the v0.13 are about 80 percent faster than the previous versions but paying about the same credits.
EDIT: Should have said EIGHT TIMES faster.
Yes, very high granted credit for CPU work.
Okay I'll get the credit dropped down a bit. I did do quite a few performance enhancements while I was debugging. |
|
|
Travis DesellVolunteer moderator Project administrator Project developer Project scientist Send message
Joined: 16 Jan 12 Posts: 1813 Combined Credit: 23,514,257 DNA@Home: 293,563 SubsetSum@Home: 349,212 Wildlife@Home: 22,871,482 Wildlife@Home Watched: 212,926s Wildlife@Home Events: 51 Climate Tweets: 26 Images Observed: 774
 |
How much memory do these new work units require on Windows 32 bit?
On my Linux 64 bit they seem to use about 400 to 450 MB
However I only get errors on my Windows computer even though only 2.3 GB is in use before a CSG WU starts, the WU then fails in seconds due to an "Out Of Memory" error.
In the case of the credits it helps when there are a lot of errors during the sorting out of an application, to make up for the ones that don't make it to completion.
So I don't see a big issue with them, as I am losing work units which get nothing.
Conan
This is odd, do you have a link to any tasks that have done this so I can check out the standard error output? |
|
|
Travis DesellVolunteer moderator Project administrator Project developer Project scientist Send message
Joined: 16 Jan 12 Posts: 1813 Combined Credit: 23,514,257 DNA@Home: 293,563 SubsetSum@Home: 349,212 Wildlife@Home: 22,871,482 Wildlife@Home Watched: 212,926s Wildlife@Home Events: 51 Climate Tweets: 26 Images Observed: 774
 |
How much memory do these new work units require on Windows 32 bit?
On my Linux 64 bit they seem to use about 400 to 450 MB
However I only get errors on my Windows computer even though only 2.3 GB is in use before a CSG WU starts, the WU then fails in seconds due to an "Out Of Memory" error.
In the case of the credits it helps when there are a lot of errors during the sorting out of an application, to make up for the ones that don't make it to completion.
So I don't see a big issue with them, as I am losing work units which get nothing.
Conan
Yuck. I *think* this may be because the AMD host is parsing the binary images file incorrectly...
If you look at this:
http://csgrid.org/csg/result.php?resultid=1649068
When it loads the images the numbers are kind of ridiculous:
parsed arguments, loading images
number_classes: 1718580028
rowscols: 1768710004
vals_per_pixel: 775842670
Now I'm going to need to find an AMD host to test on. Will probably need to come up with a text version of the input file, which is a little troublesome as that will be *huge*. |
|
|
JumpinJohnny Send message
Joined: 24 Sep 13 Posts: 237 Combined Credit: 10,275,610 DNA@Home: 192,548 SubsetSum@Home: 201,740 Wildlife@Home: 9,881,323 Wildlife@Home Watched: 55,997,833s Wildlife@Home Events: 15,584 Climate Tweets: 389 Images Observed: 351
 |
How much memory do these new work units require on Windows 32 bit?
On my Linux 64 bit they seem to use about 400 to 450 MB
However I only get errors on my Windows computer even though only 2.3 GB is in use before a CSG WU starts, the WU then fails in seconds due to an "Out Of Memory" error.
Yuck. I *think* this may be because the AMD host is parsing the binary images file incorrectly...
....
http://csgrid.org/csg/result.php?resultid=1649068
......
Now I'm going to need to find an AMD host to test on. ....
.....................................................................
Just my 2 cents worth:
The failing computers are all Windoz 32 bit Windoz XP
>>> http://csgrid.org/csg/show_host_detail.php?hostid=107
>>>http://csgrid.org/csg/show_host_detail.php?hostid=1966
Identical AMD cpus are running Win 7 (64) and Linux with NO errors.
The issue is the OS. >>> Win XP <<< 32 bit hosts with AMD. |
|
|
Conan Send message
Joined: 13 Apr 12 Posts: 151 Combined Credit: 47,672,899 DNA@Home: 399,792 SubsetSum@Home: 1,448,876 Wildlife@Home: 45,824,231 Wildlife@Home Watched: 70,910s Wildlife@Home Events: 0 Climate Tweets: 450 Images Observed: 0
 |
My computer are there for all to see. Yes the one failing is Windows XP 32 bit.
I stopped running Einstein (each WU took over 500MB) and this has dropped me back to just 1.8 GB of memory in use so it can't be memory related.
I have checked but can't see that my anti-virus is blocking certain files, so it appears OK.
As already pointed out I have a same generation AMD processor running Linux and it is having no problems at all.
I was going on what the output was saying as I noted a comment saying "Out of Memory", there is also a comment saying they a certain file can't be found or opened.
Conan
____________
|
|
|
Travis DesellVolunteer moderator Project administrator Project developer Project scientist Send message
Joined: 16 Jan 12 Posts: 1813 Combined Credit: 23,514,257 DNA@Home: 293,563 SubsetSum@Home: 349,212 Wildlife@Home: 22,871,482 Wildlife@Home Watched: 212,926s Wildlife@Home Events: 51 Climate Tweets: 26 Images Observed: 774
 |
My computer are there for all to see. Yes the one failing is Windows XP 32 bit.
I stopped running Einstein (each WU took over 500MB) and this has dropped me back to just 1.8 GB of memory in use so it can't be memory related.
I have checked but can't see that my anti-virus is blocking certain files, so it appears OK.
As already pointed out I have a same generation AMD processor running Linux and it is having no problems at all.
I was going on what the output was saying as I noted a comment saying "Out of Memory", there is also a comment saying they a certain file can't be found or opened.
Conan
Okay that's very odd. It looks like the windows binary may not be hooking up into BOINC correctly, given the line:
03:46:10 (5440): Can't open init data file - running in standalone mode
I wonder if something is wonky with your BOINC installation on that machine? |
|
|
Travis DesellVolunteer moderator Project administrator Project developer Project scientist Send message
Joined: 16 Jan 12 Posts: 1813 Combined Credit: 23,514,257 DNA@Home: 293,563 SubsetSum@Home: 349,212 Wildlife@Home: 22,871,482 Wildlife@Home Watched: 212,926s Wildlife@Home Events: 51 Climate Tweets: 26 Images Observed: 774
 |
I sent an email out to the BOINC mailing lists, hopefully we can get this figured out. |
|
|
Travis DesellVolunteer moderator Project administrator Project developer Project scientist Send message
Joined: 16 Jan 12 Posts: 1813 Combined Credit: 23,514,257 DNA@Home: 293,563 SubsetSum@Home: 349,212 Wildlife@Home: 22,871,482 Wildlife@Home Watched: 212,926s Wildlife@Home Events: 51 Climate Tweets: 26 Images Observed: 774
 |
isn't the granded credit a litle bit high ? :)
I also noticed that the v0.13 are about 80 percent faster than the previous versions but paying about the same credits.
EDIT: Should have said EIGHT TIMES faster.
Yes, very high granted credit for CPU work.
Dropped the credit awarded by a factor of 5. Probably still too high but I'll leave it at there for awhile given our alpha status and hopefully it will help attract some more crunchers. :) |
|
|
JumpinJohnny Send message
Joined: 24 Sep 13 Posts: 237 Combined Credit: 10,275,610 DNA@Home: 192,548 SubsetSum@Home: 201,740 Wildlife@Home: 9,881,323 Wildlife@Home Watched: 55,997,833s Wildlife@Home Events: 15,584 Climate Tweets: 389 Images Observed: 351
 |
The WU times increased by a factor of 5 ... the credits seem about the same.
--and--
as I noted before:
The only machines I see failing for everyone are windoz xp using AMD cpu |
|
|
Travis DesellVolunteer moderator Project administrator Project developer Project scientist Send message
Joined: 16 Jan 12 Posts: 1813 Combined Credit: 23,514,257 DNA@Home: 293,563 SubsetSum@Home: 349,212 Wildlife@Home: 22,871,482 Wildlife@Home Watched: 212,926s Wildlife@Home Events: 51 Climate Tweets: 26 Images Observed: 774
 |
The WU times increased by a factor of 5 ... the credits seem about the same.
--and--
as I noted before:
The only machines I see failing for everyone are windoz xp using AMD cpu
I bumped up the number of epochs the current WUs are training for from 100 to 200. Initially they were set at 50. The results are coming in good but it looks like the neural networks could do even better with more time to train. |
|
|