[Radiance-general] Error: rtcontrib: fatal - incomplete ray value from rtrace

Greg Ward gregoryjward at gmail.com
Thu May 19 17:15:54 PDT 2011


Hi David,

Andrew McNeil recently found a problem with rtcontrib running in 64-bit mode on a 32-bit operating system, where very complex ray trees would cause the process to hang.  It sounds a bit like what you're experiencing, so if you like you can download the latest HEAD release from www.radiance-online.org, and either recompile everything or (simpler) grab src/util/rtcontrib.c from the unpacked HEAD and substitute that for the copy in your existing source tree, recompiling just rtcontrib.  The code for rtrace hasn't changed in any important way, and this should fix it if your problem is the same.

Regarding signal 9, some Unix implementations use this uncatchable signal to terminate processes that have exceeded their resource limits.  Other systems send a specialized signal saying what went wrong, and rtrace would catch this and report the problem (e.g., "file size limit exceeded").  If your system is just killing the process with signal 9, there's no way to really know what's going wrong.  All you can do is check that your resource limits are ample to your task.

By the way, Andy only ran into this error when he was using many bounces and some rather high parameters in rtcontrib, which caused the ray trees to occasionally exceed 2 GBytes in size -- from a single ray!  It's difficult to say if this is happening in your case without seeing your parameter settings, but it's worth trying to patch rtcontrib in any case as a first step.

Best,
-Greg

> From: David Appelfeld <d.appelfeld at gmail.com>
> Date: May 19, 2011 4:41:33 PM PDT
> 
> Hello Greg, 
> I am suspecting the machine little bit, because it stop after different time, sometime it got calculation for 4points sometime for 10, etc.
> 
> In the beginning I had only one script for whole grid and run it on 8cores on one node which was the maximum. But there was probably not enough memory and it gave me error message that there is not enough allocated memory and it went to waiting process but it never restarted. It stopped even though I could still see that the the job which I submited to run the script was active. Unfortunately I don't remember the whole error message correctly.
> Then I have split the calculation up to more scripts with less cores used to save some time and to avoid previous problem. I have run several scripts without a problem. Then I have got this message and I have deleted the job and resubmitted it again without doing any  changes. Some of the jobs run successfully but some just gave this error message and stopped. Again I could see that the job was still active but nothing was coming out from calculations. 
> As I am writing this I am thinking that the server is getting overloaded and pause the calculation and then rtrace can not finish the process. 
> I don't know if this make any sense, but I can try to submit it again and hope that it will finish. 
> 
> Could you please say more about signal 9 and if I can do anything with it. 
> 
> Thank you
> David



More information about the Radiance-general mailing list