[Radiance-general] sharing indirect values for parallel
processing?
Jack de Valpine
jedev at visarc.com
Fri Feb 4 17:12:53 CET 2005
Hi all interested,
Cross posting to dev as this is probably a more appropriate space for
this conversation.
OK, let's see who I can irritate the most...
As a refresher, there have been numerous threads on this topic on
Radiance Dev (in no order other than my searching through my mail):
* Before we give up on lock files...
* multiprocessor systems, Radiance and you
* as well as others if you want to delve in to the depths of the pre
radiance-online mailing list archives
In general, I recall that there are a couple of directions to go:
* network filesystem locking - such as NFS or Samba, where we are
dependant on either the locking mechanism actually working (eg
NFS) or the filesystem (Samba) being installed
* client/server - probably more hairy from a implementation
standpoint as well as from a porting point of view. Although,
perhaps guaranteeing the best performance for selected os'?
Not to rehash old stuff, but could one of the more knowledgeable
developers (Greg, Georg, Peter, Carsten...?) give us a refresher on what
the options are and perhaps some idea of the time that would be needed
to implement a workings solution? Locking is a recurring problem. It
would be nice to figure a consensus solution (ie what direction to
pursue) and then a strategy for implementation (ie resources, person(s),
money...), so perhaps we as a community could figure out how to move
this forward (if as always there is enough interest).
I must admit to having run into this wall on a variety of occasions. NFS
(v3) on linux is "supposed" to lock correctly (sync mode on the
mount/fstab), as a test there is a test suite from Sun
(www.connectathon.org) that is supposed to test the nfs server. I
remember running this test suite in the past and getting positive
results on linux. Nevertheless, I have found it extremely difficult to
get working results with a networked image render (eg rpiece distributed
over multiple cpu nodes). Either there end up being problems with
ambient values between image cells and/or with locking of the syncfile
for distributing image cells to different machines. I even implemented a
client/server in perl at one point to try to fight this problem with the
syncfile (with partial success as I recall and perhaps more if my time
would allow). Not to cause offense... But is it possible that the
locking code in Radiance needs to be checked itself?
In brief follow-up to Lar's comments about openmosix/mosix. As
understand it the msf filesystem, is supposed to implement locking
correctly. There are also other more sophisticated network filesystems
such as GFS (Systina, I think and commercial), OpenGFS and many others.
However these all require separate special install and perhaps
modification of the kernel or installation of a modified kernel, and
there is serious question as too whether these are portable to other
os's such as MS version whatever (as the main offender of portability).
Note also that I tried openmosix at one point. One problem that I found
is that if you start multiple large (eg memory size) jobs on the master
node then this can lead to excessive paging and since the master node
tries to start the jobs at the same time into its own memory space prior
to migrating them off to other nodes in the cluster. So if your job
requires 1 Gig of memory to hold the scene and you want to run 10 jobs
on 5 dual processor nodes with each node having 2 Gig of memory, if you
start all the jobs on one node then you are hosed. If you start them on
individual nodes, then you should be using a different clustering
solution since this completely negates the value of the migration
algorithms in openmosix. Now it has been a while since I used OpenMosix,
so perhaps things are different...
Note also that named pipes do not work (at least back in mid 2003, you
can see my brief inquiry to the openmosix list and Moshe Bar's even
briefer reply back in April of 2003) on OpenMosix. So if you want to do
memory sharing on multiprocessor nodes you have to roll your own batch
job distributor.
-Jack de Valpine
Georg Mischler wrote:
>Lars O. Grobe wrote:
>
>
>
>>>The most straightforward solution to our problem would probably
>>>be to use lock files, as Greg suggested in earlier discussions.
>>>Unfortunately nobody has found the time yet to actually implement
>>>that. If anyone wants to volunteer, please move the discussion of
>>>your proposal to the dev-list.
>>>
>>>
>>Hi,
>>
>>as I won't be able to help on the implementation, I won't bring this to
>>the dev-list for now ;-) However, I guess the only needed feature of
>>the shared fs used is a working byte range locking, right? So I will
>>find out if the fs provided by openmosix (mfs) has this feature, which
>>would make a set of mosix nodes a great radiance installation.
>>
>>
>
>Ambient files are only written at the end, so file locking
>and byte range locking have the same effect.
>We also need a solution that works on all platforms and on
>all file systems. Requiring third party software just to get
>reliable file sharing is clearly out of the question.
>
>
>-schorsch
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://radiance-online.org/pipermail/radiance-general/attachments/20050204/11a788c3/attachment.htm
More information about the Radiance-general
mailing list