[Radiance-general] Research tools: who what which how?

Georg Mischler schorsch at schorsch.com
Fri Apr 15 17:00:33 PDT 2016


Somewhat off-topic, but...

Out of curiosity, I played around with those examples a bit.
I had to slightly modify all of them, first to get Perl and Ruby to run
at all, and then to make sure all of them produce tabs instead of 
spaces.
Btw: The Perl version adds an extra tab at the end of each line.


perl -anF'\t|\n' -e'$n=@F-1 if \!$n;for(0..$n){push@{$$m[$_]},$F[$_]} 
END{print map{join"\t",@$_,"\n"}@$m}'

python -c "import sys; print('\n'.join('\t'.join(c) for c in 
zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))"

ruby -e 'puts readlines.map(&:split).transpose.map(){|x|x*"\t"}'


The result shows (as expected) that this kind of comparison is utterly
meaningless. There are simply too many factors out of your control
that can influence the result. Between Perl and Python, the
executables that happen to be installed on my box pretty much get the
opposite result than you reported. Maybe my Python is in better shape
than yours, because it gets more exercise... ;)

I'm actually a bit surprised by the bad performance of Perl. One of
the reasons may be the suboptimal algorithm, which explicitly loops
through the data in the interpreter. The other two use a functional
approach, where the heavy lifting is handled in C. Ruby has a slower
startup time, but its operating performance is much closer to Python.
I also didn't expect that much of a speed-up with Python 3 over 2.

The Python version is easy to understand, once you know that the
builtin function zip() is equivalent to Rubys transpose(). The rest
is IO and string manipulation. You also may not be familiar with
generator expressions. Very powerful stuff!


150 Kb  (160 x 160 matrix)

0.2  perl 5
0.1  python 2.7
0.02 python 3.4
0.4  ruby


6 Mb  (1000 x 1000 matrix)

1.02  perl 5
0.36  python 2.7
0.22  python 3.4
0.48  ruby 2.1


24 Mb  (2000 x 2000 matrix)

4.11  perl 5
1.74  python 2.7
1.13  python 3.4
2.41  ruby 2.1


Cheers
-schorsch


Am 2016-04-13 19:01, schrieb Christopher Rush:
> For a trivial example of transposing a matrix tab delimited data
> file... which of these is most easily understood, reproducible, and
> fastest. Also note, I couldn't find a working Ruby example...
> 
> 
> perl -anF'\t|\n' -e'$n=@F-1if!$n;for(0..$n){push@{$$m[$_]},$F[$_]}
> END{print map{join"\t",@$_,"\n"}@$m}'
> ^^ this takes a quarter of a second on 148K file, and doesn't look
> particularly clean but I could probably figure it out by researching
> the documentation
> 
> python -c "import sys; print('\n'.join(' '.join(c) for c in
> zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))"
> ^^ this takes half a second on 148K file but looks a bit baffling to me
> 
> awk '{for (f=1;f<=NF;f++) col[f] = col[f]":"$f} END {for
> (f=1;f<=NF;f++) print col[f]}' | tr ':' ' '
> ^^ this takes 2 seconds on 148K file which isn't very good, but
> probably the easiest to interpret by eye, in my opinion
> 
> ruby -e 'puts readlines.map(&:split).transpose.map{|x|x*" "}'
> ^^ I couldn't make this or any other Ruby examples I found online
> work, which might mean I have a basic misunderstand of how to type
> this in a single line workflow on the terminal. And all the examples I
> could find look like a black box to me because apparently there are
> functions built in to do this task.
> 
> echo '' >tmp1;  cat m.txt |while read l ; do paste tmp1 <(echo $l | tr
> -s ' ' \\n)>tmp2; cp tmp2 tmp1; done
> ^^ this series of commands is basically disqualified because it takes
> far too long
> 
> 
> Taken from these threads:
> http://stackoverflow.com/questions/1729824/transpose-a-file-in-bash
> http://stackoverflow.com/questions/3249508/transpose-in-perl

-- 
Georg Mischler  --  simulations developer  --  schorsch at schorsch com
+schorsch.com+  --  lighting design tools  --  http://www.schorsch.com/




More information about the Radiance-general mailing list