Cluster Management in VisLab/Archive

From ISRWiki
Jump to navigation Jump to search

Some obsolete information about Cluster Management in VisLab is kept here, for the sake of history.

Cortex and the uname command

Prior to 2009-05-12, the command

  uname -n

(which normally outputs the unique, alphabetic name of the current machine you are logged in) was not working as we wanted on Cortex. This happened because all the machines in the cluster share the same network disk. In particular, the file /etc/hostname is also shared among all of them; it contains the string "source", which is the real result of "/bin/uname -n" but it is of little information for our "yarp run" script, which sometimes needs to answer the question "where am I?", that is, on which server it is running.

Forcing uname to give the desired output

Since we want each machine of the cluster to provide its unique name (i.e., cortex1 ... cortex5) as the output of "uname -n", we can use the following command to do the job:

  ifconfig eth0 | grep "inet addr" | awk '{print $2}' | awk -F: '{print $2}' | awk -F. '{print $4}'

Basically, this command extracts the last byte from the current machine IP address (the number after the last dot). For example, if you type the command on cortex3 you will get "3" (the last byte of 10.10.1.3) as output.

The idea is to make this custom command available as a system-wide behaviour in the cluster, by setting it as an alias called "uname -n" (to be more accurate, the alias is just called "uname", with every additional parameter to be ignored). Note that, by using this trick, we overwrite the output of the real uname command on the machine (should you need it, you can still call it with "/bin/uname").

To enforce the alias for everyone, we added the following lines in user icub's ~/.bashrc:

  shopt -s expand_aliases
  alias uname="echo -n cortex; ifconfig eth0 | grep 'inet addr' | awk '{print \$2}' | awk -F: '{print \$2}' | awk -F. '{print \$4}'; echo > /dev/null"

The shopt line makes aliases valid in non-interactive shells. The second line, containing the actual "uname" alias, contains three commands separated by semicolons:

  • "echo -n cortex" prints the word "cortex" without a trailing newline;
  • "ifconfig eth0 | grep 'inet addr' | awk '{print \$2}' | awk -F: '{print \$2}' | awk -F. '{print \$4}'" extracts the last byte of the current machine's IP address;
  • "echo > /dev/null" ignores everything that is typed after "uname", such as "-n".

Old workaround

Before 2009-05-12 we were customizing $ICUB_DIR/scripts/yarprun.sh and saving our own version as yarprunVislab.sh. Since "uname -n" outputted "source" on all of the 5 Cortex computers, we changed the line

 ID=/`uname -n`

to:

 ID=/`uname -n`
 if [ $ID == "/source" ];
  then
    ID=/cortex` ifconfig eth0 | grep "inet addr" | awk '{print $2}' | awk -F: '{print $2}' | awk -F. '{print $4}'`;
 fi;

of course, we had to make a copy of $ICUB_DIR/scripts/icub-cluster.py to $ICUB_DIR/scripts/icub-clusterVislab.py and change all the invocation of yarprun.sh to yarprunVislab.sh in the latter.

Besides, we had to copy the yarprunVislab.sh on all of the machines (Chico2, pc104). To copy it to pc104, we actually had to copy it to icubsrv, in the correct location (see pc104):

 scp yarprunVislab.sh 10.10.1.51:/exports/code-pc104/iCub/scripts/

This method, while cumbersome, was working: from Chico2 we could control "yarp run" on Chico2, pc104, Cortex1..5. We did not bother to have it running on Cortex6 or icubsrv, though, as those computers are very rarely used in demos.