Suivent les notes spécifiques de l'installation de xCAT-1.1.7 sur le
cluster de GIREF. Cette installation a été exécutée en décembre
2002. Depuis, il y a eu une mise à jour à 1.2.0-pre2b, qui a impliqué
des petits changements aux fichiers de configuration de xCAT
spécifiques à l'installation locale.
27Nov2002:
----------
1) Partitioning:
/ 5G
/opt 5G
/boot 100M
/tmp 2G
/var 2G
/install 1.5G
swap 1G
/home 54G (remainder)
2) configure root: (passwd deleted)
configure galileo user (passwd deleted)
3) configure NIS:
domain galileo.net
server 10.1.1.101
4) activate NIS server (isn't activated by default)
NOTE: perhaps will replace with LDAP server
5) put galileo name in GIREF's DNS.
At this point, while trying to copy the rh73 CDROMs on the disk, it
became obvious that the partitioning layout is wrong
28Nov2002 - 10Dec2002:
----------------------
0) packages selection is important:
- anaconda
- dhcp
- expect
- http
- lynx
- pdksh
- php (for ganglia)
- tftpd, tftp
- ucd-snmp
- uucp
1) Partitioning:
/ 15G
/tmp 3G
/var 3G
/boot 200M
swap 2G
/home 48G (remainder)
2) name: n01.galileo.lan
3) configure root: (passwd ...deleted...)
configure galileo user: (passwd ...deleted...)
4) configure NIS:
domain galileo.lan
broadcast
NOTE: apparently, it doesn't work at all.
And I don't have the right communication
with the ether switch
5) transfer rh73 CDROMs from merlin to n01
6) transfert rh73 updates from ftp.crc.ca to n01.
Also did cleanup of updates.
Script what_to_upd.py helps selecting what to update.
Install and make boot new kernel.
7) install xcat (transfer from gandalf)
8) configure xcat:
- cp /opt/xcat/samples/xcat.{csh,sh} /etc/profile.d/
- echo "XCATROOT=/opt/xcat" >/etc/sysconfig/xcat
- create /opt/xcat/etc/site.tab
- create /etc/hosts
- create /opt/xcat/etc/nodelist.tab
- create /opt/xcat/etc/noderes.tab
- create /opt/xcat/etc/nodetype.tab
- create /opt/xcat/etc/nodehm.tab
- create /opt/xcat/etc/passwd.tab
- create /opt/xcat/etc/mac.tab using the cluster database (from merlin)
- create /opt/xcat/etc/mpa.tab
- create /opt/xcat/etc/mp.tab
- create /opt/xcat/etc/conserver.tab (whatever ??)
- create /opt/xcat/etc/conserver.cf (whatever ??)
- create /opt/xcat/etc/vnc.tab
NOTE: I get lost on MPN configuration and other resources:
apc.tab, cisco3500.tab, conserver.cf,
conserver.tab, emp.tab, rtel.tab, tty.tab
9) on n01, deactivate services: kudzu apmd autofs iptables ipchains rawdevices lpd rhnsd:
for s in $list;do /sbin/chkconfig --level 0123456 $s off;done
10) on n01, configure syslog for remote loging permission:
- in /etc/sysconfig/syslog set SYSLOG_OPTIONS="-m 0 -r"
11) activate and configure snmpd:
/sbin/chkconfig --level 345 snmptrapd on
cp /opt/xcat/samples/etc/snmptrapd.conf /etc/snmp
12) add some needed aliases:
echo "root: galileo" >> /etc/aliases
echo "alerts: galileo,ctibirna@giref.ulaval.ca" >> /etc/aliases
newaliases
13) activate, configure and test tftpd:
- edit /etc/xinetd.d/tftp file to:
- remove the root jail (replace -s /tftpboot by -s /)
- add logging (-v -v)
NOTE: the above is needed for the way xcat prepares the pxe boot
/sbin/chkconfig tftp on
mkdir /tftpboot
echo "Hi hi" >/tftpboot/test
tftp n01
get /tftpboot/test
quit
rm /tftpboot/test
14) configure NFS:
echo "/install *(ro,no_root_squash)" > /etc/exports
echo "/opt/xcat 10.1.1.0/24(ro,no_root_squash)" >> /etc/exports
echo "/usr/local 10.1.1.0/24(ro,no_root_squash)" >> /etc/exports
echo "/home 10.1.1.0/24(rw,no_root_squash)" >> /etc/exports
/sbin/chkconfig --level 345 nfs on
15) activate and configure NTP:
echo "restrict merlin.giref.ulaval.ca mask 255.255.255.0 nomodify notrap noquery"> /etc/ntp.conf
echo "server merlin.giref.ulaval.ca">> /etc/ntp.conf
echo "">> /etc/ntp.conf
echo "restrict 10.1.1.0 mask 255.255.255.0 notrust nomodify notrap">> /etc/ntp.conf
echo "restrict 10.1.2.0 mask 255.255.255.0 notrust nomodify notrap">> /etc/ntp.conf
echo "restrict 132.203.7.0 mask 255.255.255.0 notrust nomodify notrap">> /etc/ntp.conf
/sbin/chkconfig --level 345 ntpd on
16) checked the ssh config.
gensshkeys root
17) config DNS:
/sbin/chkconfig --level 345 named on
/opt/xcat/sbin/makedns
18) configure DHCP:
NOTE: dhcp package has to be installed
- in /etc/sysconfig/dhcpd, add DHCPDARGS="eth0", which makes sure that
only requests from the cluster get answered
/sbin/chkconfig --level 345 dhcpd on
/opt/xcat/sbin/makedhcp --new
/opt/xcat/sbin/makedhcp --allmac
18b) configure NIS server:
/sbin/chkconfig --level 345 ypserv on
/sbin/service ypserv start
/sbin/chkconfig --level 345 ypbind on
/sbin/service ypbind start
cd /var/yp; make
19) reboot n01!!
NOTE: the ether switch finally recognizes the node as 1000Mb
20) configured ether switch:
- new IP 10.1.1.200
- new alert1: to 10.1.1.101:25
- new name: ether
21) make first stage:
copy all CDs in /install/rh73
cd /opt/xcat/build/rh73
./applypatch; ./e1000patch; ./nofibre
cd /opt/xcat/netboot
./mknb --update
cd /opt/xcat/stage
./mkstage
NOTE: this creates some files in /tftpboot
22) fix the kickstart templates:
cd /opt/xcat/ks73 (check that the files you need are here and their content)
./mkks
23) prepare the postinstall dir:
mkdir /opt/xcat/post
mkdir /opt/xcat/post/updates
mkdir /opt/xcat/post/updates/rh73
get all updates from ftp.redhat.com or ftp.crc.ca
cd /opt/xcat/post
cp -vr gm-routes kernel rc.d rpm73 sync /install/post
24) prepare kernel:
cd /usr/src/linux-2.4
make mrproper
cp configs/*i686-smp* .config
vi Makefile (replace -10custom with -10smp)
make menuconfig
make dep
make modules &
sleep 40; kill %
25) compile asm driver (so that mpcli and sp tools work)
look for updates on ibm.com
rpm -Uvh ibmasm-src-redhat
rpm -Uvh /usr/local/ibmasm/ibmasm-1.06*
sp ReadLog (for test)
26) prepare the mpa
mpasetup
mpacheck
27) configure the ASMs:
nodeset compute stage3
- then reboot all nodes, with about 20s pause between each
10Dec2002:
----------
28) Install myrinet:
# - download gm-1.6.3_Linux from myri.com
# - create rpm for cluster using /opt/xcat/build/gm/gmmaker - failed
- download gm-1.5.2.1 from myri.com (according to advice on xcat-user ml)
- create rpm for cluster using /opt/xcat/build/gm/gmmaker
- install gm rpm on n01
- copy gm rpm to /install/post/kernel
11Dec2002:
----------
29) Install ganglia:
- install gmond*.rpm, gmetad*.rpm, webfront*.rpm on n01
- copy gmond*.rpm in /install/post/rpm73/
- change /etc/gmond.conf to put the name and the owner of the cluster
- change /etc/gmetad.conf to indicate source "localhost"
/sbin/chkconfig httpd --level 345 on
/sbin/service httpd start
12Dec2002:
----------
SUPPLEM: In order to use distcc, I had to modif
the kickstart file, to add the "@ Software Development" package.
13Dec2002:
----------
SUPPLEM: Need to install:
- ICC
- mpich-gm
- petsc
- Atlas
157Dec2002:
-----------
Second install.
NOTE: new MAC for RSA: 00:09:6B:0A:24:D2
|