Introduction

Les documents exécutifs pour l'installation de xCAT sont:

  • xCAT-mini-HOWTO.html;
  • nodeinstall-HOWTO.html.

Ces documents se trouvent dans le répertoire doc de xCAT, qui est archivé dans le paquet xcat-dist-doc-[version].tgz.

Il est important de vérifier ces deux documents lors de l'installation d'une nouvelle version de xCAT, car chaque nouvelle version peut avoir des particularités par rapport aux versions précédentes,

Installation de xCAT

Suivent les notes spécifiques de l'installation de xCAT-1.1.7 sur le cluster de GIREF. Cette installation a été exécutée en décembre 2002. Depuis, il y a eu une mise à jour à 1.2.0-pre2b, qui a impliqué des petits changements aux fichiers de configuration de xCAT spécifiques à l'installation locale.

27Nov2002:
----------
1) Partitioning:
/		5G
/opt		5G
/boot		100M
/tmp		2G
/var		2G
/install	1.5G
swap		1G
/home		54G (remainder)

2) configure root: (passwd deleted)
   configure galileo user (passwd deleted)

3) configure NIS:
	domain galileo.net
	server 10.1.1.101

4) activate NIS server (isn't activated by default)
	NOTE: perhaps will replace with LDAP server

5) put galileo name in GIREF's DNS.

At this point, while trying to copy the rh73 CDROMs on the disk, it
became obvious that the partitioning layout is wrong


28Nov2002 - 10Dec2002:
----------------------
0) packages selection is important:
	- anaconda
	- dhcp
	- expect
	- http
	- lynx
	- pdksh
	- php (for ganglia)
	- tftpd, tftp
	- ucd-snmp
	- uucp

1) Partitioning:
/		15G
/tmp		 3G
/var		 3G
/boot		200M
swap		 2G
/home		48G (remainder)

2) name: n01.galileo.lan

3) configure root: (passwd ...deleted...)
   configure galileo user: (passwd ...deleted...)

4) configure NIS:
	domain galileo.lan
	broadcast 
   NOTE: apparently, it doesn't work at all.
	And I don't have the right communication
	with the ether switch

5) transfer rh73 CDROMs from merlin to n01

6) transfert rh73 updates from ftp.crc.ca to n01.
	Also did cleanup of updates.
	Script what_to_upd.py helps selecting what to update.
	Install and make boot new kernel.

7) install xcat (transfer from gandalf)

8) configure xcat:
	- cp /opt/xcat/samples/xcat.{csh,sh} /etc/profile.d/
	- echo "XCATROOT=/opt/xcat" >/etc/sysconfig/xcat
	- create /opt/xcat/etc/site.tab
	- create /etc/hosts
	- create /opt/xcat/etc/nodelist.tab
	- create /opt/xcat/etc/noderes.tab
	- create /opt/xcat/etc/nodetype.tab
	- create /opt/xcat/etc/nodehm.tab
	- create /opt/xcat/etc/passwd.tab
	- create /opt/xcat/etc/mac.tab using the cluster database (from merlin)
	- create /opt/xcat/etc/mpa.tab
	- create /opt/xcat/etc/mp.tab
	- create /opt/xcat/etc/conserver.tab (whatever ??)
	- create /opt/xcat/etc/conserver.cf (whatever ??)
	- create /opt/xcat/etc/vnc.tab

   NOTE: I get lost on MPN configuration and other resources:
	apc.tab, cisco3500.tab, conserver.cf,
	conserver.tab, emp.tab, rtel.tab, tty.tab

9) on n01, deactivate services: kudzu apmd autofs iptables ipchains rawdevices lpd rhnsd:
	for s in $list;do /sbin/chkconfig --level 0123456 $s off;done

10) on n01, configure syslog for remote loging permission:
	- in /etc/sysconfig/syslog set SYSLOG_OPTIONS="-m 0 -r"

11) activate and configure snmpd:
	/sbin/chkconfig --level 345 snmptrapd on
	cp /opt/xcat/samples/etc/snmptrapd.conf /etc/snmp

12) add some needed aliases:
	echo "root: galileo" >> /etc/aliases
	echo "alerts: galileo,ctibirna@giref.ulaval.ca" >> /etc/aliases
	newaliases

13) activate, configure and test tftpd:
	- edit /etc/xinetd.d/tftp file to:
		- remove the root jail (replace -s /tftpboot by -s /)
		- add logging (-v -v)
	NOTE: the above is needed for the way xcat prepares the pxe boot

	/sbin/chkconfig tftp on
	mkdir /tftpboot
	echo "Hi hi" >/tftpboot/test
	tftp n01
		get /tftpboot/test
		quit
	rm /tftpboot/test
	

14) configure NFS:
	echo "/install *(ro,no_root_squash)" > /etc/exports
	echo "/opt/xcat 10.1.1.0/24(ro,no_root_squash)" >> /etc/exports
	echo "/usr/local 10.1.1.0/24(ro,no_root_squash)" >> /etc/exports
	echo "/home 10.1.1.0/24(rw,no_root_squash)" >> /etc/exports
	/sbin/chkconfig --level 345 nfs on

15) activate and configure NTP:
	echo "restrict merlin.giref.ulaval.ca mask 255.255.255.0 nomodify notrap noquery"> /etc/ntp.conf
	echo "server merlin.giref.ulaval.ca">> /etc/ntp.conf
	echo "">> /etc/ntp.conf
	echo "restrict 10.1.1.0 mask 255.255.255.0 notrust nomodify notrap">> /etc/ntp.conf
	echo "restrict 10.1.2.0 mask 255.255.255.0 notrust nomodify notrap">> /etc/ntp.conf
	echo "restrict 132.203.7.0 mask 255.255.255.0 notrust nomodify notrap">> /etc/ntp.conf
	/sbin/chkconfig --level 345 ntpd on

16) checked the ssh config.
	gensshkeys root

17) config DNS:
	/sbin/chkconfig --level 345 named on
	/opt/xcat/sbin/makedns

18) configure DHCP:
	NOTE: dhcp package has to be installed
	- in /etc/sysconfig/dhcpd, add DHCPDARGS="eth0", which makes sure that
		only requests from the cluster get answered
	/sbin/chkconfig --level 345 dhcpd on
	/opt/xcat/sbin/makedhcp --new
	/opt/xcat/sbin/makedhcp --allmac

18b) configure NIS server:
	/sbin/chkconfig --level 345 ypserv on
	/sbin/service ypserv start

	/sbin/chkconfig --level 345 ypbind on
	/sbin/service ypbind start

	cd /var/yp; make

	
19) reboot n01!!
	NOTE: the ether switch finally recognizes the node as 1000Mb

20) configured ether switch:
	- new IP 10.1.1.200
	- new alert1: to 10.1.1.101:25
	- new name: ether

21) make first stage:
	copy all CDs in /install/rh73
	cd /opt/xcat/build/rh73
	./applypatch; ./e1000patch; ./nofibre
	cd /opt/xcat/netboot
	./mknb --update
	cd /opt/xcat/stage
	./mkstage
	NOTE: this creates some files in /tftpboot

22) fix the kickstart templates:
	cd /opt/xcat/ks73 (check that the files you need are here and their content)
	./mkks

23) prepare the postinstall dir:
	mkdir /opt/xcat/post
	mkdir /opt/xcat/post/updates
	mkdir /opt/xcat/post/updates/rh73
	get all updates from ftp.redhat.com or ftp.crc.ca
	cd /opt/xcat/post
	cp -vr gm-routes kernel rc.d rpm73 sync /install/post

24) prepare kernel:
	cd /usr/src/linux-2.4
	make mrproper
	cp configs/*i686-smp* .config
	vi Makefile (replace -10custom with -10smp)
	make menuconfig
	make dep
	make modules &
	sleep 40; kill %

25) compile asm driver (so that mpcli and sp tools work)
	look for updates on ibm.com
	rpm -Uvh ibmasm-src-redhat
	rpm -Uvh /usr/local/ibmasm/ibmasm-1.06*
	sp ReadLog (for test)

26) prepare the mpa
	mpasetup
	mpacheck

27) configure the ASMs:
	nodeset compute stage3
	- then reboot all nodes, with about 20s pause between each

10Dec2002:
----------
28) Install myrinet:
	# - download gm-1.6.3_Linux from myri.com
	# - create rpm for cluster using /opt/xcat/build/gm/gmmaker - failed
	- download gm-1.5.2.1 from myri.com (according to advice on xcat-user ml)
	- create rpm for cluster using /opt/xcat/build/gm/gmmaker
	- install gm rpm on n01
	- copy gm rpm to /install/post/kernel

11Dec2002:
----------
29) Install ganglia:
	- install gmond*.rpm, gmetad*.rpm, webfront*.rpm on n01
	- copy gmond*.rpm in /install/post/rpm73/
	- change /etc/gmond.conf to put the name and the owner of the cluster
	- change /etc/gmetad.conf to indicate source "localhost"
	/sbin/chkconfig httpd --level 345 on
	/sbin/service httpd start

12Dec2002:
----------
SUPPLEM: In order to use distcc, I had to modif
	 the kickstart file, to add the "@ Software Development" package.


13Dec2002:
----------
SUPPLEM: Need to install: 
	- ICC
	- mpich-gm
	- petsc
	- Atlas


157Dec2002:
-----------
Second install.
	NOTE: new MAC for RSA: 00:09:6B:0A:24:D2