Skip to content

post.xcat_restructure

penguhyang edited this page Mar 8, 2016 · 38 revisions

The mini-design of post.xcat restructure

Background

As the original post.xcat file is not easy debug. We should identify some critical error and plain error. When error happens, we should record the detail information in log files on MN and the node. Also as there is no big difference between post.ubuntu and post.xcat scripts. So we will merge the post.ubuntu into post.xcat to make sure the post.xcat is consistent for redhat sles and ubuntu.

This mini-design support the redhat6.7 redhat7 sles11 sles12 ubuntu14 ubuntu15 and the etc...

For redhat On the node, with xcatdebugmode off, only the critical error will output(/var/log/xcat/xcat.log) if happens; with xcatdebugmode on, the plain error and critical error will output if happens and the running process will output too. On the MN, the error msg will record in the folder /var/log/xcat/

For sles On the node, with xcatdebugmode off, the critical error will output(/var/log/xcat/xcat.log) if happens; with xcatdebugmode on, the plain error and critical error will output if happens; the running process will output whether debugmode is on or off. On the MN, the error msg will record in the file /var/log/messages

For ubuntu On the node, with xcatdebugmode off, only the critical error will output(/var/log/xcat/xcat.log) if happens; with xcatdebugmode on, the plain error and critical error will output if happens and the running process will output too. On the MN, the error msg will record in the folder /var/log/xcat/

critical error

Solution: write this error information to the node(/var/log/xcat/xcat.log) and MN(/var/log/xcat/), halt the system. And update the node status to failed on MN about this node.

1. openssl is not installed on the system
2. download the postscripts failure
    We use wget command to download the postscripts from the http://$i$INSTALLDIR/postscripts/ on MN, it maybe failure for a serial reasons.
    1) Without wget command
    2) The network is unreachable
3. getpostscript.awk not exist
    First we try to download the mypostscript.$NODE file from the MN, we will rename it to mypostscript if MN have this file. If MN don't have this file, we will try to create mypostscript file using getpostscript.awk. If the getpostscript.awk file is not in the /xcatpost folder, then the error happens.
4. create the mypostscript failure
    The mypostscript file is used to generate the mypostscript.post and other files. If this file can't generate with these two methods, then the error happens. 

plain error

Solution: write this error information to the node(/var/log/xcat/xcat.log) and MN(/var/log/xcat), but not halt the system.

1. download the precreate mypostscript file failure
2. create the mypostscript.post file failure
3. create the xcatpostinit1 file failure
4. create the xcatinstallpost file failure
5. create the xcatdsklspost file failure

Code Logic and Process

  1. Export environment variable information, such as MASTER_IP, NODESTATUS, TFTPDIR and etc..
  2. Include the library of the xCAT to use some functions.
  3. Set the value for the variable:INSTALLDIR, TFTPDIR if they haven't set.
  4. Sleep for a while, then download the postscripts from management node and write the related information in xcatinfo file.
  5. Before download postscripts form management node, exam whether the openssl and wget is installed or not, if not then the system should halt.
  6. Time to download postscripts, use wget command to download the postscripts from MN and exam whether the download is successful, if not then the system should halt.
  7. Fortunately the postscripts have been downloaded sucessfully, then we will create the mypostscript file. 7.1. First try to download the mypostscript.$NODE file, this file is created when set the precreatemypostscripts attribute to 1. If this file exists, rename this file to mypostscript. 7.2. If there is not mypostscript.$NODE file, then we should generate mypostscript file through getpostscript.awk. If the getpostscript.awk file not exist, then the system should halt. 7.3. We use a while loop to generate mypostscript with getpostscript.awk in case there is a failure.
  8. Use sed command to add run_ps before the commands in the mypostscript file. We output the run_ps subroutine and append the mypostscript file content to recreate mypostscript file. Unfortunately, this file can't be created, so the system will halt.
  9. Now we have the mypostscript file. It's time to use the mypostscript file to create the mypostscript.post file according sed command to delete the items between postscripts-start-here and postscripts-end-here
  10. Create the post init file(xcatpostinit1)
  11. Create the xcatinstallpost file
  12. Create the dskls post file(xcatdsklspost)
  13. Finally create the mypostscript file according sed command to delete the items between postbootscripts-start-here and postbootscripts-end-here
  14. update the node status using updateflag.awk

Planning Outputs

When xcatdebugmode is on, the log information will be saved.

1. The system will sleep for a while to get ready, the output will looks like.
sleep 16

2. Before download postscripts from the management node, exam whether the openssl is installed or not, if not the output will looks like.
/usr/bin/openssl does not exist, halt ...

3. Generate the xcatinfo file. Output:
/opt/xcat/xcatinfo generated

4. When download postscripts file from the management node
    1. Show this message as a reminder that we are going to download the postscripts
    trying to download postscripts from http://$MASTER_IP$INSTALLDIR/postscripts/
    2. If the system have no wget command, we can't download. Output:
    /usr/bin/wget does not exist, halt ...
    3. It's time to download the postscripts file from the management node.
        1. If the postscripts downloaded sucessfully, the output will looks like:
        postscripts downloaded successfully
        2. If we can't download the postscripts, the output will looks like:
        failed to download postscripts from http://$MASTER_IP$INSTALLDIR/postscripts/, halt ...

5. Now we generate the mypostscript file
    1. According the precreated mypostscript file
        1. Show this message as a reminder that we are going to download the precreated mypostscript file. 
        trying to download precreated mypostscript file http://$MASTER_IP$TFTPDIR/mypostscripts/mypostscript.$NODE
        2. If the  precreated mypostscript file download successfully, the output will looks like:
        precreated mypostscript downloaded successfully
    2. According the getpostscript.awk
        1. If we can't download the precreated mypostscript, then we will try to generate the getpostscript file using getpostscript.awk. Show this message as a reminder that we are going to generate it.
        failed to download precreated mypostscript, trying to generate with getpostscript.awk
        2. If the getpostscript.awk file don't exist, the output will looks like:
        /xcatpost/getpostscript.awk does not exist, halt ...
    3. If this file can't generate with these two methods, the output will looks like:
    generate mypostscript file failure, halt ...
    4. If this file generated successfully, output:
    generate mypostscript file successfully

6. Time to generate mypostscript.post
    1. If successfully generated, output:
    /xcatpost/mypostscript.post generated
    2. If failed to generate, output:
    failed to generate /xcatpost/mypostscript.post

7. Time to generate xcatpostinit1
    1. If successfully generated, output:
    /etc/init.d/xcatpostinit1 generated
    2. If failed to generate, output:
    failed to generate /etc/init.d/xcatpostinit1
    3. Enable the xcatpostinit1, output(for redhat and sles):
    service xcatpostinit1 enabled

8. Time to generate xcatinstallpost
    1. If successfully generated, output:
    /opt/xcat/xcatinstallpost generated
    2. If failed to generate, output:
    failed to generate /opt/xcat/xcatinstallpost

9. Time to generate xcatdsklspost
    1. If successfully generated, output:
    /opt/xcat/xcatdsklspost generated
    2. If failed to generate, output:
    failed to generate /opt/xcat/xcatdsklspost

10. Running mypostscript
    1. Output this information before running mypostscript:
    running mypostscript
    2. Output this information after running mypostscript:
    mypostscript returned

11. show this message as a reminder that grub has updated(for redhat and sles)
    /boot/grub/grub.conf updated

12. report the installation status
    finished node installation, reporting status...

News

History

  • Oct 22, 2010: xCAT 2.5 released.
  • Apr 30, 2010: xCAT 2.4 is released.
  • Oct 31, 2009: xCAT 2.3 released. xCAT's 10 year anniversary!
  • Apr 16, 2009: xCAT 2.2 released.
  • Oct 31, 2008: xCAT 2.1 released.
  • Sep 12, 2008: Support for xCAT 2 can now be purchased!
  • June 9, 2008: xCAT breaths life into (at the time) the fastest supercomputer on the planet
  • May 30, 2008: xCAT 2.0 for Linux officially released!
  • Oct 31, 2007: IBM open sources xCAT 2.0 to allow collaboration among all of the xCAT users.
  • Oct 31, 1999: xCAT 1.0 is born!
    xCAT started out as a project in IBM developed by Egan Ford. It was quickly adopted by customers and IBM manufacturing sites to rapidly deploy clusters.
Clone this wiki locally