// SLES 12: Frequently Asked Questions 

SLES12:


SLES12 is a major OS upgrade of SLES. SLES12 provides expanded functionality, software and modules than SLES11 and a newer version of the SLURM job manager used in Discover. NCCS is actively moving Discover compute nodes to SLES12 to provide a cleaner and newer compute environment for all users.

The same $HOME and $NOBACKUP filesystems are available in SLES12 as they were in SLES11. These are global filesystems and they will be maintained in the same place moving forward.

See the Using Cron on Discover documentation.

Under SLES12, the OpenSSH configuration is different than SLES11. It uses a set of more secure cipher suites by default and causes issues in pulling the latest versions of code using version control. To fix the issue, add the following to your .ssh/config file in your home directory:

Host *
HostKeyAlgorithms ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519,ssh-rsa,ssh-dss

If you are still having issues, you may also need to back up your .ssh/known_hosts file and have it regenerated when connecting to servers.

The NCCS recommends moving to SLES12 exclusively as soon as possible to avoid working in both SLES11 and SLES12 environments simultaneously and having to repeatedly adjust your .ssh/config and/or .ssh/known_hosts files.

Similarly with SLES11, SLES12 supports X11 forwarding for applications that require usage of an X session on Discover such as Matlab. To connect to Discover and start an X session, make the following configuration changes on your local system:

For Windows and PuTTY users, ensure that a program such as xrdp is installed and your Discover PuTTY session has X11 forwarding enabled. To enable X11 forwarding in PuTTY:

  1. In PuTTY, navigate to Connection -> SSH
  2. Check the box next to X11 Forwarding
  3. Set the X display location as :0.0
  4. Connect to Discover. It is recommended that you also create a saved session in PuTTY if you need X11 forwarding.

For Mac users, ensure that XQuartz is installed on your system and set the following variable on your system, either at the command line or in your shell's configuration file (such as .bashrc for BASH users):

export DISPLAY=:0

To enable X11 forwarding in your .ssh/config file, add the following under your entry for Discover on your client system (MacOS/Linux) and *not* on Discover:

host discover.nccs.nasa.gov dirac.nccs.nasa.gov
User <USERID>
LogLevel Quiet
ProxyCommand ssh -l <USERID> login.nccs.nasa.gov direct %h
Protocol 2
ForwardX11 yes
Host login.nccs.nasa.gov login
PKCS11Provider /usr/lib/ssh-keychain.dylib

The second stanza allows for PIV based authentication into login.nccs. You would then get prompted for your NCCS LDAP password as normal.

Connect to Discover using the following command:

ssh -Y login.nccs.nasa.gov

If you have a bastion node config in place, you can use:

ssh -Y discover.nccs.nasa.gov

To connect to SLES12 from within Discover, use the following command:

ssh -Y discover-sles12

For Linux users, use the same guidance as Mac users but do not install XQuartz.

For additional guidance, see our documentation on configuring bastion nodes.

A quick way to tell what processor your jobs are running on is through the /proc/cpuinfo file. The following command will give output that is useful in determining what processor jobs are using:

cat /proc/cpuinfo | grep "model name" | sort | uniq -c
48 model name                :Intel(R) Xeon(R) Gold 6240R CPU @ 2.4GHz

In the above example, using the "uniq -c" command to count the unique output lines allows us to see that there are 48 cores of Intel 6240R processors.

To tell if the node being used is SLES11 or SLES12, you can check the contents of the /etc/SuSE-release file.

For SLES11, the file will have the following content:

cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3

For SLES12, the file will have the following content:

cat /etc/SuSE-release
SUSE Linux Enterprise Server 12 (x86_64)
VERSION = 12
PATCHLEVEL = 3

Additionally, the /etc/os-release file only exists in SLES12. For scripting, jobs can check if the file exists to make sure that they are running in SLES12 instead of SLES11.



Slurm 19:


The NCCS is actively moving Discover compute resources to the SLES12 realm. Every week, NCCS plans to transition more compute nodes to SLES12, completing the transition of all nodes within the next two months. Users should transition their work to the SLES12/Slurm19 realm as soon as possible, and once your work is running there , you should continue running exclusively in that realm.

During the transition to SLES12, there will be limited resources available in the SLES12/Slurm19 realm. As more user work migrates to the SLES12/Slurm19 realm, the NCCS will transition more resources into that realm. However, there may be times where that realm is busy and you may have to wait for your job(s) to be scheduled.

The SLES12 Slurm partitions (queues), while named the same, are separate from the SLES11 Slurm partitions. You will only see your work in one realm when you issue the squeue command, based on which login node you are on.

Connect to Discover as you normally would, and then “ssh discover-sles12”. From these login nodes, if you issue an "squeue"; command, you will only see the jobs that are running or pending in the SLES12/Slurm19 realm (and none of your work that may still be running or pending in the SLES11 realm).

Having completed the conversion from PBS to SchedMD's Slurm in Oct 2013, NCCS is dropping support for the processing of inline #PBS directives. This is important for job scripts that are written to execute across both PBS and Slurm clusters: removal of interpretation of #PBS directives allows and ensures deterministic behavior for inline option parsing.

Yes, but the NCCS strongly encourages the use of the native Slurm commands, because qsub and other q* commands are community-contributed freeware wrapper scripts that are not covered by SchedMD's support contract

Please explore the SchedMD Rosetta Stone for help translating many heavily used PBS (and other resource manager) commands to Slurm commands and syntax, or contact NCCS User Support for additional assistance

You are most likely asking this question because you have been using qsub or the #PBS inline directives.
PBS (qsub) default behavior is to separate job output and error listings into two distinct files. Slurm (sbatch) instead directs stdout and stderr to the same file, which defaults to slurm-<jobid>.out in the directory from which the job was submitted. To obtain the PBS-default-like behavior, specify:

#SBATCH -o /path/to/output_file> #SBATCH -e /path/to/error_file>

You must use the sbatch command and not the qsub wrapper for this to work.

Please convert to native Slurm sbatch --wait, or sbatch --dependency, as appropriate to your use case. NB: using qsub -W block=true run from inside another batch job causes your account to be charged for the original job's resources for the entire, indeterminate amount of time it takes for your next job to be scheduled and executed. The NCCS recommends using job dependencies to ensure that your workflow executes in the proper order and that it is not subject to the non-deterministic nature of waiting for a job launched from within another job to get scheduled and to complete.

With the limited number of SLES12 nodes available at this time, all Slurm QoS priorities have been made equal while users work on converting from SLES11 to SLES12. Once more users have converted to SLES12, we will re-implement the same QoS priorities as they existed under SLES11.



Software/Modules:


SLES12 uses a different (newer) lmod based modules environment system to manage software modules. This lmod system allows for dependencies such that modules will only be shown as available if the necessary dependencies have been loaded. Users should still use the commands “module avail” and “module load”, etc. as before.

Using the new module environment, the module lists are not always automatically refreshed. Users may need to clear their lmod cache by running the following command:

rm /home/$USER/.lmod.d/.cache/spiderT.x86_64_Linux.lua*

This will remove the cached list and regenerate it once “module avail” is run again.

All MPI modules or software packages that have been compiled using specific compilers are only available once the underlying compiler is loaded. For example, when the Intel Compiler is loaded using "module load", running "module avail" will list the MPI modules and associated software available to be loaded. Similarly, if a particular piece of software has been compiled using GCC 7 on SLES12, the software will not appear until the GCC 7 module has been loaded.

Along with the deployment of SLES12, only the most recent and supported Intel compilers versions are available. In order to keep more up-to-date versions of software available, the NCCS plans to keep available only the latest two Intel compiler versions in line with Intel's support. Older versions of Intel compilers and other software are not compatible with SLES12 have not been installed and are considered discontinued within NCCS.

NOTE: While the lastest version of Intel v17 and v18 are installed, the NCCS recommends moving to Intel compilers v19 or v19.1. The older versions (v17 & v18) of the Intel compilers are not supported under SLES12 and will be removed at some point in the near future.

Under SLES12, a number of the module paths have changed. For example, in SLES11, to load GCC 7, the module path is listed as other/comp/gcc-7.3. For SLES12, the same module has a different path and is listed as comp/gcc/7.4.0. If some of the modules that are used daily are not available in SLES12, the module path has likely changed and will need to be modified in the user's shell profile.

Depending on different factors, the software has either not been moved over to SLES12 because of incompatibility or because the software is unsupported. The software may also be available as a module. Running "module avail" will show if the software is available as a module or not. If it is a piece of software you would like to have available, please contact the NCCS User Services Group by e-mail at support@nccs.nasa.gov.



Local Software:


The NCCS has installed many of the software packages that we believe users will need for running under SLES12. We do *not* plan to install all the software that is currently available under SLES11. Specifically, for software that has multiple versions available under SLES11, we have only installed the most recent version(s) under SLES12 as many older versions are not compatible with SLES12.

Xxdiff has been installed as a module under SLES12.

Please use “module load xxdiff” to load the module, and then you will be able to run the xxdiff command.

A large amount of the software available in SLES11 is either not compatible with SLES12 or is unsupported by the vendor. With that, the NCCS will not be porting over unsupported versions of software, either by the vendor or SLES12.

For local software libraries that are required for programs to run, such as the GNU Science Library (GSL), the libraries can be found in the same location they are found in SLES11 in: /usr/local/other.

  1. Try "module avail" to see if there is an existing module for the software.
  2. If the software is compiler specific, try loading the compiler that you wish to use, then issue "module avail" again to see if a version of the software has been built for that compiler.
  3. Check under /usr/local/other as some software is not managed via modules.
  4. If all else fails, contact the NCCS User Services Group at support@nccs.nasa.gov.



Do you have another question or concern?


If you do, please contact the NCCS User Services Group by e-mail at support@nccs.nasa.gov.