All the FLOPs

19 TFLOPs
19 TFLOPs

When I get asked what I do (well, I should really say I’m a musician or a father or something like that) I say I’m in computers … then the questions start. Then I have to explain I don’t work with computers like your laptop, I work on high performance computing (HPC) clusters. HPC clusters are what scientists use to run HUGE problems on.

The image above is the output of some monitoring and testing of a system I just installed and the performance that it’s achieving. This particular system has 94 distinct servers that contain a total of 5,888 GB of RAM and 1128 processors all connected with a network running at 40 Gbps (that would be like 40 connections from Google Fiber all at once). The performance is measure in floating point operations per second (FLOPs). This system is running at 19 TFLOPs or 19,000 GFLOPs or 19,000,000,000,000 distinct operations on a floating point number PER SECOND. For comparison, my new MacBook Air will do around .0020 TFLOPs.

Share Button

RPMS for TORQUE with Nvidia GPU Support

Getting TORQUE built into RPMs with GPU support was considerably more frustrating than I expected. I’m really not a fan of TORQUE as I seem to often run in to silly problems or serious limitations since pbs_sched is so simplistic that it’s really not the best fit for most users … but I still have to support it, so here goes.

First we need to install CUDA. Thankfully, Nvidia has added a yum repo so this whole process has gotten a little bit easier. The Getting Started Guide has all of the info, but it’s a bit to wade through since they tackle multiple distros. The basic process is to enable the EPEL repository, enable the Nvidia repository (install the appropriate RPM from the CUDA Downloads page), then install the cuda and gpu-deployment-kit packages with yum.

yum -y install cuda gpu-deployment-kit

Download the source for TORQUE from Cluster Resources/Adaptive Computing. I used version 4.2.7 and all of the examples will reference this.

Untar the TORQUE source and run configure with a few options. The annoying option is the --with-default-server options since omitting this makes the clients connect only the localhost instead of the the actual pbs_server process. No amount of config file changes or environment settings change this behavior.

./configure --with-default-server=head.cluster --enable-nvidia-gpus --with-nvml-lib=/usr/lib64/nvidia --with-nvml-include=/usr/include/nvidia/gdk

Now, you would think that would add all of the correct options and building would just go smoothly. NOPE! None of the GPU stuff gets added to the torque.spec file. Fancy! So edit torque.spec and look for the %configure section. It will look like:

%configure --includedir=%{_includedir}/%{name} --with-default-server=%{torque_server} \
--with-server-home=%{torque_home} %{ac_with_debug} %{ac_with_libcpuset} \
--with-sendmail=%{sendmail_path} %{ac_with_numa} %{ac_with_memacct} %{ac_with_top} \
--disable-dependency-tracking %{ac_with_gui} %{ac_with_scp} %{ac_with_syslog} \
--disable-gcc-warnings %{ac_with_munge} %{ac_with_pam} %{ac_with_drmaa} \
--disable-qsub-keep-override %{ac_with_blcr} %{ac_with_cpuset} %{ac_with_spool} %{?acflags}
%{__make} %{?_smp_mflags} %{?mflags}

Change this to:

%configure --includedir=%{_includedir}/%{name} --with-default-server=%{torque_server} \
--enable-nvidia-gpus --with-nvml-lib=/usr/lib64/nvidia --with-nvml-include=/usr/include/nvidia/gdk \
--with-server-home=%{torque_home} %{ac_with_debug} %{ac_with_libcpuset} \
--with-sendmail=%{sendmail_path} %{ac_with_numa} %{ac_with_memacct} %{ac_with_top} \
--disable-dependency-tracking %{ac_with_gui} %{ac_with_scp} %{ac_with_syslog} \
--disable-gcc-warnings %{ac_with_munge} %{ac_with_pam} %{ac_with_drmaa} \
--disable-qsub-keep-override %{ac_with_blcr} %{ac_with_cpuset} %{ac_with_spool} %{?acflags}
%{__make} %{?_smp_mflags} %{?mflags}

Now we can run make -j then make rpm and your RPMs will be joyfully created in ~/rpmbuild/RPMS/x86_64.

Share Button