Giter Club home page Giter Club logo

wipi's Introduction

Building HPC with Raspberry Pi

Raspberry Pi Cluster View Raspberry Pi Cluster View

Step - 0: The Hardware

Step - 1: Prepare the Micro-SD cards for Raspberry Pi OS

Download latest version of Raspbian Lite OS by using your MacOS/Linux terminal and type

wget https://downloads.raspberrypi.org/raspbian_lite_latest -O raspbian_lite_latest.zip

Now extract the zipped file

unzip raspbian_lite_latest.zip

Check the directory for the contents from the extraction and find the name of the disk image file with extension .img (e.g. 2020-02-13-raspbian-buster-lite.img) Now insert a SD/Micro-SD card inside your laptop and check for the attached devices/mount point of the card For MacOs:

diskutil list

For Linux:

lsblk

Let’s say it is attached to dev/disk2. First unmount the disk,

diskutil unmountDisk /dev/disk2

For linux (use "sudo" if necessary)

umount /dev/disk2

Then flash the image to memory card

sudo dd if=2020-02-13-raspbian-buster-lite.img of=/dev/disk2

If successful, a drive will be mounted under the name boot. Raspberry Pis usually comes with disabled SSH configuration. We don’t want that. To enable it create an empty file inside the boot directory.

For MacOS, you can find it under /Volumes/boot

cd /Volumes/boot/

Now, type

touch ssh 

Now, we have successfully configured a Raspbian Lite OS having ssh enabled.

Let’s eject the card from the Mac

cd
diskutil unmountDisk /dev/disk2

Repeat this process for all three memory cards. Now insert the cards to your Raspberry Pis. Remember to mark the master node to separate it from others.

Now plug in all the three memory cards in to the storage port of Raspberry Pis. Then connect the network cables(CAT5/6/6A) to in the ethernet port of Pis. Do not power on the Pis at the moment.

External storage as shared storage

The concept of cluster is based on idea of working together. In order to do so, they need to have access to the same files. We can arrange this by mounting an external SSD drive (not necessary but convenient and faster) or flash drive, and exporting that storage as a network file system (NFS). It would allow us to access the files from all nodes.

The process is straight forward and simple. At this point, insert the external storage into your master node.

Step - 2: Network Setup

To do this part, you need a wireless router with DHCP enabled. The Dynamic Host Configuration Protocol (DHCP) will allocate IPs as soon as we connect our Raspberry Pis to the network. If you have network switch, first plugin the other end of the ethernet cables connected to Pis. Now plugin one extra cable from switch to Wireless router. Physical network complete. Now power on the wireless router and the switch.

Note: If you do not have the network switch, then connect the network cables directly to the wireless router.

Now, login to the wireless router management page using browser. If your laptop is connected to the same network, just type the gateway IP. e.g. if you IP is 10.10.0.10, usually your gateway is 10.10.0.1, it's really simple. If you have trouble getting into the management page look for proper information on the router body. It's written somewhere on the body. After getting into the management page, go to the connected devices page and keep it open.

Setting up Wireless Router for the first time

⚠️ Instructions are applicable for TP-Link Routers. 🎬 Video instruction

Similar informations are available for other manufactures:

If you want to use the router as modem, you need to follow the instruction given in the picture below: Tp-link modem setup Tp-link modem setup 2 ⚠️ If you already have a modem and want use this device as a router only, follow the instruction below: Tp-link modem setup 3 Now, power on the master node first by connecting the USB-C cable from a power outlet (or the 6-port USB power supply) and keep refresing the page. If everything goes well, you should see a new device named raspberrypi connected to the network. Now note down the IPV4 address associated with it.

Next, power on one of the compute nodes and do the same (note it as node01). Repeat the process for all the compute nodes. At the end, you should have something similar to the following information with you:

  • master IPV4: 10.10.0.11
  • node01 IPV4: 10.10.0.12
  • node02 IPV4: 10.10.0.13

Now, try to ping each of the Pis from your computer terminal and wait for couple of seconds, then kill it by pressing Ctrl + c.

ping 10.10.0.11

You should get an output very similar to the following

PING 10.10.0.11: 56 data bytes
64 bytes from 10.10.0.11: icmp_seq=0 ttl=59 time=1.947 ms
64 bytes from 10.10.0.11: icmp_seq=1 ttl=59 time=3.582 ms
64 bytes from 10.10.0.11: icmp_seq=2 ttl=59 time=3.595 ms
64 bytes from 10.10.0.11: icmp_seq=3 ttl=59 time=3.619 ms
...
--- 10.10.0.11 ping statistics ---
6 packets transmitted, 6 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.947/3.317/3.635/0.614 ms

Note: If you have options to reserve IP on your wireless router management page, it is advised to do so for all the Pis. However, it is not mandetory.

Step - 3: Setting Up the Master Node

Now, log in to your master node using

Upon connection use password raspberry. (Note: it is the default password)

Now, use the following command to download the shell script for master node

pi@raspberrypi~$ wget https://raw.githubusercontent.com/sayanadhikari/wipi/automated/deployment/master_deployment.sh

The script you just downloaded should be in /home/pi/ with the name master_deployment.sh.

Safety check for shared storage:

Just before setting up the network, we inserted the external storage into master node. Now use the following command to see the dev location and mount point of your storage.

pi@raspberrypi ~> lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
mmcblk0     105:0    0  7.4G  0 disk 
├─mmcblk0p1 105:1    0 43.8M  0 part /boot
└─mmcblk0p2 105:2    0  7.4G  0 part /
sda         3:16     0 59.2G  0 disk
└─sda1      3:17     0 59.2G  0 part

In our case, the main partition of the external storage is mounted at /dev/sda1. However, if it is different for you, you should edit the master_deployment.sh file and modify the following lines accordingly. (Note: replace /dev/sda1 with /dev/sda2 or whatever path you get from lsblk command)

mkfs.ext4 /dev/sda1
UUID=$(blkid -o value -s UUID /dev/sda1)

Now run the script to prepare the master node.

pi@raspberrypi~$ sudo bash ./master_deployment.sh

At the end of the script execution, the system will automatically reboot. After reboot, log-in to the master node again using

and run the following command

pi@master~$ sudo chmod 777 -R /shared_dir

Step - 4: Setting Up the Worker Nodes

We already have the IPs for worker nodes See Step - 2. Now let's prepare them one by one. Log into node01 by using the following command,

Upon connection use password raspberry. (Note: it is the default password)

Now, use the following command to download the shell script for worker node

pi@raspberrypi~$ wget https://raw.githubusercontent.com/sayanadhikari/wipi/automated/deployment/node_deployment.sh

The script you just downloaded should be in /home/pi/ with the name node_deployment.sh. Now run the script to prepare the worker node.

pi@raspberrypi~$ sudo bash ./node_deployment.sh node01

At the end of the script execution, the system will automatically reboot.

Now repeat the process for rest of the worker nodes. Login to rest of the worker nodes using their respective ips (information available at Step-2). Also, remember to replace “node01” word in the last command with their respective node numbers.

Step - 5: Configuring SLURM on master Node

Slurm is an open source, and highly scalable cluster management and job scheduling system. It can be used for both large and small Linux clusters. Let’s install it on our Pi cluster. To do that, first we need to login to the master node using ssh again,

Then, go to the deployment directory inside the wipi repository,

pi@master ~> cd /home/pi/wipi/deployment

To allow password-less ssh across the system, run the password_less_ssh_master.sh using the following command,

pi@master~$ bash password_less_ssh_master.sh

Now run the script slurm_config_master.sh to prepare the master node for slurm.

pi@master~$ sudo bash slurm_config_master.sh

To ensure smooth operation, the system will reboot at this point.

Step - 6: Configuring SLURM on Compute Nodes

We have successfully configured the master node, we need to do the same on compute nodes. Now, log into the one of the nodes let's say node01

Then, go to the deployment directory inside the wipi repository,

pi@node01 ~> cd /home/pi/wipi/deployment

Now run the script slurm_config_nodes.sh to prepare the worker node for slurm.

pi@node01~$ sudo bash slurm_config_nodes.sh

To ensure smooth operation, the system will reboot at this point.

Diagnostic check for MUNGE

Now, we need to verify whether our the SLURM controller can successfully authenticate with the client nodes using munge. In order to do that, we need to login to master node and use the following command,

pi@master ~> ssh pi@node01 munge -n | unmunge

Upon successful operation, you should get output something similar to the following,

ssh pi@node01 munge -n | unmunge
pi@node01's password: 
STATUS:           Success (0)
ENCODE_HOST:      master (127.0.1.1)
ENCODE_TIME:      2020-08-30 22:45:00 +0200 (1598820300)
DECODE_TIME:      2020-08-30 22:45:00 +0200 (1598820300)
TTL:              300
CIPHER:           aes128 (4)
MAC:              sha256 (5)
ZIP:              none (0)
UID:              pi (1001)
GID:              pi (1001)
LENGTH:           0

Sometime, you might get an error, which indicates that you may have failed to copy the exact munge key to the nodes.

Now repeat this process on all the other nodes.

Step - 7: Test SLURM

Login to master node using ssh and type the following command

pi@master ~>sinfo

You should get an output something like this

PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
picluster*    up   infinite      2   idle node[01-03]

You can simply run a task to ask the hostname for each node

pi@master ~>srun --nodes=2 hostname

It will give you an output similar to

node02
node01

Step - 8: Powering On and Off (Cluster)

Write a shell script with the following lines of codes and save it as clusterup.sh

#!/bin/bash
sudo scontrol update NodeName=node[01-02] state=resume
sinfo
echo "Nodes up and running"

Each time you power on your cluster, run the following command at the startup,

pi@master ~>clusterup

Each time you need to power off your cluster, run the following command,

pi@master ~>clusterdown

If you wanr to check the temperatures of individual nodes use the following command,

pi@master ~>tempcheck

Step - 9: OpenMPI

OpenMPI is the Open sourced Message Passing Interface. In short it is a very abstract description on how messages can be exchanged between different processes. It will allow us to run a job across multiple nodes connected to the same cluster.

Now let's test OPENMPI, go to the open_mpi directory inside shared_dir

pi@master ~>cd /shared_dir/open_mpi

Now, compile the program using mpicc

mpicc hello_mpi.c

This would create an executable name a.out You can run the executable using the following command

mpirun -np 2 -hostfile hostfile ./a.out

Now, let’s test the same using SLURM job manager. In order to do so, first we have to create a job script. Go to the slurm_jobs directory inside shared_dir

pi@master ~>cd /shared_dir/slurm_jobs

To submit a job use the following command

sbatch hello_mpi_slurm.sh 

To view the status of any job

squeue -u pi

NOTE: pi is your username

References:

wipi's People

Contributors

sayanadhikari avatar rinku-mishra avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.