Category: Rackspace Cloud

Dynamic Inventory with Ansible and Rackspace Cloud

March 4, 2014

Typically, with Ansible you create one or more hosts file which it calls Inventory file and Ansible will pick the servers from the hosts file and runs the playbooks onto the servers. This is a simple and straightforward way to do it. However, if you are using the Cloud, its very likely that your applications are creating and deleting servers based on some other logic and its very impractical to maintain a static Inventory file. In that case, Ansible can directly talk to your cloud (AWS, Rackspace, OpenStack, etc) or a dynamic source (Cobbler etc) through what it calls Dynamic Inventory plugins, without you having to maintain a static list of servers.

Here, I will go through the process of using the Rackspace Public Cloud Dynamic Inventory Plugin with Ansible.

Install Ansible
First of all, if you have not already installed Ansible, go ahead and do so. I like to install Ansible within virtualenv using pip.

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install python-dev python-virtualenv
virtualenv env
source env/bin/activate
pip install ansible

Install Rax Dynamic Inventory Plugin
Ansible maintains an external RAX Inventory File on its repository (Not sure why these plugins do not get bundled with the Ansible package). The rax.py script depends on pyrax module, which is the client binding for Rackspace Cloud.

pip install pyrax
wget https://raw.github.com/ansible/ansible/devel/plugins/inventory/rax.py
chmod +x rax.py

The script needs a configuration file named ~/.rackspace_cloud_credentials, which will store your auth credentials to Rackspace Cloud.

cat ~/.rackspace_cloud_credentials
[rackspace_cloud]
username = <username>
api_key = <apikey>

Run rax.py
As you can see, rax.py is a very simple script that provides a couple of methods to list and show servers in your cloud. By default, it grabs the servers in all Rackspace regions. If you are interested in only one region, you can specify the RAX_REGION.

./rax.py --list
RAX_REGION=DFW ./rax.py --list
RAX_REGION=DFW ./rax.py --host some-cloud-server

Create Cloud Servers
Since you have already pyrax installed as a dependency of rax.py inventory plugin, you can use command-line to create a cloud server named ‘staging-apache1′ and and tag the server as staging-apache group using the metadata key-value feature.

export OS_USERNAME=<username>
export OS_PASSWORD=<apikey>
export OS_TENANT_NAME=<username>
export OS_AUTH_SYSTEM=rackspace
export OS_REGION_NAME=DFW
export OS_AUTH_URL=https://identity.api.rackspacecloud.com/v2.0/
ssh-keygen
nova keypair-add --pub-key ~/.ssh/id_rsa.pub stagingkey
nova boot --image 80fbcb55-b206-41f9-9bc2-2dd7aac6c061 --flavor 2 --meta group=staging-apache --key-name stagingkey staging-apache1

If you want to install Apache on more staging servers, you would create server named staging-apache2 and tag it with the same group name staging-apache.

Also note, we are injecting ssh keys to the servers on creation, so ansible will be able to do ssh passwordless login. With Ansible, you also have the option of using username-password if you choose so.

Once the server is booted, lets make sure ansible can ping all the servers tagged with the group staging-apache.

ansible -i rax.py staging-apache -u root -m ping

Run a sample playbook
Now, lets create a very simple playbook to install apache on the inventory.

$ cat apache.yml
- hosts: staging-apache
  tasks:
      - name: Installs apache web server
        apt: pkg=apache2 state=installed update_cache=true

Lets run the apache playbook on all rax servers in the region DFW and that match the hosts in the group staging-apache.

RAX_REGION=DFW ansible-playbook -i rax.py apache.yml

With static inventory, you’d be doing this instead, and manually updating the hosts file:

ansible-playbook -i hosts apache.yml

Now you can ssh into the staging-apache1 server and make sure everything is configured as per your playbook.

ssh -i ~/.ssh/id_rsa root@staging-apache1

You may add more servers to the staging-apache group, and on the next run, ansible will detect the updated inventory dynamically and run the playbooks.

Rackspace Public Cloud is based off of OpenStack Nova. So nova.py inventory should work pretty much the same. You can look at the complete lists of dynamic inventory plugins here. Adding a new inventory plugin like for say Razor that isn’t already there would be fairly simple.

Deploying multinode Hadoop 2.0 cluster using Apache Ambari

October 31, 2013

The Apache Hadoop community recently made the GA release of Apace Hadoop 2.0, which is a pretty big deal. Hadoop 2.0 is  basically a re-architecture and re-write of major components of classic Hadoop including the NextGen MapReduce Framework based on Hadoop YARN, and federated Namenodes. Bottomline, the architectural changes in Hadoop 2.0 allows it to scale to much larger clusters.

Deploying Hadoop manually can be a long and tedious process. I really wanted to try the new Hadoop, and I quickly realized Apache Ambari now supports the deployment of Hadoop 2.0. Apache Ambari has come a long way since last year and has really become one of my preferred Hadoop deployment tools for Hadoop 1.*.

In this article below, I will go through the steps I followed to get a Hadoop 2.0 cluster running on Rackspace Public Cloud. I just chose Rackspace public cloud as I have easy access to it, but doing it on Amazon or even dedicated servers should be just as easy too.

1. Create cloud servers on Rackspace Public Cloud.

You can create cloud servers using the Rackspace Control Panel or using their APIs directly or using any of the widely available bindings.

For Hadoop cluster, I am using:

  • Large flavors ie 8GB or above.
  • CentOS6.* as Guest Operating System.

To actually create the servers, I will use a slightly modified version of bulk servers create script. I will create one server for Apache Ambari and a number of servers for Apache Hadoop Cluster and I will then use Ambari to install the Hadoop onto the Hadoop cluster servers.

So basically, I have created the following servers:

myhadoop-Ambari
myhadoop1
myhadoop2
myhadoop3
myhadoop4
myhadoop5

and have recorded their hostnames, public/private ip addresses and root passwords for each.

2. Prepare the servers.

SSH into the newly created Ambari server eg. myhadoop-Ambari. Update its /etc/hosts file with the entry for each server above.

Also create a hosts.txt file with the hostnames of the servers from above.

root@myhadoop-Ambari$ cat hosts.txt
myhadoop1
myhadoop2
myhadoop3
myhadoop4
myhadoop5

At this point, from the same Ambari server, run the following script which will ssh into all of the servers specified in the hosts.txt file and set them up.

Specifically, the script will set up passwordless SSH between the servers and also disable iptables among other things.

prepare-cluster.sh

#!/bin/bash

set -x

# Generate SSH keys
ssh-keygen -t rsa
cd ~/.ssh
cat id_rsa.pub >> authorized_keys

cd ~
# Distribute SSH keys
for host in `cat hosts.txt`; do
    cat ~/.ssh/id_rsa.pub | ssh root@$host "mkdir -p ~/.ssh; cat >> ~/.ssh/authorized_keys"
    cat ~/.ssh/id_rsa | ssh root@$host "cat > ~/.ssh/id_rsa; chmod 400 ~/.ssh/id_rsa"
    cat ~/.ssh/id_rsa.pub | ssh root@$host "cat > ~/.ssh/id_rsa.pub"
done

# Distribute hosts file
for host in `cat hosts.txt`; do
    scp /etc/hosts root@$host:/etc/hosts
done

# Prepare other basic things
for host in `cat hosts.txt`; do
    ssh root@$host "sed -i s/SELINUX=enforcing/SELINUX=disabled/g /etc/selinux/config"
    ssh root@$host "chkconfig iptables off"
    ssh root@$host "/etc/init.d/iptables stop"
    echo "enabled=0" | ssh root@$host "cat > /etc/yum/pluginconf.d/refresh-packagekit.conf"
done

Note, this step will ask for root password for each of the servers before setting them for passwordless access.

3 Install Ambari.

While still on the Ambari server, run the following script that will install Apache Ambari.

install-ambari-server.sh

#!/bin/bash

set -x

if [[ $EUID -ne 0 ]]; then
    echo "This script must be run as root"
    exit 1
fi

# Install Ambari server
cd ~
wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/GA/ambari.repo
cp ambari.repo /etc/yum.repos.d/
yum install -y epel-release
yum repolist
yum install -y ambari-server

# Setup Ambari server
ambari-server setup -s

# Start Ambari server
ambari-server start

ps -ef | grep Ambari

Once the installation completes, you should be able to login to the ip address of the Ambari servers on the browser and access its web interface.

http://myhadoop-Ambari:8080

admin/admin is the default username and password.

4. Install Hadoop.

Once logged into the Ambari web portal, it is pretty intuitive to create a Hadoop Cluster through its wizard.

It will ask for hostnames and SSH Private Key, which you can get from the Ambari Server.

root@myhadoop-Ambari$ cat hosts.txt
root@myhadoop-Ambari$ cat ~/.ssh/id_rsa

You should be able to just follow the wizard and complete the Hadoop 2.0 Installation at this point. The process the install Hadoop 1.* is almost exactly the same although some of the services like YARN don’t exist.

Apache Ambari will let you install a plethora of services including HDFS, YARN, MapReduce2, HBase, HIVE, Oozie, Ganglia, Nagios, ZooKeeper and Hive and Pig clients. As you go through the installation wizard, you can choose what service goes on which server.

5. Validate Hadoop:

SSH to myhadoop1 and run the script to do a wordcount on all books of Shakespeare.

wordcount2.sh

#!/bin/bash

set -x

su hdfs - -c "hadoop fs -rmdir /shakespeare"
cd /tmp
wget http://homepages.ihug.co.nz/~leonov/shakespeare.tar.bz2
tar xjvf shakespeare.tar.bz2
now=`date +"%y%m%d-%H%M"`
su hdfs - -c "hadoop fs -mkdir -p /shakespeare"
su hdfs - -c "hadoop fs -mkdir -p /shakespeare/$now"
su hdfs - -c "hadoop fs -put /tmp/Shakespeare /shakespeare/$now/input"
su hdfs - -c "hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-76.jar wordcount /shakespeare/$now/input /shakespeare/$now/output"
su hdfs - -c "hadoop fs -cat /shakespeare/$now/output/part-r-* | sort -nk2"

So you have your first Hadoop 2.0 cluster running and validated. Feed free to look into the scripts, its mostly instructions from the Hortonworks docs scripted out. Have fun Hadooping!

Bulk creating Rackspace cloud servers using Script

October 31, 2013

I keep having to create a large number of cloud servers on Rackspace Cloud, so I can play with things like Hadoop and Cassandra.
Using the control panel to create one server at a time, and record each login password and ip, and wait till the server goes active can get really tedious very soon.
So here’s a little script that installs the REST API python binding ‘rackspace-novaclient’ on an Ubuntu server, and prompts you for the image, flavor and number of servers to create, then goes and creates the servers.

On an Ubuntu server, first export your Rackspace Cloud auth credentials (either as root or sudo user)

export OS_USERNAME=<username>
export OS_PASSWORD=<apikey>
export OS_TENANT_NAME=<username>
export OS_AUTH_SYSTEM=rackspace
export OS_AUTH_URL=https://identity.api.rackspacecloud.com/v2.0/
export OS_REGION_NAME=DFW export OS_NO_CACHE=1

Here is the actual script:

#!/bin/bash

set -x

# Install the Client
if [[ $EUID -ne 0 ]]; then
	sudo apt-get update
	sudo apt-get install python-dev python-pip python-virtualenv
else
	apt-get update
	apt-get install python-dev python-pip python-virtualenv
fi

virtualenv ~/.env
source ~/.env/bin/activate
pip install pbr
pip install python-novaclient
pip install rackspace-novaclient

# Read AUTH Credentials
: ${OS_USERNAME:?"Need to set OS_USERNAME non-empty"}
: ${OS_PASSWORD:?"Need to set OS_PASSWORD non-empty"}
: ${OS_TENANT_NAME:?"Need to set OS_TENANT_NAME non-empty"}
: ${OS_AUTH_SYSTEM:?"Need to set OS_AUTH_SYSTEM non-empty"}
: ${OS_AUTH_URL:?"Need to set OS_AUTH_URL non-empty"}
: ${OS_REGION_NAME:?"Need to set OS_REGION_NAME non-empty"}
: ${OS_NO_CACHE:?"Need to set OS_NO_CACHE non-empty"}

# Write credentials to a file
cat > ~/novarc <> 'server_passwords.txt'
}

CLUSTER_SIZE=`expr $CLUSTER_SIZE - 1`
for i in $(eval echo "{1..$CLUSTER_SIZE}")
do
	boot $FLAVOR_ID $IMAGE_ID $CLUSTER_NAME$i
done

is_not_active() {
	status=`nova show $1 | grep 'status' | awk '{print $4}'`
	if [ "$status" != "ACTIVE" ] && [ "$status" != "ERROR" ]; then
		echo "$1 in $status"
		return 0
	else
		return 1
	fi
}

# Wait for all the instances to go ACTIVE or ERROR
while true
do
	READY=1
	for i in $(eval echo "{1..$CLUSTER_SIZE}")
	do
		if is_not_active $CLUSTER_NAME$i; then
			READY=0
		fi
	done

	echo "READY is $READY"
	if [ "$READY" -eq "1" ]; then
		break
	fi
	sleep 5
done

for i in $(eval echo "{1..$CLUSTER_SIZE}")
do
	echo $CLUSTER_NAME$i >> 'hosts.txt'
done
cat hosts.txt

record_ip(){
	private_ip=`nova show $1 | grep 'private network' | awk '{print $5}'`
	public_ip=`nova show $1 | grep 'accessIPv4' | awk '{print $4}'`
	echo $private_ip $1 >> 'etc_hosts.txt'
	echo $public_ip $1 >> 'etc_hosts.txt'
}

for i in $(eval echo "{1..$CLUSTER_SIZE}"); do record_ip $CLUSTER_NAME$i; done

cat etc_hosts.txt

echo "CLUSTER is READY"

Then, execute the script

./create-clusters.sh

Alternatively, I put the script on github, so you can also do curl, pipe, bash.

bash <(curl -s https://raw.github.com/sacharya/hadoop-101/master/rax/create-clusters.sh)

The script will wait till all the servers go to an active status and will save the the ips, hostnames and passwords for each of the servers onto these three files.

  • etc_hosts.txt
  • hosts.txt
  • server_passwords.txt