Validating JSON using Python jsonschema

August 27, 2013

JSON data format can be described using jsonschema, which can then be used to do validation of input JSON and all kinda of automated testing and processing of data. Coming from the Java and XML world, I find it very handy to validate incoming json requests on RESTful apis.

jsonschema is an implemenation of json-schema for Python, and its pretty easy to use.

Given your json data and an associated schema for the json that you have created, using the jsonschema library is pretty easy:

pip install jsonschema
import json
import jsonschema

schema = open("schema.json").read()
print schema

data = open("data.json").read()
print data

    jsonschema.validate(json.loads(data), json.loads(schema))
except jsonschema.ValidationError as e:
    print e.message
except jsonschema.SchemaError as e:
    print e

This will validate your schema first and then validate the data. If you are sure your schema is valid, you can directly use one of the available Validators.

# Use a Draft3Validator
except jsonschema.ValidationError as e:
    print e.message

This will just report the first error it catches. Interestingly, you can also use lazy validation to report all validation errors:

# Lazily report all errors in the instance
    v = jsonschema.Draft3Validator(json.loads(schema))
    for error in sorted(v.iter_errors(json.loads(data)), key=str):
except jsonschema.ValidationError as e:
    print e.message

For your reference, here is my sample json and its schema for you to start with. You can generate a basic schema out of your data using tools like Or you can dive into yourself for details of the jsonschema spec.


    "streetAddress": "21 2nd Street",
    "city":"New York",
      "number":"212 555-1234"


  "$schema": "",
    "address": {
        "city": {
        "houseNumber": {
        "streetAddress": {
    "phoneNumber": {
          "number": {
          "type": {

Bash Script to Backup MySQL Databases

February 24, 2013

mysqldump is a program to do a dump of a MySQL database. It creates a .sql file, which you can then use to restore the database.

Back up a MySQL database:

mysqldump -u mysql_user -h ip -pmysql_password database_name > database_name.sql

To restore a database from the database_name.sql file:

mysql -u mysql_user -h ip -pmysql_password database_name < database_name.sql

Backup all databases on the server:
Interestingly, you can backup all databases on the server:

mysqldump -u mysql_user -h ip -pmysql_password -A > all_databases.sql

To restore all databases:

mysql  -u mysql_user -h ip -pmysql_password < all_databases.sql

Backup a table on a MySQL database:
You can also do mysqlduml at the table level:

mysqldump -u mysql_user -h ip -pmysql_password database_name table_name > table_name.sql

To restore the table to the database:

mysql  -u mysql_user -h ip -pmysql_password database_name table_name < table_name.sql

I have a bunch of MySQL databases hosted on a bunch of servers. I have been pretty lazy to back them up regularly. So I wrote a quick bash script to back up the MySQL databases, creating a separate backup file for each database.

Here’s what the script does in short:
1. You provide it a list of ip address, username and password.
2. It will mysqldump all the databases on each host server that the user has access to.
3. It will store all the dumps in the backup_dir and compress each dump using gunzip.

Bash doesn’t really support multi dimensional arrays. So I had to store the ip, username and password as a comma separated string and split it up in each iteration. Ugly, but gets the job done for now.

# Script to do back of all mysql databases on different hosts.


function mysql_dump() {
        local ip="$1"
        local mysql_user="$2"
        local mysql_password="$3"
        mysql_databases=`mysql -u ${mysql_user} -p${mysql_password} -h ${ip} -e "show databases"| sed /^Database$/d`
        for database in $mysql_databases
                if [ "${database}" == "information_schema" ]; then
                        echo "Skipping $database"
                        echo "Backing up ${database}"
                        mysqldump -u ${mysql_user} -p${mysql_password} -h ${ip} ${database} | gzip > "${backup_dir}/${database}.gz"

backup_date=`date +%Y_%m_%d_%H_%M`
mkdir -p "${backup_dir}"

for server in ${servers[@]}
        mysql_dump ${cols[0]} ${cols[1]} ${cols[2]}

Error Messages in Java

December 1, 2011

Every time something goes wrong within the application or you want to notify the user of something, you display messages to the user. These messages could be simple validation errors, or severe errors in the backend. Such events can happen anywhere within the system and we want to be able to tell the user what happened in a user-friendly way. So its important that there messages are well-written and consistent and they better be stored in a central place for maintenance.

Here’s a simple and clean way to do it that I have used over the years. Its pretty simple, really. I am using Enums for all error messages and using message format so I can pass in extra variables. While the messages itself are still in code, at least they are in the same file that anyone can use and maintain.

package errors;
import java.text.MessageFormat;

public enum Errors {

     USERNAME_NOT_FOUND("User not found"),
     USERNAME_EXISTS("Username {0} already exists."),
     USERNAME_CONTAINS_INVALID_CHARS("Username {0} contains invalid characters {1}.");

     private final String message;

     ErrorMessages(String message) {
          this.message = message;

     public String toString() {
          return message;

     public String getMessage(Object... args) {
          return MessageFormat.format(message, args);
     public static void main(String args[]) {
          System.out.println(ErrorMessages.USERNAME_CONTAINS_INVALID_CHARS.getMessage("s%acharya", "%"));

Sometimes the above solution is not enough. You want to do internationalization on the messages and display different messages to the user based on the locale. In such a case, you can externalize the messages to a file, and have translation of the messages in each locale file. You can then use ResourceBundle and currentLocale to get the right properties file and construct the right message for the given key using MessageFormat.

package errors;

import java.text.MessageFormat;
import java.util.Locale;
import java.util.ResourceBundle;

public class Messages {

     public static String getMessage(String key, Object... args) {
          // Decide which locale to use
          Locale currentLocale = new Locale("en", "US");
          ResourceBundle messages = ResourceBundle.getBundle("resource", currentLocale);

          MessageFormat formatter = new MessageFormat("");

          String output = formatter.format(args);
          return output;

     public static void main(String args[]) {
          System.out.println(Messages.getMessage("USERNAME_EXISTS", "sacharya"));
          System.out.println(Messages.getMessage("USERNAME_CONTAINS_INVALID_CHARS", "s%acharya", "%"));


There following files should be available in the classpath so the class can see it.

USERNAME_NOT_FOUND=User not found.
USERNAME_EXISTS=Username {0} already exists."
USERNAME_CONTAINS_INVALID_CHARS=Username {0} contains invalid characters {1}.


USERNAME_NOT_FOUND=User not found.
USERNAME_EXISTS=Username {0} already exists."
USERNAME_CONTAINS_INVALID_CHARS=Username {0} contains invalid characters {1}.

Sample usages have been demonstrated in the main method above. Choose whatever way serves your purpose.

Nginx Proxy to Jetty for Java Apps

March 4, 2011

Traditionally, I used to go with Apache, Mod Jk and Tomcat to host any Java web apps. But this time I was working on a small hobby project written in Groovy on Grails and had to deploy it to a VPS with a very limited resources. So I had to make the most of the server configuration that I had. So I went with a combination of Nginx and Jetty.

If you’ve never heard of Nginx, it is a very simple HTTP server that is known for its high-performance, low and predictable resource consumption and low memory footprint under load. It uses an asynchronous even-driven model to handle requests which enables it to efficiently handle a large no of requests concurrently.

Similarly, Jetty provides a very good Java Servlet Container. Jetty can be used either as a Standalone application server or can be embedded into an application or framework as a HTTP Component or a servlet engine. It servers as a direct alternative to Tomcat in many cases. Because of its use of advanced NIO and small memory footprint, it provides very good scalability.

Below, I will jot down the steps I went through to configure Nginx as a frontend to Jetty on my VPS running Ubuntu Hardy.

Install Java:

$sudo apt-get install openjdk-6-jdk
$ java -version
java version "1.6.0_0"
OpenJDK  Runtime Environment (build 1.6.0_0-b11)
OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)

$ which java

Install Jetty:

Download the latest version of Jetty, and upload the tar file to your directory of chose on your server.

$ scp jetty-6.1.22.tar

Now login to your server, go to the directory where you uploaded Jetty above.

$ cd /user/java
$ tar xvf  jetty-6.1.22.tar

Now you can start or stop the Jetty server using the following commands:

$ cd /user/java/jetty-6.1.22/bin
$./ start
ps aux | grep java
root     21766  1.2 72.4 1085176 387196 ?    Sl   Mar27   1:12 /usr/lib/jvm/java-6-openjdk/
/bin/java -Djetty.home=/user/java/jetty-6.1.22 -jar
/user/java/jetty-6.1.22/start.jar /user/java/jetty-6.1.22/etc/jetty-logging.xml

The jetty logs are under jetty-6.1.22/logs if you are interested.

Open up your bash profile and set the following paths:

$ vi ~/.bash_profile



Now that Jetty is running, you can go the its default port 8080 and verify that everything is working as expected.

Now that you have Jetty, its time to deploy your app to the Jetty container.

$ scp myapp.war
$ tar -xvf myapp.war

$ vi /user/java/jetty-6.1.22/contexts/myapp.xml

<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Mort Bay Consulting//DTD Configure//EN" "">
<Configure class="org.mortbay.jetty.webapp.WebAppContext">
<Set name="configurationClasses">
<Array type="java.lang.String">
<Set name="contextPath">/</Set>
<Set name="resourceBase"><SystemProperty name="jetty.home" default="."/>/webapps/myapp</Set>

Restart jetty and go to http://ipAddress:8080/myapp, and you should be getting your app.

Install Nginx:

$ sudo aptitude install nginx

This will install Nginx under /etc/nginx

You can start, stop or restart the Nginx server using the commands:

$ sudo /etc/init.d/nginx start
$ sudo /etc/init.d/nginx stop
$ sudo /etc/init.d/nginx restart

Go to your server ip address (or locahost of local) in your browser, and you should be able to see the default welcome page.

Nginx Proxy to Jetty:
Now, lets point configure Nginx as a proxy to our Jetty Server:

$ cd /etc/nginx/sites-available
$ vi default
Point your proxy_pass to:

location / {

Basically, nginx listens on port 80 and forwards it to port 8080. Jetty sets anything on / to /webapps/myapp which means any request to from nginx is served from

Now if you type your IP address or domain name in the browser, content will be served from your application in Jetty. Right now, you are serving everything through Jetty including the scatic files like images, javascript and css. But you can easily serve the static files directly through Nginx: Just add a couple of locations in there:

location /images {
root /user/java/jetty-6.1.22/webapps/myapp;
location /css {
root /user/java/jetty-6.1.22/webapps/myapp;
location /js {
root /user/java/jetty-6.1.22/webapps/myapp;

My final configuration is:

server {
listen   80;

access_log  /var/log/nginx/localhost.access.log;

location / {
location /images {
root /user/java/jetty-6.1.22/webapps/myapp;
location /css {
root /user/java/jetty-6.1.22/webapps/myapp;
location /js {
root /user/java/jetty-6.1.22/webapps/myapp;

# redirect server error pages to the static page /50x.html
error_page   500 502 503 504  /50x.html;
location = /50x.html {
root   /var/www/nginx-default;

Find the Jar File Given a Class Name

March 3, 2011

Often times, while working in Java, you get a ClassNotFoundException or a ClassCastException and you are trying to find find out what Jar the class belongs to and where is it located in the classpath. Your application is either not finding the class or finding the wrong class with the same Class name in the classpath. So you wanna know what Jar is your class coming from at Runtime and whether that is the right class.

Grep to find all Jars with the Class name:

You could write a little bash script to do a find for the class file within your fileSystem, but that doesn’t tell you whats loaded in the classpath. So it will give you shit load of crap that have the class name:

$ cd ~/.groovy
$ $ find . -name "*.jar" -exec sh -c 'jar -tf {} | grep -H --label {} org.apache.commons.httpclient.HttpClient.class' ;

This above command will search in the current directory and all sub directories for any jars. Then for each jar file, it will view the contents of the jar file using jar tf and look for the java class org.apache.commons.httpclient.HttpClient.class. The output will be different depending on where I am running the script from.

Java Class to find Jars with the given Class in Classpath:

But you only want to find the Jar File loaded into the Java Classpath. Here’s a simple Java Class that does the same from within an main class:


import org.apache.commons.httpclient.HttpClient;

public class MainApp {
    public static void main(String[] args) {
    public static String findPathJar(Class<?> context) throws IllegalStateException {
        URL location = context.getResource('/' + context.getName().replace(".", "/") 
                            + ".class");
        String jarPath = location.getPath();
        return jarPath.substring("file:".length(), jarPath.lastIndexOf("!"));

This will print the jar file in the classpath that contains the class HttpClient.class:


The output will be same no matter where I am running the class from, since it is looking at the classpath and not the current directory.

Handy Groovy Script to find Jar with the given Class in Classpath:

#!/usr/bin/env groovy

def klass {
   println "Enter the Class name you want to find the jar for:"
   klass = it.readLine()
def context = Class.forName(klass)
def absolutePath = context.getResource('/' +".", "/") 
           + ".class").getPath()
println absolutePath.substring("file:".length(), absolutePath.lastIndexOf("!"))

Running this as a script, I get:

$ ./getJarFile.groovy 
Enter the Class name you want to find the jar for:

Again, the output will be same no matter where I am running the script from.

Last of all, I find the site very helpful too to find Jar files for a given class.

Facebook Connect with JSecurity on Grails

April 18, 2010

Say you have a Grails application using JSecurity (now called Apache Shiro) for authentication. How do you provide an alternate mechanism to authenticate users using Facebook Connect?

Good thing! Grails has the Facebook Connect plugin so you can authenticate users without registering them to your system. But what you will want is to integrate the Facebook Connect Plugin with the JSecurity Plugin. That way, even though a user is actually authenticated using Facebook Connect, your JSecurity knows about the user and can assign roles and permissions to the user.

I assume you have both the Apache Shiro Plugin installed and authentication working properly. Now go ahead and install the and Facebook Connect Plugin by following the documentation.

In addition to the regular JSecurity Login form, you will have the Facebook Login button as follows:


<script type="text/javascript">
        function delayer() {
            window.location = "${createLink(controller:'auth', action: 'signin')}"
<div id="login_section">
    <h4>Login using Facebook Connect</h4> 
    <div class="login_form">
        <fb:login-button autologoutlink="false" onlogin="setTimeout('delayer()',100)">
	<fb:name uid="loggedinuser" useyou="false"></fb:name>
	<fb:profile-pic uid="loggedinuser" size="normal" />			
	<g:facebookConnectJavascript  />
</div> <!--Login Form Ends-->  

The above code snippet will display the Facebook login button. When the user clicks on the button, Facebook Connect’s login dialog is displayed. Once he user authenticates himself to Facebook, the delayer() javascript method is called which then redirects to the auth/signin grails action.

Now lets insert the following snippet of code into the signIn action.


def facebookConnectService
def signin = {
    if(!params?.username && !params?.password) {
        if(facebookConnectService.isLoggedIn(request)) {
            try {
                params.rememberMe = false
                params.username = "facebook"
                params.password = "randompassword"
            } catch (Exception e) {
                flash.error ="We are sorry. Please try again in a while."
                redirect(controller: 'home', action: 'index') 
    // Rest of the Code continues

where, facebookConnectService is the service class that is provided by the the Grails Facebook Connect plugin. We are just checking if the user is logged in to Facebook, and then setting values for username and password as per JSecurity’s needs.

The actual authentication of the user is done in grails-app/realms/ShiroDbRealm.groovy.

Now lets add a few lines to our ShiroDbRealm.groovy to handle our Facebook user:


def facebookConnectService

def authenticate(authToken) { "Attempting to authenticate ${authToken.username} in DB realm..."
        def username = authToken.username

        // Null username is invalid
        if (username == null) {
            throw new AccountException('Null usernames are not allowed by this realm.')

        def user = null
        def account = null
        if (username == "facebook") {
            def facebookUser = null
            try {
                facebookUser = getFacebookUser()
            } catch (Exception e) {
       	    try {
       	        user = User.findByUsername(facebookUser.username)
       	    } catch (Exception e) {

           if (!user) {
                facebookUser.passwordHash = new Sha1Hash("randompassword").toHex()
                if (facebookUser.validate()) {
	                user = true)
                } else {
                        facebookUser.errors.allErrors.each {
                            log.error it

                account = new SimpleAccount(user.username, user.passwordHash, "ShiroDbRealm")
                return account;

            } else {
                account = new SimpleAccount(user.username, user.passwordHash, "ShiroDbRealm")
                return account;
      // Rest of the Code continues

def getFacebookUser () {
        String userId = facebookConnectService.getFacebookClient().users_getLoggedInUser()
        java.lang.Iterable<java.lang.Long> userIds = new ArrayList<java.lang.Long>()
        Set<ProfileField> c = new HashSet<ProfileField>()

        def myresults = facebookConnectService.getFacebookClient().users_getInfo(userIds, c)
        def useObj = myresults.getJSONObject(0)
        User user = new User()
    	user.username = userId
    	user.firstName = useObj.getString("first_name")
    	user.lastName = useObj.getString("last_name") = useObj.getString("name")  	
    	user.displayName =
        Date date = new java.util.Date()
    	user.dateCreated = date
    	user.lastUpdated = date
    	user.lastVisited = date
        return user

Basically, if the user uses Facebook Connect, we are setting his username to ‘”facebook” and password to “randompassword” from the signin action. In the authenticate method, if the username is “facebook”, we are getting all the info of the user from facebook. If the user with that facebook id is already in our database, it is an existing user and is authenticated. If the user doesnt exist in our database, the new user is added to the databse with a username and password. The username and password is only for integrating Facebook Connect with JSecurity, and the user has no idea about it. Now even though the user is actually authenticated by Facebook, it still is a JSecurity user in our database, and will be treated just like any other user.

Please note that I have added a few properties like firstName, lastName, displayName etc to the User.groovy which is created by JSecurity. Feel free to add new properties to the User object if you want to capture more user info from Facebook for that user.

Now you are good to go. Deploy and test the app.

Soon, you will notice that this will work in development, but there is a bug in the Current revision of Grails Facebook Connect Plugin in production environment, due to which it it cannot find the FacebookConnectConfig.groovy in prod. Go and download the source for Facebook Connect Grails plugin and modify the afterPropertiesSet method under services/FacebookConnectService.groovy in the grails-facebook-connect-0.1 project.


void afterPropertiesSet() {
    // check if there is a compiled class for the Facebook connect config groovy script
    // this will be the case when an application is bundled in a WAR
    def config = null
    try {
        config = Class.forName("FacebookConnectConfig").newInstance()
    } catch(Exception  e) {
    if (config != null) {
        // compiled config class found, we must be running as a WAR 
        facebookConnectConfig = new ConfigSlurper().parse(config.getClass())
    } else {
        // no compiled class exists for the config, we must be running the Grails built-in web server
        GroovyClassLoader loader = new GroovyClassLoader(getClass().getClassLoader())
        Class clazz = loader.parseClass(new File("grails-app/conf/FacebookConnectConfig.groovy"))
        facebookConnectConfig = new ConfigSlurper().parse(clazz)

Now use the patched plugin in your app and that should work!

Shoud you use Facebook Connect?:
Based on my few months of experience, users are pretty hesitant to login to a site using Facebook Connect. In my case, less than 10% of the users used Facebook Connect on my site, and it certainly wasn’t worth the effort I put into making it work and mantaining it. Its kind of clunky at times too. I don’t think I am never gonna use it again just for the sake of authentication. So, make sure it really adds some value to your application and users, before you decide to integrate it in your app coz you thought its cool.

Gant is Awesome

August 19, 2009

If you have never heard of Gant, its a tool that lets you write Ant scripts in Groovy. Rather than being a replacement to Groovy, it actually empowers your Ant scripts by letting you use Ant and Groovy easily and seamlessly.

I have been playing with Memcached for a while now. So once in a while, I had to flush the cache from all Memcached servers in development. Since I am using a cluster of servers, its tedious to telnet to each of them and issue the flush_all command.

$ telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.

If I wanted to view the stats, I had to telnet to the server and issue the stats command to each.

The SpyMemcached Java Client that I am using provides those APIs in Java for interacting with the servers.

Since I kept doing it so often, I decided to add a couple of targets in my build script written in Gant.

1. flush-all:

Ant provides you an optional telnet task using the Apache Commons Net library so that you can issue any command over telnet. So to use the library, lets put it into Ant’s path first.

$ cp commons-net-2.0.jar ~/.ant/lib/

If you were to do in Ant, the task would be like:

<telnet port=11211 server="localhost">

I can do the same thing in Gant:

ant.telnet( port:11211, server:"localhost") {

Nothing fancy, but its a lot more readable in Java code than in XML.

2. stats:

We can do a similar thing for stats by issuing the stats command over telnet. But instead, lets use the SpyMemcached library to get stats of each server in the cluster. So lets put the library in Ant’s path first:

$ cp memcached-2.3.1.jar ~/.ant/lib/
MemcachedClient cache=new MemcachedClient(
                 AddrUtil.getAddresses("server6:11211 server7:11211 server8:11211"));

Thats it. I can write this Java code from my Gant script.

Putting together everything and with a little bit of formatting, here’s what my build.gant now looks like:

import net.spy.memcached.MemcachedClient
import net.spy.memcached.BinaryConnectionFactory
import net.spy.memcached.AddrUtil

target ( 'cache-flush' : "Flush cache on all servers") {
     for(int i=6; i < 9; i++){
          ant.telnet( port:11211, server:"server${i}") {
               println "Flushing Cache on server${i}"
     println "Successfully flushed cache on all servers"

target ( 'cache-stats' : "Cache stats on all servers") {
     try {
          MemcachedClient cache=new MemcachedClient(
                AddrUtil.getAddresses("server6:11211 server7:11211 server8:11211"))
          Map allStats = cache.getStats()

          println("CACHE STATS FOR ALL SERVERS")
          System.out.printf("%30s", "")
          allStats.keySet().each {
               System.out.printf("%-30s", it)

          List list = new ArrayList()
          allStats.keySet().each {
               Map stats = allStats.get(it)

          list.get(0).keySet().each { a->
               System.out.printf("%-30s %-30s %-30s %-30sn",
                    a, list.get(0).get(a), list.get(1).get(a), list.get(2).get(a));

     } catch(Exception e) {
          println "Exception caught while getting cache stats ${e}"

Notice how seamlessly I was able to use Ant, Groovy and Java all in one. Super powerful.

Now, by going to the location of my build.gant file, I can just do :

$ gant cache-flush

$ gant cache-stats

Bottomline – Gant is awesome.

Using Memcached with Java

August 10, 2009

Why not JBoss Cache?
By default, if you are looking for a caching solution for your Java based enterprise application, the tendency is to go with Java Caches. I have been using JBoss Cache for a couple of years now. It is a very powerful smart cache, which provides clustering, synchronized replication and transaction support. Meaning, given a cluster of JBoss cache, each instance is aware of the others and will be kept in sync. That way, if one of the instance is down, other instances still be serving your data.

Having been plagued with memory problems over and over again, I finally gave up on JBoss Cache and decided to go with a a simple and dumber solution – Memcached.

Memcached is widely popular esp. in the PHP and Rails community. My main reasons for switching from JBoss Cache to Memcached are:

1. JBoss Cache is replicated, so there is the overhead of syncing the nodes. All the nodes try to keep the same state. Memcached is distributed and each node is dumb about the other nodes. Each piece of data lives in only one of the nodes. And the nodes don’t know about each other. If one node fails, only some hits are missed. While this may seem like a disadvantage, it is actually a blessing if you are willing to give up the complexity for simplicity and ease of maintenance.

2. JBoss cache comes with a pretty complicated configuration. Memcached doen’t require any configuration.

3. JBoss Cache lives in your JVM, and you have to tune the JVM for optimum memory, which isnt always fun as the nature and amount of your data changes . Memcached uses the amount RAM you specify. If the memory becomes full, it will evict older data based on LRU.

In short, the fact that Memcached is so simple and requires almost no maintenance was a big big win for me. However, if your application is such that the sophisticated caches makes sense, you should definitely consider using them.


Memcached server (protocol defined here) is an in memory cache that stores anything from binary to text to primitives associated with a key as a Key-Value pair. Like with any other caches, storing data in memory prevents you from going to the database or fileserver or any backend system everytime a user requests for the data. That saves a lot of load of your backend systems, leading to higher scalability. Since the data is stored in memory, it is generally faster than making an expensive backend call too.

However, Memcached is not a persistent store, and doesn’t guarantee something will be in the cache just because you stored it. So you should never rely on the fact that Memcached is storing your data. Memcached should strictly be used for caching purposes only, and not for reliable storage.

The only limitation with Memcached (that you need to be aware of) is that the key in memcached should be less that 255 chars and each value shouldn’t exceed 1 MB.

1. Install Libevent
Memcached uses the Libevent library for network IO.

$ cd libevent-1.4.11-stable
$ autoconf
$ ./configure --prefix=/usr/local
$ make
$ sudo make install

2. Install Memcached:
Download the latest version of Memcached from who developed Memcached originally for Livejournal.

$ cd memcached-1.4.0
$ autoconf
$ ./configure --prefix=/usr/local
$ make
$ sudo make install

3. Run memcached:
Start memcached as a daemon with 512MB of memory on port 11211(default). Then you can telnet to the server and port and use any of the available commands.

$memcached -d -m 512 -p 1121

$ telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.
get joe
set joe 0 3600 10  (Note: TTL 3600 and 10 bytes)
get joe
VALUE joe 0 10

Spy Memcached (Memcached Java Client):
Basic Usage:

There are a few good java clients for Memcached. I briefly looked at the Whalin’s Memcached Client and Dustin’s SpyMemcached Client, and decided to go with the latter for minor reasons.You can start with the API as shown in the docs:

MemcachedClient c=new MemcachedClient(new InetSocketAddress("", 11211));
c.set("someKey", 3600, someObject);
Object myObject=c.get("someKey");

The MemcachedClient is a single-threaded client to each of the Memcached server in the pool. The set method sets an object in the cache for a given key. If a value already exists for the key, it overwrites the value. It takes a timeToLive value in seconds, which is the expiration date for the object. Even though there are many requests comings, the client handles only one thread at a time, while the rest wait in the queue. The get method retrieves the object based on the unique queue, and the delete method is used to delete the value.

There are other methods available for storage, retrieval and update but you will get by most of the times just with the three methods get, set and delete.


By design, memcached Server doesn’t have any authentication around it. So its your job to secure the memcached server or the port from outside network. Furthermore just to obscure the key, you can prefix your key with some secret code or use the hash of the key as the key.

For example:

String randomCode = "aaaaaaaaaaaaaaaaaaaa";
c.set(randomCode + "someKey", 3600, someObject);
Object myObject=c.get(randomCode + "someKey");

Adding/Removing a cache server:

If you need to upscale and want to add a new memcached server, you just need to add the server ip and port to the pool of existing servers, and the memcached client will take it into account. If you want to downscale and get rid of a server, just remove the server from the pool. There will be cache misses for the data living on the server for a while, but cache will soon recover itself as it will starting caching the data onto other available servers. Same thing will happen if you lose connectivity to one of the servers. If you are worried about flooding the database when you lose a memcached server, you should have the data pre-fetched onto another server. However, the memcached server themselves don’t know anything about each others. Its all the function of the client.

MemcachedClient c =  new MemcachedClient(new BinaryConnectionFactory(),
                        AddrUtil.getAddresses("server1:11211 server2:11211"));

Connection Pooling:

The MemcachedClient establishes TCP connection (Facebook has released a modified version of memcached to use UDP to reduce the number of connections) open to the memcached server.So you might want to know how many connections are being used.

$ netstat -na | grep 11211
tcp4       0      0        ESTABLISHED
tcp4       0      0        ESTABLISHED

There is really no way to explicitly close the TCP connections. However, each get or set is atomic in itself. There is no really harm to opening as many TCP connections as you like as Memcached is designed to work well with large number of open connections.
Update: As Matt said in the comments below, each connection is is asynchronous and non-blocking. So while there is one thread per node, requests are multiplexed. So there is very little benefit to doing explicit pooling with multiple copies of the client connections.

MyCache Singleton:

So with all the changes, here’s what my wrapper around MemcachedClient looks like:

import net.spy.memcached.AddrUtil;
import net.spy.memcached.BinaryConnectionFactory;
import net.spy.memcached.MemcachedClient;

public class MyCache {
	private static final String NAMESPACE= "SACHARYA:5d41402abc4b2a76b9719d91101";
	private static MyCache instance = null;
	private static MemcachedClient[] m = null;
	private MyCache() {
		MemcachedClient c =  new MemcachedClient(
                         new BinaryConnectionFactory(),
                return c;
	public static synchronized MyCache getInstance() {
		System.out.println("Instance: " + instance);
		if(instance == null) {
			System.out.println("Creating a new instance");
			instance = new MyCache();
	     return instance;
	public void set(String key, int ttl, final Object o) {
		getCache().set(NAMESPACE + key, ttl, o);
	public Object get(String key) {
		Object o = getCache().get(NAMESPACE + key);
        if(o == null) {
        	System.out.println("Cache MISS for KEY: " + key);
        } else {
            System.out.println("Cache HIT for KEY: " + key);
        return o;
	public Object delete(String key) {
		return getCache().delete(NAMESPACE + key);	
	public MemcachedClient getCache() {
		MemcachedClient c= null;
		try {
			int i = (int) (Math.random()* 20);
			c = m[i];
		} catch(Exception e) {
		return c;

In the above code:
1. I am using the BinaryConnectionFactory (which is a new feature) that implements the new binary wire protocol which provides more efficient way of parsing the text.

2. MyCache is a singleton, and it sets up 21 connections when it is instantiated.

3. My keys are of the format: SACHARYA:5d41402abc4b2a76b9719d91101:key where SACHARYA is my domain. That way I can use the same memcached server to store data for two different applications. The random staring 5d41402abc4b2a76b9719d911017c592 is just for some security through obscurity which we discussed above. Finally the key would be something like userId or username or a sql query or any string that uniquely identifies the data to be stored.

Sample Use:

Generally you can use caching wherever there is bottleneck. I use it at the Data Access Layer layer for saving myself from making a database or a webservice call. If there is a computation-heavy business logic, I cache the output at the business layer. Or you can cache at the presentation layer. Or you can cache at every layer. It all depends on what you are trying to achieve.

public List<Product> getAllProducts() {
        List<Product> products = (List<Product>) MyCache.getInstance().get("AllProducts");
        if(products != null) {
              return products;
        products = getAllProductsFromDB()
        if(products) {
              MyCache.getInstance().put("AllProducts", 3600, customer);
        return products;

public void updateProduct(String id) {
public void deleteProduct(String id) {

Warming the Cache:

When the application is first started, there is nothing in the cache. So you might want to pre-warm the cache through a job scheduler, just to avoid large no of backend calls at once. I generally like to put this piece put outside of the application itself. It could be a separate app in itself where you prewarm the cache based on the hit-list of keys.

Measuring Cache Effectiveness:

The stats command provides important information about how your cache is performing. Among other parameters, it provides the total get request and how many were hit and missed.

$ telnet localhost 11211
STAT cmd_get 13219
STAT get_hits 12232
STAT get_misses 512

This means of total 13219 cache requests, it came back with results for 12232, resulting in 12232/13210=92.5% of cache hit, which isn’t that bad.

Now once you have a general idea of your cache hit rate, you can improve it even further by logging which particular requests were missed and optimizing them over time.

You can get the memory stats by using command “stats slabs” or you can invalidate items in cache using “flush all”.


You should never rely on your cache only though. If you somehow lost connectivity to your caching server, the application should perform exactly the same. You should use caching only for scalability and/or speed. Implementing the cache itself is pretty simple. The difficult part is which data to cache, how long to cache, when to invalidate the cache, when to update stale data, and how to prevent the database being flooded once the cache is invalidated. This is something that depends on the nature of your data, how fresh you want it and how you update it. You should keep on measuring the stats and gradually improve the effectiveness over time.

Grails, DBCP & Stale Connections

July 13, 2009

Does your application break after every few hours of inactivity even though you have enough database connections in the pool? It is a common problem with database connection pooling (and idle sockets connections in general).

I have been running a few Grails app with PostgreSQL database using Apache Commons DBCP connection pooling. Most of these apps are pretty busy, and are working quite well so far. But I have one critical app that doesn’t get used as much. Recently after a bug report, I was watching through the logs and I realized that this app was complaining about Socket Connection Exception after every hour of idle time. Try it again, and it would work. So why was it rejecting the DB connection the first time? Creepy!

I checked out the other apps I have, and all of them were suffering from the same problem – depending on how idle they were. I couldn’t ignore any longer.

I started off with the basics.

Basic configuration:

I have my datasource defined in the DataSource.groovy file under grails-app/conf. I have enabled connection pooling and I am using the appropriate PostgreSQL-JDBC driver. Grails comes with Apache DBCP connection pooling by default. So it just works.

environments {
     production {
          dataSource {
               pooled = true
               driverClassName = "org.postgresql.Driver"
               dbCreate = "update"
               url = "jdbc:postgresql://"
               username = "myuser"
               password = "mypassword"

This is what my production configuration looks like – running whatever the default behavior that Grails comes with. I printed the datasource object to see what DBCP configurations my app is using by default. { println it }

As the result shows, it is using the default values as defined in Apache DBCP


Using netstat, I started watching if the application has any connections to the database open:

$ netstat -na | grep 5432
tcp4       0      0     ESTABLISHED

So DBCP says it has no connections open (active or idle) to the database yet in the pool. But netstat shows there is a TCP connection established to the database. Where does this connection come from – loading & initializing the Driver?

Now lets use the application for a while. Once you start asking the pool for connections, existing connections idle in the pool are at your service, or if there no connections in the pool, new connections are created and given to you, which get returned back to the pool after you complete the query. So at any given time, there will be minIdle to maxIdle ( 0 to 8 in our case) connections in the pool.

$ netstat -na | grep 5432
tcp4       0      0     ESTABLISHED
tcp4       0      0     ESTABLISHED

Now with some connections in the pool, I left the app idle for an hour and then I tried to access the app (netstat still shows 2 TCP connection). The first query got the following exception:

org.springframework.dao.DataAccessResourceFailureException: could not execute query;
nested exception is org.hibernate.exception.JDBCConnectionException: could not execute query

Caused by: org.hibernate.exception.JDBCConnectionException: could not execute query
at org.hibernate.exception.SQLStateConverter.convert(

Caused by: org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(

Caused by: org.hibernate.exception.JDBCConnectionException: could not execute query
at org.hibernate.exception.SQLStateConverter.convert(

Caused by: org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(

Caused by: Broken pipe
at Method)

The subsequent query succeeds. There was one connection (ignore the other TCP connection used during Driver initialization) in the pool, but the database responded by saying that the connection was not valid.


At this point it was clear that the TCP connection that was sitting idle was already broken, but our app still assumed it to be open. By idle connections, I mean connections in the pool that aren’t in active use at the moment by the application. After some search, I came to the conclusion that the network firewall between my app and the database is dropping the idle/stale connections after 1 hour. It seemed to be a common problem that many people have faced.

By default, DBCP holds the pooled connections open for infinite time. But a database connection is essentially a socket connection, and it doesn’t come for free. The host OS, database host, and firewall have to allocate a certain amount of memory and other resources for each socket connection. It makes sense to those devices not to hold onto idle connections for ever. So the idea is to make sure that you don’t have stale connections in your pool that would otherwise be silently dropped by OS or firewall. The system has no way of knowing if the connection is broken unless is sends a packet and waits for an acknowledgement. So even when the connection is timed out or closed by one side, the other side may still think the connection is open.

While there may not be a firewall between your server and database, even the OS has a timeout on TCP connections. You could probably increase the TCP keepalive of the OS itself, but that will affect the whole system, and yet you are only postponing the problem.

Now lets try to modify some of the DBCP settings for the dataSource.

1. Validating Connections: DBCP allows you do define a validation query and do a sanity check of the connection before you actually use it in your application.
By default,


The validation query must return at least one row, and using the query you can have DBCP test the connection for you while its idle, before you borrow and before you return it.
So lets change it to:

validationQuery="SELECT 1"

If any of the connection object fails to validate, it will be dropped from the pool. There might be some performance implications of running these three SQLs (which I am not worried at the momet), and hence you might just want to try testOnBorrow.

2. Evicting Idle Connections: DBCP can run an idle object evictor at a regular interval and evict any connections that are older than some threshold. By default this behavior is turned off since timeBetweenEvictionRunsMillis is set to -1.

minEvictableIdleTimeMillis=1000 * 60 * 30

Now lets run the evictor every 30 minutes and evict any connections older than 30 minutes.

timeBetweenEvictionRunsMillis=1000 * 60 * 30
minEvictableIdleTimeMillis=1000 * 60 * 30

It turns out that you cannot change the DBCP settings from the DataSource.groovy file. The datasource object injected in the DataSource.groovy file is an instance of the javax.sql.DataSource. I can however do so by overriding the default DataSource from the BootStrap.groovy, which sets up the settings during start up.

import org.codehaus.groovy.grails.commons.ApplicationAttributes
class BootStrap {

     def init = { servletContext ->

          def ctx=servletContext.getAttribute(
          def dataSource = ctx.dataSource

          dataSource.setMinEvictableIdleTimeMillis(1000 * 60 * 30)
          dataSource.setTimeBetweenEvictionRunsMillis(1000 * 60 * 30)

          dataSource.setValidationQuery("SELECT 1")

 { println it }

You can do the same from grails-app/conf/spring/Resource.groovy:

import org.apache.commons.dbcp.BasicDataSource
beans = {
     dataSource(BasicDataSource) {       

          validationQuery="SELECT 1"

This seems to have solved the problem for me. Since my firewall was dropping the socket connections at 60 minutes, all I did was proactively run the idle object evictor every half 30 minutes, flush connections that are idle for more than 30 minutes and regenerate new connections in the pool. I also did sanity check over the connections in the pool.

Writing Ejabberd Modules

June 10, 2009

Ejabberd is an open-source XMPP server written in Erlang. Although XMPP has been known for building instant messaging applications for a long time now, over the last few years, people have used it for building very interesting realtime applications. In fact both XMPP and Erlang are being boosted by a new found enthusiasm in the last couple of years. Having played with Erlang for a couple of years now, Ejabberd is an obvious choice for me as an XMPP server. Here I am just documenting things for myself as I learn to write simple modules – which is a powerful way of extending and plugging into the basic Ejabberd server. 

I assume you already have Erlang installed. So lets install Ejabberd from the source.

$ svn co ejabberd
$ cd ejabberd/src
$ ./configure
$ make
$ sudo make install

Upon installation you will notice the following important directories and files created, whose meanings are quite obvious from their names:

      -> ejabberd.cfg
      -> ejabberdctl.cfg

You can then use the following commands to start and stop the server.

$sudo /sbin/ejabberdctl start
$sudo /sbin/ejabberdctl status
The node ejabberd@localhost is started with status: started
ejabberd 2.1.0-alpha is running in that node
$sudo /sbin/ejabberdctl stop

Writing an Inernal Module:
All internal modules in Ejabberd start with the name ‘mod_’ implement the gen_mod behavior through the two methods:

start(Host, Opts) -> ok
stop(Host) -> ok

where Host is the name of the virtual host running the module, and Opts is the set of options.
So lets just write a basic module that will print something when it starts.

$ cd /ERLANG_LIB/ejabberd/src
$ vi mod_hello.erl


start(_Host, _Opt) ->
        ?INFO_MSG("Loading module 'mod_hello' ", []).

stop(_Host) ->

Compile the module, move the beam file to /lib/ejabberd/ebin:

$ erlc mod_hello.erl
$ sudo mv  mod_hello.beam /lib/ejabberd/ebin

Now we need to configure our Ejabberd Configuration File to load the mod_hello module. So lets go to the sections where modules are enabled in the ejabberd.cfg, and add the following line to the file.

$ sudo vi /etc/ejabberd/ejabberd.cfg
  {mod_first_module, []},

Start the server and verify that the message is printed in the log.

$ sudo /sbin/ejabberdctl start
$ less /var/log/ejabberd.log


Writing an HTTP module:
Building an HTTP module is similar to building internal modules but the nice thing about HTTP modules is you can get requests from URLs, process them in the module and then send the response back. So if you need any extra interactivity or information exposed via the URL, HTTP module is the way do it.
Each HTTP module implements the gen_mod behavior. That means, it has start/2 and stop/1 functions. It also has a request handler process/2 function, that handles the actual request.

Now lets write a HTTP module mod_available_user.erl. Given a URL like http://localhost:5280/users/sacharya, lets build an HTTP module that will send a response back telling whether the username ‘sacharya’ is already registered or not. So let me grab the sample template from Ejabberd Documentation, and modify it.

$ vi mod_available_user.erl





start(_Host, _Opts) ->

stop(_Host) ->

process(Path, _Request) ->
    {xmlelement, "html", [{"xmlns", ""}],
     [{xmlelement, "head", [],
       [{xmlelement, "title", [], []}]},
      {xmlelement, "body", [],
       [{xmlelement, "p", [], [{xmlcdata, is_user_exists(Path)}]}]}]}.

is_user_exists(User) ->
        Result = ejabberd_auth:is_user_exists(User, "localhost"),
        case Result of
                true -> "The username " ++ User ++ " is already taken.";
                false ->"The username " ++ User ++ " is available."

How the process works is pretty clear if you have some familiarity with Erlang. All we are doing is using the function is_user_exists/1 provided in the the ejabberd_auth module and deciding what message to display in the response.

Compile the file and move the beam to ebin.

$ erlc -I /lib/ejabberd/include/ mod_available_user.erl
$ sudo mv mod_available_user.beam /lib/ejabberd/ebin

The flags ‘I’ (Include) in the erlc command is just used to reference to the directory where the included .hrl files are located.
Now lets add the module into the configuration file so that the request handler will dispatch any requests to /users to our module.

$ vi /etc/ejabber/ejabberd.cfg
{5280, ejabberd_http, [
                        {request_handlers, [{["users"], mod_available_user}]}

Create some Users and Test:
To test the above module, we need some users. So lets create an admin user in the domain, login as admin and create a couple of users.

$ sudo ejabberdctl start
$ sudo ejabberdctl register admin localhost password
User admin@localhost succesfully registered

Now go to /lib/erlanged/ejabberd.cfg and add this user:

{acl, admin, {user, "admin", "localhost"}}

Now bring localhost:5280/admin in the browser and login using, username: admin@localhost and password: password

Once you are logged in, go to http://localhost:5280/admin/server/localhost/users/, and add a few users eg. sudarshan@localhost, acharya@localhost.

Now that we have the users created and the HTTP module to check the users ready, lets test some URLs:

Request: http://localhost:5280/users/sudarshan
Response: The username sudarshan is already taken.

Request: http://localhost:5280/users/joeuser
HTTP Response: The username sacharya is available.

So in a very easy way, you can build applications that interact with your core Ejabberd services through the URL.