Tag: Grails

Grails, DBCP & Stale Connections

July 13, 2009

Does your application break after every few hours of inactivity even though you have enough database connections in the pool? It is a common problem with database connection pooling (and idle sockets connections in general).

I have been running a few Grails app with PostgreSQL database using Apache Commons DBCP connection pooling. Most of these apps are pretty busy, and are working quite well so far. But I have one critical app that doesn’t get used as much. Recently after a bug report, I was watching through the logs and I realized that this app was complaining about Socket Connection Exception after every hour of idle time. Try it again, and it would work. So why was it rejecting the DB connection the first time? Creepy!

I checked out the other apps I have, and all of them were suffering from the same problem – depending on how idle they were. I couldn’t ignore any longer.

I started off with the basics.

Basic configuration:

I have my datasource defined in the DataSource.groovy file under grails-app/conf. I have enabled connection pooling and I am using the appropriate PostgreSQL-JDBC driver. Grails comes with Apache DBCP connection pooling by default. So it just works.

environments {
     production {
          dataSource {
               pooled = true
               driverClassName = "org.postgresql.Driver"
               dbCreate = "update"
               url = "jdbc:postgresql://sacharya.com:5432/sacharya_prod_db"
               username = "myuser"
               password = "mypassword"
               logSql=true
          }
     }
}

This is what my production configuration looks like – running whatever the default behavior that Grails comes with. I printed the datasource object to see what DBCP configurations my app is using by default.

dataSource.properties.each { println it }
minIdle=0
initialSize=0
maxIdle=8
numIdle=0
maxActive=8
numActive=0

As the result shows, it is using the default values as defined in Apache DBCP

Problem:

Using netstat, I started watching if the application has any connections to the database open:

$ netstat -na | grep 5432
tcp4       0      0  24.25.26.27.50095     98.129.169.246.5432     ESTABLISHED

So DBCP says it has no connections open (active or idle) to the database yet in the pool. But netstat shows there is a TCP connection established to the database. Where does this connection come from – loading & initializing the Driver?

Now lets use the application for a while. Once you start asking the pool for connections, existing connections idle in the pool are at your service, or if there no connections in the pool, new connections are created and given to you, which get returned back to the pool after you complete the query. So at any given time, there will be minIdle to maxIdle ( 0 to 8 in our case) connections in the pool.

$ netstat -na | grep 5432
tcp4       0      0  24.25.26.27.51308     98.129.169.246.5432     ESTABLISHED
tcp4       0      0  24.25.26.27.50095     98.129.169.246.5432     ESTABLISHED

Now with some connections in the pool, I left the app idle for an hour and then I tried to access the app (netstat still shows 2 TCP connection). The first query got the following exception:

org.springframework.dao.DataAccessResourceFailureException: could not execute query;
nested exception is org.hibernate.exception.JDBCConnectionException: could not execute query

Caused by: org.hibernate.exception.JDBCConnectionException: could not execute query
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:97)

Caused by: org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)

Caused by: org.hibernate.exception.JDBCConnectionException: could not execute query
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:97)

Caused by: org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)

Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)

The subsequent query succeeds. There was one connection (ignore the other TCP connection used during Driver initialization) in the pool, but the database responded by saying that the connection was not valid.

Reasoning:

At this point it was clear that the TCP connection that was sitting idle was already broken, but our app still assumed it to be open. By idle connections, I mean connections in the pool that aren’t in active use at the moment by the application. After some search, I came to the conclusion that the network firewall between my app and the database is dropping the idle/stale connections after 1 hour. It seemed to be a common problem that many people have faced.

By default, DBCP holds the pooled connections open for infinite time. But a database connection is essentially a socket connection, and it doesn’t come for free. The host OS, database host, and firewall have to allocate a certain amount of memory and other resources for each socket connection. It makes sense to those devices not to hold onto idle connections for ever. So the idea is to make sure that you don’t have stale connections in your pool that would otherwise be silently dropped by OS or firewall. The system has no way of knowing if the connection is broken unless is sends a packet and waits for an acknowledgement. So even when the connection is timed out or closed by one side, the other side may still think the connection is open.

While there may not be a firewall between your server and database, even the OS has a timeout on TCP connections. You could probably increase the TCP keepalive of the OS itself, but that will affect the whole system, and yet you are only postponing the problem.

Solution:
Now lets try to modify some of the DBCP settings for the dataSource.

1. Validating Connections: DBCP allows you do define a validation query and do a sanity check of the connection before you actually use it in your application.
By default,

validationQuery=null
testOnBorrow=false
testOnReturn=false
testWhileIdle=false

The validation query must return at least one row, and using the query you can have DBCP test the connection for you while its idle, before you borrow and before you return it.
So lets change it to:

validationQuery="SELECT 1"
testOnBorrow=true
testOnReturn=true
testWhileIdle=true

If any of the connection object fails to validate, it will be dropped from the pool. There might be some performance implications of running these three SQLs (which I am not worried at the momet), and hence you might just want to try testOnBorrow.

2. Evicting Idle Connections: DBCP can run an idle object evictor at a regular interval and evict any connections that are older than some threshold. By default this behavior is turned off since timeBetweenEvictionRunsMillis is set to -1.

timeBetweenEvictionRunsMillis=-1
numTestsPerEvictionRun=3
minEvictableIdleTimeMillis=1000 * 60 * 30

Now lets run the evictor every 30 minutes and evict any connections older than 30 minutes.

timeBetweenEvictionRunsMillis=1000 * 60 * 30
numTestsPerEvictionRun=3
minEvictableIdleTimeMillis=1000 * 60 * 30

It turns out that you cannot change the DBCP settings from the DataSource.groovy file. The datasource object injected in the DataSource.groovy file is an instance of the javax.sql.DataSource. I can however do so by overriding the default DataSource from the BootStrap.groovy, which sets up the settings during start up.

import org.codehaus.groovy.grails.commons.ApplicationAttributes
class BootStrap {

     def init = { servletContext ->

          def ctx=servletContext.getAttribute(
                       ApplicationAttributes.APPLICATION_CONTEXT)
          def dataSource = ctx.dataSource

          dataSource.setMinEvictableIdleTimeMillis(1000 * 60 * 30)
          dataSource.setTimeBetweenEvictionRunsMillis(1000 * 60 * 30)
          dataSource.setNumTestsPerEvictionRun(3)

          dataSource.setTestOnBorrow(true)
          dataSource.setTestWhileIdle(false)
          dataSource.setTestOnReturn(false)
          dataSource.setValidationQuery("SELECT 1")

          dataSource.properties.each { println it }
     }
}

You can do the same from grails-app/conf/spring/Resource.groovy:

import org.apache.commons.dbcp.BasicDataSource
beans = {
     dataSource(BasicDataSource) {       
          minEvictableIdleTimeMillis=1800000
          timeBetweenEvictionRunsMillis=1800000
          numTestsPerEvictionRun=3

          testOnBorrow=true
          testWhileIdle=true
          testOnReturn=true
          validationQuery="SELECT 1"
     }
}

This seems to have solved the problem for me. Since my firewall was dropping the socket connections at 60 minutes, all I did was proactively run the idle object evictor every half 30 minutes, flush connections that are idle for more than 30 minutes and regenerate new connections in the pool. I also did sanity check over the connections in the pool.