Writing Ejabberd Modules

June 10, 2009

Ejabberd is an open-source XMPP server written in Erlang. Although XMPP has been known for building instant messaging applications for a long time now, over the last few years, people have used it for building very interesting realtime applications. In fact both XMPP and Erlang are being boosted by a new found enthusiasm in the last couple of years. Having played with Erlang for a couple of years now, Ejabberd is an obvious choice for me as an XMPP server. Here I am just documenting things for myself as I learn to write simple modules - which is a powerful way of extending and plugging into the basic Ejabberd server. 

Installation:
I assume you already have Erlang installed. So lets install Ejabberd from the source.

$ cd ERLANG_LIB
$ svn co http://svn.process-one.net/ejabberd/trunk ejabberd
$ cd ejabberd/src
$ ./configure
$ make
$ sudo make install

Upon installation you will notice the following important directories and files created, whose meanings are quite obvious from their names:

/etc/ejabberd
      -> ejabberd.cfg
      -> ejabberdctl.cfg
/sbin/ejabberdctl
/lib/ejabberd
/var/log/ejabberd/ejabberd.log

You can then use the following commands to start and stop the server.

$sudo /sbin/ejabberdctl start
$sudo /sbin/ejabberdctl status
The node ejabberd@localhost is started with status: started
ejabberd 2.1.0-alpha is running in that node
$sudo /sbin/ejabberdctl stop

 
Writing an Inernal Module:
All internal modules in Ejabberd start with the name ‘mod_’ implement the gen_mod behavior through the two methods:

start(Host, Opts) -> ok
stop(Host) -> ok

where Host is the name of the virtual host running the module, and Opts is the set of options.
So lets just write a basic module that will print something when it starts.

$ cd /ERLANG_LIB/ejabberd/src
$ vi mod_hello.erl
-module(mod_hello).
-behavior(gen_mod).

-export([
    start/2,
    stop/1
    ]).

start(_Host, _Opt) ->
        ?INFO_MSG("Loading module 'mod_hello' ", []).

stop(_Host) ->
        ok.
 

Compile the module, move the beam file to /lib/ejabberd/ebin:

$ erlc mod_hello.erl
$ sudo mv  mod_hello.beam /lib/ejabberd/ebin

Now we need to configure our Ejabberd Configuration File to load the mod_hello module. So lets go to the sections where modules are enabled in the ejabberd.cfg, and add the following line to the file.

$ sudo vi /etc/ejabberd/ejabberd.cfg
{modules,
 [
  .....
  {mod_first_module, []},
  .....
]
}

Start the server and verify that the message is printed in the log.

$ sudo /sbin/ejabberdctl start
$ less /var/log/ejabberd.log

Sweet!

Writing an HTTP module:
Building an HTTP module is similar to building internal modules but the nice thing about HTTP modules is you can get requests from URLs, process them in the module and then send the response back. So if you need any extra interactivity or information exposed via the URL, HTTP module is the way do it.
Each HTTP module implements the gen_mod behavior. That means, it has start/2 and stop/1 functions. It also has a request handler process/2 function, that handles the actual request.

Now lets write a HTTP module mod_available_user.erl. Given a URL like http://localhost:5280/users/sacharya, lets build an HTTP module that will send a response back telling whether the username ’sacharya’ is already registered or not. So let me grab the sample template from Ejabberd Documentation, and modify it.

$ vi mod_available_user.erl

-module(mod_available_user).

-behavior(gen_mod).

-export([
    start/2,
    stop/1,
    process/2
    ]).

-include("ejabberd.hrl").
-include("jlib.hrl").
-include("web/ejabberd_http.hrl").

start(_Host, _Opts) ->
    ok.

stop(_Host) ->
    ok.

process(Path, _Request) ->
    {xmlelement, "html", [{"xmlns", "http://www.w3.org/1999/xhtml"}],
     [{xmlelement, "head", [],
       [{xmlelement, "title", [], []}]},
      {xmlelement, "body", [],
       [{xmlelement, "p", [], [{xmlcdata, is_user_exists(Path)}]}]}]}.

is_user_exists(User) ->
        Result = ejabberd_auth:is_user_exists(User, "localhost"),
        case Result of
                true -> "The username " ++ User ++ " is already taken.";
                false ->"The username " ++ User ++ " is available."
        end.

How the process works is pretty clear if you have some familiarity with Erlang. All we are doing is using the function is_user_exists/1 provided in the the ejabberd_auth module and deciding what message to display in the response.

Compile the file and move the beam to ebin.

$ erlc -I /lib/ejabberd/include/ mod_available_user.erl
$ sudo mv mod_available_user.beam /lib/ejabberd/ebin

The flags ‘I’ (Include) in the erlc command is just used to reference to the directory where the included .hrl files are located.
Now lets add the module into the configuration file so that the request handler will dispatch any requests to /users to our module.

$ vi /etc/ejabber/ejabberd.cfg
{5280, ejabberd_http, [
                         captcha,
                         http_poll,
                         web_admin,
                        {request_handlers, [{["users"], mod_available_user}]}
                        ]}

Create some Users and Test:
To test the above module, we need some users. So lets create an admin user in the domain, login as admin and create a couple of users.

$ sudo ejabberdctl start
$ sudo ejabberdctl register admin localhost password
User admin@localhost succesfully registered

Now go to /lib/erlanged/ejabberd.cfg and add this user:

{acl, admin, {user, "admin", "localhost"}}

Now bring localhost:5280/admin in the browser and login using, username: admin@localhost and password: password

Once you are logged in, go to http://localhost:5280/admin/server/localhost/users/, and add a few users eg. sudarshan@localhost, acharya@localhost.

Now that we have the users created and the HTTP module to check the users ready, lets test some URLs:

Request: http://localhost:5280/users/sudarshan
Response: The username sudarshan is already taken.

Request: http://localhost:5280/users/joeuser
HTTP Response: The username sacharya is available.

So in a very easy way, you can build applications that interact with your core Ejabberd services through the URL.

Optimization - still an Enemy?

May 31, 2009

In 1974, within the context of Structured ProgrammingDonald Knuth said “Premature optimization is the root of all evil”.

All evil huh, Donald?

Eventually, with evolution “Don’t optimize early” or “Don’t optimize unless something is a problem” became one of the mantras of software development. In fact, it is one of the most obediently followed mantras in software industry. Probably the reason they follow it so closely is because the rule says “Don’t do it”. Don’t do anything, and you are following the principle. Don’t do anything (that you already don’t wanna do) and you are getting a pat on your back for doing great. How awesome! How easy! Don’t you wish all other principles in software development (and life in general) were as lenient as this one?

That cute little tip from the Guru decades ago has now been cleverly molded to a free-pass to build things of the barest minimum quality just enough to save your ass at the moment. As long as it seems to work somehow, its fine. In a way, its legal and ethical to do things the wrong (often quicker) way as long as its not a problem today. In the uber-speedy AGILE’s caricatures of today, it means “Just do it and push it into production”. In a slightly different scenario, it could mean “Just do it for now and  realize at the end of the week what you just did was a crap”. Whether you are releasing the insecure and bloated piece of unresponsive crap that nobody is proud of or fixing the crap to make it less crap depends on how shameless and ignorant you are. Unless a customer has found the problem, a problem is not a problem. So release it anyway coz its not a problem yet. Let it be a problem tomorrow.

Businesses, in general, seem to have a phobia for quality. Few companies make a quality product, and worry about scaling it to quantities. The majority get quantities of low-quality products, and then spend their life trying to uplift the curse. Some don’t have any clues, they just exist coz they were established and they have to keep moving.

Early analysis & optimization is a measure of your commitment towards quality. Ironically, when you hear of optimization, you think its is a developer’s problem - you think its that extra for-loop in the developers code. I agree, developers are not perfect. But neither is anyone nor is the whole process.

Of course, in a practically perfect team, the customers would know exactly what they want. And the analysts would know exactly whats feasible and appropriate. The architect would architect the perfect holistic view of the business requirements with the system requirements and the technologies. The designers would design the perfect interfaces and workflows. The developers would write the perfect code. And all this would be accomplished in a perfect environment provided by the manager. Oh well, do we still need testers coz we have already written the perfect implementation for the perfect set of requirements?

The reality is that the customer has no clue about what he wants. The analyst has no idea about whats feasible. The architect is the one who retired from programming and hasn’t learnt any new technologies in the last five years. The developers have no clue about anything other than the few shortcuts in their favorite IDEs. Oh ok. We need testers by now. After all, we all are humans and to err is human. So testers are the people that would come at the end and validate the work.

But it solves only a tiny piece of the puzzle. Quality assurance will verify that the codes and UI behaves as per the requirements, and the verification is based on the few sets of tests it runs. It doesn’t validate the requirements. It doesn’t validate the architecture. It doesn’t validate the code. It doesn’t validate the aesthetics or the usability. It doesn’t validate the infrastructure. If you can have bugs in the developers code, you can have bugs in the requirements too. You can have bugs in the architecture too. You can have bugs in the process and management itself. And the effect of defects in the rest of the process is usually a lot bigger than a that of the code and the coder.

You see the fundamental flaw in the process that I am talking about? Validation is done at the end of the whole cycle, and worse, it validates a small portion of the whole work. The rest goes unvalidated for the most part, and you try to hide the filth under the carpet for ever. But the ghost will appear from under the carpet.

What should happen is, after every phase of the software development life cycle, you should validate the output against your standards and optimize it right then if necessary. Don’t let the flaw bubble up the stack. You should optimize your requirement, you should optimize your design, you should optimize your architecture. Everything! If you are just postponing your problems, you are making them bigger and incurable. What would take 2 days to fix it during design will take a month to fix it after its developed.

Early optimization is a taboo word in software development industry. Its a shameful thing to do coz you are breaching the most popular software principle.

If you are a developer programming for the web, its is even more complicated than it appears. You develop the product on your laptop like a laboratory test. What runs on your box will now be shipped to the Internet to be used by people all over the world with different connections, with different clients and tools with different degrees of understanding. Its such a complicated thing that you do have to set your standards and priorities correct. If you don’t think about performance until its a problem, then you are doomed. If you don’t think about scalability, until its a problem, then you are doomed. Because if you got your basics wrong in design and analysis, the only way to fix and optimize it would be to redo the whole thing, which is usually not an option. You cannot optimize a crappy product and make it great again, all you can do is to make it less crappy.

Optimization isn’t about spending weeks in a bat-cave trying to find a problem in somebody else’s shitty code. Its more about doing things right from the beginning and minimizing the stupid things that could be avoided. It about setting up a realistic standard for yourself, and living upto it everyday. It about having a vision even before you do something, and doing what it takes to make it happen. Optimization is about fixing things that you know will be a problem in relevant future. Its about curing the cancer before you even develop it. Its about being optimistic that you will be in the business one year from now.

How many times did you rush a feature in a week, and then spent the next three weeks fixing the problems in production? How many times you made the wrong design decisions coz you thought Early Optimization is evil? How many times have you worked on the rewrites of a recent project? How many times did you try to achieve things like performance, scalability and security backwards, rather than incorporating it from ground up?

NOT Optimizing early is clearly a much bigger problem in software development than optimizing early. But even then, if you can get away with a bad product, why bother making a good product. Right?

Book: Writing secure code

April 3, 2009

Have there been any inventions in human history which are as insecure as the softwares we use today? Just pull up the news in the last couple of weeks:

1. Google shares Docs without your permission: March 7, 2009 and again on March 26, 2009

2. Facebook reveals private photos on wall posts: March 20, 2009

3. Safari browser cracked in 2 seconds: March 18th, 2009

4. Cached data exposes 20,000 Credit cards: March 20, 2009

If you actually go through SANS list, you will be scared. And that is the world we live in today.

writing_secure_code1

Writing Secure Code

An enormous number of softwares have been written, deployed, and exposed over the Internet in the last 10 years without enough thought on Security, and thus Security is going to be a huge huge thing for the next 10 years. After all, you have to clean up your own shit, right (unless you are a dog) ?

I started this book about a month ago, and I just finished it today. Written by Michael Howard and David LeBlanc from Microsoft, the book mostly talks in reference to C/C++, and the Dot Net framework. But unlike all other books that talked about Cryptography, secure protocols and algorithms, this book actually talks about writing secure code on a daily basis, and develops some principles for building secure software. In that sense, although a little old and a little too big, this book is an awesome read for someone wanting to write secure code. While Absolute Security is a Myth, at least you can make it difficult for attackers to exploit the vulnerabilities.

Every good developer is a hacker himself. So the book goes into details of Buffer overflows, Integer overflows, Cross-site Scripting, Sql Injections, Code Access Security, Using proper Access Control Lists, Cryptographic Techniques and its proper use, Encoding and Internationalization, Canonicalization etc.

The book argues that Security should be part of the design rather than an add-on at the end of coding. You should define your trusted and untrusted boundaries, analyze all the threats involved, and evaluate risks associated with the threats, and define your security goals based on the risk factor. All this should be a fairly short, simple and high level process, but it will tell the developer what you need to pay extra attention to and tell the QA what you need to watch out for. Through code review only, you can reduce your bugs by 80%, and most of the bugs found in code review will hardly ever be found through QA testing.

The most interesting side of the book is how it relates all the security problems to some breach of fundamental security principles. Just for my own reference, there are a quite a few security principles stated in the book that should be built into every developer’s subconsciousness:

1. Minimize the attack surface. (Think of hidden fields in forms)

2. All input is evil, unless proven otherwise. Also, assume all external systems you talk to are insecure. (Think whatever you want.)

3. Use principle of least privilege. Use elevated privilege only when you have to, and use it for the shortest amount of time possible.

4. Use defense in depth. Use OS level ACLs as the last line of defense.

5. Avoid security through obscurity. (Microsoft ?)

6. Security features != Secure features. Also don’t ever write your own encryption algorithms (unless ?).

7. Client-side security is an Oxymoron. Don’t try it (at work or otherwise).

All in all, this book has completely changed the way I used to look at my own code and systems, and has definitely made me a better developer and a thinker.

Now I see loopholes everywhere in my own code. Am I guaranteed to be safe from SQL Injection attacks just because I used parameterized query? Am I safe from Cross-site Scripting attacks just because I encoded the output? What if the attacker doesn’t use a browser? Did I canonicalize the filenames after the input? Although the language I use isn’t vulnerable to buffer overflows, am I safe from Integer overflows? What user is my application running as? What if somebody has already hacked into my system? While its not possible to chase after every theoretical bug within the code, we can at least prevent the ones that are obvious or extremely malicious.

But its businesses that build softwares, right? So here comes the million dollar question:

Is it possible to write highly secure softwares without costing extra money and substantial time for the company?

Answer: In general, at least most of the time, YES! Writing secure software is a habit more than anything else. However secure code is only a small piece of the puzzle, and cannot alone make the system secure if other basics are violated.

Wordpress @ Slicehost

March 26, 2009

So I finally moved this blog from a shared-hosting with Godaddy to Slicehost 256MB VPS slice running Ubuntu Hardy . The whole process of setting up DNS and installing Apache, MySql, Postfix and Wordpress (including my favourite theme and plugins) was very easy, and I didnt run into any problems. I did back up my database with Godaddy before migration, but the ‘Export/Import as XML’ seemed to work just fine. All in all, I was able to get it up and running in about an hour with all the content migrated. When there are documents like Mensk.com and Slicehost Articles, you really don’t have anything left to think.

With that saying, I really wanted to get rid of Wordpress this time, or any other Wordpress wannabes. Wordpress is an awesome piece of software, but it’s just not what I ideally would like to have.

1. Wordpress isn’t really suited for posting long snippets of code. If you want to get it working, you end up spending some time trying to fix those endcoding, line wraps and syntax highlighting issues.

2. Wordpress is just too big for me. I don’t need those fancy features.

3. I don’t need databases to store some handful rants of mine. Ideally, I would like to write a blog in a text file (using some basic markup), and then just FTP it to my sever to a specific directory, and it would just work. The day I don’t want to have a blog anymore, I would just grab that directory from my server and take it with me.

4. Everytime I see a cool plugin or a theme I wanna try, I don’t want to be looking into every single line of code to see if there is anything malicious in there.

5. Every time I hear about any new vulnerability found in Wordpress, I don’t want to be worried about doing an upgrade.

I did briefly go through the major blogging and some wiki softwares but they are all built around the same philosophy and more or less suffer from the same problems. At one point, I almost went with Webby (static site-generator based on Ruby), but then I would have to go through a separate plugin for comments like Disqus, which I didn’t want.

So eventually I had to decide between writing my own basic blogging software or using Wordpress. I chose the latter, coz I think there are things way more important to do in the world than writing your own blogging software in 2009. Well, thats might be just another way saying that I am a loser.

Invoking Private Methods

March 3, 2009

A private modifier in Java means that the member(variable or method) can only be accessed in its own class.

By rule, you should always make a class member private unless you have a reason not to. If you want a method to be visible outside of the class, you should make it public or protected. But let’s say you encounter a case when you need to invoke the private method of another class (You might need it while writing JUnit tests, or while writing debugger tools where you need to access all public and private members.). Can you access a private method of Class B from Class A? Is it possible?

Well, yeah. Use Reflection API in Java. This will allow you to supress default Java language access control checks when using reflected members.

The AccessibleObject class within java.lang.reflect package contains a method setAccessible(boolean flag). A false flag will enforce Java Language access checks, where a true flag will supress the access checks. So by setting flag to true, you will be able to invoke a private method of another class.

Lets say we have a Calculator class which has a private method called add.

package access;

public class Calculator {
	private int add(Integer a, Integer b) {
		return a + b;
	}
}

Now, by using Reflecton, you can get a java.lang.reflect.Method object that represents the specified method. The Method object inherits from the java.lang.reflect.AccessibleObject object which provides the setAccessible(boolean flag) method that you can use to supress the access checks.

package access;

import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;

public class MainApp {

	public static void main(String[] args) {

		Calculator ac = new Calculator();

		try {

			Class<?> c = ac.getClass();
			Class[] params = new Class[] { Integer.class, Integer.class };
			Method m = c.getDeclaredMethod("add", params);

			m.setAccessible(true);
			Object o = m.invoke(ac, 1, 2);

			System.out.println("The sum of the numbers is: "
					+ ((Integer) o).intValue());

		} catch (NoSuchMethodException x) {
			x.printStackTrace();
		} catch (InvocationTargetException x) {
			x.printStackTrace();
		} catch (IllegalAccessException x) {
			x.printStackTrace();
		}

	}

}

Once you set the Accessible flag to true, you can then invoke the method by passing any arguments that it requires. Running the class will print a sum of 3, which is calculated and returned by the private method ‘add’.

If you dont set the flag to true, you will get an IllegalAccessException saying:

Class access.MainApp can not access a member of class access.Calculator with modifiers “private”.

Note: If there is a Security Manager, the context in which the code is run must have the suppressAccessChecks permission.