Month: January 2009

REST WS is like fastfood!

January 17, 2009

I don’t know why I always had this picture in my mind. When I think of SOAP, I would think of a typical public company and its IT department in a never-ending integration dilemma. When I think of REST, I would mistakenly visualize some recently hacked social networking site or web2.0 application surviving on Google ads. I don’t know where that image came from, and but I know its not fair and not always true.

SOAP wasn’t the easiest or the best, but there wasn’t anything better - up until REST emerged, when suddenly people started typecasting SOAP as bulky and over-engineered for most problems. I went with the crowd. If SOAP WS was the protocol of the Enterprise, REST brought a new set of developers to the game. REST was the protocol of the get-it-done and we-will-see believers. 

While I have been consuming RESTful web services for a couple of years now, I hadn’t actually designed and implemented one yet. But I knew HTTP, so I thought I knew REST. Just represent a resource in the URL and let different actions be invoked on it. If you use REST, you probably are using Rails or Grails or Django or the likes. That makes it even easier. Just make your domain object addressable through the URL, and thats all. We will address the other concerns in future iterations.

But once I started writing one myself, I started digging into books and articles and discussions looking for technical details of the RESTful world. But all I saw was a mostly bewildered crowd and a small creative bunch that has already embraced REST. And all advices and guidelines I could read was purists and fanatics who stamp this is Low REST, this is High REST, this is POX (Plain Old XML), this is unRESTfully REST, this is RESTfully unREST. I don’t understand why RESTful Quotient is so overemphasized. Who the hell do you think anyone cares what Roy Fielding meant by REST in 2000? We just care about RESTful web services over HTTP, with an emphasis on every word except REST. For me RESTful services means a way where I can expose some functionality through the URL. Whether I want to be disciplined about what I call resources is my design, but REST as an architecture style would have to answer me my enterprise concerns.

For me, a webservice - whether it is RPC or SOAP or REST, should have  to answer the same enterprise concerns. The difference is only how they approach it:

1. Simple: Ok, REST is simple. REST is simple if you are talking about the barest minimum it takes to get it up and running. 

2. Secure: There are new Standards like OAuth, and HMAC-based Authentication, that makes REST a little too complex but they solve only a limited set of problems and scenarios. How do you secure your resource? If you are POSTing or PUTing an XML file, how do you secure it? How can you keep talking about REST without talking about authentication mechanism for XML or any data exchange format for that matter? That takes us back to the SOAP envelope and WS Security or something similar, doesn’t it?

Developers work on tight deadlines, since security isn’t a builtin aspect of REST, developers tend to sacrifice it to meet deadlines, just like they used to do with JUnit. That makes a REST security practically a nice to have. The few of the REST services that I have used all dealt with critical information, but one of them didn’t even have a basic authentication, one had Basic and at the most it had Basic over SSL.

3. Transactional: Is a resource the correct unit of transaction? What if you need to deal with transactions across multiple resources? Do you handle that on the serverside or let the client handle it? If I make a convenience API to deal with transactions and expose it as URL, would you blame me for breaching the RESTful contract? How important is being truly RESTful? Does REST even want to address transactions or is PUT and POST on a resource enough?

4. Efficient: How can it be efficient without support for complex transactions? Whether it is REST or not, webservices are to provide the clients an appropiate control to their assets, and it should be reasonably fast. Again, is providing the necessary functionality important or is being RESTful important? How do I CRUD on a collection of resources? Most of the RESTful public APIs that I have used either provide me too much (they are painfully slow) or provide me too less. What I need is information, and it could be across multiple resources.

Is RESTful service anything more than pretty URLs taking and returning XML/JSON? Does it really make sense when I need information and action across resources? Is it still fine if there is no API definition and keeps changing? Is it still fine if it only does the happy path well? How many developers know enough about REST to ensure that the standards that should have been specified and enforced by the architecture are met? Its scary that too many critical web services have already been written in REST just for the sake of ease, without putting too much thoughts on it.

I probably will never have to go back to SOAP again(its not my decision though), but the day REST has answered the important questions, it will look like some sort of SOAP WS stack - back to where we started.

Until then, REST is like fastfood. SOAP WS is a little tedious to cook, but you won’t regret it.

Download Youtube video

January 11, 2009

How to download Youtube videos has been done by a number of bloggers in a number of languages. I just wanted to try it out myself in Erlang, but I had no idea where to start.

Now that I’ve decided to start, let me grab an Eddie Vedder song from Youtube:

http://www.youtube.com/watch?v=gct6BB6ijcw

And lets sniff the HTTP traffic using Wireshark to see what actually happens when we watch a Youtube video. The browser made a conection to Youtube and before it started streaming the video, the input URL got redirected to a different Youtube URL (Depending on whether Youtube is caching the video or not, it could in turn probably be redirected to a different Cache server or a different IP):

No.     Time        Source                Destination           Protocol Info
     96 3.295105    xx.xx.xx.xx        208.65.153.253        HTTP     GET /get_video?video_id=gct6BB6ijcw&t=OEgsToPDskJ6n06uQXzbbyp7xAnxK6pN&el=detailpage&ps= HTTP/1.1

No.     Time        Source                Destination           Protocol Info
    115 3.879454    xx.xx.xx.xx        209.85.239.30         HTTP     GET /get_video?origin=lax-v113.lax.youtube.com&video_id=gct6BB6ijcw&ip=xx.xx.xx.xx&region=0&signature=587F68CED7B14F380192AAB1D58942F0EAB9AE7B.6379C474E29D2B2348E1A69954D7FACFC461F964&sver=2&expire=1231731436&key=yt4&ipbits=0 HTTP/1.1

Upon playing with the new URL in the browser, I realized that the URL that Youtube gets the video from is

http://youtube.com/get_video?video_id=gct6BB6ijcw&t=OEgsToPDskJ-EKxTpxj79WK0fWWs_YjO

The param “video_id” is the same as the param “v” in the original URL. So we only need to find the value for the param “t”.
Lets look at the HTML source for

http://www.youtube.com/watch?v=gct6BB6ijcw

and see if it contains the value for the parameter “t”. Luckily, grepping for “&t=”, I found this in the source:

&t=OEgsToPDskKrsk1Xwku653CJbAXXrdJb

The value of “t” in the HTML source and the value of “t” in the redirect URL is different, but I found that both values of “t” seemed to work when appended to the URL. Oh well, I tried it again just to make sure and realized that the value of the parameter “t” changes for every request, but all values seem to work(Probably it is timestamp dependent!).

So we have a plan now:
1. Get a Youtube video URL and make a HTTP request to it.
2. Get the body of the reponse and find the value of “t” using regex pattern matching.
3. Generate the proper redirect URL using the two parameters “video_id” and “t”.
4. Make a http request to the new URL and stream the bytes and write to a file with “.flv” extension.

Here is the full source code in Erlang: 

-module(video_downloader).
-export([download/1]).

download(URL) ->
    {ok, {_Status, _Header, Body}} = http:request(URL),
    Video_URL = get_video_download_url(URL, Body),
    stream_video(Video_URL).

stream_video(Video_URL) ->
    io:format("Downloading video from ~p~n", [Video_URL]),
    {ok, {_Status, _Header, Body}} = http:request(Video_URL),
    file:write_file("myvideo.flv", Body),
    io:format("Download complete!").

get_video_download_url(URL, Body) ->
    Matcher = "&t=[A-Za-z0-9-_]*",
    {match, Start, Length} = regexp:first_match(Body, Matcher),
    T = string:substr(Body, Start, Length),
    {ok, New, _No} = regexp:sub(URL, "watch\\?v=", "get_video?video_id="),
    New ++ T.

Lets run it from the Erlang shell:

1> c(video_downloader.erl).
{ok,video_downloader}
2> inets:start().
ok
3> video_downloader:download("http://www.youtube.com/watch?v=gct6BB6ijcw").
Downloading video from "http://www.youtube.com/get_video?video_id=gct6BB6ijcw&t\
=OEgsToPDskLUZgy2pfyoRf-AtXCdHhYG"
Download complete!ok

Go to your current directory and use any Flv Viewer to see if the downloaded file is a working video or convert it to format of your choice and watch it offline.

I would love to give an Erlang twist to it by spawning a few concurrent processes to download videos, but this is not quite a good example to do it from by localbox - too much of Disk IO, Network IO and slow Internet connection.