Tag: pytorctl

Crawling anonymously with Tor in Python

March 5, 2014

There are a lot of valid usecases when you need to protect your identity while communicating over the public internet. It is 2013 and so you probably already know about Tor. Most people use Tor through the browser. The cool thing is that you can get access to the Tor network programmatically so you can build interesting tools with privacy built into it.

The most common usecase to be able to hide the identity using TOR or being able to change identities programmatically is when you are crawling a website like Google (well, this one is harder than you think) and you don’t want to be rate-limited or forbidden.

This did take a fair amount hit and trial to get it working though.
First of all, lets install Tor.

apt-get update
apt-get install tor
/etc/init.d/tor restart

You will notice that socks listener is on port 9050.

Lets enable the ControlPort listener for Tor to listen on port 9051. This is the port Tor will listen to for any communication from applications talking to Tor controller. The Hashed password is to enable authentication to the port to prevent any random access to the port.

You can create a hashed password out of your password using:

tor --hash-password mypassword

So, update the torrc with the port and the hashed password.


ControlPort 9051
HashedControlPassword 16:872860B76453A77D60CA2BB8C1A7042072093276A3D701AD684053EC4C

Restart Tor again to the configuration changes are applied.

/etc/init.d/tor restart

Next, we will install pytorctl which is a python based module to interact with the Tor Controller. This lets us send and receive commands from the Tor Control port programmatically.

apt-get install git
apt-get install python-dev python-pip
git clone git://github.com/aaronsw/pytorctl.git
pip install pytorctl/

Tor itself is not a http proxy. So in order to get access to the Tor Network, we will use the Privoxy as an http-proxy though socks5..

Install Privoxy.

apt-get install privoxy

Now lets tell privoxy to use TOR. This will tell Privoxy to route all traffic through the SOCKS servers at localhost port 9050.
Go to /etc/privoxy/config and enable forward-socks5:

forward-socks5 / localhost:9050 .

Restart Privoxy after making the change to the configuration file.

/etc/init.d/privoxy restart

In the script below, we’re using urllib2 to use the proxy. Privoxy listens on port 8118 by default, and forwards the traffic to port 9050 which the Tor socks is listening on.
Additionally, in the renew_connection() function, I am also sending signal to Tor controller to change the identity, so you get new identities without restarting Tor. You don’t have to change the ip, but sometimes it comes in handy with you are crawling and don’t wanted to be blocked based on ip.


from TorCtl import TorCtl
import urllib2

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/2009021910 Firefox/3.0.7'

def request(url):
    def _set_urlproxy():
        proxy_support = urllib2.ProxyHandler({"http" : ""})
        opener = urllib2.build_opener(proxy_support)
    request=urllib2.Request(url, None, headers)
    return urllib2.urlopen(request).read()

def renew_connection():
    conn = TorCtl.connect(controlAddr="", controlPort=9051, passphrase="your_password")

for i in range(0, 10):
    print request("http://icanhazip.com/")

Running the script:

python ip_renew.py

Now, watch your ip change every few seconds.

Use it, but don’t abuse it.