There are a lot of valid usecases when you need to protect your identity while communicating over the public internet. It is 2013 and so you probably already know about Tor. Most people use Tor through the browser. The cool thing is that you can get access to the Tor network programmatically so you can build interesting tools with privacy built into it.
The most common usecase to be able to hide the identity using TOR or being able to change identities programmatically is when you are crawling a website like Google (well, this one is harder than you think) and you don’t want to be rate-limited or forbidden.
This did take a fair amount hit and trial to get it working though.
Tor
First of all, lets install Tor.
apt-get update apt-get install tor /etc/init.d/tor restart
You will notice that socks listener is on port 9050.
Lets enable the ControlPort listener for Tor to listen on port 9051. This is the port Tor will listen to for any communication from applications talking to Tor controller. The Hashed password is to enable authentication to the port to prevent any random access to the port.
You can create a hashed password out of your password using:
tor --hash-password mypassword
So, update the torrc with the port and the hashed password.
/etc/tor/torrc
ControlPort 9051 HashedControlPassword 16:872860B76453A77D60CA2BB8C1A7042072093276A3D701AD684053EC4C
Restart Tor again to the configuration changes are applied.
/etc/init.d/tor restart
PyTorCtl
Next, we will install pytorctl which is a python based module to interact with the Tor Controller. This lets us send and receive commands from the Tor Control port programmatically.
apt-get install git apt-get install python-dev python-pip git clone git://github.com/aaronsw/pytorctl.git pip install pytorctl/
Privoxy
Tor itself is not a http proxy. So in order to get access to the Tor Network, we will use the Privoxy as an http-proxy though socks5..
Install Privoxy.
apt-get install privoxy
Now lets tell privoxy to use TOR. This will tell Privoxy to route all traffic through the SOCKS servers at localhost port 9050.
Go to /etc/privoxy/config and enable forward-socks5:
forward-socks5 / localhost:9050 .
Restart Privoxy after making the change to the configuration file.
/etc/init.d/privoxy restart
Script:
In the script below, we’re using urllib2 to use the proxy. Privoxy listens on port 8118 by default, and forwards the traffic to port 9050 which the Tor socks is listening on.
Additionally, in the renew_connection() function, I am also sending signal to Tor controller to change the identity, so you get new identities without restarting Tor. You don’t have to change the ip, but sometimes it comes in handy with you are crawling and don’t wanted to be blocked based on ip.
ip_renew.py
from TorCtl import TorCtl
import urllib2
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
headers={'User-Agent':user_agent}
def request(url):
def _set_urlproxy():
proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
_set_urlproxy()
request=urllib2.Request(url, None, headers)
return urllib2.urlopen(request).read()
def renew_connection():
conn = TorCtl.connect(controlAddr="127.0.0.1", controlPort=9051, passphrase="your_password")
conn.send_signal("NEWNYM")
conn.close()
for i in range(0, 10):
renew_connection()
print request("http://icanhazip.com/")
Running the script:
python ip_renew.py
Now, watch your ip change every few seconds.
Use it, but don’t abuse it.
Hi
Thanks for this tutorial. How you are getting the hashed password..?
I try to use this
tor –hash-password your_password
But every time it returns different hash password. How to do that.?
Thanks
Arul
I have automated the process for both ubuntu and centos.
https://github.com/arulrajnet/operationalscripts/blob/master/tools/tor_installer.py
Hi
Thanks for this tutorial. I have tried everything as you mentioned in the above tutorial and my program also running but 10 times i am getting the same ip address only.
Please, solve my problem that will be helpful.
Thanks
pavan
I ran into this issue also and found a fix. It takes time to change identity through tor, so I edited the ip_renew.py file to handle the issue.
1) Create two new variables underneath the “headers” variable:
oldIP = “0.0.0.0″
newIP = “0.0.0.0″
2) Change the for loop to:
for i in range(0, 10): if oldIP == "0.0.0.0": renew_connection() oldIP = request("http://icanhazip.com/") else: oldIP = request("http://icanhazip.com/") renew_connection() newIP = request("http://icanhazip.com/") while oldIP == newIP: newIP = request("http://icanhazip.com/") print request("http://icanhazip.com/")Hi,
J’ve an “invalid syntax” for print request
Could you help me ?
Ballman.
On line 17 what shoul I fill out on the password (passphrase=”your_password”)?
헐퀴…별게 다 있구만 ㄷㄷㄷㄷㄷ
[…] is a socks proxy. Connecting to it directly with the example you cite fails with “urlopen error Tunnel connection failed: 501 Tor is not an HTTP Proxy”. As […]
Hi all,
I am new to this kind of things.I want to start an automhated ,,traing” system with some associates but my local provider is blocking me.If there is somewone interested to research this ideea and join the project and think please drop me an email at :
brigadafulger@gmail.com
Tnks
[…] http://sacharya.com/crawling-anonymously-with-tor-in-python/ […]
Hello, greetings to all … Initially thank for their work and for sharing.
Home in the field of programming and running the code gives me the following error:
Failed to read authentication cookie (permission denied): /var/run/tor/control.authcookie
Traceback (most recent call last):
File “ip_renew.py”, line 22, in
renew_connection()
File “ip_renew.py”, line 18, in renew_connection
conn.send_signal(“NEWNYM”)
AttributeError: ‘NoneType’ object has no attribute ‘send_signal’
256
I wonder … if additional configuration is required TOR I’m running bad or something.
Thank you for your understanding and support on this issue.
Your TOR may be 0.2.5.1 version, you need to udpate to 0.2.7.1
here the currect procedure
https://www.torproject.org/docs/debian.html.en
You need to remove the hashtags from the relevant lines in the torrc file.
Specifically, the lines that read
## The port on which Tor will listen for local connections from Tor
## controller applications, as documented in control-spec.txt.
#ControlPort 9051
## If you enable the controlport, be sure to enable one of these
## authentication methods, to prevent attackers from accessing it.
#HashedControlPassword ……
should read
## The port on which Tor will listen for local connections from Tor
## controller applications, as documented in control-spec.txt.
ControlPort 9051
## If you enable the controlport, be sure to enable one of these
## authentication methods, to prevent attackers from accessing it.
HashedControlPassword …….
[…] I’m following the blog post here. […]
Thanks for the the simple and effective tutorial.
Works like a breeze with 0 errors on:
Ubuntu Server 15.10
python 2.7.10
Hi,
I m getting an error
Connection refused. Is the ControlPort enabled?
Traceback (most recent call last):
File “ip_renew.py”, line 22, in
renew_connection()
File “ip_renew.py”, line 18, in renew_connection
conn.send_signal(“NEWNYM”)
AttributeError: ‘NoneType’ object has no attribute ‘send_signal’
I try different port but no use
Hey guys,
I was wandering if it is possible to trace you back if using this kind of crawling.
Someone have enough experience to answer this question?
Any help will be warmly received
hi! this is awesome, but why did you use TorCtl instead of Stem? TorCtl’s been deprecated for some time, and occasionally spits errors. I think Stem may be a cleaner option, in this case.
thanks again for the tutorial!!