How to: Using Python and Google to find hundreds of e-mail adresses
Who never received lots of unwanted messages on their e-mail? Certainly, few of us. You probably know it, you should never leave your email address in a web page. To understand why, I propose you to study this small Python script, which will scan the Google search for e-mail addresses. You may be surprised by how much results it will get.
Please note that this article is only a proof of concept and it may only be used for studying and learn how easily spammers can retrieves e-mail adresses by using Google and a simple Python script.
For those to whom the idea would cross the mind, remember that Christopher William Smith, often reffered as the king of spam, has recently been sentenced to 30 years of jail…
Well, enough discussion, let’s see what we are interested to know the code itself:
#!/usr/bin/python
import sys
import re
import string
import httplib
import urllib2
import re
def StripTags(text):
finished = 0
while not finished:
finished = 1
start = text.find("<")
if start >= 0:
stop = text[start:].find(">")
if stop >= 0:
text = text[:start] + text[start+stop+1:]
finished = 0
return text
if len(sys.argv) != 2:
print "\nrsx.py : Find hundreds of e-mail adresses on Google.\n"
print "\nUsage : ./rsx.py \n"
print "\nexemple: ./rsx.py gmail.com \n"
sys.exit(1)
domain_name=sys.argv[1]
d={}
page_counter = 0
try:
while page_counter <400:
results = 'http://groups.google.com/groups?q='+str(domain_name)+'&hl=en&lr=&ie=UTF-8&start=' + repr(page_counter) + '&sa=N'
request = urllib2.Request(results)
request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
opener = urllib2.build_opener()
text = opener.open(request).read()
emails = (re.findall('([\w\.\-]+@'+domain_name+')',StripTags(text)))
for email in emails:
d[email]=1
uniq_emails=d.keys()
page_counter = page_counter +10
except IOError:
print "No result found!"+""
page_counter_web=0
try:
print "\n\n+++++++++++++++++++++++++++++++++++++++++++++++++++++"+""
print "+ Results:"+""
print "+++++++++++++++++++++++++++++++++++++++++++++++++++++\n\n"+""
while page_counter_web >400 :
results_web = 'http://www.google.com/search?q=%40'+str(domain_name)+'&hl=en&lr=&ie=UTF-8&start=' + repr(page_counter_web) + '&sa=N'
request_web = urllib2.Request(results_web)
request_web.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
opener_web = urllib2.build_opener()
text = opener_web.open(request_web).read()
emails_web = (re.findall('([\w\.\-]+@'+domain_name+')',StripTags(text)))
for email_web in emails_web:
d[email_web]=1
uniq_emails_web=d.keys()
page_counter_web = page_counter_web +10
except IOError:
print "No results found!"+""
for uniq_emails_web in d.keys():
print uniq_emails_web+""
This command line program should be launched this way (assuming you named the file rsx.py):
python rsx.py gmail.com
It’s impressing to see how many e-mail adresses you can find with only 67 lines of code!
[...] other modules we should include in the list? Leave us a comment! Into Python? Be sure to check out this article! Tags: python, python [...]
I do this with awk in 26 lines…
@KINGSPAMMER: Interresting, any exemple to share with us?
Impressing, but I hope this will not be used for spam.
How to: Using Python and Google to find hundreds of e-mail adresses…
[...]Who never received lots of unwanted messages on their e-mail? Certainly, few of us. You probably know it, you should never leave your email address in a web page. To understand why, I propose you to study this small Python script, which will scan …
not working
File “rsx.py”, line 16
stop = text[start:].find(.>.)
[...] page. To understand why, I propose you to study this small Python script, which will scan the Googhttp://www.catswhocode.com/blog/web-development/ind-hundreds-of-e…MapQuest Maps – Driving Directions – MapMaps Maps – Enter as much as you know. find a Business [...]
@internet: Which python version do you use?
let see if it works
[...] [...]
[...] page. To understand why, I propose you to study this small Python script, which will scan the Googhttp://www.catswhocode.com/blog/web-development/how-to-using-python-and-google-to-find-hundreds-of-e…Google Chrome Tips and Pointers Chrome is Google’s newly released browser. It’s currently available [...]
Don’t you have something I can just type into Google?
Google doesn’t seem to take the “@”-sign. So I cannot search for e-mail-addresses! (Is this for spam-prevention?)
Why I want this:
I found a site on the net where people can publish texts anonymously, But some type their e-mail-addresses in the text so that you could answer them. Now what I wanted to do is to search the site (using Google’s “site:”-operator) for texts where this is the case (and therefore searching for any e-mail addresses).
Is there a way or would that already be illegal?
Sorry for being stupid. You seem to be really clever and know about things!
Pretty amazing how such a simply code can do so much damage! jk jk
You should look at installing commentluv or disqus on your blog. This will make it dofollow and hence you will gain more traffic.
[...] manera de encontrar emails para hacer spam … Este articulo lo he leido en Cats Who Code , tambien dicen que el spamming es penado con la [...]
This sounds interesting. And only with 67 lines! Quick question: will this script be applicable even to older versions of Python?
@KINGSPAMMER: Would you mind sharing how you did it with only 26 lines?
Wow, what a great snippet with only 67 lines. Any sample how I can do this in PHP?
LOL, we can’t avoid spammers :/