Who never received lots of unwanted messages on their e-mail? Certainly, few of us. You probably know it, you should never leave your email address in a web page. To understand why, I propose you to study this small Python script, which will scan the Google search for e-mail addresses. You may be surprised by how much results it will get.

Please note that this article is only a proof of concept and it may only be used for studying and learn how easily spammers can retrieves e-mail adresses by using Google and a simple Python script.
For those to whom the idea would cross the mind, remember that Christopher William Smith, often reffered as the king of spam, has recently been sentenced to 30 years of jail…

Well, enough discussion, let’s see what we are interested to know the code itself:

#!/usr/bin/python

import sys
import re
import string
import httplib
import urllib2
import re

def StripTags(text):
    finished = 0
    while not finished:
        finished = 1
        start = text.find("<")
        if start >= 0:
            stop = text[start:].find(">")
            if stop >= 0:
                text = text[:start] + text[start+stop+1:]
                finished = 0
    return text

if len(sys.argv) != 2:
        print "\nrsx.py : Find hundreds of e-mail adresses on Google.\n"
        print "\nUsage : ./rsx.py
\n"
        print "\nexemple: ./rsx.py gmail.com \n"
        sys.exit(1)

domain_name=sys.argv[1]
d={}
page_counter = 0
try:
    while page_counter <400:
        results = 'http://groups.google.com/groups?q='+str(domain_name)+'&hl=en&lr=&ie=UTF-8&start=' + repr(page_counter) + '&sa=N'
        request = urllib2.Request(results)
        request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
        opener = urllib2.build_opener()
        text = opener.open(request).read()
        emails = (re.findall('([\w\.\-]+@'+domain_name+')',StripTags(text)))
        for email in emails:
            d[email]=1
            uniq_emails=d.keys()
        page_counter = page_counter +10
except IOError:
    print "No result found!"+""
page_counter_web=0
try:
    print "\n\n+++++++++++++++++++++++++++++++++++++++++++++++++++++"+""
    print "+ Results:"+""
    print "+++++++++++++++++++++++++++++++++++++++++++++++++++++\n\n"+""

    while page_counter_web >400 :
        results_web = 'http://www.google.com/search?q=%40'+str(domain_name)+'&hl=en&lr=&ie=UTF-8&start=' + repr(page_counter_web) + '&sa=N'
        request_web = urllib2.Request(results_web)
        request_web.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
        opener_web = urllib2.build_opener()
        text = opener_web.open(request_web).read()
        emails_web = (re.findall('([\w\.\-]+@'+domain_name+')',StripTags(text)))
        for email_web in emails_web:
            d[email_web]=1
            uniq_emails_web=d.keys()
        page_counter_web = page_counter_web +10

except IOError:
    print "No results found!"+""
for uniq_emails_web in d.keys():
    print uniq_emails_web+""

This command line program should be launched this way (assuming you named the file rsx.py):

python rsx.py gmail.com

It’s impressing to see how many e-mail adresses you can find with only 67 lines of code!

Related Posts

No related posts.
 

10 Comments

  1. KINGSPAMMER
    Posted June 17, 2008 at 3:06 am | Permalink

    I do this with awk in 26 lines…

  2. Posted June 17, 2008 at 7:18 am | Permalink

    @KINGSPAMMER: Interresting, any exemple to share with us?

  3. Ernesto Sanchez
    Posted June 17, 2008 at 12:37 pm | Permalink

    Impressing, but I hope this will not be used for spam.

  4. internet
    Posted June 19, 2008 at 11:56 am | Permalink

    not working

    File “rsx.py”, line 16
    stop = text[start:].find(.>.)

  5. Posted June 20, 2008 at 2:43 am | Permalink

    @internet: Which python version do you use?

  6. Posted July 5, 2008 at 10:58 pm | Permalink

    let see if it works ;)

  7. Stupid Moron
    Posted October 8, 2008 at 5:47 am | Permalink

    Don’t you have something I can just type into Google?
    Google doesn’t seem to take the “@”-sign. So I cannot search for e-mail-addresses! (Is this for spam-prevention?)

    Why I want this:
    I found a site on the net where people can publish texts anonymously, But some type their e-mail-addresses in the text so that you could answer them. Now what I wanted to do is to search the site (using Google’s “site:”-operator) for texts where this is the case (and therefore searching for any e-mail addresses).

    Is there a way or would that already be illegal?
    Sorry for being stupid. You seem to be really clever and know about things!

  8. Posted January 3, 2010 at 10:19 pm | Permalink

    Pretty amazing how such a simply code can do so much damage! jk jk

  9. Posted January 31, 2010 at 10:03 am | Permalink

    You should look at installing commentluv or disqus on your blog. This will make it dofollow and hence you will gain more traffic.

  10. Posted May 18, 2010 at 2:52 pm | Permalink

    This sounds interesting. And only with 67 lines! Quick question: will this script be applicable even to older versions of Python?

    @KINGSPAMMER: Would you mind sharing how you did it with only 26 lines?

6 Trackbacks

  1. By Python: 50 modules for all needs on June 16, 2008 at 6:23 pm

    [...] other modules we should include in the list? Leave us a comment! Into Python? Be sure to check out this article! Tags: python, python [...]

  2. By Web 2.0 Announcer on June 17, 2008 at 5:14 pm

    How to: Using Python and Google to find hundreds of e-mail adresses…

    [...]Who never received lots of unwanted messages on their e-mail? Certainly, few of us. You probably know it, you should never leave your email address in a web page. To understand why, I propose you to study this small Python script, which will scan …

  3. By find addresses on June 19, 2008 at 9:02 pm

    [...] page. To understand why, I propose you to study this small Python script, which will scan the Googhttp://www.catswhocode.com/blog/web-development/ind-hundreds-of-e…MapQuest Maps – Driving Directions – MapMaps Maps – Enter as much as you know. find a Business [...]

  4. By find searches email on July 15, 2008 at 11:45 am

    [...] [...]

  5. By e mail address search on September 27, 2008 at 3:34 pm

    [...] page. To understand why, I propose you to study this small Python script, which will scan the Googhttp://www.catswhocode.com/blog/web-development/how-to-using-python-and-google-to-find-hundreds-of-e…Google Chrome Tips and Pointers Chrome is Google’s newly released browser. It’s currently available [...]

  6. [...] manera de encontrar emails para hacer spam … Este articulo lo he leido en Cats Who Code , tambien dicen que el spamming es penado con la [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Subscribe without commenting

  • Smashing Network
  • Hosted by VPS.net and Akamai CDN
WordPress Appliance - Powered by TurnKey Linux