How to: Using Python and Google to find hundreds of e-mail adresses

by Jean-Baptiste Jung. 18 Comments -

Who never received lots of unwanted messages on their e-mail? Certainly, few of us. You probably know it, you should never leave your email address in a web page. To understand why, I propose you to study this small Python script, which will scan the Google search for e-mail addresses. You may be surprised by how much results it will get.

Please note that this article is only a proof of concept and it may only be used for studying and learn how easily spammers can retrieves e-mail adresses by using Google and a simple Python script.
For those to whom the idea would cross the mind, remember that Christopher William Smith, often reffered as the king of spam, has recently been sentenced to 30 years of jail…

Well, enough discussion, let’s see what we are interested to know the code itself:

#!/usr/bin/python

import sys
import re
import string
import httplib
import urllib2
import re

def StripTags(text):
    finished = 0
    while not finished:
        finished = 1
        start = text.find("<")
        if start >= 0:
            stop = text[start:].find(">")
            if stop >= 0:
                text = text[:start] + text[start+stop+1:]
                finished = 0
    return text

if len(sys.argv) != 2:
        print "\nrsx.py : Find hundreds of e-mail adresses on Google.\n"
        print "\nUsage : ./rsx.py \n"
        print "\nexemple: ./rsx.py gmail.com \n"
        sys.exit(1)

domain_name=sys.argv[1]
d={}
page_counter = 0
try:
    while page_counter <400:
        results = 'http://groups.google.com/groups?q='+str(domain_name)+'&hl=en&lr=&ie=UTF-8&start=' + repr(page_counter) + '&sa=N'
        request = urllib2.Request(results)
        request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
        opener = urllib2.build_opener()
        text = opener.open(request).read()
        emails = (re.findall('([\w\.\-]+@'+domain_name+')',StripTags(text)))
        for email in emails:
            d[email]=1
            uniq_emails=d.keys()
        page_counter = page_counter +10
except IOError:
    print "No result found!"+""
page_counter_web=0
try:
    print "\n\n+++++++++++++++++++++++++++++++++++++++++++++++++++++"+""
    print "+ Results:"+""
    print "+++++++++++++++++++++++++++++++++++++++++++++++++++++\n\n"+""

    while page_counter_web >400 :
        results_web = 'http://www.google.com/search?q=%40'+str(domain_name)+'&hl=en&lr=&ie=UTF-8&start=' + repr(page_counter_web) + '&sa=N'
        request_web = urllib2.Request(results_web)
        request_web.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
        opener_web = urllib2.build_opener()
        text = opener_web.open(request_web).read()
        emails_web = (re.findall('([\w\.\-]+@'+domain_name+')',StripTags(text)))
        for email_web in emails_web:
            d[email_web]=1
            uniq_emails_web=d.keys()
        page_counter_web = page_counter_web +10

except IOError:
    print "No results found!"+""
for uniq_emails_web in d.keys():
    print uniq_emails_web+""

This command line program should be launched this way (assuming you named the file rsx.py):

python rsx.py gmail.com

It’s impressing to see how many e-mail adresses you can find with only 67 lines of code!

Comments (18) - Leave yours

  1. Web 2.0 Announcer said:

    How to: Using Python and Google to find hundreds of e-mail adresses…

    [...]Who never received lots of unwanted messages on their e-mail? Certainly, few of us. You probably know it, you should never leave your email address in a web page. To understand why, I propose you to study this small Python script, which will scan …

  2. find addresses said:

    [...] page. To understand why, I propose you to study this small Python script, which will scan the Googhttp://www.catswhocode.com/blog/web-development/ind-hundreds-of-e…MapQuest Maps – Driving Directions – MapMaps Maps – Enter as much as you know. find a Business [...]

  3. e mail address search said:

    [...] page. To understand why, I propose you to study this small Python script, which will scan the Googhttp://www.catswhocode.com/blog/web-development/how-to-using-python-and-google-to-find-hundreds-of-e…Google Chrome Tips and Pointers Chrome is Google’s newly released browser. It’s currently available [...]

  4. Stupid Moron said:

    Don’t you have something I can just type into Google?
    Google doesn’t seem to take the “@”-sign. So I cannot search for e-mail-addresses! (Is this for spam-prevention?)

    Why I want this:
    I found a site on the net where people can publish texts anonymously, But some type their e-mail-addresses in the text so that you could answer them. Now what I wanted to do is to search the site (using Google’s “site:”-operator) for texts where this is the case (and therefore searching for any e-mail addresses).

    Is there a way or would that already be illegal?
    Sorry for being stupid. You seem to be really clever and know about things!

  5. Jamie said:

    This sounds interesting. And only with 67 lines! Quick question: will this script be applicable even to older versions of Python?

    @KINGSPAMMER: Would you mind sharing how you did it with only 26 lines?

Leave a Reply

Your email address will not be published. Required fields are marked *

Please respect the following rules: No advertising, no spam, no keyword in name field. Thank you!