advertisement

Print
O'Reilly Book Excerpts: Google Hacks, 2nd Edition

Hacking Google

by Rael Dornfest, Tara Calishain
SEO + Web Traffic = Money

Editor's note: With access to more than three million documents in over 30 languages, Google is a researcher's dream. But like any invaluable tool, knowing the insider tricks of the trade is a must to save time and needless effort. Tara Calishain and Rael Dornfest, authors of Google Hacks, 2nd Edition, have set out to educate the masses to the ins and outs of Google. In today's excerpt, they offer the inside scoop on scattersearching, cartography, Google on the go, gmail-lite, and AdSense. With over 150 million Google searches conducted every day, why be just a number?

moderate

Scattersearch with Yahoo! and Google

Sometimes, illuminating results can be found when scraping from one site and feeding the results into the API of another. With scattersearching, you can narrow down the most popular related results, as suggested by Yahoo! and Google.

We've combined a scrape of a Yahoo! web page with a Google search[Hack #41], blending scraped data with data generated via a web service API to good effect. In this hack, we're doing something similar, except this time we're taking the results of a Yahoo! search and blending it with a Google search.

Yahoo! has a "Related searches" feature, where you enter a search term and get a list of related terms under the search box, if any are available. This hack scrapes those related terms and performs a Google search for the related terms in the title. It then returns the count for those searches, along with a direct link to the results. Aside from showing how scraped and API-generated data can live together in harmony, this hack is good to use when you're exploring concepts; for example, you might know that something called Pokemon exists, but you might not know anything about it. You'll get Yahoo!'s related searches and an idea of how many results each of those searches generates in Google. From there, you can choose the search terms that generate the most results or look the most promising based on your limited knowledge, or you can simply pick a road that appears less traveled.

The Code

Save the following code to a file called scattersearch.pl.

TIP: Bear in mind that this hack, while using the Google API for the Google portion, involves some scraping of Yahoo!'s search pages and thus is rather brittle. If it stops working at any point, take a gander at the regular expressions for they're almost sure to be the breakage point.

#!/usr/bin/perl -w
#
# Scattersearch -- Use the search suggestions from
# Yahoo! to build a series of intitle: searches at Google. 
     
use strict;
     
use LWP;
use SOAP::Lite;
use CGI qw/:standard/;
     
# Get our query, else die miserably.
my $query = shift @ARGV; die unless $query;
     
# Your Google API developer's key.
my $google_key = 'insert key here';
     
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
     
# Search Yahoo! for the query.
my $ua  = LWP::UserAgent->new;
my $url = URI->new('http://search.yahoo.com/search');
$url->query_form(rs => "more", p => $query);
my $yahoosearch = $ua->get($url)->content;
$yahoosearch =~ s/[\f\t\n\r]//isg;
     
# And determine if there were any results.
$yahoosearch =~ m!Also try:(.*?)  !migs;
die "Sorry, there were no results!\n" unless $1;
my $recommended = $1;

# Now, add all our results into
# an array for Google processing.
my @googlequeries;
while ($recommended =~ m!<a href=".*?">(.*?)</a>!mgis) {
    my $searchitem = $1; 
    $searchitem =~ s/nobr|<[^>]*>|\///g;
    print "$searchitem\n";
    push (@googlequeries, $searchitem);
}

# Print our header for the results page.
print join "\n",
start_html("ScatterSearch");
     h1("Your Scattersearch Results"),
     p("Your original search term was '$query'"),
     p("That search had " . scalar(@googlequeries). " recommended terms."),
     p("Here are result numbers from a Google search"),
     CGI::start_ol( );
     
# Create our Google object for API searches.
my $gsrch = SOAP::Lite->service("file:$google_wdsl");
     
# Running the actual Google queries.
foreach my $googlesearch (@googlequeries) {
    my $titlesearch = "allintitle:$googlesearch"; 
    my $count = $gsrch->doGoogleSearch($google_key, $titlesearch,
                                       0, 1, "false", "",  "false",
                                       "", "", "");
    my $url = $googlesearch; $url =~ s/ /+/g; $url =~ s/\"/%22/g;
    print li("There were $count->{estimatedTotalResultsCount} ".
             "results for the recommended search <a href=\"http://www.".
             "google.com/search?q=$url&num=100\">$googlesearch</a>");
}
     
print CGI::end_ol( ), end_html;

Running the Hack

This script generates an HTML file, ready for you to upload to a publicly accessible web site. If you want to save the output of a search for siamese to a file called scattersearch.html in your Sites directory, run the following command ["How to Run the Hacks" in the Preface]:

% perl scattersearch.pl "siamese" > ~/Sites/scattersearch.html

Your final results, as rendered by your browser, will look similar to Figure 2-15.

Figure 2-15
Figure 2-15. Scattersearch results for siamese

You'll have to do a little experimenting to find out which terms have related searches. Broadly speaking, very general search terms are bad; it's better to zero in on terms that people would search for and that would be easy to group together. At the time of this writing, for example, heart has no related search terms, but blood pressure does.

Hacking the Hack

You have two choices: you can either hack the interaction with Yahoo! or expand it to include something in addition to or instead of Yahoo! itself. Let's look at Yahoo! first. If you take a close look at the code, you'll see we're passing an unusual parameter to our Yahoo! search results page:

$url->query_form(rs => "more", p => $query);

The rs=>"more" part of the search shows the related search terms. Getting the related search this way will show up to 10 results. If you remove that portion of the code, you'll get roughly four related searches when they're available. That might suit you if you want only a few, but perhaps you want dozens and dozens! In that case, replace more with all.

Beware, though: this can generate a lot of related searches, and it can certainly eat up your daily allowance of Google API requests. Tread carefully.

Kevin Hemenway and Tara Calishain

Search Engine Optimization

Essential Reading

Search Engine Optimization
Building Traffic and Making Money with SEO
By Harold Davis

SEO--short for Search Engine Optimization--is the art, craft, and science of driving web traffic to web sites.

Web traffic is food, drink, and oxygen--in short, life itself--to any web-based business. Whether your web site depends on broad, general traffic, or high-quality, targeted traffic, this PDF has the tools and information you need to draw more traffic to your site. You'll learn how to effectively use PageRank (and Google itself); how to get listed, get links, and get syndicated; SEO best practices; and much more.

When you approach SEO, you must take some time to understand the characteristics of the traffic that you need to drive your business. Then go out and use the techniques explained in this PDF to grab some traffic--and bring life to your business.


Read Online--Safari
Search this book on Safari:
 

Code Fragments only

Pages: 1, 2, 3, 4, 5

Next Pagearrow