ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Creating Google Custom Search Engines

by Bernard Farrell
09/06/2007

Why Do You Need a Better Search Engine?

It's early in the evening and some old school friends just called unexpectedly. They'll be at your house in 90 minutes and you want to make a quick meal with what you have in the house. Where do you find an easy to make recipe using the ingredients you've got to hand (chicken pieces, onions, potatoes, cream, wine, and seasonings)? You open your browser, go to Google, and search for a recipe with these ingredients and you're given nearly 600,000 pages. Well, that's not a lot of help. It might take you an hour to find a good recipe among all those sites.

There must be a better way to search just the sites that you'd normally use for recipes and return a smaller set of results that are more likely to help you get a meal on the table. This is where a Google Custom Search Engine can help you.

What Is a Custom Search Engine?

A custom search engine (CSE) tells Google which sites to search and which to avoid when dealing with a search query. This makes it much easier to get specific, guided answers to questions about a specific subject area. If you create a CSE you can use your expertise in a subject to control where Google looks for information about that topic. And you may even make some money in the process, because the custom search engine returns AdSense advertisements with each set of results. If you have an AdSense account, then the revenue from those advertisements can go to you. Don't get too excited, you'll probably get less than a few dollars a month from AdSense unless your search engine gets really successful.

You can tune this list of sites over time, adding and removing sites from the list. This makes it easy to improve the results based on the queries entered by people who are using your custom search engine.

Here's an example of a custom search engine for Bermuda. The editor of this CSE has chosen 80 sites that are worth searching to produce results about Bermuda that help him and others. So, if I love scuba diving and I search on this site for the term "scuba tours," I get results that are about scuba tours in Bermuda. If I used the normal Google search page, I'd get over 2 million results and I'd spend much more time looking for Bermuda-related scuba touring information.

A properly built CSE returns search results intended for specific audiences or areas of interest. It's important to get the search engine name and description right so people don't get frustrated when trying to use it. And then the rest of the work is deciding where to search. Later, you can organize the results into categories to help fine tune the results, but that's an optional step you can skip when you're starting out. It may sound like a lot of work, but you can actually create a new custom search engine in minutes. You'll see how in the next section. Later, I'll show you how to tune the CSE interface, site selection, and results.

Less Than 10 Minutes to a New Search Engine

I'm going to create a custom search engine for recipes. Along the way I'll show you the type of information you'll need to make your own CSE. Before you get going, you need a Google account. If you don't have one, go to the Google create account page.

The first step in creating your custom search engine is to visit the Google CSE home page. Here we'll complete a two-step process: entering the setup data and then trying out the engine. After that, our CSE is ready for use. There are restrictions on what you can do with a custom search engine, these are given on Google's CSE Terms of Service page.

Before you start, you need some basic information. Don't worry if you're not sure how to complete some of the details, you can easily change any of the CSE settings after you've created your engine.

  • Name: Choose something that helps people see what type of engine it is. I'll call this the Simple Recipes Search Engine. It's not aimed at gourmet chefs. Note that the terms of services do not allow you to use Google as part of the name for your engine.
  • Description: Describe what the search engine does and the type of people who might use it. Keep this short and easy to understand.
  • Engine keywords: These are examples of the type of search query someone would enter. They help the Google engine choose which pages it should promote more. You can enter single words, or phrases enclosed in quotes. Mine are homecooking "easy to prepare" and "simple cooking".
  • Language: Choose a specific one, or all. I'll choose English for this engine.
  • What to search: You can specify the only sites to search, a set of sites that should be given a higher priority, or the entire Web. I'm going to select the sites for Google to include in the results.
  • Which sites to search: Initially, you may only have a small set of sites in mind. Later, you can add and remove sites.

This is one of the most important parts of creating the search engine. Here you control which how much of each site is included in results by how you enter its URL. I'll use the site http://homecooking.about.com/library/archive/ as an example.

URL specified for CSE What's Included Scope
*.about.com/* The whole about.com domain All pages in about.com
homecooking.about.com/* All of about.com's Homecooking site Many pages
homecooking.about.com/library/archive/* Recipes part of the site Recipes pages
homecooking.about.com/library/archive/blbbindex.htm Only this page. One page

Unless you're creating the custom search engine for a non-profit, university, or government agency you have to display advertisements on the results pages. And, of course, you must agree to those Terms of Service.

The completed first page looks something like the following.

CSE page figure
Figure 1. Initial CSE page

When you click the Next button, you're taken to the Try it Out page. Here you can test whether your engine produces results that you like. Don't set your expectations too high for this page, because you'll come back and further tune them later. I'll try the query for pie crust and see what I get back.

pie crust query figure
Figure 2. Pie crust query

These results are appropriate and they all come from the right site. I check the box to get a confirmation email and press Finish.

Note: this confirmation email has lots of useful pointers for management of your engine, so it's worth requesting it. If you forgot to click this box or you delete the email message, don't worry. Log on to Google and click on the My Search Engines link.

I've placed this search engine online with all of its data so you can see how it works and what has been specified to Google. You can find my site for the search engine at http://recipeclues.com/. And the Google homepage for the Simple Recipes search engine is here.

Now we can start to customize the search engine in a number of useful ways. First, I'll show you how to add more sites to the list that Google uses for results.

Adding More Sites Using Forms

You can add more sites to your search engine in a number of ways. If you've already got a list of sites in mind, you can just go to the My Search Engines page and click on the control panel link for your search engine. Then, click on the Sites tab and you have a form that allows you to add and remove sites and check whether you've already got a site defined in your search engine.

sites tab figure
Figure 3. Sites tab

If you've only got a small number of sites, this interface is just about usable. One challenge is that it only allows you to work with 20 of the sites that you've specified at a time. Initially, that may work. But once you have a custom search engine that's using more than about 40 sites, you'll find this interface very tiring to use.

When you click on the Add Sites button, you'll get a popup dialog that allows you to enter individual sites.

enter individual sites figure
Figure 4. Enter individual sites

On this form, the final option lets you dynamically extract links from a page and add them to your search engine. It's useful when the page you're pointing at is a blogroll or a list of linked sites. This has been recently added to the CSE control interface and is not currently available in the bulk input approach.

Pages: 1, 2, 3

Next Pagearrow





Sponsored by: