Request that your site be "indexed" or "crawled" so that it shows up in searches form the main UB home page, and learn about adding Google search boxes to your UB-hosted website.
Operating System: Any
Applies To: Web Developers
Last Reviewed: March 23, 2013
Any website hosted on a UB webserver is eligible for inclusion on the Google Search Appliance. To request service, please choose "Google Search Appliance" for the service name on the following form.
There are three methods to prevent UB's Google Search Appliance from crawling your site:
robots.txt file is a plain text file which is read by search engine spiders and allows a webmaster to allow or disallow crawling of a site. Create the robots.txt file and place it in the root directory of your site to control crawling for the entire site. The access is denied by using the spider name, or User-agent. The User-agent of UB's Google Search Appliance is ubgsa.
To prevent the UB search engine from crawling your site, while allowing all other search engines in the world to, include the following information in the robots.txt file:
User-agent: ubgsa Disallow: /
To allow only the UB search engine to crawl your site, include the following information in the robots.txt file:
User-agent: ubgsa Disallow: User-agent: * Disallow: /
To prevent all search engines from crawling your site, include the following information in the robots.txt file:
User-agent: * Disallow: /
To prevent all Search Engines from crawling an individual Web page, you can use the following Robots META tags between the <head> and </head> tags of an HTML page.
To prevent all search engines from showing a "Cached" link for your page, place this tag in the section of your page:
To allow other search engines to show a "Cached" link, preventing only the UB Search Engine and google.com from displaying one, use the following tag:
This tag only removes the "Cached" link for the page. The UB Search Engine and google.com will continue to index the page and display a snippet.
To search the UB Google Search Appliance from your site, you can add a search box to your webpages by adding an HTML form to your Web page. You also may restrict a search to a subset of the pages indexed by the search engine (e.g. your Web site only). To add a search field like the one below to your site, add the following to your HTML code:
To restrict the search box to only returns results from an entire Web domain, a single Web site, or a subset of a Web site, add the as_sitesearch parameter in your search form:
You can also restrict the search to a specific directory under a domain:
If a trailing slash '/' is used at the end of the URL value, then the search will be restricted to only that specific folder. In the example above, which does not use a trailing slash, results will be returned for the directory folder and all subfolders under it.
Sometimes there are multiple URLs you can use to view a site at UB, such as hostname.buffalo.edu, or www.hostname.buffalo.edu. The UB search engine has been configured to not crawl every alternate URL and generally will only crawl each using one address. Make sure that the URL / directory you specify in the "as_sitesearch" parameter is the one listed in the UB Search Engine first.
To specify the search be done across multiple Web sites, use the "as_q" parameter instead of "as_sitesearch", and make sure to use "site:" and one or more "OR" operators in the value:
It would also be helpful to show that this search is a site restricted search in the search button. To do that, modify the code for the btnG parameter:
For an example, this code would be used to create a custom search box to search only the ubit.buffalo.edu website: