GoogleMini Search Appliance Fails to Boot

by Tim 1. February 2011 00:24

Security screwA sinking feeling washes over as the network monitor alerts you to a lost server, a bit like loosing contact with one of the fleet at sea. Today it is the GoogleMini, purchased to quickly give us good, familiar search results to our externally facing websites a couple of years ago.

Support Policy

The GoogleMini comes with a maximum of two years support. After that time, if support renewal is required then you must accept a new GoogleMini to go with it. A free box with your support contract is better way of thinking of the way Google run the support and less painful than thinking about how every two years another 1U sever is going to be scrapped. The typical response to cries for help from Google on older boxes is;

Hello **********,

Thank you for writing to Google.

The technical support and hardware warranty for the Google Mini can be extended to up to a maximum of two years. Since your device is past the two year mark, you would need to purchase a new Google Mini if you wanted to be under support again.

However, you can still use the Google Mini to provide the most relevant search results in the industry to your users. The Google search technology comes under a perpetual license, so you are free to keep using it for the life of the hardware.

This support arrangement makes the total cost of ownership high for this solution, but the solution in a box is attractive and hard to resist as it leaves the development and IT infrastructure teams free to concentrate on innovation.

Broken GoogleMini

So what of the alert, you ask? It turns out that our faithful server that has ran without restart for two years has stopped working. On reboot it makes a lot of noise from the fans but shows a blank, black screen on the monitor and no beeps or any BIOS screen.

Although the server is out of warranty the licence terms probably prevent us from continuing to use once it has been opened up. Hence for us the only option is to buy a new GoogleMini or find another solution to serving search results.

Curiosity and opening up the GoogleMini

ATTENTION: Do not open your GoogleMini if you intend to keep using it and/or it is covered still by your support contract.

Inside GoogleMiniCuriosity reigned and off came the rack mount cheeks, out came the tamper proof screws. The all covering blue Google sticker was peeled off the top of the unit to allow the top to slide off.

A quick note on the two screws in the back. The tamper proof screws can be opened by sawing a slot into them for a normal flat head screwdriver to use (gentle use of hacksaw), or alternatively some claim that a small flat head screwdriver wedged into the screw will allow it to be undone.

Broken memory bank on motherboard

Inside you find a familiar server layout. thanks to a post on the internet, the memory was removed and now the server started doing POST beeping, ah-ha, not so dead! Putting memory back into memory bank 1 alone allowed the server to start booting. Putting any memory into the memory bank 2 caused it to not boot again. Looks like a broken mother board. New memory was loaded into the first bank only. After doing a lot of file system checking and some half an hour later, it is running.

Apparently the hardware inside does vary a little, this one was a P8SCT motherboard, marketed by SuperMicro. I guess in order to regain faith in the server, ordering a new motherboard would allow breathe some life into it.

Repurpose the hardware

So challenge of fixing the server hardware problem is resolved but it must now be repurposed for a new life as a Linux server for other uses as it has been opened. There are a few postings people who have very successfully done this with the GoogleMini, seems like the right thing to do from an environmental point of view. The kit is a dual Pentium 3 so should have plenty to give to other deployments, and importantly it looks cool in blue.

To do anything with it, you must get into the BIOS, that is normally protected with a Google owned password, the following snippets help get around that one;

[4]

Step 3 — Resetting the BIOS
Different Google Minis come with different internals. This one happened to have a single processor Super Micro ComputerSUPERO P8SCT motherboard. Other Minis have come with dual processor Pentium 3 boards. In our case, to clear the CMOS and eliminate the BIOS password requirement, I had to bridge two contact pads with a screwdriver. Close everything up, power on the machine, hit the DEL key and straight into the BIOS we go.

[5]

clear the BIOS by jumping the JBT1 pins according to the motherboard manual page 2-19, chapter 2-7 (Jumper Settings). Remove Power and AC coard, short-circuit a few seconds with a small screwdriver, job done.

Cloud based search

So where from here? There may be cloud based search solutions now offered, however the cost of bandwidth in and out of the data centre would be unbearable to support external indexing, for the very large sites this box was indexing. The Mini, sat next to the webserver in the same rack at the Co-Lo seems to make sense.

References:

[1] How do I salvage an old Google search appliance after an apparently failed BIOS update

[2] How to Turn a Google Mini into a Home Server

[3] AnandTech Search goes Google -This post provides some great photographs and information on how to get into your GoogleMini.

[4] Google Mini – Can’t Boot Wont Boot 

[5] HACKING THE GOOGLE MINI

[6] Support Policy

The Google Mini is offered as a perpetual license, including one year of support and hardware replacement coverage, for a total price of $1995 for search across 50,000 documents. Additional versions deliver search across 100,000 documents for $2,995, 200,000 documents for $5,995, and 300,000 documents for $8,995. A second year of support and hardware replacement coverage is available for $995.

Google Mini Front

Tags:

Google Mini | Infrastructure

Google Mini Remove URL from index

by Tim 27. August 2009 17:37

One of my ASP.NET ecommerce applications uses URL rewriting for product pages. For example:

Item Sku Number: 473-151
Product Description: Bright Products, Black box converter
Website URL: href=http://www.mydomain.com/Products/473-151- Bright Products, Black box converter

Note: Text after the SKU item number is irrelevant as it is disregarded for the purposes of the ASP.NET engine, only the 473-151 finds the page.

Google Mini

We use a Google Mini to index the site and provide search results to the site users. As the product page can be entered from a number of differing routes in the past using different access URLs, and to help keep the page count down in the results from the box we use the canonical header meta tag to provide what should be the definitive page url for this page.

Canonical headers are supported by all the search engines of importance. The tag looks like this;
<link rel="canonical" href=http://www.mydomain.com/Products/473-151- Bright Products, Black box converter />

Change of description

Recently a supplier complained as the description of the product in the URL for the item was wrong, although on the page it was correct. After investigation it was found that the item description had been changed as the supplier had rebranded the brand name, see below.

Item Sku Number: 473-151
Product Description: Mighty Products, Black box converter
Website URL: href=http://www.mydomain.com/Products/473-151- Bright Products, Black box converter

This meant that when the item was searched for in the Google mini, it found “Black box converter” but had the incorrect URL shown above. It should have the url as follows;

Website URL: href=http://www.mydomain.com/Products/473-151- Mighty Products, Black box converter

What's wrong?

So what is wrong? It turns out that the Google Mini still has the old URL in the index. In fact it turns out that the page is very persistent at staying in the index. Thus the box happily crawls it each night.

It seems “the index” is a list of pages the Google Mini has found at some time in the past. In fact a page can now have been “unlinked” from the site, having no inbound links to it, but it will still persist in the index and thus results.

The only way to remove a page from the Google Mini Index is highlighted in this document Administering Crawl for Web and File Share Content: Introduction, here it sates that;

  • The license limit is exceeded
  • The crawl pattern is changed
  • The robots.txt file is changed
  • Document is not found (404)

These are the only ways that a page will be removed. As in this scenario, the page still returns a valid page, as it has the same item SKU number, it keeps indexing under the wrong URL potentially forever!

Also it is worth noting that if you are having problems with re-indexing the content of the page rather than the URL of the page then check the “Last-Modified” header that is being returned by the page in the response from the web server. This is particularly an issue in dynamic pages as normally static pages will be dealt with appropriately from the last modified date of the file on the file system of the site. You can study the headers from the page by using a developer tool bar (now built into IE8).

Solution attempt 1

Aha I thought I know how to tackle this. The old URL no longer exists now, as it has been superseded by the new page, thus the ASP.NET site should be issuing a response.status = “301 moved permanently” to force the page out of the Google Mini to index the new page page and register that URL and presumably drop the old URL from the index.

Couple of lines and problem was solved I thought.

If Not officialUriForPage.PathAndQuery.EndsWith(Request.RawUrl) Then
  Response.Clear()
  Response.Status = "301 Moved Permanently"
  Response.AddHeader("Location", utility.GetPublicProductURL( _
            Me.ProductDetails.ProductId, Me.ProductDetails.ItemDescription))
  Response.End()
End If

 

So now the old page will issue a “301 moved permanently” response to the browser and Google Mini, it will go index that new page and drop the old URL – However it don’t work that way.

Solution attempt 2

After the overnight index solution 1 turned turned out a failure. Reading the documentation again it turns out the Goole Mini is being helpful and returning both URL’s, the new and moved URL, for any searches that have a search hit inside the new URL content. It seems that the four methods of removal noted earlier really are the only way to remove a page from the index.

Action

I could put the URL I wanted to remove from the index into the “Don’t Crawl URLS” box of the crawl pattern definition in the Google Crawl admin pages. This would then cause the Google Mini to, after 15 minutes to six hours, re-examine the index and realise this page no longer should be there and remove it. This would be done under the criteria “The crawl pattern is changed”, item two of the list of conditions for removal of pages in the list earlier. I could then remove the don’t crawl URL again from the Google Box so I don’t forget and accidentally block a future new URL replacement. This should work for a few pages, we have about 15,000 products online, need something better.

Instead I went for the last option in the list, “If the search appliance receives a 404 (Document not found) error from the Web server when attempting to fetch a document, the document is removed from the index.”.

Hence I changed the code sample above to redirect to our generic 404 not found page rather than redirecting with the moved redirect. Check that the 404 page responds in the header with a 404 status code or the Google Mini will not see the 404 status. However I don’t want this to happen for end users only the Google Mini. This is because for an end user they just want to be redirected to the URL, a 404 not found is rude and would make lost sales as users assume the item no longer exists. Luckily the Goole box sends a configurable user_agent variable in requests, so we can behave differently to it.

If Not IsNothing(System.Web.HttpContext.Current.Request. _
         ServerVariables("HTTP_USER_AGENT")) _
    AndAlso System.Web.HttpContext.Current.Request. _
         ServerVariables("HTTP_USER_AGENT").Contains("gsa-crawler") Then
    'Not found for Google Mini
    Response.Clear()
    Response.Status = "404 Not Found"
    Response.AddHeader("Location", "/ErrorPages/404.aspx")
    Response.End()
Else
    'Perm redirect
    Response.Clear()
    Response.Status = "301 Moved Permanently"
    Response.AddHeader("Location", common.utility.GetPublicProductURL( _
                Me.ProductDetails.ProductId, Me.ProductDetails.ItemDescription))
    Response.End()
End If

I hope the problem is now resolved.

Tags:

ASP.NET | Google Mini

Google Mini excluding ASP.NET page fragments

by Tim 17. February 2009 11:55

You have configured your Google Mini, got it integrated with you site. What you find now is that your results are getting skewed by irrelevant content on your site. This is what I’ve just found.

Exclude unwanted page sections

The result set was upset by the “customers who bought this also bought…” and the site page header and footer. This turned out very simple to resolve. There is a HTML tag that can be used to stop parts of the page from getting indexed. The definition of these are found in this document, excluding Unwanted Text from the Index.
Here are the examples pulled from that documentation for brevity;

<!--googleoff: anchor--><A href=sharks_rugby.html>shark </A> <!--googleon: anchor-->
<!--googleoff: snippet-->Come to the fair!<!--googleon: snippet-->
<!--googleoff: all-->Come to the fair!<!--googleon: all-->

You surround the control or section of the page you do not want to participate in the results with one of the three HTML comment tags shown above. This will not affect the rendering of you page but does mean something to the Google search appliance.

Index: The words between the tags are ignored by Google, they are treated as if they don’t occur on the page at all.

anchor: text in the html anchor tag to another page will not cause that destination page to appear as a result due to the link on this page.

Snippet: the search result will not use the text between the tags in the auto generated snippet that is included in the results.

all: Turns on all the attributes. Text between the tags is not indexed, followed to another linked-to page, or used for a snippet.

To solve my problem googleoff was applied to;

  • “Customers who bought this bought” control reference
  • Product category breadcrumb on the product pages
  • master page header and footers
    This has resulted in “contact us” not returning every page in the site any more, as it used to be linked from every page through the site master pages and made the snippets much more relevant from search results.
    Resulting in much richer results. Caution should be applied to avoid excluding too much of your content from Google as you can’t predict what and why someone is searching on your site. Excluding too much content may hinder them finding what they require or prevent them ever getting what they need.
    Check the documentation for other controls you have available to control the indexing of pages (the crawl).

Tags: ,

Google Mini

TextBox