Geeks With Blogs
pemo (Notes from a Small [Academic] Island) ++++++++++++++++++ It's all academic! ++++++++++++++++++

Two suggestions – both to do with Google assessing its own performance (and improving upon it) – one long, one short…

1. Google should remember what you search for, as it doesn't learn anything about how humans refine search-expressions (unless it IP-logs or something).

Currently, Googleites enter a search-expression, and Google returns n-pages of links to what, it considers, are suitable pages.  However, sometimes, what's returned is, for a variety of reasons, inappropriate - so the user refines their query, and resubmits it.  Hopefully, within an iteration or two, the user finds what they're looking for.

However, because websites are naturally stateless, Google doesn't know that a previous attempt at finding information is related to a subsequent attempt – therefore, its ability to learn something about how people use language to pinpoint information is limited.

Now, Google could of course inject a little state into this.  For example, they could put/update a cookie on your machine.  The cookie contains the details of your last search – stuff like what words you used, the time/date of the query, and a unique ID (identifying you).  Now, whenever you search, the contents of the cookie (containing the data detailing your previous search) is sent to Google – along with the new search words (the cookie's then, as part of the reply, updated with the new search data). Now, at the back-end Google could very simply look at the date/time information of the earlier search, and, if it's very close to the subsequent search date/time, well, the probability of this second visit being a refinement of the first visit dramatically increases.

Alternatively, they could store this sort of stuff server-side, instead of on your machine.  They could (say) log your IP address, and store that with the query data (as before - words used, time/date submitted (the ID is now your IP address)).  The process from this point on is the same as the last of course.  However, this latter approach has lots of plus points to it, for example: 1. you wouldn't know they're doing it. 2. you can have cookies turned off, and it doesn't matter. 3. nothing about what you've been searching for is stored on your computer, …

In either cases, when a subsequent but related search is detected, important linguistic information could be learned through studying the relationship between the old and new search-expressions.

It's a shame that they don't do this (?) – because, if they did (and could establish that this is a refinement of that), it'd be well interesting to analyse the data, and to see if one can say something about how humans use natural-language to interact with search-engines, and generally optimise their thoughts - turning mentalese (to pinch a Pinker term) into the most suitable/successful, concrete words.

BTW, there's an indication that Google doesn't do this currently – because, if it did, it should be able to offer you better alternative search-expressions (very different to the 'Did you mean' stuff)!  Additionally, and in proportion to the number of search words used (and over time using NLP techniques), these alternatives should also become evermore accurate.  However, this doesn't happen – so, I assume they don't capture this sort of data.  Shame really.

2. Google should watch what websites you visit.

When you've found a useful looking results-page, do Google know what link on it you chose to navigate to?  No – Google doesn't do click-through (yet).

And again, it's a shame really – as this data could help improve their page-rank algorithm, and certainly lead to some interesting machine-learning work - in reparsing the page to better correlate its content with your search-expression.


Google is one of the busiest websites in the world – and its raison detre in life is to interact with humans - using natural language.  From an AI, CL, ML, and NLP point of view, it'd be a great place to be a researcher (I reckon)!

Posted on Saturday, June 19, 2004 8:18 AM | Back to top

Copyright © pemo | Powered by: