Computing Cluster Scores

Posted in Data Grids by admin on August 9, 2010 No Comments yet

Google solution for the challenges of consultations

Search engines aim to provide the most relevant results in response to questions but the limitations can see what really went on the basis of the queries used. Search queries may be too specific or too general for search engines to recognize the good results. Google has filed patent applications regarding alternative query terms or query refinements to offer a solution.

Google's solution

The search queries are not very effective in providing good results include homonyms are words that have the same sound or spelling but different meanings. inappropriate contexts in the choice of words can also be confusing, especially for search engines. very general terms, provide results that are too broad while very narrow terms can be very restrictive and can provide results search that does not respond.

Google presents a system and method that attempts to address this particular problem. In this system, a stored query and a stored document associated as a logical association. The couple is assigned a weight so when a search query is issued, a set of search documents produced. There are at least a document search that matches at least one document. Recovery is performed when the stored query and the weight assigned to it seeks partners at least one document stored. A group is through this and the score is done in at least one of the groups in relation to at least one cluster. At least scored such a query is suggested as a set of query refinements.

The process begins when Google finds the election results of 100 documents for grouping. During this phase, the expression vectors are calculated for each of these documents were ranked by relevance score. The documents adapt to a document stored in a database contained association. Alternative terms query associations are watching the consultations had calculated in advance for the combined set of documents stored.

Term vectors are also created by an alternative consultation period. Clusters are created from the two sets of expression vectors to form clusters. Each group has a cluster centroid calculated. Search queries related to a document Search in the cluster are scored according to the distance from the center of gravity and the percentage of stored documents that occur in the cluster. The best refinement suggested query contains the highest number of search terms and the query most frequently in documents of the cluster.

Other group names query can be created to reach suggested query refinements. The improvements are sorted by relevance score. alternative queries may include forms of denial of the terms in the series of refinements, but not in the original search query. A number of selected search queries queries Previous user can use to reach a series of improvements to precomputed. The consultations will be published default search results, while are maintained in a database for future user search requests. Refined queries provide the user with the original search results.

The pre-calculation stage happens before any query is entered into the search engine. It is best described with the use of at least four parties – associator, selection, regenerator and inverter.

The associator creates relevance-weighted relationships between stored queries and stored documents. The selector decides that the documents stored and stored queries should recover. The regenerator looks query logs and documents stored selected on the basis of previous searches. The investor looks at the data stored cached and select documents and related queries based on cached data.

The query refinement system itself consists of four parties. A Matcher matches one or more documents stored in the actual search documents that have been generated by the search engine to answer a search query. It also identifies the stored queries and assigned weights using the associations corresponding to the matched stored documents. It forms a one or more clustered clusters using term vectors formed from the terms that occur in the combined set of stored queries and corresponding weights. The scorer centroids estimated to represent the weighted center of each group of expression vectors. A presenter identifies search queries highest score as a refinement or more queries to the user. The interesting aspect of this approach is the amount of user data is incorporated into results through the use of log files and information stored cached.

The patent application shows one way to achieve improvements consultation, but no one really knows exactly how Google comes with alternative outcomes. Without however, offers some clues on how to create content in web sites and how to display these alternative outcomes. In considering carefully the words that People probably look for and what appears on Google, the results of the search phrases, a track can be provided on how the search data approach cure a web page.

Multi-stage processing of queries

The determination of the relevance on the page to answer investigators' questions considers how a term or phrase is used in the context of a page. A patent application that looks at possible ways to consider the context of these words was presented Google also. It describes a procedure performed to determine the relevance and find the search results.

The possible actions to be taken as described in this document can be divided into stages. The first stage deals with the removal of empty words, derived term and the expansion of queries using things like synonyms and related terms that commonly co-occur with them. During this stage, the relevance scores are created between query and each document with an estimated or more scoring algorithms. The second stage uses adjacency and proximity of terms to classify documents. The third stage reviews the term attributes such to determine whether the terms are the titles, headings, metadata or whether these terms have certain characteristics of the source. The fourth and final stage is the generation fragments back with the results.

Interactive query refinements have been shown to promote recovery. major search engines use the history of user actions as queries or clicks to customize the search results. The specific query web recommendations (QSRs) with character retroactive to answer questions in the history of the user and get new results. Its main purpose is to recommend new websites for user queries age. Without This does not prove useful if the user has an abiding interest in a particular query. Focus can also be changed from individual consultation sessions query that includes all actions associated with a given initial query. A query is considered a refinement of the previous query if both queries contain at least one term common.

About the Author

Read about customer retention strategy and building customer loyalty at the Good Customer Service Skills website.

Lec 10 | MIT 6.00 Introduction to Computer Science and Programming, Fall 2008

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay