Information Retrieval Ranking versus Machine Learning Ranking
Search has historically ranked search results in response to query terms searchers entered into a search box based on returning of organic search results ranked on a combination of information retrieval score (looking a relevance between those terms and their use in resources on the web) combined with a materiality score, based on an authority measure such as PageRank. But Google may begin using a Machine Learning Model to rank more content.
Search engineers have been telling us search engines may now be using machine learning models to rank web pages. We aren’t quite sure how machine learning has been used, but it’s good seeing descriptions of how those ranking pages may be in search results.
One patent granted at the start of December 2020 is worth looking at, and I saved a copy of the patent to write about from back then.
An inventors behind this patent has worked to implement Sibyl (Tushar Chandra), and a video about that project is highly recommended:
https://www.youtube.com/watch?v=QoUVwGZb9tA
Recommendation systems sometimes have separate information retrieval and machine-learned ranking stages.
The difference between those types of results?
That information retrieval stage selects documents (videos, advertisements, music, text documents, etc.) from a corpus-based on various signals while the machine-learned system ranks the output of the information retrieval system.
For example, when a searcher enters a query such as “cat”, a contextual information retrieval system may select a set of candidate advertisements that contain the word “cat” from all available advertisements.
Those candidate advertisements may then be ranked based on a machine-learned model that has been trained to predict the likelihood of an advertisement being clicked through by a searcher based on various features, such as:
- The type of user
- The location of the user
- The time of day at which the query was made
- Etc.
The patent tells us about the difference between the two:
An information retrieval tool is computationally efficient, but can only produce a rough estimate of which items are best recommended to a user.
A machine-learned model can produce more accurate recommendations but is often more computationally intensive than an information retrieval tool.
Because the information retrieval tool is less accurate, it can exclude certain candidates from consideration using the machine-learned model that would otherwise be highly ranked.
According to the Google Patent, rules from a machine learning model and received, with each of the machine learning rules containing an outcome, features, and an outcome probability predicted by the machine learning model for the one or more features and the outcome.
By looking at LinkedIn Profiles of the inventors of this patent, it appears that this machine learning approach is likely used for the optimization of videos at YouTube, and the examples in the patent focus on videos.
An entry for a token-based index can exist for each of the rules and may contain one or more tokens based on the features of the rule, the outcome of the rule, and the outcome probability of the rule.
A query may be received and a subset of tokens that correspond to the query may be identified.
The token-based index may be used to obtain several outcome probabilities based on the subset of tokens.
An outcome may be selected based on the plurality of outcome probabilities and may be provided to a user.
A subset of selected outcomes may be ranked.
For example, a hundred videos (outcomes) may be ranked by probability with the top twenty shown, from highest to lowest probability, to the user.
Systems and techniques according to the present disclosure may generate or change one or more indices based on rules and results of a model that is the product of a machine learning system.
The generated or modified indices may be used to provide results based on a search technique.
More characteristics, advantages, and implementations of the disclosed subject matter may be set forth or cleared from consideration of the following detailed description, drawings, and claims.
This patent can be found at:
Searchable index
Inventors: Jeremiah Harmsen, Tushar Deepak Chandra, Marcus Fontoura
Assignee: Google LLC
US Patent: 10,853,360
Granted: December 1, 2020
Filed: March 27, 2019
Abstract
Systems and techniques are disclosed for generating entries for a searchable index based on rules generated by one or more machine-learned models. The index entries can include one or more tokens correlated with an outcome and an outcome probability. A subset of tokens can be identified based on the characteristics of an event. The index may be searched for outcomes and their respective probabilities that correspond to tokens that are similar to or match the subset of tokens based on the event.
What Does this Machine Learning Model Look Like?
This patent is about a searchable index created using rules from a machine-learned model.
This lets the superior intelligence and logic of a machine-learned model be embodied in an easily-searchable index.
And standard information retrieval tools can efficiently retrieve data using the index.
Compare this to a system with separate information retrieval and machine-learned ranking stages by eliminating the loss of candidates during the information retrieval stage that would otherwise be highly scored by the machine-learned model.
Techniques from the patent can use machine-learned models generated using supervised learning data such as labeled examples.
Those labeled examples may be input into a machine learning system and output from the machine learning system may be a machine-learned model with weights generated in response to the labeled data.
The labeled examples may contain both an outcome and properties associated with a specific instance.
The weights and outcomes may be unitless numerical values, percentages, counts of occurrences, or other quantification.
A machine learning system may receive labeled data (e.g., labeled examples) to develop a machine-learned model that contains weights generated in response to the supervised labeled data.
One or more rules may be created by a machine-learned model.
Example Machine Learning Rules for SERPs
This example appears to have been created for use with videos that may be shown in response to a search query.
A rule can include an outcome, a set of features, and a probability. For example, the rule:
(keyword:car, video:carmaker_1).fwdarw.0.03
This tells us that when a searcher submits a search keyword “car” at a search engine (a feature) and the user is returned a video about carmaker_1 (the outcome), with a 3% probability that the user will select the video to view it (the probability).
The patent tells us that the entries in a searchable index may include documents and searchable tokens.
We also know that a token in a machine-learned token-based index can be referred to as an indexed token.
And that an indexed token may include a keyword or may not include a keyword.
So, an index can include one token that withs the keyword “car”, as well as other tokens that do not include keywords but that, relate to other features such as location, language, and browser settings.
In that case, a feature may include any information known about a user, such as a query text submitted by the user, a browser configuration set by the user, etc.
Also, a feature can be general state information such as:
- Time of day
- Geographic location
- Etc.
The patent then shows us an example, a webpage can produce an entry as follows:
web_page_1: [text:boxcar, 4.0], [image:train, 2.0]
This entry indicates that a page of a website (“web_page_1”, a document) includes the text string “boxcar” four times and two images of trains.
So, a standard rule-based machine-learned model can be shown as a set of documents and tokens with weights.
For example, the following rules indicate the likelihood that a user who enters the search keyword “car” into a search engine will select a video about a particular carmaker:
(keyword:car, video:carmaker_1).fwdarw.0.03
(keyword:car, video:carmaker_2).fwdarw.0.05
These rules can become entries that include a set of searchable tokens corresponding to each video, such as:
carmaker_1: [keyword:car, 0.03]
carmaker_2: [keyword:car, 0.05]
An entry can include an outcome (such as “carmaker_1”), tokens (such as keyword: car), and a weight (such as 3%).
The tokens can be based on the occurrence of features in a machine-learned rule.
A weight can correspond to the probability that the outcome will occur based on the occurrence of certain features, represented here as tokens.
Since data described in the patent has the same structure as a standard web search information retrieval problem, then standard web search and other information retrieval techniques can be used, such as inverted indices and posting lists.
An implementation following the patent can end the need for a separate information retrieval step and can score all of the outcomes using the machine-learned model.
Because of that, search results may be more appropriate to a searcher than a typical indexed search, such as ones based on attributes of the user, context of the search, etc.
As an example, features can be associated with a given user.
So they can use the presence (1) or absence (0) of features (query, user location (Europe, America), and whether the searcher has a high bandwidth connection.
A machine-learned model can include weights that may show the relative contributions of various features to the likelihood of an outcome.
From the relative contributions of features to the likelihood that a user will select for viewing a particular video A-D.
The presence or absence of features for a given user combined with the weight of the feature for an outcome to determine the likelihood of that outcome for that user.
The presence of keyword: car and America for User B combined with weights for those features for Video C (0.5 and 0.2, respectively) can be used to predict a probability that User B will select Video C for view
The likelihood that each searcher will choose to view each different video can be calculated.
The weights may be unitless numerical values, percentages, counts of occurrences, or other quantification.
Rules can be generated based on the machine-learned model.
A rule can correlate at least one feature with a probability of occurrence of a given outcome.
Many rules can be generated based on the occurrences of various combinations of features.
A set of tokens can be generated based on the features in a rule.
Examples of such tokens may include [keyword:car], [location:Europe] and [bandwidth:high].
Tokens corresponding to a set of features in a rule can be considered in combination with a probability of a given outcome and indexed.
The tokens [keyword:car], [location:Europe], and [bandwidth:high] can be correlated with a 4% probability that Video C will be selected to be viewed:
Video C: [keyword:car, location:Europe, bandwidth:high, 0.04]
This information can be stored as a searchable index entry along with other such results derived from other rules.
The total compiled index can be searched using standard information retrieval tools.
This generated index is referred to as a token-based index.
It is based on tokens that are not limited to keywords.
One or more ranking techniques can be employed to refine the search results to respond to a query.
The index can be used to consider only those tokens corresponding to features for a given user.
A search of the index can retrieve all videos for which probabilities have been calculated for a user whose last viewed video was about cats and whose location is America.
The resulting videos can be ranked by probability from most to least likely to be selected next for viewing when presented as subsequent viewing options to the user.
Descriptions (e.g., thumbnails) for the top-ranked videos in the list can be recommended to the user.
An outcome may be one for which the machine learning model predicted an outcome value.
The value may be a weight, likelihood, or probability. (By way of example only, we refer to probability within the patent.)
The outcome may be any applicable result such as:
- A regression-based prediction
- A ranking
- A conversion probability
- A click through probability
- A duration prediction (e.g., how long a user is likely to view/interact with content associated with the outcome), or the like.
As discussed in the previous example, a user viewing video Y and a user viewing video Z are examples of outcomes.
Other examples of outcomes may be:
- Selecting a promotion
- Opening an account
- Purchasing a product or service
- The duration for which a user views a content (e.g., a video, an image, a text, etc.)
- Repeat access (e.g., how likely a user is to revisit content), or the like
An outcome probability may be represented in any useful form, including:
- Integer representations
- Boolean categorization
- Normalization (e.g., the probability value converted into a normalized probability, conversion rate, percentage, etc.)
An outcome probability may be any applicable prediction such as a percentage, ratio, or the like and/or may correspond to a prediction of the amount spent (e.g., dollars spent), amount of exposure time (e.g., video minutes watched), or the like.
The outcome probability may be derived from the prediction made by the machine learning model.
A searchable index may contain many entries, each associated with an outcome.
The entries may correspond to an outcome probability that predicts the likelihood of a searcher selecting the content associated with the outcome.
This outcome probability may represent the percentage chance of a user selecting content associated with the outcome.
What Machine Learning Model Results Will Look Like?
Interestingly, examples from the patent involve videos.
This searchable token-based index may use tokens associated with a query that may be matched with the tokens in the searchable token-based index and, using a search algorithm, and outcomes may be selected based on one or more outcome probabilities.
A machine learning system may generate and update models to make predictions and provide rankings.
A machine learning model-based prediction may contain an outcome, one or more features, and a prediction value.
Predictions made using a machine learning model can include many features for a given event from the machine learning model and based on the presence of the features, the machine learning model may output a probability or prediction.
A machine learning model predicting whether a searcher will view video Y (an example of an outcome) may be provided with features that the user located in the United States, has viewed a music video X in the past, and has set her default language as English (examples of features).
This machine learning model may contain weights for each of the features:
- 0.5 for being located in the United States
- 0.9 for having viewed music video X
- 0.3 for setting the default language as English)
That the machine learning model may contain weights for other features (e.g. the use is located in Canada) but, as those features are not present in this example prediction, their weights may not contribute to the prediction.
The lack of presence of a particular feature may be important in predicting an outcome and may be considered.
The machine learning model may provide a weight for whether the user will view music video Y based on the absence of a feature (e.g., the user is not using a mobile device).
The prediction value may be normalized to represent a percentage or probability in any applicable manner.
The instance could contain the outcome: “whether the user will view video Y”, the features: “located in the United States”, “viewed video X”, and “default language English”, and the prediction: “0.9” (normalized).
A feature may be any applicable characteristic associated with an instance and may be based on a user (e.g., user demographic, user history, user associations such as user accounts or preferences), a device (e.g., a user device type, device features, device capabilities, device configuration, etc.), a current event, or the like.
Features can include a searcher location, a searcher language preference, a view history, a searcher account, a searcher subscription, a device model type, a device screen resolution, a device operating system, a holiday designation, a sporting event occurrence, or the like.
A feature may be a search input (such as a text keyword, an image, an audio file, etc.) received from a searcher.
That the outcome, features, and/or the prediction may be represented in any useful form such as integer representations, via Boolean categorization, normalized (e.g., the probability value converted into a normalized probability, conversion rate, percentage, etc.) of the patent, the machine learning model may be trained using prior selections of one more users (outcomes), as disclosed herein.
The prediction made by the machine learning model may be based on the successful selection or lack of selection of an outcome such that the predicted outcome probability may increase based on selected candidate results and may decrease based on unselected outcomes.
What Will Rules for the Machine Learning Model Look Like?
An instance of a machine learning model-based rule may contain an outcome, one or more features, and an outcome probability.
In an example of a prediction made via a machine learning model, many features for a given event may be provided to the machine learning model and, based on the presence of the features, the machine learning model may output a probability or prediction.
A more detailed example:
A machine learning model that predicts whether a user will view video Y (an outcome) may be provided with data that the user located in the United States, has viewed a music video X in the past, and has set her default language as English (features).
This machine learning model may prescribe weights for each of the features, e.g., 0.5 for being located in the United States, 0.9 for having viewed music video X, and 0.3 for setting the default language as English.
So, the machine learning model may predict that the user will view music video Y with a weight of 1.7 based on the features associated with the rule.
The probability value may be normalized to represent a percentage or probability in any applicable manner.
The instance may contain the outcome: “whether the user will view video Y”, the features: “located in the United States”, “viewed video X”, and “default language English”, and the prediction: “0.9” (normalized).
The outcome, features, and/or the probability may be represented in any applicable manner such as hash values, integer representations, Boolean categorization, normalization (e.g., the probability value converted into a normalized probability, conversion rate, percentage, etc.).
So an outcome for “Selecting video X” may be represented with a hash value “e0d123e5f316”.
At the next step, an instance of a rule-based on a machine learning model may be converted into an entry in a searchable feature-based index.
The entry in the searchable feature-based index may contain an outcome associated with one or more tokens and an outcome probability.
A token may be based on a feature contained within a rule.
That model may predict a probability of 0.9 for the outcome “the user will view video Y” based on various features.
The token-based index may correlate the same tokens to other outcomes, each with its own probability.
For example, the same tokens may be correlated to the outcome “the user will view video Z” with a probability of 0.8.
A searchable token-based index may be an inverted index or a posting list such that it is an index data structure that is configured to store a mapping from content (e.g., words, numbers, values, etc.) to locations in a database file, to documents, or a set of documents.
This searchable token-based index may allow fast full-text searches and maybe a database file itself rather than its index.
A query may then be received.
A query may be generated based on the actions of a human user, a computer, a database, software, an application, a server, or the like.
The term query may include any input that can be used to search the index to get a probability of one or more outcomes based on the occurrence of one or more events.
When a searcher selects a given video, the characteristics of the selection (e.g., the identity of the video, the topic of the video, location of the user, etc.) can be used as the basis of a query to search the index for outcomes and their respective probabilities that the user will select other videos to watch next.
The results of the query can predict, for example, that the user will select Video B with a probability of 0.2, Video C with a probability of 0.1, and Video D with a probability of 0.4.
A query may be formulated based on a subset of tokens that may be identified, e.g., based on an event.
For example, a keyword search for “car” may have been submitted by a user in Canada at 5:07 PM ET with the language setting of the user’s browser set to “French.”
The subset of tokens that may be identified can include keyword: car, location: Canada, time:5:07 PM ET, and language: French.
These tokens can be used to search the index for outcomes and probabilities correlated with the same or similar tokens in the index.
These tokens may correspond, for example, to the following entries in the index, which can be retrieved using standard index search techniques.
One or more outcomes (above, videos) may be selected from the results, e.g., based on their respective outcome probabilities.
The outcome Video F may have the highest probability of being selected for viewing next.
Accordingly, a link to Video F may be provided to a user. the next highest-ranked (most probable) videos (Video R and Video A) may also be presented.
What Will Rules for the Machine Learning Model Look Like?
We’ve been told that an instance of a machine learning model-based rule may contain an outcome, one or more features, and an outcome probability.
So in an example of a prediction made via a machine learning model, many features for a given event may be provided to the machine learning model and, based on the presence of the features, the machine learning model may output a probability or prediction.
As a more specific example, a machine learning model that predicts whether a user will view video Y (an outcome) may be provided with data that the user located in the United States, has viewed a music video X in the past, and has set her default language as English (features).
This machine learning model may prescribe weights for each of the features, e.g., 0.5 for being located in the United States, 0.9 for having viewed music video X, and 0.3 for setting the default language as English.
So, the machine learning model may predict that the user will view music video Y with a weight of 1.7 based on the features associated with the rule.
The probability value may be normalized to represent a percentage or probability in any applicable manner.
Here the instance may contain the outcome: “whether the user will view video Y”, the features: “located in the United States”, “viewed video X”, and “default language English”, and the prediction: “0.9” (normalized).
The outcome, features, and/or the probability may be represented in any applicable manner such as hash values, integer representations, Boolean categorization, normalization (e.g., the probability value converted into a normalized probability, conversion rate, percentage, etc.).
So an outcome for “Selecting video X” may be represented with a hash value “e0d123e5f316”.
At the next step, an instance of a rule-based on a machine learning model may be converted into an entry in a searchable feature-based index.
The entry in the searchable feature-based index may contain an outcome associated with one or more tokens and an outcome probability.
A token may be based on a feature contained within a rule.
That model may predict a probability of 0.9 for the outcome “the user will view video Y” based on various features.
The token-based index may correlate the same tokens to other outcomes, each with its own probability.
For example, the same tokens may be correlated to the outcome “the user will view video Z” with a probability of 0.8.
A searchable token-based index may be an inverted index or a posting list such that it is an index data structure that is configured to store a mapping from content (e.g., words, numbers, values, etc.) to locations in a database file, to documents, or a set of documents.
This searchable token-based index may allow fast full-text searches and maybe a database file itself rather than its index.
A query may then be received.
A query may be generated based on the actions of a human user, a computer, a database, software, an application, a server, or the like.
The term query may include any input that can be used to search the index to obtain a probability of one or more outcomes based on the occurrence of one or more events.
For example, when a searcher selects a given video, the characteristics of the selection (e.g., the identity of the video, the topic of the video, location of the user, etc.) can be used as the basis of a query to search the index for outcomes and their respective probabilities that the user will select other videos to watch next.
The results of the query can predict, for example, that the user will select Video B with a probability of 0.2, Video C with a probability of 0.1, and Video D with a probability of 0.4.
In an implementation, a query can be formulated based on a subset of tokens that may be identified, step 240, e.g., based on an event.
For example, a keyword search for “car” may have been submitted by a user in Canada at 5:07 PM ET with the language setting of the user’s browser set to “French.”
The subset of tokens that may be identified can include keyword: car, location: Canada, time:5:07 PM ET, and language: French.
These tokens can be used to search the index for outcomes and probabilities correlated with the same or similar tokens in the index.
These tokens may correspond, for example, to the following entries in the index, which can be retrieved using standard index search techniques.
One or more outcomes (above, videos) may be selected from the results, e.g., based on their respective outcome probabilities.
The outcome Video F may have the highest probability of being selected for viewing next.
Accordingly, a link to Video F may be provided to a user. the next highest-ranked (most probable) videos (Video R and Video A) may also be presented.