Promoting Brands in Your Store Using the Elasticsearch Multi Search API

Promoting Brands in Your Store Using the Elasticsearch Multi Search API

This article describes a solution to the problem of promoting brands in the search results of an online shop with an Elasticsearch-based search component. The particular context for this solution is a Magento Enterprise 1.14.2.0 multi-website store using the Evozon Search Extension, but the general idea should apply across shop platforms and, potentially, across search implementations.

The challenge

What exactly do I mean by ‘promoting brands’? The problem is defined as follows: if a customer is not searching for a specific brand, the search results displayed should not be ranked purely by relevance (which is the default behaviour), but should be manipulated so as to show products under certain brands at the top of the search results page. The size and order of the sets of these blocks of search results should be configurable. Additionally, it should be possible to show multiple sets of results from the same brand on the same page.

Stage 0: Perplexity

My initial reaction to this problem was along the lines of "we can’t know if the customer’s query matches a brand or not. We could try comparing it to existing brands in our store, but we have ngrams! We have fuzzy search! Also, search doesn’t work like that. There is absolutely no way to build a query that will retrieve results in this format!" I may have also experienced a brief but powerful desire to change careers and become a hexa-cow herder on Eden Prime.

Stage 1: Analysis

As the initial effect of the problem wore off, I turned my attention to designing a solution.

The first subproblem to solve was how to determine whether the query string matched a brand name. The best solution I could come up with was using highlighting, which adds the matched fields to the query response and applies a highlight to their content.

As for displaying the search results in a configurable way, it’s true that you can’t build a single query that satisfies the given constraints, but what about using multiple queries and restructuring their results on the application side? Well, executing these queries in sequence would negatively impact performance. Executing them in parallel, like with curl_multi_exec(), would be a better solution, and we already had basic support for this in our search extension. But, as it turns out, there is an Elasticsearch native solution: the multi search API, which makes it possible to execute multiple queries at once. And, after you have the results from this multi query, you can present them to the user in any way you want. In the end, I opted for this approach.

With this general outline in mind, I moved on to the implementation.

Stage 2: Implementation

Setting up the configuration

The configuration had to match a brand name to a value representing its position on the search results page and another value representing the number of products to display. In concrete terms, I had to make it possible for an admin to set up something like the following:

Given that I was working within the context of a Magento store, I used a mapper to display this configuration in the admin area:

Building the multi search request

If you already have multi search in place, the actual implementation should be relatively painless. If, however, the search component relies on building and sending single queries and receiving single responses, as was the case in my situation, you will need to be a little more creative.

What I did was intercept the query right before it was executed and, if the promoted brands feature was enabled and the query was not filtered by brand, convert it to a multi search query. What does that mean in practice? For clarification purposes, let’s assume a spherical hexa-cow simplified case: the user has searched for ‘40’ and this search has been encoded into the following query body:

{  
   "query":{  
      "query_string":{  
         "query":"40"
      }
   },
   "from":0,
   "size":10
}

Our implementation actually uses a multi-match query, which is important for reasons I will explain later, but this basic example will suffice for illustrative purposes.

If this query were executed, we would receive a response such as the following:

{  
   "took":1,
   "timed_out":false,
   "_shards":{  
      "total":1,
      "successful":1
      "failed":0
   },
   "hits":{  
      "total":10,
      "max_score":1.3862944,
      "hits":[  
         {  
            "_index":"products_index",
            "_type":"product",
            "_id":"1",
            "_score":1.3862944,
            "_source":{  
               "name":"N7 Hurricane",
               "brand":"Systems Alliance",
               "fire_mode":"Automatic",
               "default_ammo":"40/240"
            }
         },
         {  
            "_index":"products_index",
            "_type":"product",
            "_id":"2",
            "_score":1.3562911,
            "_source":{  
               "name":"Blood Pack Punisher",
               "brand":"Blood Pack",
               "fire_mode":"Automatic",
               "default_ammo":"40/320"
            }
         }, 
         {
            ...
         }
      ]
   }
}

And on the frontend, the results might look like this:

The first step towards the solution is replicating the original query body once for every promoted brand in your configuration. These brand-specific copies must filter out documents that do not match the corresponding brand and correctly set the matching ‘from’ and ‘size’ values. As an example, the query body copy for our first promoted brand, Kassa Fabrication, could look like this:

{  
   "query":{  
      "bool":{  
         "must":[  
            {  
               "query_string":{  
                  "query":"40"
               }
            },
            {  
               "query_string":{  
                  "query":"brand:Kassa Fabrication"
               }
            }
         ]
      }
   },
   "from":0,
   "size":3
}

Of course, you can build your query and add filters in other ways too. What you are trying to achieve here is retrieving a set of documents that match the original query AND the promoted brand, no matter what this looks like in your implementation.

When you build these copies, be careful with the ‘from’ values - in case you have repeating brands, like in the example configuration, these values must be adjusted accordingly. In the context of our example, for the next Kassa Fabrication copy, the ‘from’ value must be 3.

Afterwards, you will need to build one more query copy that excludes all the promoted brands and has the same ‘from’ and ‘size’ values as the original query. Why? Because your queries might not find any promoted products, or might find fewer than you need to fill a whole search results page. So, assuming you are on the first page of results and pagination dictates that you should display 10 products, you could construct the following query to retrieve non-promoted products:

{  
   "query":{  
      "bool":{  
         "must":{  
            "query_string":{  
               "query":"40"
            }
         },
         "must_not":[  
            {  
               "query_string":{  
                  "query":"brand:Kassa Fabrication"
               }
            },
            {  
               "query_string":{  
                  "query":"brand:Hahne-Kedar"
               }
            },
            {  
               "query_string":{  
                  "query":"brand:Sirta Foundation"
               }
            }
         ]
      }
   },
   "from":0,
   "size":10
}

All these copies are to be sent to the _msearch endpoint as part of a single request. When building this request, make sure that the queries are sorted by their configured positions, as this will make the processing of the response much easier. The final query body should look something like this:

{index: products_index}
{[original query body]}
{index: products_index}
{[first query body copy (promoted brand with position 1)]}
...
{index: products_index}
{[nth query body copy (promoted brand with position n)]}
{index: products_index}
{[query body copy that excludes the promoted brands]}

Note: Please be aware that adding a new line after the last query body is mandatory.

The response for this query might be the following:

{  
   "responses":[  
      {  
         "took":1,
         "timed_out":false,
         "_shards":{  
            "total":1,
            "successful":1,
            "failed":0
         },
         "hits":{  
            "total":500,
            "max_score":1.5,
            "hits":[original query hits]
         }
      },
      {  
         "took":1,
         "timed_out":false,
         "_shards":{  
            "total":1,
            "successful":1,
            "failed":0
         },
         "hits":{  
            "total":3,
            "max_score":1.1,
            "hits":[first query hits]
         }
      },
      ...
      {  
         "took":1,
         "timed_out":false,
         "_shards":{  
            "total":1,
            "successful":1,
            "failed":0
         },
         "hits":{  
            "total":3,
            "max_score":1.6,
            "hits":[nth query hits]
         }
      },
      {  
         "took":1,
         "timed_out":false,
         "_shards":{  
            "total":1,
            "successful":1,
            "failed":0
         },
         "hits":{  
            "total":10,
            "max_score":1.2,
            "hits":[hits for query that excludes the promoted brands]
         }
      }
   ]
}

You now have all the documents you need, but there is a bit more ground to cover before you can display them on the frontend. As a preview of the final goal, here is what this multi query response will look like to the customer (the numbers to the right are the brands’ positions):

Knowing when to promote brands

Returning to the first constraint of our problem, you must make sure that you only display the promoted results when the customer is not looking for a particular brand. As I briefly mentioned before, you can solve this via highlighting, specifically by modifying the original query to also highlight the brand field.

The catch with highlighting is that, in order to use it, the highlighted field must be among the queried fields. Like I said above, our implementation uses a multi-match query, so this was not a problem I had to solve. A multi-match query adapted for our example would look like this:

{  
   "query":{  
      "multi_match":{  
         "query":"40",
         "fields":[  
            "brand",
            "[field_1]",
            "[field_n]"
         ],
         "type":"cross_fields",
         "minimum_should_match":"50%",
         "tie_breaker":0.1
      }
   },
   "from":0,
   "size":10,
   "highlight":{  
      "fields":{  
         "brand":{  
            "type":"plain"
         }
      }
   }
}

If the response for the original query has a highlight key that contains the brand field, it means that the user has searched for a brand and therefore you should return the result of this query.

If the brand field is not highlighted in the response, then it is okay to display promoted brands, so you should return the multi query results.

Building the single search response

If it turns out that you should return the multi query results and your implementation expects a single result object, you have to intervene and convert these responses into such an object.

What you can do here is take the first (original) query response and replace its hits with the hits from the other queries, structured as per your configuration. This will preserve any additional information the main query might retrieve about your result set as a whole (such as the total count), so you can safely return this altered response and any other components involved will tick happily along, unaware of your intervention.

Pagination

This is by far the trickiest problem to solve in this context. Taking the situation described here, you may have noticed that the query excluding promoted brands had a size of 10, but only one result could be displayed to the user, as the other 9 slots were filled with promoted brands. This means that, on the next page, the ‘from’ value for this query should be 1, not 10, as pagination would require. Additionally, because we showed two Kassa Fabrication result sets, on the next page, the first query should start from 6, not 3.

I must admit I do not have a fully satisfactory solution to this. I opted to keep track of the offsets in the session, but they are calculated when the customer visits a page (as I can’t know beforehand how many documents the queries will retrieve) and I am relying on the assumption that the pages will be visited in order. This means that, if the customer performs a search and then goes directly to page 5 of the results, the offsets will be derived from the standard pagination and will be inaccurate. I would love to have a better solution to this, so do let me know if you come up with one :)

Stage 3: Rejoicing

Congratulations, you have successfully harnessed the power of multi search and highlighting to show the results that you want to promote in your store! ... Or at least I sincerely hope so :) I tried to make the solution sufficiently generic to help anyone facing this problem, but if you think that something different or more is needed, please share your struggles and experiences and let’s see if we can build something even better in the end.


NO COMMENTS

Tell us what you think

Fields marked with " * " are mandatory.

We use cookies to offer you the best experience on our website. Learn more

Got it