Introduction
Websphere Commerce uses apache solr which is a fast open source java search server for its searches. We know that the product and other data are stored in database. DB queries are heavy and performance intensive. Hence we can't keep on querying the DB to get the results for every search.
As a solution for this and to make our searches fast, commerce introduced Solr. Solr is able to achieve fast search responses because instead of searching the text directly, it searches an index. Solr stores this index in a directory called index in the data directory. Apart from just being fast, Solr provides the ability to change the relevancy of search result. This means by changing certain parameters we can improvise the order in which the search results appear.
For example if somebody search for "pepsi", ideally pepsi drink must come in the top. 7up is also a product of pepsi and it will have the word "pepsi" in the manufacturer name. In this scenario we can say that a match for search term in product name is more relevant than a match in manufacturer name. This way we can make sure that the correct product is displayed on the top. 7up will still be there in the search result but with a lesser relevancy. Also boost the products we want by changing the boost factor, change the sorting options, filtering specific products/categories from results and much more can be done using solr.
Commerce reindexing
Commerce reindexing is the process of indexing the data from DB in Solr. It consists of two processes.
Pre-process: Preprocess the data to be stored/indexed in solr and save it into temporary tables.
Build-index: Indexes the data in temp table to solr.
Important files for reindexing
Solrconfig.xml : This file has the configurations of solr like connection time out, replication handling, max connections etc.
Schema.xml : This file has the definition of the solr fields.
pre-process xmls : These xmls will have the sqls to create the temporary table and populate them as we need.
wc-data-config.xml : This has the query to get the data to populate the solr indexes and also have the mapping between the temporary table name to the solr field name.
SOLR Queries
We save fields in SOLR in a specific core. Mostly for product searches we use MC_10001_CatalogEntry_en_US (where 10001 is the master catalog of the store). For an easy analogy I am comparing SQL queries to SOLR queries.
If we want to get all the products from DB in a query we will write it as
SELECT * FROM CATENTRY;
The corresponding solr implementation would be as below
http://localhost/solr/MC_10001_CatalogEntry_en_US/select?q=*:*
q parameter is the main query of the request which is equivalent to the search term. Here we are passing q as * which means the query will fetch every doc in solr.
Now if we want to get all the product data which has got "milk" in the product name we will use the DB query
SELECT CT.* FROM CATENTRY CT, CATENTDESC CD WHERE CD.CATENTRY_ID = CT.CATENTRY_ID AND LOWER(CD.NAME ) LIKE '%milk%'
In Solr we can write the same query as below
http://localhost/solr/MC_10001_CatalogEntry_en_US/select?q=milk&fq=name:milk
Here "fq" stands for the filter query. Filter queries are used to add conditions to solr queries. CD.NAME in the above DB query is indexed as field "name" in Solr. So the solr query means, get all the details from solr which has the term milk in any of the fields and filter the results such that the results has "milk" in the field "name". So all the other results which might have "milk" in any other fields are omitted.
Say now we want to select all the search for milk which are has milk in product name and are in Dairy category. The DB query will look like below. You can see that the query is becoming bigger and messier
SELECT CT.* FROM CATENTRY CT, CATENTDESC CD , CATGPENREL CG, CATGRPDESC CGD WHERE CD.CATENTRY_ID = CT.CATENTRY_ID AND CG.CATENTRY_ID = CT.CATENTRY_ID
AND CGD.CATGROUP_ID = CG.CATGROUP_ID
AND LOWER(CD.NAME ) LIKE '%milk%'
AND CGD.NAME LIKE '%Dairy%'
The solr query for the same would look like
http://localhost/solr/MC_10001_CatalogEntry_en_US/selectq=*&fq=name:milk&fq=categoryname:Dairy
Note : categoryname is the solr field corresponding to DB category name column
It is just an addition another filter query and bang you get the results..!!
Let us make a slight modification to fetch the products in diary as well as fresh category. The sql will be
SELECT CT.* FROM CATENTRY CT, CATENTDESC CD , CATGPENREL CG, CATGRPDESC CGD WHERE CD.CATENTRY_ID = CT.CATENTRY_ID
AND CG.CATENTRY_ID = CT.CATENTRY_ID
AND CGD.CATGROUP_ID = CG.CATGROUP_ID
AND LOWER(CD.NAME ) LIKE '%milk%'
AND (CGD.NAME LIKE '%Dairy%' OR CGD.NAME LIKE '%Fresh%')
Let us get help from solr, the query would be almost similar to above with a slight change
http://localhost/solr/MC_10001_CatalogEntry_en_US/select?q=*&fq=name:milk&fq=categoryname:(Dairy+Fresh)
Say now we want to get all the above results but don't want products which are from a specific manufacturer (say RanjithsMilk). DB query would look like
SELECT CT.* FROM CATENTRY CT, CATENTDESC CD , CATGPENREL CG, CATGRPDESC CGD WHERE CD.CATENTRY_ID = CT.CATENTRY_ID
AND CG.CATENTRY_ID = CT.CATENTRY_ID
AND CGD.CATGROUP_ID = CG.CATGROUP_ID
AND LOWER(CD.NAME ) LIKE '%milk%'
AND (CGD.NAME LIKE '%Dairy%' OR CGD.NAME LIKE '%Fresh%')
AND CT.MFNAME NOT LIKE '%RanjithsMilk%'
This is really messy as we can see couple of like and one not like in the same query. So if make a query for the above requirement in Solr, it would look like below
http://localhost/solr/MC_10001_CatalogEntry_en_US/select?q=*&fq=name:milk&fq=categoryname:(Dairy+Fresh)&fq=-mfName:RanjithsMilk
Note : mfName corresponds to the DB column CT.MFNAME and " - " corresponds to removing the results from the query
Yes it is just another filter query.. !!!!!
Now if we look at the SQL and the solr query , it is pretty evident that the the solr queries are really easy to write and is fast to execute. This is a very simple example. An original search will have much more stuffs and then SOLR will be really handy.
Paginated results : we can use start (start index of the results) and rows (number of rows to be fetched) parameters to get paginated results. The below query will fetch the first five results from the response.
http://localhost/solr/MC_10001_CatalogEntry_en_US/select?q=*&fq=name:milk&fq=categoryname:(Dairy+Fresh)&fq=-mfName:Coles&start=0&rows=5
Field lists : By using fl parameter we can specify the fields that we need in the response through which we can get rid of unused data. The below query will return only name and category name in the results.
http://localhost/solr/MC_10001_CatalogEntry_en_US/select?q=*&fq=name:milk&fq=categoryname:(Dairy+Fresh)&fq=-mfName:Coles&start=0&rows=5&fl=name,categoryname
These are some of the basic tips for querying solr. There are much more things we can do and it can be covered in a different post.