Wednesday 25 July 2018

FAQ about Solr AEM Integrations

What are all the difference between Solr usage embedded & remote in AEM?
  • Embedded Solr is recommended only for development purpose. Production search can be implemented using remote/ external Solr. This way it makes the solution more scalable.
  • If the site has smaller content, embedded could be used. but for larger content, external Solr is recommended. 
  • When we work with external Solr, we have more control over schema, index options, boosting fields, more direct configurations etc.
  • External Solr is recommended when data/ content from third party applications needs to be indexed than from AEM.

Difference between Solr & Lucene
  • Lucene is the core search engine and Solr is a wrapper on Lucene. To read data from Lucene we need programs. But Solr provides a UI thus making easier to read indexed data.

Advantages of using Solr search


Below given major advantages of using Solr as indexing/ search engine with AEM.
  • Quick learning curve.
  • Horizontal & vertical scaling through Solr Cloud
  • Clustering through Apache Zookeeper
  • Rich full text query syntax
  • Highly configurable relevance and indexing
  • Plugin architecture for query parsing, searching & ranking, indexing

 

Wednesday 2 August 2017

Steps to implement a search

This post discuss about the steps to implement search in any applications.

Index of search implementation blog can be found at this location

Set up

Set up of the search engine and indexing options are considered as first step in search implementation.
  • Hosting: Hosting the search tool.
  • Indexing: Indexing the data.
  • Index frequency : Configuring the index.  (Daily, Monthly)

 
Configure
This step includes any configuration related to search.
  • Metadata - Used for faceting, sorting, ranking, relevance.
  • Breadcrumbs - for a better navigation.
  • Pagination - for a better navigation.
  • Recent Searches - Search assist.
  • Search suggestions(Did you mean) - Search assist.
  • Auto complete - Search assist.

Fine-tune
Once search implementation is done,we need to fine tune the search by analysing the results.
  • Promotion: Normal, Based on search result, we can promote.
  • Dictionaries : Configure for better results.
  • Banners : any organizational promotion.
  • Redirects : Send user to a page.

General considerations to choose your future search platform.

Search Management and Maintenance -
Thinking about previous data migration and future search upgrades.
User experience -
Different user interface for range of use cases.
Content -
Which resides in file repositories, document management system, database or CRM applications.
Classifying and unifying cross system data -
Content processing and defining metadata for new index.
Hybrid Scenarios -
Search data from on premise and cloud.
Result set customization -
Efficient way of displaying results.
Ranking & Relevance -
Fine tuning the order of results.
The migration process -
Provide transparent, solid experience to the end users, Educate the users about new system.

Read More

Steps to implement any search technology
AEM Dispatcher, why it  is needed?
AEM Desktop App
Figure out the best search technology or tool
Steps to implement search in Solr
Quality of Search - fine tuning search implementation
FAQ on search implementation


Thursday 14 May 2015

Search across Solr Cores

Solr Findings: Multi Core, Multi item search in Solr

Below listed some features which will be helpful while working with Solr.

1) Searching across all cores: There are some cases where we need to search across multi cores environment in Solr. Using shards, we can enable this feature.
Solr's feature 'shards' split huge indexes to make the search faster. Cores can be treated as shards for an 'All core search'. Some of the samples are given below Say we have two cores (us_en,es_en).
The both cores can be queried using below parameters
http://localhost:8983/solr/us_en/select?shards=localhost:8983/solr/us_en,localhost:8983/solr/es_en&indent=true&q=*:*&df=title

Here df is the default query field and we are doing a '*' search.


2) For multi item search: Let us say we need to search multiple items using same query. We can do a multi parameter search to a core as shown below.

http://localhost:8983/solr/es_en/select?q=*:*&df=id&wt=json&indent=true& fq= (id:”id1” | id: “id2” | id: “id3”)

Tested URL as shown.
http://localhost:8983/solr/us_en/select?q=*%3A*&df=id&wt=json&indent=true&fq=(id:"id1" | id:"id2" | id:"id3")

Friday 20 February 2015

Security precautions for Solr on Dispatcher

Things to be taken care while configuring Solr on Dispatcher:

Solr is a tool which can be accessed using direct URLs. If we miss to block the UI access, it can be a vulnerability threat for the application. Also we should block queries with <delete> for a tension free operation.

How rules are created in dispatcher for security?

There are some default security rules enabled for dispatcher and some may be as below,

RewriteCond %{QUERY_STRING} ^.*(localhost|loopback|127\.0\.0\.1).*                     [NC,OR]
RewriteCond %{QUERY_STRING} ^.*(\*|;|<|>|'|"|\)|%0A|%0D|%27|%3C|%3E||%7C|%26|%24|%25|%2B).*         [NC,OR]
RewriteCond %{QUERY_STRING} ^.*(;|<|>|'|"|\)|%0A|%0D|%27|%3C|%3E|).*(/\*|union|select|insert|cast|set|declare|drop|update|md5|benchmark).* [NC]

We need to re-write them in such a way the query is not blocked except update/delete operations.

How delete by id and delete all should be prevented from dispatcher?

Delete by id from solr core en_US

If id is ‘/content/project/us/en/Home/testarticle.html’
Invoking below URL will deleted the id and all its records from index.
http://localhost:8983/solr/en_US/update?stream.body=<delete><query>id:"/content/project/us/en/home/testarticle.html"</query></delete>&commit=true

Delete All from a solr core en_US
Invoke below url so that all data will be deleted from index for specific core en_US. But think twice before executing this command, because it delete *ALL*.
http://localhost:8983/solr/en_US/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true

How publish avoid listing of full solr server url?
Use '/solr' in page/component which uses solr query for dispatcher and there must be some rewrite rule which appends dispatcher url to /solr. Thus we can hide the solr server url from dispatcher.

How to search in entire solr fields for a query?

How to search in entire solr fields?

We have many levels of configurations for Solr, which makes Solr a rich search tool. Usually we do search the Solr with respect to a specific field which is defined in schema.xml. But there are cases we need to search across multiple fields. Let us see how it can be achieved.

There are two ways to do this.

1. Using DisMax: Usually Solr comes with dismax plugin. So in query, we just need to pass all fields in qf field as shown below.

/select?defType=dismax&q="query1","query2","query3"&qf=field1 field2 field3

In above case we are searching 3 terms query1,query2,query3 (added in inverted commas to ensure words with space fetch matching results)
field1,2,3 are the fields in schema.xml to be searched.

2. Another way is collecting all data to same field by copying through schema.xml

We need to have below lines in schema files,

<field name="datacollection" type="text_general" indexed="true" stored="false" multiValued="true"/>

Then copy the contents of required fields to the new field

<copyField source="field1" dest="datacollection"/>
<copyField source="field2" dest="datacollection"/>
<copyField source="field3" dest="datacollection"/>

Then query in default field datacollection.

JSON response from Solr

Let us see how to generate JSON format from Solr search result.

Required Configuration in Solr to get the desired JSON format:

There are cases we need desired JSON format other than the usual format provided by SOLR. Let us see how to get a desired JSON formatted output from Solr.

Steps:

1)
Below line to be added in solrconfig.xml (if multi core configuration, add below line in solrconfig.xml of each core)
<queryResponseWriter name="xslt" class="org.apache.solr.response.XSLTResponseWriter"/>

2) Given below a sample JSON response writer. Save it in your /core/conf/xslt folder.

3)
The query should have ‘&wt=xslt&tr=json.xsl’ appended to invoke the required JSON format.

Sample JSON.xsl

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text" indent="yes" media-type="application/json"/>
  <xsl:variable name="count" select="response/result/@numFound"/>

  <xsl:template match="response">
    <xsl:text>{"relatedContent":{"totalRecords": <xsl:value-of select="$count"/></xsl:text>  <xsl:text>,"results":[</xsl:text>
    <xsl:apply-templates select="result/doc"/>
    <xsl:text>]}}</xsl:text>
  </xsl:template>

  <xsl:template match="result/doc">

    <xsl:text>{"title":"</xsl:text><xsl:apply-templates select="str[@name='title']"/>  
    <xsl:text>","imageReference":"</xsl:text><xsl:apply-templates select="arr[@name='imageReference']"/>
<xsl:text>","imageAltText":"</xsl:text><xsl:apply-templates select="str[@name='imageAltText']"/>
<xsl:text>","firstName":"</xsl:text><xsl:apply-templates select="str[@name='firstName']"/>
<xsl:text>","lastName":"</xsl:text><xsl:apply-templates select="str[@name='lastName']"/>
    <xsl:text>"}</xsl:text>
<!-- append comma to all elements except last element-->
     <xsl:if test="position()!=last()">
       <xsl:text>,</xsl:text>
      </xsl:if>

   </xsl:template>
   

</xsl:stylesheet>

Integration of Solr with AEM/CQ + Zookeeper- better designs

Here we are trying to explain the design approaches when we integrate Solr search with AEM

Solr with CQ : Better designs

When we are in designing phase of Solr search with CQ, we have different approaches. One of the better approach is explained below.

How many Solr in production environment?
For the System to handle abundant requests and fail safe condition met, we need to have minimum two Solr instances set up and a load balancer on it. So when ever there is a request the load will be balanced and request will be served.



How the update works?
For updating both Solr through load balancer, we need to have Apache ZooKeeper configured(http://zookeeper.apache.org/). Zookeeper helps us to serve the configurations across server.

Can we have xslt transformation for JSON through this design?
Yes. When we need JSON response from Solr, we have to use our own xslt files for transformation. This can be kept as usual inside Solr deployed server (For eg : JBoss).

My changes on xslt not appearing in server. How to fix?
After xslt changes, some times the server cache xslt files. To get the xslt files refreshed, we need to follow below steps.
  • Stop JBoss
  • Stop ZooKeeper
  • Clear temp folders of JBoss
  • Start JBoss,ZooKeeper