Radimaging Ltd - Paul Beck's Technical Working Notes for Microsoft Technology: continous crawl

Showing posts with label continous crawl. Show all posts

Monday, 22 July 2013

SharePoint 2013 Search Overview

Overview: This post explains how SharePoint 2013 Search works. SharePoint 2013 search is the latest search within the SP product and replaces SP and Fast search that was used with SP 2010.

6 Components:

Query
Index
Analytics
Content Processing
Crawl
Admin

Tip: Group Query & Index roles on the same server. Then group Analytics & Content processing. Group Crawl & Admin (you can also add content processing to these servers).

Installing search on SP 2013 creates 4 database (all db's by default use the 'Simple' recovery model):

Search Admin,
Search analytics Reporting,
Search Crawl, and
Search Links.

A guideline from Microsoft is: "Add one index partition for every 10 million items in the search index." This depends on how you are using search, so if you have more documents per index but have few queries or can live with longer response times or your kit can deal more queries this can be increase or decreased.
Index partitions are splitting the data vertically, so if you have 25 million search items and want less than 10 million per partition, you will need 3 index partitions on the 3 index servers. You do not have redundancy, so if any index goes down your search is broken. Index replicas as the name suggests is a copy. I think of this as horizontal scaling. So if you want HA on your 25 million item search farm, you need another 3 index (Replica) partitions. You have 6 index servers. Using index replicas will improve query results speed.

Search Architectures for SharePoint 2013 - From MS (Kavindra Palaraja out of his oit2013-model-sharepoint-search-architecture.pdf document). This is not my diagram but it explains the components nicely.

Search Components in SharePoint 2013

Replicas and Index partitions explained are refereed to as Rows and Columns.

Add a new Index partition when the number of documents in the index exceeds 30 million.

SharePoint 2013 supports 3 types of Crawls:

Full (SP2010)
Incremental (SP2010) and
Continuous.

The continuous crawl on works on SP2013 content and shall display content in the crawl results as soon as data has been crawled and run thru the content processing component (CPC) (it doesn't wait for the crawl to complete). Note: security changes are only picked up after incremental search is run. There are no crawl logs for continuous crawls, so for troubleshooting go to SQL Search Service DB for the table MSSMiniCrawls (verify).
"multiple continuous crawls can run at the same time. Therefore, even if one continuous crawl is processing a large content update, another continuous crawl can start at the predefined time interval and crawl other updates. Continuous crawls of a particular content repository can also occur while a full or incremental crawl is in progress for the same repository." Technet
It is a good idea to run incremental crawls as they index more data than just SP2013 data and continuous crawl does not process or retry items that return errors and the incremental crawl shall clean theses items up.

Tip: Results can also be security trimmed at Query time, this is FTC (Full Trust Code) that must be deployed on the query role search server on-prem.

More Info:
Capacity management and sizing overview for SharePoint Server 2013
http://www.microsoft.com/en-us/download/details.aspx?id=30383
http://www.microsoft.com/en-gb/download/details.aspx?id=30374
SP2013 Stretch Farms
SP2013 Database types and desc

Design Goal - Index partitions and Index replicas.

Partitions marked in Red.

Note: 2016/11/16 - Adding email messages e.g. msg to SharePoint has always crawled the data however in MOSS and SP2010, the attachments do not get crawled. SP2013 (it may be since SP1) and Office 365 will also index the attachments of messages saved in SharePoint.