Showing posts with label CEWS. Show all posts
Showing posts with label CEWS. Show all posts

Thursday 6 April 2017

SharePoint Search to extract SharePoint list Data into SQL

Problem:  I have multiple lists that I need to get data changes updated into SQL on multiple site collections.  Think 1000 site collections with 5 lists in each so 5K lists are being updated (my actual requirement is much larger).

Initial Hypothesis: In SP 2010 or SP 2013 I would use a Full Trust Event Receiver and register each list using a feature but now we are in the new world of no full trust code.  The 1st thought is RER (Remote Event Receivers), I need to register 5,000 of these and they are notoriously unreliable for delivery.  Search has to pick up all changes to list items for indexing and I can break into the crawl pipeline during Content Enrichment.

Proposed Solution:


Points to Consider:
  1. I need to create a Content Enrichment Web Service (CEWS) that I shall call from the crawl component.  I can only register a single web service on a crawl SSA so consider using the toolkit/Microsoft CEWS Framework on the Web Service endpoint this allows multiple Web Servies to be strung together.  Good idea to implement a toolkit so multiple pieces of logic can be applied.  So if BA Insights registered, then can't add another CEWS Web service. CEWS toolkit provides a pipeline of stages to add custom logic later.  So write our own stages and hook into the CEWS framework.
  2. CEWS does not process item deletes so a possibility is using the crawl log to identify deletes
  3. There is no CEWS on SharePoint Online/O365.
Note: Properties returned from CEWS has minimal data input (common custom fields such as Title) and need to augment with registered MP's to get the data in the CEWS web Service.  Do not get a dump of properties/need to register and ask for the properties.

27/04/2017 Note: An idea I had to deal with "Deletes", I could put the item into a status of "About to Delete" and the crawl would pick up the status and the CEWS could delete from he SQL database and then the Web Service could delete the list item from the SharePoint list.  It doesn't work, the CEWS pipeline does not pickup and item updated and then deleted.  So the crawl is smart but I can't enhance using this approach so more work to fix the delete ...


Tuesday 31 May 2011

FAST Search Overview

Overview:  Researching FAST for SharePoint 2010 Enterprise, I am logging my findings to provide a basic overview for using FAST with SP2010.

FAST for SP2010 OOTB Search Results

Tips:
  • Install FAST on it's own Hardware x64 Windows 2008 R2, need 4GB RAM and 4CPU's min should use 16GB RAM and 8 CPU's;
  • Min disk 50GB, 1TB with multiple spindles recommended;
  • Install FAST on seperate Hw (not on SP2010 or DC machines);
  • Neeed Internet access port 80 and IP adress should be static;
  • FAST need SP2010 Enterprise Edition;
  • SP2010 search must be installed it is still used for the People search results;
  • FAST uses a db for configuration of FAST so back-off your existing SQL Server farm used by SP2010;
  • The indexed data out of the content databases is stored in FAST indexes on the file system not in the SQL Server db;
Technical Overview:
Main components are: Crawler (examines the data to be made searchable), Web Analyser and Indexer (performs the search queries).


Crawler servers maximum of 30 million docs per node (server), crawler produces 2 databases. Sizing is roughly 3GB per million docs in the log file and 4GB per million docs in content.

Web Analyser servers has a maximum recommendation of 30 million per node. Storage of 5GB per million docs.

Indexer servers the queries back and recommend staying under 15 million docs per node. Need roughly 120GB/million documents crawled. Due to the high IOPS required by the index servers it best to keep these as physical servers. VMWare experts and tune VM’s for high IOPS but a specialist is highly recommended.

Licensing is done per server or VM and is roughly 14K/server/VM (Not verified)

People search is still surfaced using SP2010 Search so don’t remove it from the SP2010 farm. I believe you can use FAST to do people search also but it doesn’t support phonetic searches so probably a good idea to leave people search with SharePoint’s enterprise search.

Virtualization: Don’t virtualize the SQL Server, use a SAN. Don’t virtualize the Indexer servers.

Advantages of FAST:

• Higher performance and scalability

• Facetted searches are provided OOTB

• Improved meta data extractors

• Previews and thumb previews for PowerPoint, word and pdf documents

• Federation has exact number counts (I love this)

• Programmatic hook into the content publishing pipeline

Setting up FAST for SP2010 on a devloper VM  This is based on articles on technet about FAST for SP2010 and is not my original work, it has been addapted to my specific requiremnt for FAST on a development VM.
References:

Simple Logical Architecture http://www.social-point.com/sharepoint-2010-search-and-fast-search

FAST for SharePoint 2010 Troubleshooting

1.> Check FAST servers are running.  PS>nctrl status
2.> Ensure OSearch14 Windows service is running as 1 of the 2 FAST installed specified accounts 
3.> Check the Certificate Connection using PS (SharePoint)> Ping-SPEnterpriseSearchContentService -HostName FS4SP1.demo.dev:13391

PS> Restart-Service -Name "OSearch14"
4.> Authorisation crawl errors.  Check the account that is performing the crawl has permissions



  • FAST logs are found in <>\FastSearch\var\log
  • \syslog\all.log is the best log for fault finding.
  • \querylogs shows all the logs for queries
  • Use Perfmon to monitor FAST, fast has it's own set of counters. 
Updated 2017-04-05:  SP 2013 and SP2016 allows breaking into the Search pipeline on the crawl  using CEWS (Content Enrichment Web Service).  Office 365/SharePoint 2013 does not support CEWS.  Also watch out for how Deletes go thru the Web Service!  Also CEWS only has one registration of a CEWS Web Service allowed per query pipeline so look at the Microsoft CEWS toolkit if you need more than 1 web service on the crawl).  
http://www.netwoven.com/2014/07/using-multiple-endpoints-as-a-content-enrichment-web-service-in-sharepoint-2013-search/