Tuesday 31 May 2011

FAST Search Overview

Overview:  Researching FAST for SharePoint 2010 Enterprise, I am logging my findings to provide a basic overview for using FAST with SP2010.

FAST for SP2010 OOTB Search Results

Tips:
  • Install FAST on it's own Hardware x64 Windows 2008 R2, need 4GB RAM and 4CPU's min should use 16GB RAM and 8 CPU's;
  • Min disk 50GB, 1TB with multiple spindles recommended;
  • Install FAST on seperate Hw (not on SP2010 or DC machines);
  • Neeed Internet access port 80 and IP adress should be static;
  • FAST need SP2010 Enterprise Edition;
  • SP2010 search must be installed it is still used for the People search results;
  • FAST uses a db for configuration of FAST so back-off your existing SQL Server farm used by SP2010;
  • The indexed data out of the content databases is stored in FAST indexes on the file system not in the SQL Server db;
Technical Overview:
Main components are: Crawler (examines the data to be made searchable), Web Analyser and Indexer (performs the search queries).


Crawler servers maximum of 30 million docs per node (server), crawler produces 2 databases. Sizing is roughly 3GB per million docs in the log file and 4GB per million docs in content.

Web Analyser servers has a maximum recommendation of 30 million per node. Storage of 5GB per million docs.

Indexer servers the queries back and recommend staying under 15 million docs per node. Need roughly 120GB/million documents crawled. Due to the high IOPS required by the index servers it best to keep these as physical servers. VMWare experts and tune VM’s for high IOPS but a specialist is highly recommended.

Licensing is done per server or VM and is roughly 14K/server/VM (Not verified)

People search is still surfaced using SP2010 Search so don’t remove it from the SP2010 farm. I believe you can use FAST to do people search also but it doesn’t support phonetic searches so probably a good idea to leave people search with SharePoint’s enterprise search.

Virtualization: Don’t virtualize the SQL Server, use a SAN. Don’t virtualize the Indexer servers.

Advantages of FAST:

• Higher performance and scalability

• Facetted searches are provided OOTB

• Improved meta data extractors

• Previews and thumb previews for PowerPoint, word and pdf documents

• Federation has exact number counts (I love this)

• Programmatic hook into the content publishing pipeline

Setting up FAST for SP2010 on a devloper VM  This is based on articles on technet about FAST for SP2010 and is not my original work, it has been addapted to my specific requiremnt for FAST on a development VM.
References:

Simple Logical Architecture http://www.social-point.com/sharepoint-2010-search-and-fast-search

FAST for SharePoint 2010 Troubleshooting

1.> Check FAST servers are running.  PS>nctrl status
2.> Ensure OSearch14 Windows service is running as 1 of the 2 FAST installed specified accounts 
3.> Check the Certificate Connection using PS (SharePoint)> Ping-SPEnterpriseSearchContentService -HostName FS4SP1.demo.dev:13391

PS> Restart-Service -Name "OSearch14"
4.> Authorisation crawl errors.  Check the account that is performing the crawl has permissions



  • FAST logs are found in <>\FastSearch\var\log
  • \syslog\all.log is the best log for fault finding.
  • \querylogs shows all the logs for queries
  • Use Perfmon to monitor FAST, fast has it's own set of counters. 
Updated 2017-04-05:  SP 2013 and SP2016 allows breaking into the Search pipeline on the crawl  using CEWS (Content Enrichment Web Service).  Office 365/SharePoint 2013 does not support CEWS.  Also watch out for how Deletes go thru the Web Service!  Also CEWS only has one registration of a CEWS Web Service allowed per query pipeline so look at the Microsoft CEWS toolkit if you need more than 1 web service on the crawl).  
http://www.netwoven.com/2014/07/using-multiple-endpoints-as-a-content-enrichment-web-service-in-sharepoint-2013-search/

Saturday 28 May 2011

Customer Care Accelerator for SharePoint

A few days ago I was in an Architecture Design Session with Microsoft Consulting and they showed me the Customer Care Accelerator (CCA) for Microsoft CRM dynamics 2011.

CCA simply allows you to maintain context between disconnected systems.  If I select a customer using my Windows application that stores my customer orders, I can take the selected CustomerId and use it directly in SharePoint to say retrieve all documents related to a customer. 

In summary, CCA allows me to use data from various desktop applications, such as web forms, crm, SharePoint or websites.  This is pretty useful for sticking together historical disparate systems.

More Infohttp://dynamics-crm.pinpoint.microsoft.com/en-GB/applications/customer-care-accelerator-for-microsoft-dynamics-crm-2011-12884914795
http://community.dynamics.com/product/crm/crmtechnical/b/crmukblog/archive/2011/05/11/getting-started-with-cca-for-crm-2011.aspx
http://blogs.msdn.com/b/ukcrm/archive/2011/05/11/getting-started-with-cca-for-crm-2011.aspx

Scanning, Storage & RBS

Problem:  The client has millions of physical documents, that need to be available via SharePoint, additionally documentation still arrives in physical form and needs to be scanned and classified.

Initial Hypothesis: SP2010 can store documents in the SQL database in blob format however, it's not really made for large blob storage performance wise, additionally SQL storage is expensive (RAID, HA). Remote Blob Storage (RBS) helps with storing blobs but does not get around limitations imposed by MS guidanace.  RBS can reduce storage and improve performace if you data storage involves a lot of large blobs (over 256kb  is a good size).  My rough sums show a huge data requirement so for example 600,000 customers transact with the client.  On average each customer has 3 physical documents a year.  So we are talking 1,8 million scanned documents a year.

Documents need to be scanned in at 300 dpi so they can be printed and stored adequately.  With compression and converting these files into tiff/pdf files we are assuming an average of 1 MB per file. So our storage requirement per year would require 1.8 million scanned documents at 1MB per file meaning my storage on 1,800GB

As we have a restriction of 200GB per content database in SP2010 (threshold that MS will support up to).  So we would require 9 new site collections on a new content db per year to meet this requirement. 

Tip: Also worth considering are thresholds and bounderies provide by the SharePoint team.  Site collections max size is 100GB, this scenario has a caviet in that a single Site Collection using a single document library/site supports up to 1TB in the Content Database.  You can have subsites nested in a site collection but 2000 per view is recommended.  Max of 300 content db's per Web Applicaion.  Max 5000 site collections per content database.

Our storage cost is much higher as our disks are RAID so at a minimum we would use 3 times this in actual physical disk space.  On top of this my indexes will be about 25% of the storage requirement.  So price, performance are getting out of control pretty quickly.

Resolution: Using RBS my estimate on these blobs is will will reduce the content database by 90% however content database size is calcualte including RBS so our storage requirement will be cheaper using RBS that is resilient however the content database sizing will not be reduced by using RBS. 

Updated: 21/07/2011 - RBS sizing Calc

Scanning tips for SP:
  • Tiff or pdf are the common base storage file type;
  • 300dpi is good print quality most requirements can be lower;
  • Black and white is far smaller then grey scale scanning.
  • Pdf's if stored correctly can be indexed by the search crawler.
More Info Scanning:
http://www.psigen.com/ - scanning and capture for SP2010.
Capturx from www.adapx.com/sharepoint is a pen that automates data capture on forms.
CoSign does digital signatures and looks to have pretty decent integration with http://www.arx.com/digital-signature/sharepoint
www.kodak.com/go/sharepoint
http://www.goscan.com/connectors-sharepoint.php
http://www.kofax.com/solutions/microsoft.asp

More Info Sizing:
HP Sizer for SP2010
Capacity management for SP2010 - Sw boundries

Tuesday 24 May 2011

Building a customers Taxonomy

Problem: I need to create a taxonmy for my client

Initial Hypothesis: Either start by interviewing stakeholders and building up a taxonomy that is offered as a Service application or
Buy an exisitng industry taxonomy http://blog.wandinc.com/2011/02/list-of-taxonomies-for-sharepoint-2010.html and offer it via a Service Application

Resolution: My prefered option would be to buy a taxonomy and try amend the taxoanmy with key stakeholders.  WAND have good taxonomys I believe: http://www.wandinc.com/

Friday 20 May 2011

SUGUK Southampton Meet

SUGUK South Meeting - Thursday 19th May 2011 http://suguksouth.eventbrite.com/
Thanks to Ian Woodgate for arranging the user group session.  A good close nit user group.  Using Southampton universities facilities is great. On the night we did 3 developer sessions, I got some useful tips and ideas from the other presenters namely: Martin Hatch, Chris McKinley and Darren White.
LINQ to SharePoint Sandbox solution development demo presentation
Ian Woodgate's blog & summary of the evening - http://blog.pointbeyond.com/2011/05/21/sharepoint-uk-user-group-southampton-19-may/
Martin Hatch's blog is: http://www.martinhatch.com/
Chris McKinley's blog: http://crmckinley.sharepointspace.com
Demo Code Download
VS project - create lists & set up referential integrity.
VS project - deploy a visual user control web part via a sandbox solution.
Slide Deck - More detailed

Sunday 15 May 2011

SharePoint Retreat Roundup

Summary of the day:  "Here is my feedback from the SharePoint Retreat meeting on Saturday 14th of May 2011.   9 people were at the retreat. We did 5 hours work and were really pushed for time - the day covered too many areas and needed more time to use the SPRetreat format properly. Venue was good - not exactly central but definately happy with the venue. I thought my presentation went well - only 1 other guy had used LINQ to SP in practice so a good topic choice, the practice exercise went well with everyone sharing however, we ran out of the time and Ashraf was left about an hour to present which he did very well on Info Path 2010 - a lot of the guys knew Infopath 2007 and I think they got a lot out of Ash's session.
My general feeling was the attendees were all glad they came and I believe we all walked away with good information. I think all the guys want to have more of these on a Saturday and guys are volunteering to do sessions. I like the coding katas format but it takes a long time and the day should only address 1 topic. All the guys that attended liked having 1 practice session per topic and the retrospective is great. Doing 2 topics worked well for 5 odd hours which allowed us all to only use up our day to lunch time. But i guess it really depends how in depth one goes on each topic.  
I'm going to open up a dialog with all the attendees and look to do another event in July."
Please post comments on your thoughts of the day - pls suggest topics, format for a July day.   thanks
paul


The videos I recorded came out very poorly, try get better light and equipement next time. 
Video of the slides explaining LINQ to SharePoint on YouTube

SPLINQ - Parameter.xml to show hidden fields

Problem:  I can never remember the syntax for the parameters files and need to lookup how to include hidden SP 2010 fields in the LINQ to SharePoint Proxy.

Initial Hypothesis:  Use a Parameters.xml file to get SPMetal to generate the proxy with hidden fields.  You can use the IncludeHiddenColumns element within the ContentType element to include all hidden fields or as I have done in the resolution example, specify the fields to include using the element Column.

Resolution:
1.> Create a parameters.xml (name it for you specific project) and add it to the 14\bin directory.
2.> Use the dos cmd prompt to generate the LINQ to SharePoint Proxy.
3.> Add the generate LINQ to SharePoint proxy code to your Visual Studio project. 
4.> Add the query to show 1 of the hidden fields that is now part of your proxy.
The code above uses the default Microsoft.SharePoint.Linq.DataContext class to retrieve the data, it is easier to use your own DataContext class created when you build your LINQ to SharePoint proxy code.  Code below shows how to use the proxy generate data context.

More Info: You can use the same technique for custom site columns however, you will need to manually edit the proxy to perform the appropriate casting.  I would still use the iCustomMapping interface for custom site columns.

Wes Hacket gave me this tip - Using CKSDev you can create a new SPI template that generate the proxy.  With the proxy comes a parameter file, a custom code file and you can regenerate the proxy by right clicking on the proxy and running the SharEPoint Generator tool.  Screent shot to come!

SPMetal Code Generation Rules - http://msdn.microsoft.com/en-us/library/ee537010.aspx 

Tuesday 10 May 2011

SharePoint Designer Filter using Contains Operator

Problem:  I converted a List View web part that shows a document library into a Data View Web Part.  I tried to add a filter based on a query string.  The query string is only part of the field so I need to use a contains statement.

Resolution: The "Contains" operator is missing, by switching into code view I can use the contains operator in my xslt as shown below.

Data view xslt Tip: Retrieve a column named Kno and intermingle with xHtml

Monday 9 May 2011

iCustomMapping LinkFieldValue errors in Sandbox solutions

Problem:  I was trying to use the iCustomMapping interface to add additional columns of type LinkFieldValue to a sandox solution I am getting the error:
{"Type 'Microsoft.SharePoint.Publishing.Fields.LinkFieldValue' in Assembly 'Microsoft.SharePoint.Publishing, Version=14.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c' is not marked as serializable."}

Initial Hypothesis: I could not see why I as getting the error but a Google search found me a post that gave me the answer.

Resolution:  Change the code from
SPListItem item = (SPListItem)listItem;
this.ContentLinkField = item["LT1"];
To:
SPListItem item = (SPListItem)listItem;
string value = item.GetFormattedValue("LT1");
this.ContentLinkField = new LinkFieldValue(value);

More info:
http://www.alexbruett.net/?p=255 

Thursday 5 May 2011

WebControls not supported in sandbox solutions

Problem:  I wanted to use a SPGridView in a sandbox solution, so I added the code an intellesense doesn't pick it up.  Looked up the SPGridView class documentation on MSDN and it's not supported.  Not really sure why but hardly anything in the Microsoft.SharePoint.WebControls assembly works with sandbox solutions.

Initial Hypothesis:
Microsoft.SharePoint.WebPartPages.ListViewWebPart is another example of a control that can't be added progratically, however it can be added using SharePoint Designer.  I'm baffled!

File Upload Size Limits

Problem: Need to store files in excess of 2GB in SharePoint.
Initial Hypothesis:
  • 50MB is the default upload limit set by SharePoint OOTB.  You can change this on the farm as described here by Dave Coleman up to 2GB or 2047MB. 
  • A common misconception is that by using RBS and not you content database to store the blob you can overcome this 2GB limit.  We it's partly true...  The maximum file size for a file in SQL Server is 2GB however, the next restraint is SharePoint 2010 Server Object Model and this has a hard limit of 2GB for an upload so moving to RBS won't overcome the problem.
  • I believe SharePoint limit's the upload to 2GB due to IIS's worker process w3wp.exe, to upload a file you need to use all the IIS available memory to upload the full stream.  Each w3wp.exe worker process runs well with 2-4GB of memory, this is not a boundary just a good idea (on x64), therefore this makes sense to me that the SP2010 team have limited any file upload to 2GB.
  • Also be aware that increasing you upload file size to 2 GB has performance ramifications so it a user uploads a file and there is no memory available no new requests can be handled until the memory is available again.
Resolution:  Store large files outside SharePoint and surface them in SharePoint.  I believe there is another solution available using a Telerik Silverlight upload control but I haven't tried it.
More Info:
http://blah.winsmarts.com/2010-3-Large_File_Upload_in_SharePoint_2010.aspx

Update 24/07/2013: SharePoint 2013 has the same hard limit of 2 GB for the maximum upload size.  Technet states 50 MB is the default limit for SP2013, the default from an OOTB install is 250MB which is the same value you get with SharePoint Online/Office 365.

Tuesday 3 May 2011

Corrupt site column cannot be deleted

Problem: I have deployed a site column (field) and Content Type declaratively, the site column is not valid and I get the following error when looking at my site columns using SharePoint 2010's UI: "Field type xxx is not installed properly. Go to the list settings page to delete this field."
Initial Hypothesis:  The cause of the issue is simply that I create a site column of type "Bool", Type bool does not exist, boolean is the correct type, as show below:
Resolution: Delete the Site Column, can't do this I tried SharePoint's UI (as shown in the problem image), SharePoint Designer, Solution Explorer with CKSDev extensions.  All tools error as the Site Column object cannot in instantiated.
So now it was time to try Powershell which will fail as it can't instantiate the Site column object either. 
PS> $web = Get-SPweb http://demo1/sites/sponline
PS> $fields = $web.Fields
PS> foreach($field in $fields) {  write-host $field.Id }
This proves the site column exists but it is corrupted.  I tried deleting it using PS.
Obviously the site column won't be removed. 

Summary:  At this point I am in a knot.  The site column is causing errors and it can't be removed, so the option is to delete the site collection or to do the bad stuff and fix the error directly in the database.

                         =================

Problem:  A corrupt site column needs to be deleted using T-SQL.

Initial Hypothesis: Find the corrupt/offending field, as you can't use SharePoint's API to remove the field, it will need to be done directly in the content database using T-SQL.

Resolution:
  1. Find the site column causing the issue using Powershell.
  2. I opened the field.text file and found:   I did a search for "bool" in the text document, there are a lot of results but all the other found "boolean".  So simply search for the type that is shown in your error message.
  3. Find the content database that is storing the offending Site Column using Central Admin.
4.> Open "Microsoft SQL Server Management Studio" and perform a query to find the offending record

T-SQL:  SELECT * FROM [Demo_PortalDB].[dbo].[ContentTypes] WHERE Definition LIKE ('%332B55548E6C%')
5.> Delete the offending record

T-SQL: DELETE FROM [Demo_PortalDB].[dbo].[ContentTypes] WHERE Definition LIKE ('%332B55548E6C%')

Summary:  Editing SharePoint database directly is not supported by Microsoft and should not be done under any circumstances.  I can't find another way to fix this issue - so if anyone has a suggestions I'd love to know it.

Error Message Examples:
Field type Bool is not installed properly. Go to the list settings page to delete this field.
Field type Financials Gross Value Certified is not installed properly. Go to the list settings page to delete this field.
Field type UKTelephone is not installed properly. Go to the list settings page to delete this field.

Monday 2 May 2011

SharePoint Retreat South West London

Overview:  Ashraf Islam & I are presenting a SharePoint Retreat on 14 May in Cobham (South West London).  We are looking at LINQ to SharePoint & InfoPath on the day.  This event broadly follows Andrew Woodward's SharePoint Retreat format: so it's free, everyone attending will get real world coding practice and we to retrospectives after each session.  http://www.21apps.com/sharepoint/spretreat-swlondon2011/
To book for the event: http://spretreatswlondon.eventbrite.com/

LINQ to SharePoint Session Downloads:
VS project - create lists & set up referential integrity.
VS project - deploy a visual user control web part via a sandbox solution. 

Update 14 May 2011 - Thanks to everyone - i gained a lot from the day, good to meet a and discuss things with some passionate devs & architects - paul
Update 28 May 2011 - Video I recorded on using LINQ to SharePoint

Sunday 1 May 2011

SP2010 Data Access options for developers

Overview:  This post contains 4 wmv's that explain both client side and server side data access options for developers.  This was a single pesentation that I have broken up into 4 smaller units of 10-15 minutes each is about 15 Megs in size. 

Part 1 - Overview of Data Access options for SharePoint 2010 developers
Part 2 - Introduction to LINQ to SharePoint
Part 3 - More options
Part 4 - LINQ to SharePoint tips

SP2010 lists vs DB tables

Overview:  When developing using the SP2010 Server Object Model you don't have to store data in SharePoint lists, especially considering the BCS in SP2010.  This presentation video was part of a session present on data access options for Sharepoint 2010 in April 2011. It specifically looks are comparing database table vs SharePoint lists to help me decide which option to use. 
http://www.youtube.com/watch?v=8ecYVdR3a1g