Thursday, 31 October 2013

Monitoring SharePoint 2013

Overview: SharePoint farms have several dependencies so to effectively monitor your farm there are a lot of components to review.  Basically there are 2 forms of monitoring: Preventative & Reactive.  Also see my post on performance monitoring SP 2013.

Preventative Monitoring: is reviewing your SharEPoint estate to try identify issues before they occur.  A simple example would be identifying that your database is running out of storage space allowing you to remediate before the system fails.

Reactive Monitoring: is ensuring you are notified as early as possible so you can identify the root cause and fix with minimal downtime.  This can be a simple as waiting for the help desk to escalate that the farm is down.  In maturing this a simple set of web requests that alert the administrators as soon as a service is down is an improvement.

Monitoring needs to be a combination of preventative and reactive monitoring done via automation and manual verification.  As the automation piece improves, there is less reliance on the manual monitoring.

Tooling:
There are a wealth of tools such as SolarWinds and it's competitors "Enterprise IT management from such vendors as CA Technologies (UniCentre), BMC, IBM (Tivili), and Hewlett-Packard (OpenView)."  Wikipedia

SolarWinds Monitoring Screen

Idera have monitoring tools specifically for SharePoint.  SolarWinds is a good option for monitoring SP farms and its dependencies: Windows OS, Machine resources, SQL, SP, WCA/Office Web Apps.  Couple this with web monitoring and you get a comprehensive reactive and preventative monitoring solution.  This will tell you before collapse if the server, OS, SQL or SP is slowing down or running out of resources.  If any or a complete service stop occurs the operations team are notified and it is highlighted where the error is as opposed to "it's not working".

AvePoints, DocAve 6 has a solid monitoring tool for SharePoint and the servers so if you already have DocAve this would be my choice.  The UI gets jumbled on big farms but overall the tool is easy to use and does a solid job.

Metalogixs Diagnostics Manager looks like a nice tool.  Very similar to the DocAve Monitor but you don't need to deploy as any pieces onto the farm.  The UI can be a bit busy but definitely a product to look at.
Metalogix's Diagnostic Manager sample screen shots

Other tools include ExtraHops traffic Monitor, by check how long response takes it determines with minimal interference how well the components/nodes in the infrastructure are performing.  DocAve 6.3 has a good monitoring solution specific to SharePoint, it will monitor down to the OS and report on CPU and Memory.

AlertFox is a monitoring service (SaaS).  It can perform http get requests at regular intervals (e.g. every 5 minutes).  This ensures your webservers stay warm, measures the response time from various location around the world and can check the speed of multistep actions such as loging into your web site and performing a search.  There are a lot of these but AlertFox is good.  It has dashboard, email and sms notification included. 

SharePoint Best Practices Analyser - CA > Monitor > Health Analyser provides a good place to see common problems.

EventViewer & ULS are also good places to do reactive and even preventative monitoring however these logs will need to be trawled manually.

Key items to monitor for me are:
1.> OS/VM: CPU, OS Memory, OS disk capacity/utilsation.
2.> Windows Services: Each role needs a set of services, so your monitoring tool can verify they are working.  An example for SharePoint servers services are shown in appendix A.  If you have agents such as DocAve from AvePoint, verify these are running.  Office Web Apps 2013's service is WACSM.
3.> SQL Server: Verify the services are running, monitor SQL performance ...
4.> SharePoint: Verify web requests are returning results and measure TTL, this may indicate a bottleneck is starting to occur.  If you are using multiple front end servers, check each server is working.

Appendix A. SharePoint 2013 Services to Monitor

WFE & APP Roles
Service
Name
Status
Startup Type
Log On As
SharePoint Administration
SPAdminV4
Started
Automatic
Local System
SharePoint Search Host Controller
SharePointSearchHostController
 
Disabled
Network Service
SharePoint Server Search 15
OSearch15
 
Disabled
Local System
SharePoint Timer Service
SPTimeV4
Started
Automatic
Demo\Sp_farm*
SharePoint Tracing Service
SPTraceV4
Started
Automatic
Demo\Sp_Service*
SharePoint User Code Host
SPUserCodeV4
 
Disabled
Demo\Sp_farm*

Search Role
Service
Name
Status
Startup Type
Log On As
SharePoint Administration
SPAdminV4
Started
Automatic
Local System
SharePoint Search Host Controller
SharePointSearchHostController
Started
Automatic
Demo\SP_SearchService
SharePoint Server Search 15
OSearch15
Started
Manual
Demo\SP_SearchService
SharePoint Timer Service
SPTimeV4
Started
Automatic
Demo\Sp_farm*
SharePoint Tracing Service
SPTraceV4
Started
Automatic
Demo\Sp_Service*
SharePoint User Code Host
SPUserCodeV4
 
Disabled
Demo\Sp_farm*

0 comments:

Post a comment