Radimaging Ltd - Paul Beck's Technical Working Notes for Microsoft Technology: logging

Showing posts with label logging. Show all posts

Sunday, 2 January 2022

App Insights Overview for SaaS logging and tracing

Overview: App Insights provides independent infrastructure for logging and tracing activities. It is tightly coupled with Azure services including PaaS. This allows for consistent scalable logging. App Insights now stores logs in Azure Log Analytics, these are all under the umbrella of Azure Monitor,

On a SaaS solution, I am looking for App Insights to log any errors have the ability to log trace information. I want a unique correlationId (to allow for distributed tracing) on the front end if there is an error so support can identify the exact issue/transactions. A unique correlationId in the http header allows for identifying a transaction and this is useful for tracing and performance monitoring. Using the App Insights SDK's and implementing a common logging module is a good idea. There are two common areas that need call out to ensure the ability to trace transactions:

SPA's (Requirement to generate a unique operation/correlationId per operation not per pageview), and
Long running operation such as timer jobs or service bus calls.

Support & DevOps:

Having a correlationId allows first line to log the correlationId and quickly follow the request without asking for replication steps. This context tracing approach is common on newer applications. Third line support has full traceability of an issue to support who can empirically see the perceived performance parts broken down using the correlationId in the header.

Key API's can be continuously monitored for errors and slow down in performance, alerts can be configured around this monitoring.

Building a first line support tool that displays the errors in a hierarchy, has help scripts and knowledge bases is a good option for streamlining support.

App Insights has live monitoring and also has Kusto query language is useful for monitoring specific queries.

Summary Report for Support

// I'm sure there are nicer ways to write/improve my Kusto, so pls let me let me know where the code can be improved

let dayminus0 = datetime(now);

let dayminus1 = ago(24h);

let dayminus2 = ago(48h);

let result0 = requests

| where timestamp > dayminus1 and timestamp < dayminus0

| summarize requestCount=sum(itemCount), avgDuration=avg(duration) by performanceBucket

| where performanceBucket == "15sec-30sec" or performanceBucket == "7sec-15sec"

or performanceBucket == "30sec-1-min" or performanceBucket == "1min-2min";

let dayminus1a = ago(24h);

let dayminus2a = ago(48h);

let result1 = requests

| where timestamp > dayminus2a and timestamp < dayminus1a

| summarize requestCount1=sum(itemCount), avgDuration1=avg(duration) by performanceBucket

| where performanceBucket == "15sec-30sec" or performanceBucket == "7sec-15sec"

or performanceBucket == "30sec-1-min" or performanceBucket == "1min-2min";

let dayminus1b = ago(2d);

let dayminus2b = ago(3d);

let result2 = requests

| where timestamp > dayminus2b and timestamp < dayminus1b

| summarize requestCount2=sum(itemCount), avgDuration2=avg(duration) by performanceBucket

| where performanceBucket == "15sec-30sec" or performanceBucket == "7sec-15sec"

or performanceBucket == "30sec-1-min" or performanceBucket == "1min-2min";

let resultTemp = result0

| join kind=inner result1 on performanceBucket

| project performanceBucket, ['Today'] = avgDuration, ['Yesterday'] = avgDuration1;

let resultTemp2 = resultTemp;

resultTemp2

| join kind=inner result2 on performanceBucket

| project

performanceBucket,

['1) Today']= (round(['Today'], -2) / 1000),

['2) Yesterday'] = (round(['Yesterday'], -2) / 1000),

['3) Two Day ago'] = (round(avgDuration2, -2) / 1000)

| render columnchart

with (

kind=unstacked,

ytitle="Seconds Taken",

xtitle="Performance Group",

title="Ensure the 'Today' bar is not significantly higher than pervious days");

Monitoring: Azure dashboards are great for monitoring application health and performance. Easy to customise, make unique dashboards and security is easy to control. sentry.io monitors API's, I have not used it. I like all the Azure stuff coming out for testing and I feel continuously running Postman collections and reporting to App Insights is the best way to go. Azure Dashboards can be limiting, Azure Grafana can be a great alternative/enhancement. Check out Azure Managed Grafana.

source cloudiqtech

Alerting: I all to often see an overuse of alerting resulting in recipients ignoring a plethora of emails. I believe in minimising alerts especially via email, and SMS type messaging. For me, I like to create a dedicate channel for alerting that includes all DevOps members and either notify via a Teams card, and even easier is to email the channel. This can be broken down further but to start I create a channel for alerting for each DTAP environment.

Note: The default channel setup only allows members of the teams channel to send email so the alerts from Azure monitor using rules won't be accepted. On the channel, and admin needs to go to the "advance settings" and change the option from "Only members of this Team" and change it the setting to "Anyone can send".

Options: There are great services for logging so my default tends to be Azure Monitor. All the main vendors support Open Telemetry. The main players in Application & API observability and monitoring include:

Microsoft: Azure Monitor includes Application Insights & Azure Log Analytics
Dynatrace (really good if you use multicloud) or Dynatrace AWS cloudwatch, Dynatrace - Saas offering is on AWS. Can be on-prem. OneAgent is deployed on the Compute i.e. VM, Kubernetes. Can import logs from other SIEMs or Azure Monitor, so you can eventually get Azure service logs such as App Service or Service Bus. Does Full stack and includes code-level and applications and infrastructure monitoring, also can show User monitoring. Dynatrace offers scalable API's that are sitting on Kubernetes. "Davis" is the AI engine used to help figure out the problems. Alerting is solid.

High-level Architecture

Dynatrace Admin Monitoring

AWS: Amazon CloudWatch Synthetics
AppDynamics,
Datadog (excellent),
New Relic,
SolarWinds (excellent)

SolarWinds admin UI from circa 2013/2014

Dynatrace

Thursday, 1 October 2020

App Insights - Basic Introduction

Overview: Azure App Insights is a great platform for collecting logs and monitoring cloud based applications on Azure. All Azure Services can push logging information into App Insight instances. This can be errors, usages, performances logging that in turn is easy to query. There are SDKs for developers that can be used to add custom logging to applications. I am a big fan of AppDynamics for logging and monitoring but for SaaS and on a new project I'd go with App Insights.

Retention: App Insights can keep 730 days worth of logs. For long term storage, "Continuous Export" can be used to push data into storage accounts as soon as it arrives in AppInsights. Retaining the App Insight logs for 90 days has no additional cost, so the default to store logs should be set to 90 days at least in most situations.

What is logged and what can be logged:

All Azure Services can be configured to send service logs to a specific App Insight instance.
Instrument packages can be added to services to capture logs such as IIS, or background services. You can pull in telemetry from infrastructure into App insights e.g. Docker logs, system events.
Custom code can also call the App Insight instance to add logging and hook into exceptions handling. There are .NET, Node.JS, Python and other SDK's that should e used to add logging, exception capturing, performance and usage statistics.

App Insights has a REST API to query the logs. The "API Explorer" tool is awesome for querying App Insights online.

The data below comes from Microsoft Docs.

"What kinds of data are collected?

The main categories are:

Web server telemetry - HTTP requests. Uri, time taken to process the request, response code, client IP address. Session id.
Web pages - Page, user and session counts. Page load times. Exceptions. Ajax calls.
Performance counters - Memory, CPU, IO, Network occupancy.
Client and server context - OS, locale, device type, browser, screen resolution.
Exceptions and crashes - stack dumps, build id, CPU type.
Dependencies - calls to external services such as REST, SQL, AJAX. URI or connection string, duration, success, command.
Availability tests - duration of test and steps, responses.
Trace logs and custom telemetry - anything you code into your logs or telemetry."

App Insights creates a hierarchy of requests built up from the operationId, and operation_parentId.

Application Insights is part of Azure Monitor and makes it easy to trace user interaction. Independent infrastructure for recording issues and tracing. App Insights in 3 parts.

Collect: Track infra/PaaS via instrumentation (throughput, speed, response times, failure rates, exceptions etc.), and via SDK (e.g. JavaScript SDK, C#) to add custom logging and tracing. Blue boxes
Store: Stores the data. Purple Box
Insights: Alerts, PowerBI, live metrics, REST API. Green Box

Extending App Insights:

For long running operations like using queues or ESB you will need to tie the operations together, and it's really easy to connect this in a hierarchy using distributed tracing.

SPA's: There is a JavaScript SDK but logging on SPA's needs configuration and understanding as not every operation is logged uniquely for tracing.

Smart Detection: automatically tries to quickly warn you of problems/abnormalities and there root cause.

Snapshot Debugger/profiler: VS remote debugging can be hooked to an issue. Shows execution traces from your live app.

Transaction Search: Easy way to query and find data or unique info.

Azure Dashboards: Custom KPI dashboard using App Insights & More Info

Saturday, 22 February 2020

Catch Error in Power Apps and App Insight Logging

Error Handling:
App Insights logging: https://sharepains.com/2019/01/24/powerapps-experimenting-with-error-handling/ Replaced as Microsoft have built in telemetry as of 3 Feb 2020.
https://powerapps.microsoft.com/en-us/blog/log-telemetry-for-your-apps-using-azure-application-insights/

Example Error capturing and tracing to Azure AppInsights:
IfError( // Perform API Call , // Fallback so log here! ,

    Trace("Pauls Unique PowerApp",TraceSeverity.Error, {UserName:User().Email,
        Role:gblRole, ErrorMsg:ErrorInfo.Message, ErrorControl:ErrorInfo.Control, 
        ErrorProperty:ErrorInfo.Property});
    Notify("Err message ..." & ErrorInfo.Message); // Display the error on the UI

More detail..

Possible Canvas Apps Error Handling Pattern:

Ensure AppInsights key is added to each canvas app
Use IfError() to check calls and logic
Use the Trace method to write info to App Insights
Do I want to enable the Experimental error handling features (great to trace by correlationId)
Consider all Power Automate that use Power Apps (ensure you use the V2 Connector)
Never use IfError to handle business logic

To Review your App Insights Logging:
Open you Azure Portal > Open your App Insights blade >
Click the "Search" navigation option > Free text entry e.g. "Loyalty PowerApp"

App Insights, finding Traces generated in Power Apps

Monitoring Tool within Power Apps
The Monitor tool in Power Apps is great for debugging and tracing.

Start a monitor on the open Power App.

Monitor Tool - Showing a GET via a custom Connector and the returned response

Function/Code Logging:

Server-side code should log to App Insights or you logging framework.

It is ideal with the Trace within Power Apps explained above to be used in conjunction with 3rd party API calls.

Overview: C# code needs to have logging. If an error occurs an appropriate response must be bubbled up for the next lay

Possible C# Error Handling Pattern:

All catch write exception to Log analytics or App insights
Calls to data sources, Azure Services and third party API's and complex logic ideally should be wrapped in a try catch and log the error to App insights using the C# App Insights SDK
The catch blocks ideally return the failed information so the caller code can deal with the logic using the output. If you don't deal with the returned message, simply log the exception and rethrowing the error (this needs to be a conscious decision on each catch)
Catch specific errors: log, if you don't pass info to caller rethrow the error if applicable (bubble), respond accordingly i.e. catch the specific error and lastly use a catch all. - Heavy, but only add to existing code where this happens often or we are having problems, i.e. be specific
Don't use Try, Catch to deal with business logic

Thought: Bubble up means: Code must log exceptions and returns appropriate reply to the caller, if you don't send the appropriate reply rethrow the exception after logging it so the caller has to deal with it.

Sunday, 25 January 2015

Auditing in SharePoint 2013

Overview: SharePoint provides excellent logging capabilities, to retrieve the auditing logs Site Settings > Site Collection Administration > Audit log reports.

Notes:

By default auditing is enabled in SharePoint. PB: I think this statement if false, all the farms I review are not logging information in the audit logs.
Auditing is done at a Site Collection level.
Audit logs are kept for 30 days by default and can be change via the UI in the site collection and the clean up is controlled by CA.
Audit logs are stored within the content database, so watch the size of auditing logs. They can take up considerable space in the content database so don't just audit everything and keep the logs endlessly.
Permissions changes, check-in/check-out, search queries, edits, document views (not SPO), ... can be audited.
Various reports can be downloaded into excel for slice and dice such as the Security settings audit log report.
Each logged event roughly takes up 2k. Calculating content database storage reqs:

Audit logs can be shipped to a central storage area and removed from the Content Database, this is ensential for large CDB's that require full auditing and performance is suffering. AvePoint and Metalogix offer tools as part of their products that perform the audit log storage & removal from the CDB. Also see Varonis.

References:
https://support.office.microsoft.com/en-us/article/View-audit-log-reports-b37c5869-1b47-4a82-a30d-ea20070fe527?CorrelationId=9139de6c-b33b-45c1-9cc2-d3958a88eab3&ui=en-US&rs=en-001&ad=US
http://sureshpydi.blogspot.co.uk/2013/05/audit-log-reports-in-sharepoint-2013.html
http://sharepoint-works.blogspot.co.uk/2013/07/audit-logging-in-sharepoint-2013.html

Centralised Auditing Product:
LepideAuditor Suite – SharePoint
http://www.lepide.com/sharepoint-audit/
LogBinder SP
https://www.ultimatewindowssecurity.com/sharepoint/logbindersp/Default.aspx

Thursday, 16 October 2014

Cross Cutting Concerns for SharePoint 2013

Overview: Last week I was speaking to a smart chap and he dropped the term Cross Cutting Concern as we were discussing SharePoint Host Apps (SPHA) and JavaScript.

Problem: When creating apps for SharePoint 2013 multiple solutions need to address cross cutting concerns. In the past I deployed a SharePoint library with caching, logging, lazy loading and various other "Cross Cutting Concerns", now for Provider Host Apps (PHA), SPHA and JS embedded within pages and Single Page Apps (SPA) we need frameworks for clients to address common components.

Hypothesis:
Caching for Client Side Code: In JavaScript you can either cache using the client cookie which is small or in HTML 5 based browsers use the JavaScript local store.
Caching on the Server: All the normal Caching of C# or Azure are available. Also look at Redis.

References:
http://en.wikipedia.org/wiki/Cross-cutting_concern
Update 27/01/2015:
http://channel9.msdn.com/blogs/OfficeDevPnP/SharePoint-Apps-and-client-side-caching

Thursday, 8 August 2013

Finding Correlation Errors on a SP2013 farm

Background: SP 2013 has rich and expansive logging/tracing capabilities. Logging is done via the Unified Log Service (ULS). This will add logs to the tracelogs (often refered to as the ULS logs or ULS trace logs or ULS, it doesn't matter except you need to understand the ULS service is not only the trace log) and the Windows Event Viewer. Anything logged in the Event Viewer log will also be in the ULS trace logs.

It is worth check how your logging is setup on your farm. I change my default location for my ULS trace logs. Change the logging so it matches your farms requirements.

On a small farm, it's normally pretty easy to take a Correlation Id / the unique GUID generate for the SharePoint request, open the trace log using notepad and find the error. The default is to create a trace log every 30 minutes, these log files have a lot of data in them on busy production farms, and as you may have a large farm you also have multiple logs to check. I use Microsofts'd unsupported ULSViewer to look at all my logs regardless of farm size. You can trace the logs in a live format and then filter out what you need. Another option is to open existing errors to get historical issues. If you know the datetime and server where the error occured, you open the correct log file (it is labled with a datetimestamp) and then either filter for the correlationId or look around the time the error occured.

Lastly, timer jobs ship entries from the ULS logs into the SharePoint Logging Database (SP_UsageandHealth). You can directly query the SP_UsageandHealth database using T-SQL.

Tracing Correlation Errors on a SharePoint 2013 Farm.
User passes you a correlation Id and the date/time when the error occured, find the apprioate ULS trace log. Open the log using ULSViewer and filter for the CorrelationId. If you can reproduce the bug, you have the developer dashboard that can be turned on (performane penalty) selectively, their is a new SP2013 tab "ULS" this will show you the ULS trace snippet relating to this request.

On a big farm you may want to1st find out which server in the farm had the error:

Merge-SPLogFile -Path ".\error.log" -Correlation "5ca5555c-8555-4555-555b-f555af4d5555"

Tip: Be aware this is a heavy process, so restrict which logs you will merge.

Use ULSViewer to find the correlationId and review the logs.

Use IT tools or Fiddler to examine the http response from SharePoint to get the correlation Id, this is the SPRequestGUID (assuming it is not show on the error message).

More Info:

Tobias Zimmergren has a great post on working with Correlation Id's.

http://www.sharepointblog.co.uk/2012/09/logging-capabilities-of-sharepoint/

http://habaneroconsulting.com/insights/An-Even-Better-Way-to-Get-the-Real-SharePoint-Error

List of the ULS viewers

http://www.sharepointblog.co.uk/2012/09/logging-capabilities-of-sharepoint/

Friday, 1 February 2013

PS functionality - logging, xml, local permissions, remote PS

Read config values from an xml file - download zipped files

Download contains:
PS-Logging.Ps1 - Allows for an optional paramter to set the XML file to read from. Of the input xml is not valid or specified revert to reading a predefined xml file from the same location in the directory that PS-logging.PS1 is saved to.
PS-LoggingMore.Ps1 - Easy way to log the PowerShell consoles actions. As it doesn't work with PowerGui, it will check the IDE.
AutoInput.xml - Is an xml files that is used for configuration of these examples.
PS-Logging.png - describes the workings of the files (ignore).

Other:
Import External Functions (download) - ability to write separate PS files and call the functions within from the main/starting PS file.

Specifying Input Parameters Strict Example:
param(
[Parameter(Mandatory=$true)][ValidateNotNullOrEmpty()]
$configLocation
)

**************************************************

Ability to Execute PS on a remote trusted computer

The PS below has 2 simple functions to illustrate:

running a remote PS commands (tip, ensure the PS window is running using a network account that has administrator rights on the 2 remote servers) and
adding a domain user account or AD group to the local machine administrators group.

Invoke-Command -computername sp-srch1, sp-srch2 -command {
$computer = [ADSI]("WinNT://" + $env:COMPUTERNAME + ",computer")
$Group = $computer.psbase.children.find("administrators")
$acc = "demo/sp_searchservice"
$Group.Add("WinNT://" + $acc)
Restart-Service -Name SPSearchHostController # Restart the windows service
Restart-Service -Name OSearch15
}