Welcome to Command Center documentation!

Indices and tables

Introduction

Welcome to the RLCatalyst Command Center user-guide. This user-guide is designed to provide documentation for users who will be installing, administering and using the Command Center product.

What is RLCatalyst Command Center

RLCatalyst Command Center is a cloud-based software product that can be used to monitor services and their underlying infrastructure. The product provides early detection and warning of problems in the targeted services or infrastructure. The product also provides capabilities to integrate problem tickets and capture incident details which can help to narrow down root cause.

Getting Started

You will be provided the following pieces of information in your starter kit:

URL application url
Company  
User  
Password  

Keep this information handy as you go through this guide and configure your system.

Planning your deployment

RLCatalyst Command Center is capable of multi-tenancy. Using the same instance of the software, you can create several tenants. Each tenant can configure his own machines for monitoring. Each tenant can also configure his own cloud accounts and get an independent view of his cloud assets. The landlord can create new tenants in the system.

Creating your first tenant

To plan the creation of a new tenant, use the planning sheet in Appendix A to collect all the information required upfront. Keep the sheet handy as you go through the following steps.

Open a browser (we recommend Chrome or Firefox). Enter the Application URL provided. The login page should open.

_images/login.png

To register a tenant, click on the Register link which is available on the login page & application will display Register screen to the user.

_images/registeraccount.png

Use details from Appendix A for Company Name, User Name, Email & set the Password as per Password policy. Click on Create Account button. You will see a Thank You screen confirming that a verification email has been sent to the email address registered.

_images/VerificationEmailSent.png

Check the verification email delivered to the registered email address & click on the verification link to activate the account. On successful validation, tenant will be allowed to login into the Command Center.

_images/VerificationEmailConfirmed.png

Logging in as a tenant

Open a browser (we recommend Chrome or Firefox). Enter the application URL provided. The login page should open. On the login page, fill the Company, User and Password fields as captured in Appendix A. Then click the Login button. You will see the landing page.
Business Service Status View – by default this will not show any data. You will need to configure business services following the instructions in this guide.

Service Health – providing a quick way of viewing at a glance, if any of the linked services (across BSM’s) are in alarm state (Yellow & Red). By clicking on critical/warning service card, the system shall navigate to the Services page and should show the Service and Nodes tabs related to selected service.

ServiceNow Ticket Snapshot – by default this will not show any data. You will need to configure a Service Now account following the instructions in this guide.

BOT’s Summary (Total)- We need to configure a Catalyst Account to view the count of Bot’s summary.

_images/DashboardBSMView-new.png

Configuring Cloud Credentials

RLCatalyst Command Centre gives you the ability to view all your cloud assets (spanning across providers and accounts) in one place. These assets include

  • Virtual Machines
  • ELBs
  • Security Groups
  • Networks

Configure your Cloud Account Details in the Command Centre Settings to view all your cloud assets in one place. Command Centre collects the information from the configured cloud account periodically. You can configure the interval in which this information refreshes.

_images/ProvidersSetting.png

In Provider Settings, we have categorized the providers based on their services. Depends on Category selection Provider List will load the available vendors.

_images/AddProvider.png
Command Center will support for following Cloud Account providers.
  • Microsoft Azure
  • AWS
  • Google Cloud
  • DigitalOcean

To configure a Azure cloud account

  1. Click on the Settings icon in the top bar
  2. Click on the Provider Settings tab
  3. Click + button and add your cloud account credentials in Settings with the details captured in Appendix A. Example provided below is for a Microsoft Azure account.
Field Instructions
Account Name Enter a Friendly name
Provider Categories Choose Cloud
Provider Categories Choose Microsoft Azure
Time Zone Choose IST
Authentication Type Choose OAuth
Client ID Enter the Client ID of your Azure application E.g.: 9812d575-dja-4b48-8434-hdgh
Client Secret Enter the Secret key of your Azure Application
Grant Type Enter the text ‘client credentials’
Resource https://management.azure.com/

Enter the Azure subscription ID | +—————————–+———————————————————————————+ | Tenant ID | Enter the Azure Tenant ID | +—————————–+———————————————————————————+ | Schedule | Enter the Time Interval for collecting data from Cloud | +—————————–+———————————————————————————+ | Repeat | Choose the Interval Type – Minutes/Hourly | +—————————–+———————————————————————————+

Note: To get the Client ID and Client Secret key, create an application in Azure and set the Role as Reader. To set the Role, Go to Subscription->Resource Group->Access Control(IAM)->Add>Permissions->Add Reader Permission

_images/AddProviderAzure.JPG

To configure a AWS cloud account

  1. Click on the Settings icon in the top bar
  2. Click on the Provider Settings tab
  3. Click + button and add your cloud account credentials in Settings with the details captured in Appendix A. Example provided below is for a AWS account.
Field Instructions
Account Name Enter a Friendly name
Provider Categories Choose Cloud
Provider List Choose AWS
Time Zone Choose IST
Authentication Type Choose IAM
Access Key Enter the Access Key of your AWS Application
Secret Key Enter the Secret key of your AWS Application
Region Enter the Region of your AWS Application
Account Number Enter the Account Number of AWS Application
Account Number | Enter the Account Number of AWS Application |

Choose the required options | +—————————–+———————————————————————————+ | Schedule | Enter the Time Interval for collecting data from Cloud | +—————————–+———————————————————————————+ | Repeat | Choose the Interval Type – Minutes/Hourly | +—————————–+———————————————————————————+

_images/AddProviderAWS.JPG

To configure a Google cloud account

  1. Click on the Settings icon in the top bar
  2. Click on the Provider Settings tab
  3. Click + button and add your cloud account credentials in Settings with the details captured in Appendix A. Example provided below is for a Google Cloud account.
Field Instructions
Account Name Enter a Friendly name
Provider Categories Choose Cloud
Provider List Google Cloud
Time Zone Choose IST
Authentication Type Choose Google Service Account
Project ID Enter the Project ID of your Google application
Private Key Enter the Private key of your Google Application
Client Email Enter the email of client generated email
Client Certificate URL Enter the client generated URL
Schedule Enter the Time Interval for collecting data from Cloud
Repeat Choose the Interval Type – Minutes/Hourly
_images/AddProviderGoogle.JPG

To configure a DigitalOcean cloud account

  1. Click on the Settings icon in the top bar
  2. Click on the Provider Settings tab
  3. Click + button and add your cloud account credentials in Settings with the details captured in Appendix A. Example provided below is for a DigitalOcean account.
Field Instructions
Account Name Enter a Friendly name
Provider Categories Choose Cloud
Provider List Choose Digital Ocean
Time Zone Choose IST
Authentication Type Choose Token
Token Enter the token of your Digital Ocean application
Schedule Enter the Time Interval for collecting data from Cloud
Repeat Choose the Interval Type – Minutes/Hourly
_images/AddProviderDigitalOcean.JPG

Configuring ITSM Credentials

Add ITSM service in provider settings, It will raise incident when service is down or not available. After adding ITSM, Dashboard and ITSM menu will update with cards. Clicking on the card will display the data regarding the particular card. ITSM will provide the direct link to the ITSM provider as whenever we select any incident it will redirect to the particular ITSM service.

Command Center will support for following ITSM providers.
  • ServiceNow
  • FreshService

To configure a ServiceNow account

  1. Click on the Settings icon in the top bar
  2. Click on the Provider Settings tab
  3. Click + button and add your ITSM account credentials in Settings with the details captured in Appendix A. Example provided below is for a ServiceNow account.
Field Instructions
Account Name Enter a Friendly name
Provider Categories Choose ITSM
Provider List Choose ServiceNow
Time Zone Choose IST
Authentication Type Password
Host URL to your ServiceNow Instance E.g.:ven01746.service-now.com
UserName Enter UserName
Password Enter Password
Schedule Enter the Time Interval for collecting data from Catalyst
Repeat Choose the Interval Type-Minutes/Hourly
_images/AddServiceNow.jpg

To configure a FreshService account

  1. Click on the Settings icon in the top bar
  2. Click on the Provider Settings tab
  3. Click + button and add your ITSM account credentials in Settings with the details captured in Appendix A. Example provided below is for a FreshService account.
Field Instructions
Account Name Enter a Friendly name
Provider Categories Choose ITSM
Provider List Choose FreshService
Time Zone Choose IST
Authentication Type Password
Host URL to your FreshService Instance E.g.https://rlab.freshservice.com
UserName Enter UserName
Password Enter Password
Schedule Enter the Time Interval for collecting data from Catalyst
Repeat Choose the Interval Type-Minutes/Hourly
_images/AddFreshService.JPG

Configuring Business Services

Add Business Services to be monitored in the dashboard view. Each service added will be monitored in the predefined interval. The Business Services will appear as cards in the dashboard each showing the latest status of the service. Clicking on a card will show you a drill down view of the service with the alerts related to the service and the outage trends. Use the Business Services information captured in Appendix A as you follow the steps below.

To configure a business service

  1. Click the + icon in the dashboard view to bring up the Add Service dialog.
  2. Add the Business Service URL (should be accessible from the Command Centre)
  3. Enter an alias or a name of the service. This will be the name displayed on the card in the dashboard.
  4. Provide an email ID to which alerts will be send during Outages. You can provide more than one email ID separated by commas.
  5. A verification e-mail will be sent to each email ID provided above. Clicking on the link in the email will confirm the email ID for receiving emails.
  6. Check the box to get email notifications for linked services
_images/AddBusinessService.jpg

Configuring the Catalyst Account

Configuring a Catalyst account allows you to access the summary of BOT runs on your dashboard page. It also enables the Remediation and Auto-Remediation features.

To configure a catalyst account

  1. Click on the Settings icon in the top bar.
  2. Click on the Provider Settings tab
  3. Click + button and add your catalyst account credentials in Settings with the details
Field Instructions
Account Name Enter a Friendly name
Provider Categories Choose Automation
Provider List Choose RLCatalyst
Time Zone Choose IST
Authentication Type Password
Host URL to your RLCatalyst Instance E.g.:https://neo.rlcatalyst.com/
UserName Enter UserName
Password Enter Password
Schedule Enter the Time Interval for collecting data from Catalyst
Repeat Choose the Interval Type-Minutes/Hourly
_images/AddRLCatalystAccount.jpg

When you add a Catalyst account, BOTs Summary panel will appear on the dashboard.

Installing the Monitoring Agents

RLCatalyst Command Centre uses monitoring agents that run on the individual machines being monitored. Monitoring Agents can be installed manually or via an automated way through RLCatalyst.

Install Agents through RLCatalyst

RLCatalyst installs monitoring agents in the target nodes on which the Business Services are running. This is done via a bootstrapping process which will install system monitoring, app monitoring and services monitoring agents into the instances. Once installed, the real-time monitoring alerts will be available under RLCatalyst Command Centre→Services and RLCatalyst Command Centre→Monitoring Tools.

  1. Login to <customer name>neo.rlcatalyst.com with the given credentials -> Go to Work zone.
  2. Click on the tree on the left to choose the Organization, Business Group, Project and
Environment. By default, there will be
o Organization with the customer name
o Business Group ‘DevOps’
o Project ‘Demo Project’
o Environments - <customer name>_EVL,

<customer name>_DEV, <customer name>_QA,

<customer name>_PROD, <customer name>_DEVOPS
  1. Choose one of the environments
  2. Click on ‘Import’ button. Enter the IP address of the instance, credentials and Import. The agents will be installed automatically when imported.

Note: The checks added for monitoring your services in Consul should be tagged/grouped properly with the business service name that has to be listed in the Dashboard View. RL Team will provide necessary help to get the service checks added

Installing monitoring agents on a Linux machine using a downloaded script Note: Perform the following steps on each machine listed under each Business Service in Appendix A.

Prerequisites

  1. To configure a machine or VM for monitoring with Command Center the following ports need to be opened in the firewall: 8301 ,8302 ,8500,8600, 3030
  2. You need sudo privileges to install the clients
  3. The machine should have a public IP address to communicate with the monitoring servers.

Procedure

  1. Download the agent_ installation.tar.gz file from the following URL: https://s3.us-east-2.amazonaws.com/cookbookslist/v2.6/linux-agent-installation.zip
Parameter1 Service name<A friendly name for the service .This will be your Business Service>
Parameter2 tag application name <Name of this application e.g. MongoDB on which your Business Service depends>
Parameter3 tag tenant id<Company Name for this Tenat>
Parameter4 URL
Parameter5 Checks interval e.g. 60s

You should now have the monitoring agents running on your machine.

Install monitoring agents on a Windows machine through a downloaded script Note: Perform the following steps on each machine listed under each Business Service in Appendix A

Prerequisites

  1. To configure a machine or VM for monitoring with Command Center the following ports need to be opened in the firewall: 8301 ,8302 ,8500,8600, 3030
  2. You need to run PowerShell as Administrator (right-click and choose “Run As Administrator”)
  3. The machine should have a public IP address to communicate with the monitoring servers.

Procedure

  1. Download the agent_ installation.tar.gz file from the following URL: https://s3.us-east-2.amazonaws.com/cookbookslist/v2.6/windows-agent-installation.zip

Install monitoring agents on a Windows machine manually

Prerequisites

  1. To configure a machine or VM for monitoring with Command Center the following ports need to be opened in the firewall: 8301 ,8302 ,8500,8600, 3030
  2. You need Administrator privileges to install the clients
  3. The machine should have a public IP address to communicate with the monitoring servers.

Procedure

  1. Choose the Chef Windows package based on the Operating System (Ex: Windows 2012) & Architecture (Ex: X86_64) from the below link in the required/available windows machine https://downloads.chef.io/chef#windows
_images/InstallingChef.jpg _images/ExtractingAgentInstallers.jpg _images/VerifyingConsulAgent.jpg _images/VerifyingSensuAgent.png

Install monitoring agent(Zabbix) on a Ubuntu machine manually

Procedure

  1. Zabbix apt repositories are available on Zabbix official website. Add the repository to install required packages for Zabbix agent using the following command wget http://repo.zabbix.com/zabbix/3.4/ubuntu/pool/main/z/zabbix-release/zabbix-release_3.4-1%2Bbionic_all.deb

  2. Un-zip the downloaded file using following command

    dpkg -i zabbix-release_3.4-1+trusty_all.deb

  3. As you have successfully added Zabbix apt repositories in your system let’s use the following command to install Zabbix agent using the following command

    sudo apt-get update sudo apt-get install zabbix-agent

  4. After installation of Zabbix agent. Edit Zabbix agent configuration file /etc/zabbix/zabbix_agentd.conf and update Zabbix server IP.

    #Server=[zabbix server ip] #Hostname=[Hostname of client system ]

    Server=192.168.1.10 Hostname=Server2

    Here 192.168.1.10 is the IP of Zabbix server to allow for connection with this Zabbix client.

  5. After adding Zabbix server IP in the configuration file, now restart agent service using below command.

    sudo systemctl start zabbix-agent

Features

Historical BSM Health Indicator

Historical BSM Health Indicator gives you the ability to see the trend of the BSM over last 30 days as a consolidated view. Using this view, the user can then navigate to specific outage view of interest..

The view can be available with a Trend Icon on Top-Left of BSM View and clicking that can show the Consolidated status of all BSM over last 30 days with appropriate status.

There is tab like “Business Services” in the main page.When we click on that tab,it’s navigated to Business services page.In that page we see the bot’s summary information,snow tickets information etc….

images/Navigation_BSM.PNG _images/HistoricalStatusTrendIcon.png _images/HistoricalStatusView.png

Clicking the link of Outage (Red) or Partial Outage available in the Historical Status Dashboard will take the user to the appropriate Outage Drill-down page

_images/HistoricalStatusDrillDownView.png

Multi-level Business Service

Multi-level Business Service feature will show the Dependent Business services(Linked Business Services) of the Business Service. User can provide any Business service as Dependent Business service of another Business Service. Whenever dependent Business Service went down the parent Business Service also shown as Yellow.

Configuration

User will provide the Dependency between Business Service using Yaml file. The Yaml file should follow the following rules.

  • File Name should be in the following order <tenantID>.<BSM Name>.yml.
  • The YAML file should contain Parent BSM and their linked BSM’s
  • Once the YAML is complete it should be added to the topologies folder.
  • Then the scheduler server should be restart to refresh all the topologies picked from the file.
_images/MultiLevelYamlFileConfig.jpg

The dendent Business service will show in the topology(Graphical view and List View) and quick view topology.

_images/MultiLevelGraphical.JPG _images/MultiLevelListView.JPG _images/MultiLevelQuickView.JPG

Viewing Cloud Assets

From the menu at the top left of the top bar, choose CMDB. Cloud assets will be listed once the Cloud Credentials are added in Settings. From the dropdown choose the cloud account and get the summary view and list view as shown in screenshot. The CMDB lists the following:
  1. Virtual machines
  2. Disks
  3. Security Groups
  4. Network Cluster
  5. Compute Databases
  6. Load Balancers

If the assets are tagged, the same information will be fetched into CMDB also. You can filter the CMDB assets view by clicking on buttons “All, Running, Monitoring “ which is available in the right corner just above the table. By default, ALL filter should be selected.

ALL: displays all the nodes (Active & Inactive)

_images/CMDBViewOfCloudAssets(ALL).jpg

Running: displays all the running nodes

_images/CMDBViewOfCloudAssets(Running).jpg

Monitoring: displays the monitoring nodes health services, Node, ELK Log Icons. Clicking on Services, Node & ELK Log Icons shall take the user to respective pages.

_images/CMDBViewOfCloudAssets(Monitoring).jpg

RSS-Feed for cloud providers

A new tab Third party tab will be displayed in Dependency Health panel, which will monitor the cloud provider status which tenant has added in the provider settings.

_images/CC_ThirdPartyAlerts_DependencyHealth.JPG

Aggregated Alerts

Once the services are added and agents are installed, the alerts will be aggregated from multiple monitoring sources by the respective collectors. Alerts are currently aggregated from

  • Ping BOTs – Checks Availability of Services
  • Consul – Monitors Services
  • Sensu – System Monitoring

When the service goes down or if an Outage happens, the corresponding card on the dashboard view will turn Red. When any of the dependent services has a problem related to BSM will be Yellow. Clicking on the card will give details on linked services and the associated nodes

_images/OutagesasRedCardsOnDashBoard.jpg

Here the details of linked services and the associated nodes of a particular BSM is shown in a graphical representation.

_images/Graphicalviewfromcard.png

clicking on each box in the graphical view will pop-up and shows the details of that particular thing in a detailed manner.

_images/Popupdetails.png

At the top right corner, you can see a button which gives us an another option of viewing the details of linked services and asociated nodes for a particular BSm in a listed view.

_images/DrilldownViewfromCard.jpg

Click on the Alerts button to see the detailed Alerts from multiple sources (Pingbot, Consul & Sensu). Alerts aggregated by Node or Service in the Alerts Monitor screen.

Service alerts are shown on the Services tab of the Alert Monitor.

_images/BSMServiceAlertsView.JPG

System alerts are shown in the Nodes tab of the Alert Monitor.

_images/BSMNodeAlertsView.JPG

The dependent services of the Business Service and their health can be viewed under the Linked services section of the same page.

The dependent nodes of the Business Service and their health can be viewed under the Nodes section of the same page.

Click on the Outages tab to get a detailed list of all the outages detected by the system.

_images/OutageDetails.jpg

Incident Communication

Click on the Incident Id to open the associated ServiceNow ticket on the ServiceNow portal. Click on the Incident Communication icon to send out communication about the incident with Root Cause Analysis & Category.

_images/IncidentCommunication.jpg _images/IncidentCommunication(AddSection).jpg

Auto-create Incident Communications for Detection and Resolution :

System automatically creates Incident Communication for application outage detection and resolution.

_images/AutoCreateIncidentCommunication.jpg

Click on the Communications tab to see a timeline of incidents

_images/CommunicationTimeline.jpg

Command Center provides a feature called “Fault Table” to capture known problems related to a service and then uses the information to help the user to categorize the root-cause of any outage that occurs.

User can add fault to “Fault Table” by clicking on + icon which is available in the “Known Faults” table (Menu->Known Faults link-> + icon)

_images/AddKnownFault.jpg

When a Root-cause identified incident communication is entered, the user can link the RCA Incident Communication to an item in the Fault Table associated to the BSM through Add Incident Communication screen.

_images/KnownFaultSelection.jpg

User shall be able to navigate to the Fault Table from any outage which is linked to a fault by clicking on “Fault” link in the Outages screen.

_images/FaultsLink.jpg

User can view the count of outages linked to a fault by clicking on the “Outages Linked” link in the Fault table

_images/OutagesLinked.jpg

Aggregated Alerts for all services are available from the left pane menu ‘Services’.

_images/AggregatedServicesAlertsView.jpg

Aggregated Alerts for all servers/instances are available from the left pane menu ‘Monitoring Tools’

_images/AggregatedSystemAlertsView.jpg

History for all servers/instances are available from the Monitoring Tools->Clients->History

_images/HistoryOfServersandInstances.jpg

Click on History Icon, to view the detailed history information regarding each client

_images/HistoricalDataRelatedtoInstances.jpg

Logging in as a landlord

Open a browser (we recommend Chrome or Firefox). Enter the application URL provided. The login page should open. On the login page, fill the Company, User and Password fields as captured in Appendix A. Then click the Login button. You will see the landing page of the tenant created first and by choosing the tenant be able to view the data of that tenant.

_images/LandlordView.jpg

Remediation

Command Center allows you to restart the service if a problem is encountered either at an underlying Node level or at a dependent service level. This feature is to give L0/L1 level support personnel a quick means of attempting to correct a problem.

When a dependent node/service has a critical alert, you have an option to remediate the problem by clicking on the icon to restart the service which is available in the BSM drilldown view screen. The BOT would then restart the node.

_images/RemediationIcon.jpg

Auto Remediation

Command Center allows you to choose to configure certain Business Services (Managed Nodes) for auto healing. Whenever an outage is detected for a BSM configured with auto-healing, the system shall then kick-off the auto-remediation process. Auto-healing shall be initiated for nodes provided are in warning or critical status.

Manual remediation shall not be available for Nodes under a BSM that is enabled for Auto-healing.

You can opt for Auto-healing option by checking the Checkbox “Enable Auto-Remediation” which is available in the “Add Service” screen.

_images/AutoRemediation.jpg

Planned versus Unplanned outages

The idea of this feature is to provide a capability to plan a down-time so that the availability of the Business Service shall not be affected. CommandCenter has provided a screen to enter a planned outage. This screen shall take a date-time range, the nodes that are affected and the BSMs that are affected. When an outage occurs, check if the outage falls within a planned outage window. If yes, do not consider that outage in the availability calculations.

By clicking on link “Plan Outage” which is available under the menu, application will open “Planned Outage Details “screen. By clicking on + icon you can add Plan outage for the required service.

_images/PlannedOutageDetails.jpg

Contact Person

In an operations center which is using a tool like Command Center, one of the first pieces of information required when a problem is detected is the contact person designated for that Business Service. The idea of this feature is to ensure that Command Center provides an easy way to enter and display this information.

In the Settings screen, provided one more tab called Contact Details.In this screen, we can add Contact Details for each BSM.

_images/ContactDetails.JPG

On the Dashboard screen, each BSM card should show a ContactDetails icon. Clicking this icon should present the Contact details for that BSM in a pop-up screen.

_images/ContactDetailsIcon.png

Impact Tree

Impact tree provides a quick way to view the quick glance where in the topology the problem is.

On the main Dashboard view, if a BSM card appears yellow, the user should be able to click on an icon that shows at a quick glance pop-up.This pop-up should show a tree-view with the BSM at the top, with the Nodes under it and the services under the nodes.Based on the alerts each level will be marked with a Yellow or Red highlight.

_images/impacttreeicon.JPG _images/topologyquickview.JPG

Outage Summary Report

Outage Summary Report will facilitate an operation manager what outages were faced,planned deployments and what early warning was provided by the tool.A report that can be run for different time-periods and which lists the outages and alerts shall be provided.

A new item called “Reports” be added to the application-menu (top-left).

_images/report.JPG

Clicking on the “Reports” menu item shall lead the user to a screen where he can choose the report to run.

_images/reporttorun.JPG
On choosing the report to run from a drop-down menu, the user shall be displayed the input fields which are specific to that report.On choosing “Outage Summary Report” the following input items shall be displayed.
BSM: This will be a drop-down that allows a specific BSM item or All BSMs that will run the report under the logged in tenant.
_images/BSMselection.JPG

Time-period: This will be a drop-down box that allows the user to choose the time-period. Available choices shall be

Yesterday: Will mean the time-period from yesterday 12:00am to 11:59pm. This day: Will mean the time-period from 12:00am of the current date to now. This week: Will mean the time-priod from 12:00am of Monday of the current week to now This month: Will mean the time-period from 12:00am of 1st of the current month to now Last 24 hours: will mean 24 hour period from current time. Last 7 days: will mean 24*7 hour period from current time. Last 30 days: will mean 30*24 hour period from current time.
_images/outagereportinputfields.png

Based on BSM Selection & Duration filter selection,outage summary report shall be generate with the two buttons “Download report as PDF” and “E-mail report”.

_images/createreport.JPG

Health Summary Report

Clicking on the “Reports” menu item shall lead the user to a screen where he can choose the report to run.

_images/HealthSummaryReportRun.JPG
On choosing the report to run from a drop-down menu, the user shall be displayed the input fields which are specific to that report.On choosing “Health Summary Report” the following input items shall be displayed.
BSM: This will be a drop-down that allows a specific BSM item or All BSMs that will run the report under the logged in tenant.
_images/HealthsummaryBSMselection.JPG

Time-period: This will be a drop-down box that allows the user to choose the time-period. Available choices shall be

Yesterday: Will mean the time-period from yesterday 12:00am to 11:59pm. This day: Will mean the time-period from 12:00am of the current date to now. This week: Will mean the time-priod from 12:00am of Monday of the current week to now This month: Will mean the time-period from 12:00am of 1st of the current month to now Last 24 hours: will mean 24 hour period from current time. Last 7 days: will mean 24*7 hour period from current time. Last 30 days: will mean 30*24 hour period from current time.
_images/Healthreportinputfields.png

Based on BSM Selection & Duration filter selection,outage summary report shall be generate with the two buttons “Download report as PDF” and “E-mail report”.

_images/createhealthreport.JPG

Command Center reports available in PDF format

Command center is providing reports in PDF format with graph and data for both Health summary report and Outage summary report. user can download or send as email the pdf report format. report will be available in all the filters.

_images/DownloadAsPDF.JPG

Downloaded report will be same as bellow for Health summary report.

_images/HealthSummaryReportPDF.JPG

Pre-outage Window Analysis

In case of an outage, an operations manager would like to quickly check what alerts have been raised in the time immediately preceding the outage. Pre-outage window analysis feature is to make this information readily available.

In the Outage page under trend-view, shall display the alerts raised against that service or its linked nodes & services on the trend chart as red (error) dots. When the user clicks on a specific outage, screen shall show the alerts in the bottom panel.

_images/outagetrendchart.JPG

In the Outage page under History-view, an Alerts icon shall be displayed to the user under the “Action” column. Clicking on the icon should lead the user to the Alerts Monitor page with the alerts for only that BSM listed with the latest alert being the last alert seen preceding the Outage detection time.

_images/alerticon.JPG _images/BSMServiceAlertsView.JPG

Information pop-up when checks fail

Command Center will show the user warnings or errors for system parameters when certain thresholds are crossed for certain metrics (CPU, disk usage, memory usage). The idea of this feature is to provide the user with helpful information against these warnings.

The user shall be shown an “info” icon against each warning or error for the system checks (CPU, disk usage, memory usage)in the BSM drilldown screen. This icon will be shown under the “Action” column.Clicking on the icon will show the appropriate message to the user.

_images/cpuusagecheck.JPG

Runbook Automation

Operations teams define runbooks so that all team-members have access to precise information regarding routines and procedures that are carried out. These runbooks contain information about the specific systems that are being monitored like server IPs, dependent services and the topology. Runbooks also document the Standard Operating Procedures that are to be followed for specific situations. Runbooks are important repositories of knowledge when team-members are trying to resolve outages or trouble-shooting problems. They also provide a reliable documentation that can be followed to achieve specific outcomes.

RLCatalyst now allows users to automate these runbooks as collections of BOTs and execute specific runbooks against Business Services and the underlying infrastructure or components.

A new item called “Runbooks” link is added to the application-menu (top-left).

_images/Runbookslink.JPG

Clicking on “Runbooks” link will navigate to the Runbooks screen.

_images/RunbooksList.JPG

You can opt for Runbook Automation option by associating the runbook to the BSM by clicking on the “Link Runbook” button which is available under Business Services tab in the Settings screen.

_images/LinkRunbook.JPG

On the Dashboard screen, BSM card should show a Runbook icon when a Runbook has been linked with the Business Service.

_images/LinkRunbookIcon.JPG

Clicking on Runbook icon in the DashBoard screen, should navigate to the Runbooks screen of that Business Service

_images/Runbooksscreen.JPG

You can choose the required BOT from the BOT’s selection menu in the Runbooks screen and click on Next Step button.

_images/BOTSelection.JPG

You can execute BOT by passing required parameters to the IP, Cloud Provider and click on Execute Button.

_images/PassingParameters.JPG

On BOT execution ,user can see a popup message about the BOT execution after that logs of that particular BOT execution as a popup.

_images/BOTTrigger.JPG _images/BOTExecutionlogs.JPG

You can execute BOT from the BSM Drilldown screen by clicking on the “Run BOTS” icon available against nodes.

_images/RunBOTS.JPG

Runbook History

Runbook History will record Success and Failure streams intended to log problems that occur in a runbook. They are written to the Runbook history when a runbook is executed.

A new item called “Runbooks History” will show as an icon in the Runbook screen and it will display the available runbooks history

_images/RunbookHistoryIcon.JPG

Clicking on “Runbooks History” icon in the Runbook screen will navigate to “Runbooks History” screen.

_images/SpecificHistory.JPG

You can view the particular Runbook history by clicking on the “History” icon which is available in the specific Business Service related Runbook screen.

_images/RunbookHistoryScreen.JPG

By clicking on “History” screen you can view the specific Business Service related runbook history.

_images/SpecficRunbookHistory.JPG

Role based access to BOT’s

CommandCenter facilitates the role-based access permissions to the BOT’s. Based on the level defined for the logged-in user, the system will display the BOTs to the user which he is entitled to run. Level 0, Level 1 are the two levels defined in the CommandCenter. You can extend the levels based on need.

BOTs availability for L0 user :

_images/Level0User.JPG

BOTs availability for L1 user :

_images/Level1User.JPG

Event triggered runbook execution

Event triggered runbook execution feature will run the BOT when event is triggered. User can add their own event by editing BOTs factory file. Once the alerts got triggered from sensu/consul/pingbot the respected BOT will run. User can add notifications bot like(SMS_BOT/Email/slack). Once the BOT is triggered user can verify this from CC(Runbook -> Bots history.)

BOT Context

This list consist of the BOT parameters that CC can accept currently to execute Event triggered BOTs. When a BOT is written, it can have parameters only from below list. If some other parameter is passed, BOT will be executed with default parameter given with BOT’s definition.

BOT Parameter Name Description
Account Name Enter a Friendly name
awsInstanceIp IP of AWS instance on which operation is to be performed by BOT
sourceCloud Credential name by which credential is saved Catalyst.
sender_name Name of sender by which SMS to be sent.
message Body of Message.
apiKey API key required to send SMS to user.
number Receivers phone number.

Event triggered BOT

Event triggered runbook execution feature will run the BOT when event is triggered.

User can configure which BOT can execute on which event by editing runbook. Once the event got triggered from server the configured BOT will run.

Event trigger Parameter Name Description
checkID Check ID in server
severity severity of check in server
source Server name
state Check state
botID BOT ID from catalyst.

Once the BOT and Event are created need to sync the Botfactory in Catalyst and then CC.

Refresh Runbook in CC

User can update the cache in CC using “Refresh Runbook” icon.

Refresh Icon in CC:

_images/Cache_Refresh.JPG

History of runbook contains the RunBy column as user can verify the BOTs triggerd information based on tenant or automation BOT.

_images/runByColumn.png

Workflow Monitoring

Workflow monitoring feature is used to monitor the workflows, which are run under the RLCatalyst Workflow Engine. The individual nodes in the workflow are modelled as BOTs.

There is tab like “workflows” in main page.when you clicked on that link, it will navigated to workflow monitoring page.On the workflow dashboard page, each workflow is represented by a card which shows the total number of runs completed and passed and failed outcomes.

_images/workflow.png images/Workflow_Dashboard.png

Initially there are no workflows it shows “No workflows available” message.

_images/No_workflow.png

You can add a workflow by clicking on the ‘+’ button.

_images/Add_workflow.png

We can add the workflow from Workflow Settings tab on the Settings page. You can also edit or delete the workflow from the Workflow settings tab.

_images/Settings.png

The RLCatalyst Command Center pulls the workflow details from the RLCatalyst Workflow Engine. Only those workflows which are added to the RLCatalyst Workflow Engine, will be available for addition through the “Add Workflow” screen. Once all workflows are added for monitoring, clicking the + button shows “All the workflows are already configured” message.

_images/Configure.png

In the workflow Monitoring page navigate to the top right menu there is option called ‘list view’,which shows added workflows are diaplyed in list view.

_images/List_view.png

Filters for daily, weekly and monthly views are available on the workflow dashboard screen. Choosing “Daily” filter will show current date data. Similarly we can choose “Weekly” or “Monthly” filters to see the workflow metrics for corresponding periods.

_images/Filter.png

Clicking on a workflow card navigates to workflow details page. On this page the user can see the history of all the workflow runs. The top of the page shows the topology of the workflow run which has been selected from the table below it.When we click on the node see the status of node like tool-tip message.

_images/Topology_view.png

Workflow history details consists following data:

Start Time Last Job Bot ID | Status End Time Logs

There is ‘i’ button in the logs,which shows logs about BOTS.

We implemented colour codes for nodes and historical runs table:

A workflow run that completed successfully will show in green color in the table. In the topology view, each node that completed successfully will show in green color.

_images/Success.png _images/Success2.png

A workflow run that completed with an error will show in red color in the table. In the topology view, the node which completed with error (and all following nodes) will show in terminated status with dark red color. On failure of a workflow run, an e-mail notification will be sent out to the contacts specified in the “Add workflow” screen.

_images/Error2.png _images/Error.png _images/Terminated.png

A workfow run is in processing user should also cancelled the workflow.In the topology view, the node which completed with cancelled will show in dark green color

images/Cancelled.PNG

When workflow run is inprocessing due to database connectivity or crash happened.In the topology view, it will shows pause with orange color after some time automatically it will restarts the workflow

_images/Resume.png

A workflow run that is delayed beyond the threshold defined in the RLCatalyst Workflow Engine will show in yellow color. In the topology view, the node which is delayed will show in yellow color. When a node execution is delayed beyond its threshold, an e-mail notification will be sent out to the contacts specified in the “Add workflow” screen.

_images/Warning.png _images/Warning2.png

When there is not history of workflow runs, all the nodes in the topology section will be shown in gray color.

_images/Gray.png

When a node is in progress, the color of the node in topology view will be shown in blue color.

_images/Inprogress.png

Analytics Page For CPU Metric

It focuses on visualizing the data that we have in ELK(Elastic logstash Kibana) stack, providing analytical recommendations to the users, for detecting the anomalies . We improved this feature for CPU Metric. It consists Visualize,Advanced analytics,Anomaly Detection tabs.

Navigating to analytics page is choose any BSM which leads to health summary details there you can see nodes information.If warning or error alert triggered for cpu_usages_check . We should see “info” icon in status . It navigactes the user to analytics page.If tenant has ELK based configuration than only we should navigated to analytics page.

_images/Navigation_to_analytics.png _images/Analytics.png

In analytics page we can see visualize tab.It shows top 5 CPU processes whose consume more CPU with percentages and CPU trend for a day in graph model.

_images/visualize.png

In analytics page we can see Advanced Analytics tab.Here threshold values are derived from the statistical analysis of last one month CPU_usage data of particular node. Based on the threshold values we can reduce the noise in the alerts.It will show proper threshold values for CPU metric of a machine in a table format

Node Current critical Threshold Current Warning Threshold Preferred Current Threshold Preferred Warning Threshold
   

 
_images/Advancedanalytics.png

In analytics page we can see Anomaly detection tab.Based on historical data, RLCatalyst Command Center determines if the current alert is an anomaly. An anomaly is an event that does not fit past patterns. In case an anomaly is detected you may want to look at recent changes to the system, unsual processes that are consuming resources etc. to identify the cause of the anomaly

_images/Anomaly_Detection.png

If alert is already resolved than we should see the message like “Alert is closed for check cpu_usages_check and node ip”

_images/closedalert.png

If server is down it shows meassage to user i.e.,”The data is unavailable at this time.please revisit this page later”.

_images/DE_DOWN.png

Analytics Page For MEMORY Metric

It focuses on visualizing the data that we have in ELK(Elastic logstash Kibana) stack, providing analytical recommendations to the users, for detecting the anomalies . It consists Visualize,Advanced analytics tabs.

Navigating to analytics page is choose any BSM which leads to health summary details there you can see nodes information.If warning or error alert triggered for Memory_usages_check we should see “info” icon in status . It navigactes the user to analytics page.If tenant has ELK based configuration than only we should navigated to analytics page.

_images/Navigation_analytics_memory.png _images/Analytics_memory.png

In analytics page we can see visualize tab.It shows top 5 MEMORY processes and MEMORY trend for a day i.e., we can see cache data,free data and used data.

_images/visualize_memory.png

In analytics page we can see Advanced Analytics tab.Here threshold values are derived from the statistical analysis of last one month MEMORY_usage data of particular node. Based on the threshold values we can reduce the noise in the alerts.It will show proper threshold values for MEMORY metric of a machine in a table format

Node Current critical Threshold Current Warning Threshold Preferred Current Threshold Preferred Warning Threshold
   

 
_images/advanced_analytics1.png

If alert is already resolved than we should see the message like “Alert is closed for check Memory_usages_check and node ip”

_images/alertclosed.png

If server is down it shows meassage to user i.e.,”The data is unavailable at this time.please revisit this page later”.

_images/DE_down1.png

In analytics page we can see Anomaly detection tab.Based on historical data, RLCatalyst Command Center determines if the current alert is an anomaly. An anomaly is an event that does not fit past patterns. In case an anomaly is detected you may want to look at recent changes to the system, unusual processes that are consuming resources .

_images/Anomaly_memory.png

Appendix A

Download the below Data Collection template by clicking on the following link .. _a link: https://s3.us-east-2.amazonaws.com/rlcatalyst/templates/CommandCenter_DataCollectionTemplate.xlsx

Registration Information
Name of the tenant This will be used to fill the Customer Name field in the registration form This field will have to be unique for each tenant configured in system
User Name This will be the username with which the tenant will login
Password This will be the initial password allocated to the tenant
Email Address Email ID which will be verified by the system during registration. Ensure you have access to this e-mail ID during registration
Provider Settings
Will an Amazon Web Services account be configured for this tenant?
AWS Access Key  
AWS Secret Key  
AWS Region for this account  
AWS Account Number  
Will a Microsoft Azure account be configured for this tenant?
Azure Client ID  
Azure Client Secret  
Subscription ID  
Tenant ID  
Will a ServiceNow account be configured for this tenant?
Host  
User Name  
Password  
Will a Sensu account be configured for this tenant?
Host  
User Name  
Password  
Business Services
Name <Name of the service as it appears on the dashboard>
URL < URL for the business service >
Linked Services (if any)
< Service1 – IP Address of node it runs on,
Service2 – IP Address of node it runs on,

Service3 – IP Address of node it runs on,

Nodes (VMs or Machines) <FQDN of Node1, FQDN of Node 2, FQDN of Node3>