HomeSection_sub_breakBlogsSection_sub_breakTechnical Blog
Icon_blog_forum_small How to create a Service Desk "incident" from an "alert" in GoToAssist Monitoring
Icon_person
Mike Suding
Icon_time
10/19/2012 11:19 am
Icon_post
1 comment
Icon_blog_forum_small How to change the company/account of an unattended Remote Support agent
Icon_person
Mike Suding
Icon_time
10/3/2012 03:54 pm
Icon_post
0 comments
Icon_blog_forum_small Alert for high usage on a switch port
Icon_person
Mike Suding
Icon_time
9/11/2012 04:40 pm
Icon_post
0 comments
Icon_blog_forum_small Workstation Alerts - which alerts work on which collection method
Icon_person
Mike Suding
Icon_time
8/31/2012 03:11 pm
Icon_post
0 comments
Icon_blog_forum_small Parameters for the unattended installer
Icon_person
Mike Suding
Icon_time
8/10/2012 12:10 pm
Icon_post
0 comments
Icon_blog_forum_small How to create a "Smart Group" to see/separate the computers you set to "Don't Monitor"
Icon_person
Mike Suding
Icon_time
4/12/2012 10:08 am
Icon_post
0 comments
Icon_blog_forum_small Create a single packaged file to install Remote Support agent even if the user doesn't have admin permissions
Icon_person
Mike Suding
Icon_time
4/10/2012 01:43 pm
Icon_post
0 comments
Icon_blog_forum_small How to install Remote Support agent with imaged/cloned Windows computers
Icon_person
Mike Suding
Icon_time
3/14/2012 04:00 pm
Icon_post
0 comments
Icon_blog_forum_small Running multiple instances of Remote Support so you can see 2+ remote control windows simultaneously
Icon_person
Mike Suding
Icon_time
2/8/2012 05:14 am
Icon_post
0 comments
Icon_blog_forum_small Summary tips and instructions to setup Linux in GoToManage
Icon_person
Mike Suding
Icon_time
2/8/2012 04:53 am
Icon_post
0 comments
Icon_blog_forum_small Summary tips and instructions to setup Macs in GoToManage
Icon_person
Mike Suding
Icon_time
1/9/2012 12:03 pm
Icon_post
0 comments
Icon_blog_forum_small Examples of how to brand the Remote Support connection page by putting it on your website
Icon_person
Mike Suding
Icon_time
11/22/2011 01:47 pm
Icon_post
0 comments
Icon_blog_forum_small How to...automate the install of Remote Support agent (aka 'host')
Icon_person
Mike Suding
Icon_time
9/25/2011 11:48 am
Icon_post
3 comments
Icon_blog_forum_small How to allow users to get into a support session from your web site
Icon_person
Tem Boylan
Icon_time
9/8/2011 03:17 am
Icon_post
1 comment

This simple piece of code will allow your clients to input a support key and get into session from your company web site:

<iframe src=“http://www.Fastsupport.com” height=“410” width=“100%”>your browser does not support IFRAMEs</iframe>

Now you can drive more business to your site when providing support. Just another way to make GoToManage YOUR remote support tool!

Enjoy,

-Tem

Icon_blog_forum_small How to...use lowest possible permissions for Crawler user
Icon_person
Mike Suding
Icon_time
9/4/2011 02:34 pm
Icon_post
4 comments
Icon_blog_forum_small How to...setup a user with lower permissions
Icon_person
Mike Suding
Icon_time
9/1/2011 12:58 pm
Icon_post
0 comments
Icon_blog_forum_small Video How to...alerts and notifications
Icon_person
Mike Suding
Icon_time
8/12/2011 04:15 pm
Icon_post
1 comment
Icon_blog_forum_small Video How to...starting a trial and installing the crawler
Icon_person
Mike Suding
Icon_time
8/12/2011 11:11 am
Icon_post
0 comments

This article was moved to a new website/tool. Click here

Icon_blog_forum_small Video How to...adding users
Icon_person
Mike Suding
Icon_time
8/5/2011 05:25 pm
Icon_post
0 comments
Icon_blog_forum_small Hiding Services in the Server Health app
Icon_person
Scanner
Icon_time
1/20/2010 03:39 pm
Icon_post
0 comments

In the “Server Health” app the status of various services as determined by the “Server Health Plugin” are displayed. Typically the Paglo Crawler will not probe a service, and thus will not report any status for it, if it knows that that device does not offer that service.

However, some services liked DNS, are checked using UDP. This means Paglo can not differentiated between a device actually offering DNS and it not currently working vs that device simply not offering DNS service. In these cases the ‘Server Health’ app will show for every device that the ‘Service Health Check’ plugin is run against that DNS is down. You can tell the ‘Server Health’ app to not consider that service in the list of services that matter for a device, but what if you simply do not want to see it at all?

As of today the ‘Server Health’ app now offers a way to do this. In any list of services, whether it be by device or by service a new checkbox will appear. It has the label “Hide this service.”

When you ‘hide’ a service on a specific device you may not have it selected for either alerting or showing in the status for that device. Hiding a service that has either of those other checkboxes selected will uncheck them before hiding the service. The service will be hidden immediately. If this service previously contributed to the status of the device that status will not update until the whole page is refreshed.

At the bottom of the list of services will be a link that will either make visible or disappear the currently hidden services:

This feature should help you make the ‘Server Health’ app more useful by eliminating views of data that just are not relevant for specific devices and services.

Icon_blog_forum_small Using Paglo Alerts to Notify You of Failing Disks
Icon_person
Scanner
Icon_time
1/19/2010 12:46 pm
Icon_post
0 comments

One of the cool things about working at Paglo is being able to use the product you are developing to help detect problems before they become serious failures on your own systems. This morning I had the experience of Paglo helping me detect a failing disk drive on one of my systems.

How did Paglo help here? Well, we have all of the system logs from these machines captured by Paglo’s Log system. You can see how to setup Paglo to capture logs here: Configure log collection…

NOTE: To access that page you need a Paglo account. A brief summation of the page is: for Windows machines logs are collected automatically once a day up to the free 10mb per day limit. For Unix machines you need to set up syslog-ng or RSyslog to send logs securely to your Paglo account for your company.

Once you are collecting logs there are lots of things you can do to monitor and inspect the ongoings of your systems. In this blog post we are going to focus on log messages that are related to disk or kernel issues. We want to look for log messages that have the words 'kernel' and 'failure' or 'root' and 'ZFS' in them. This may capture more than we want, but we can always refine the query later if it triggers too often.

In Paglo log search you use parentheses, ie: ‘(’ and ‘)’, to group search clauses. You use boolean logic to join search clauses with words such as AND and OR. This makes our search term be: “(kernel failure -user) OR (root ZFS)”.

In order for Paglo log search to differentiate between searching for the word ‘or’ and telling Paglo you want it to use an OR clause you must use an upper case “OR”. We have added “-user” to our search to tell Paglo that we want to exclude log messages that have the word “user” in them. Log searches are case insensitive.

In the case of our machine with a failing disk drive this returns the results:

To turn this in to an alert that will notify me via email whenever any log messages matching this occur is pretty simple. Go to the “Alerts” app and click the “Create a new alert” link. Then fill in the ‘Generate alert when’ section with our search term, being sure to select the ‘Log search’ type:

I have already created this alert and shared it with the Paglo community. You can find it at: Kernel / ZFS failures

Be sure to add some destinations to the alert if you use it so that Paglo has somewhere to send alert notifications to.

Although this does not remove the sinking feeling you get when you need to replace a failed hard drive this should at least tell you that a drive is beginning to fail before the whole system goes south.

Icon_blog_forum_small Detecting Problems with your Printers via Paglo Alerts and the SNMP Get Plugin Crawler Plugin
Icon_person
Scanner
Icon_time
11/25/2009 03:36 pm
Icon_post
0 comments

One of the great tools of network and IT management is SNMP. Many devices purchased today have support for some set of SNMP MIBs. The Paglo Crawler has a plugin that can capture specific SNMP OIDs from any device it scans. Coupled with a Paglo Crawler Schedule, and a Paglo Alert you can be notified of many errors and warning conditions on devices on your network.

In this post we show you how to be notified when there is an exception with a network attached printer that supports SNMP. (This is just one example of how SNMP and Paglo can work for you)

First we need to configure the Paglo Crawler’s SNMP Get Plugin to query some parts of the SNMP tree that contain the information we care about. We want to track two additional OIDs: HOST-RESOURCES-MIB::hrDeviceStatus and HOST-RESOURCES-MIB::hrPrinterDetectedErrorState. This first will tell us if the printer is behaving normally, or if something is wrong. The second can tell us what the printer thinks is wrong.

By clicking on the ‘Plugins’ tab for your Paglo Crawler in the ‘Crawlers’ App in your Paglo account you will see the plugin SNMP Get Plugin” in the list of available plugins. When you select this plugin make sure that Enabled is checked. To monitor the two OIDs we mentioned above you need to add them. By clicking on the Custom… button brings up a dialog when you can enter the OIDs one at a time.

Currently you need to use the numerical representation of the OIDs. For HOST-RESOURCES-MIB::hrPrinterDetectedErrorState this is: “1.3.6.1.2.1.25.3.5.1.2.1” and for HOST-RESOURCES-MIB::hrDeviceStatus it is: “1.3.6.1.2.1.25.3.2.1.5.1”. We also get the HOST-RESOURCES-MIB::hrPrinterStatus OID to differentiate between other types of devices and printers. Its numerical representation is: “1.3.6.1.2.1.25.3.5.1.1.1”.

The SNMP Get Plugin is a ‘device’ plugin which means it will be run at least once a day automatically against all hosts that the Paglo Crawler discovers. For monitoring a service for problems this is not often enough. So we are going to create a schedule that will run this plugin against our printers every 5 minutes. We also throw in the Service Health Check plugin because it is inexpensive (in terms of resource utilization) to run and can tell you other valuable things about your printer like “is it available?”

Now we have the Paglo Crawler checking up on our printers every 5 minutes we can now construct an alert to tell us when the printer is unhappy.

This involves delving a little in to how Paglo stores data for data retrieved via the SNMP Get Plugin. Paglo uses a sub-tree named ‘snmp’ rooted for every device. At every level of the OID we create a new level in Paglo. Thus, the hrDeviceStatus OID’s value is stored at: /network/device/oid_1/oid_3/oid_6/oid_1/oid_2/oid_1/oid_25/oid_3/oid_2/oid_1/oid_5/oid_1 The value stored here will be ‘2’ if the device is running normally, ‘3’ if it is in some unspecified warning state, and ‘5’ if it is down — ie: not operating due to some error.

The heart of a Paglo Alert is a PQL query. (Go to the end of this post if you just want to click on a link and have the Alert created for you). We can turn any PQL query that returns any result when an error condition is true in to an alert. So what we ned is a query that will tell us the hrPrinterStatus and hrPrinterDetectedErrorState when any printer is down. For it to be considered a printer it must have the hrPrinterStatus OID. For it to be having a problem the hrDeviceStatus must be ‘5’.

In the Paglo Alerts app we create an alert and use the following PQL query:

select nvl(../../../../../../../../../system/name, ../../../../../../../../../system/dns_name)||'('||first(../../../../../../../../../interface/inet/ip_address)||')' as "Printer",
oid_2/oid_1/oid_5/oid_1 as "Device Status",
oid_5/oid_1/oid_1/oid_1 as "Printer Status",
oid_5/oid_1/oid_2/oid_1 as "Detected Error State"
from /network/device/snmp/oid_1/oid_3/oid_6/oid_1/oid_2/oid_1/oid_25/oid_3
where oid_2/oid_1/oid_5/oid_1 = '5' and oid_5/oid_1/oid_2/oid_1 != '0x00'

Here is what the results of the above PQL would look like if a printer was offline:

Now you will be alerted whenever any of your Paglo monitored printers is having problems.

One of the nice things about such PQL queries is that you can easily view the history of any of the results. If you click on a label, such as ‘Device Status’ in the above query results you will get a popup menu. From this select History and you will see the history of this value over time as a chart, if the value is numeric.

This alert has already been created as a public Share-It in Paglo’s community: Printer Having Problems .

Icon_blog_forum_small The Paglo Data Center Hosted Cloud Based Crawler
Icon_person
Scanner
Icon_time
11/19/2009 03:28 pm
Icon_post
0 comments

You can now use Paglo’s data center infrastructure to remotely monitor your Web site and applications running on Cloud-based platforms like Amazon.

Last week we introduced a new alert that would monitor the health of your network connection. The key to the ability of this new alert was that it was being run out of the Paglo Data Center via our Cloud Based Paglo Crawler.

At the time there was no easy way for a user to create a Cloud Based Crawler outside of creating an instance of the ‘Internet is Down’ alert. Now we have added in to the ‘Crawlers’ app the ability to directly create a Cloud Based Crawler. Each company index may have only one Cloud Based Crawler.

When you go to the ‘Crawlers’ app if your company does not yet have a Cloud Based Crawler a link will show up in the upper right hand of the page. Clicking this will cause a Cloud Based Crawler to be created for your company. It may take a couple of minutes for the Cloud Based Crawler to be fully instantiated and running. When the ‘Schedules’ tab shows up is when the creation process has been completed.

Having a Cloud Based Paglo Crawler can do a lot more then simply probe the health of your network connection. Using the Service Health Check plugin you can monitor the reliability of a number of internet services and you can do this from our data center. Useful if you want to know if your externally hosted web site is up and running and what sort of latency is may be experiencing.

The list of services that are probed the Service Health Check plugin are:

  • ICMP Echo (‘ping’)
  • FTP (port 21, TCP)
  • Telnet (port 23, TCP)
  • SMTP (port 25, TCP)
  • DNS (port 53, UDP)
  • HTTP (port 80, Success if “GET /” returns a 2XX or 3XX response.)
  • POP3 (port 110, TCP)
  • IMAP (port 143, TCP)
  • LDAP (port 389, TCP)
  • HTTPS (port 443, TCP)
  • SQLServer (port 1433, TCP)
  • Oracle (port 1521, TCP)
  • MySQL (port 3306, TCP)
  • eHub (port 9013, TCP)

The Service Health Check plugin will return success for a specific service if we can connect to it. The exceptions are: DNS in which we look for any sort of DNS response to a generic query, and HTTP must return a 2XX or 3XX response from a ‘GET /’ request.

A number of users of Paglo have been interested in our ability gather Amazon CloudWatch data, but have not wanted to run a Paglo Crawler on their machines. They can now use the AWS CloudWatch plugin in the Cloud Based Crawler to monitor their Amazon EC2 instances.

How do you delete a Cloud Based Crawler? If you go to the ‘Advanced’ tab for the Cloud Based Crawler and click on the ‘Delete’ button it will not only schedule the entries in the PQL database that represent the crawler to be deleted but also delete the associated Cloud Based Crawler instance. NOTE: When you do this, unlike with a Paglo Crawler running on one of your machines, the Cloud Based Crawler is itself also deleted. This means that all schedules, plugin configurations and historical information directly related to this Cloud Based Crawler are also deleted. It will not show up again after you delete it.

NOTE: The Cloud Based Crawler is load balanced across a number of our servers. This means that once you enable it and turn on the ‘Service Health Check’ plugin you are going to be probed from our data center’s IP addresses. Currently the probing will come from crawler01.paglo.com and crawler02.paglo.com although this may grow in the future.

Icon_blog_forum_small The new 'Internet is down' alert and the Paglo Cloud Based Crawler
Icon_person
Scanner
Icon_time
11/2/2009 03:50 pm
Icon_post
0 comments

We have just released a terrific new feature for testing whether your WAN link (or public facing Web site) is up or down. (And it can also be used to monitor your Amazon EC2 instances, but that is a story for another day.)

This functionality gives us a platform to do more sophisticated infrastructure testing from the cloud. We are going to describe how to get started with the WAN link up/down test and Alert feature here and will talk more about the other capabilities in the near future.

To create a Paglo alert to notify you when your internet connection has gone down you go to the ‘My Paglo home page’ which you get to by clicking on the ‘Home’ tab in the left navigation bar.

In the box titled ‘Created Alerts’ the ‘Internet is down’ alert now has a text field next to it when you enable it. In to this text field you need to put the external IP address of your router. By default it will be filled in with the IP address that Paglo sees that you are coming from. You need to enter the external IP address of your network’s router so that Paglo knows what specifically to check for connectivity.

How does this work?
Paglo now has a Cloud Based Crawler that runs out of our data center. It is just like the Paglo Crawler you have installed on your local network except that it does not do network scans and discovery. All it does is run a subset of the Paglo Crawler plugins. It runs the Service Health Check plugin and the Amazon CloudWatch plugin.

When you create the Internet is down alert via the My Paglo ‘Create Alerts’ page we will create a Cloud Based Crawler for you, configure it to probe the IP address you gave when you created the alert using the Service Health Check plugin, and set up an alert to notify you when that IP address is no longer reachable from our data center.

The device with the IP address you entered will also show up in the Server Health application in its default configuration. This is because we will identify that device as a router.

So now you can have both health and performance monitoring behind the firewall and testing from the Internet to your web facing infrastructure. Go at it and let us know what you think of this extension to our IT Management SaaS platform.

Icon_blog_forum_small The Power of Correlating Data with PQL Queries
Icon_person
Scanner
Icon_time
8/28/2009 04:35 pm
Icon_post
0 comments

We had observed a problem, using the Service Health app where a mail server was not responding to SMTP almost more often then it was up! Although this would not lose mail it could seriously delay messages being sent through the service so we had to figure out why this was happening. Looking through the data for the company we started to see a pattern. Now if we only had a way to correlate the different bits of historical data we had gathered.


The red bars show where the service was down. A short red bar indicates that it was only down for a part of that time interval.

One of the big advantages of Paglo is how we gather so many different kinds of data. Since it is all stored in to a single tree-structured database for each company you can correlate different pieces of data. Our SQL-like query language PQL allows you to extract disparate pieces of information and plot them together in the same chart.

For example, on unix hosts we can gather the 1 minute, 5 minute, and 15 minute load averages over time. We also gather the status of various network services offered by a host. You can learn more about the Server Health app in our earlier blog post Server Health, the Service Health Crawler Plugin, Alerts, and You.

A service that we monitor is SMTP. One of the most common Mail Service Agent programs on unix is sendmail. sendmail has a anti-denial of service attack feature where if the machine it is running on is too busy it will not answer requests, effectively appearing to be down to other hosts. So we should be able to correlate time periods when SMTP on the host is down with its load average. Here we unleash the power of PQL:

select service_id||’ on ‘||../../../../../system/dns_name as caption,
mean(constant(success))*10 as success,
../../../../../system/stats/loadavg_1min
from /network/device[system/dns_name=’<your>’]/apps/com/paglo/service_health/service
where service_id=‘smtp’
history from ‘1 day ago’ to ‘now’

We are not going to go in to too much detail on the specifics of the PQL statement but in short we are selecting a service id combined with the host’s DNS name, the ‘success’ state of that service as monitored by the Service Health Plugin in the crawler, and the 1 minute load average of that host. We want these values going back in time from now to one day ago.

This query gives us a large table of values. But if we click on the link to convert this to a chart, for a host that is frequently very busy we may see a chart something like this:

The blue line shows the 1 minute load average of the host over the past day. The red line, when it is at ‘10’ represents when the host is accepting mail via SMTP. Whenever the red line dips down to 0, the host is refusing mail.

You can quickly see that whenever the load average is below 10, it accepts mail and when it is above 10 it refuses mail. In many other charting and reporting systems to gather the right data and then generate such a chart would take a fair amount of custom code. With Paglo’s rich data gathering and PQL it only took us a few minutes to come up with a query to chart the correlated data.

For more information on PQL and how to use it consult our PQL documentation

Icon_blog_forum_small Server Health, the Service Health Crawler Plugin, Alerts, and You
Icon_person
Scanner
Icon_time
8/13/2009 05:35 pm
Icon_post
0 comments

We frequently get asked how you can monitor uptime of various machines and services that they offer. Immediately following that people frequently want to be notified when some set of monitored services go down.

Paglo can do both of these things for you. There are four pieces involved in this: The Service Health Paglo Crawler Plugin, the Inventory app, the Server Health app, and the Alerts app.

The Service Health Paglo Crawler Plugin

The Service Health Plugin scans a variety of network services on the hosts that it is run against. It records the success or failure of these probes to Paglo along with the amount of time the probe took to complete. Currently this list of services that are probed is hard coded in to the Service Health Plugin itself.

The list of services that it probes are ICMP echo (aka ping), HTTP on port 80, HTTPS on port 443, FTP on port 21, telnet, SMTP, POP3, IMAP, LDAP, SQLServer on port 1443, Oracle on port 1521, MySQL on port 3306, and DNS queries via UDP.

By default, as is the same for any other device plugin that the Paglo Crawler runs, this will be run approximately once per day against every device that has been discovered. This is not very useful for actually monitoring services on specific machines. This is where setting up a schedule for the crawler becomes necessary.

On the Paglo site go to the Crawlers app by clicking on the Crawlers link in the left hand column. If you have only one crawler this will take you to that crawler’s configuration. If you have more than one you will need to click on the box for the crawler you wish to configure. The configuration page will start on the Scheduler tab.

Creating a Service Check Schedule

In the Scheduler tab click on the Create New Schedule link. This will present you with a form for your new schedule. You should give it some distinctive name so that you will know what this schedule is doing. For service health check we recommend that you set the Interval to at least 5 minutes. This means for the selected hosts the plugins in this schedule will be run every 5 minutes.

Next you will select the plugins you want this schedule to run. By clicking on the Select… button a dialog will come up with the list of plugins available to run in this schedule. In this case we only want the Service Health Plugin.

The final thing we need is to select which hosts it will be run against. Press the Select… button in the Hosts section and another dialog will come up.

NOTE: This list will be the hosts that the crawler has found. You can click the Custom… button and enter the IP address of any specific you want.

After you have selected all of the hosts you wish to monitor the services on your form should look something like this:

Do not forget to click on the Save Changes button at the bottom of the page!

Making a Group of the Monitored Hosts

One of the features of Paglo that we are expanding is the ability to create groups of hosts either using various search criteria or a static set of hosts. The Server Health app uses groups to determine what hosts to display. Creating groups is done in the Inventory app.

Once you have entered the Inventory app click on the Configure link in the upper right hand corner of the page, below the Paglo Search bar. On the ‘Configure the Inventory Application’ page, select the tab labeled Groups.

In the Groups tab click the Add Group… button. This will bring up a dialog that lets you create a group. Since we are making a group to track the hosts that we want to view in the Server Health app, we will call this group “Service Health Check”.

Now that you have created the group we can add hosts to it. Currently you do this by navigating in the Inventory app to lists of hosts. Here you can see we are looking in our Network appliances and have selected our three Netgear NAS boxes to add to the Service Health Check group by selecting it from the Group: Add to Group menu in the top right of the page:

Once you have added all the hosts you wish to monitor in the Server Health app you can browse to the group you created.

The Server Health app

The Server Health app is where you select which services you wish to be alerted when they stop responding. We created a group in the Inventory app so that we could tell the Server Health app to use that group.

In the Server Health app click on the Configure link in the upper right hand of the page. By default the Server Health app will show you the list of devices that Paglo has classified as either a server, a router, or a switch. In this case, however, we want it to show the list of devices defined in our Service Health Check group. Select the Use Group item and in the pull down menu select the Service Health Check group.

You can then go back to the main page of the Server Health app by clicking either on the app in the left menu, or on the Servers link at the top of the page. On the left side of the page there will be an expandable tree widget. There will be a node for every device you have specified in the Service Health Check group. By clicking on a device name you can see the report for all the services on that device on the right side of the page.

The report will show you an uptime bar for each device. Below each uptime bar will be the probed latency of that service. The uptime bar will show increasing amounts of red for the longer a device is down. One of the features is that no matter the time scale of the uptime bar, if a device was down red will show up so that downtimes will be easy to see.

Here you can select which services you care about for this device and additionally which of these services you want to receive a Paglo Alerts for.

Going through each device this way and selecting the Device uptime service may be a bit tedious so instead you can instead expand the Services node in the tree and then expand the ping node. By clicking on ping the right side of the Server Health app will display the results of the ping service check for all devices in the Service Health Check group. You can now go through this list and enable the Alert when down checkbox for all the devices you wish to receive an alert when that device stops responding to pings.

Receiving Service Health Check Alerts

The final piece of our tour brings us to the app that started this whole quest: The Alerts app. Select the Alerts app from the list of apps in the left hand menu of the Paglo site. By selecting any service in the Server Health app for Alert when down the alert Service Down alert will automatically be created in the Alerts app.

You can not select the Service Down alert. It is important that you do not modify the Name or the PQL statement inside the Generate alert when box. These are tuned specifically to watch for services being down in the Server Health app.

You can now add your email address or twitter account to have messages sent to when any service that you selected with Alert when down in the Server Health app. You can see here where I have added my email address to be notified when any service I am monitoring goes down, as well as every hour when it is still down, and an additional notification when the service comes back up.

This one alert will fire when any service listed in the Server Health app is down, letting you centralize your alerts for services instead of having to create a new alert for every one of dozens of services you may be monitoring.

That concludes our tour of the Service Health Check Paglo Crawler Plugin, setting up a schedule in the Crawlers app, collecting a set of hosts together in to a group in the Inventory app. Using this group in the Server Health app to select which services to be alerted on, and finally adding a notification address in the Alerts app to send those alerts to.

Icon_blog_forum_small Using the Paglo bulkSubmit API from Python to Watch Log Files
Icon_person
Scanner
Icon_time
7/27/2009 06:37 pm
Icon_post
0 comments

We frequently talk about the Paglo API but do not have much in the way of example code on how to use it. This post is more an exploration of the Paglo submitFile API then a way to submit logs to be indexed from Python. For a simpler way to submit logs from unix machines please consult Searching Ruby on Rails Production Log Files with Paglo

Here I am going to cover how to use the bulkSubmit API from Python to capture log files on unix based systems to submit up to Paglo. This article assumes some basic familiarity with Python and Unix administration concepts. Although the actual Python implementation of the bulkSubmit API will work fine under all OS’s that Python runs on, these example scripts were written specifically for a unix environment. As such they run under FreeBSD, Mac OS X, and Linux (Ubuntu). Also note these examples work with Python versions 2.5 and 2.6.

You can find all of the code referred to here in our public Subversion repository at: https://svn.paglo.com/paglo_open_source/paglo_for_python/trunk

You will need to either install the paglo Python module or make sure that it is included in your PYTHONPATH environment variable before you can make use the scripts and API’s that it provides. A setup.py file is provided so that installing the paglo Python module is a simple matter of running

python setup.py install

Consult Python’s documentation on the module search path for more information.

The basic problem we want to solve in this post is simple:

Watch log files in an ongoing fashion, and through log file rotation. Submit their contents up to Paglo as new log messages are appended to these files.

I split the task of actually monitoring the log files from the task of submitting data up to Paglo. This is done by the two scripts in the scripts/ directory in our subversion repository: scripts/paglo_submitter.py and scripts/paglo_logwatcher.py

Paglo Submitter

Let us look at the script which watches a directory for submission files and submits the data up to Paglo. This script is designed to submit two kinds of data: PQL ‘merge’ statements, and log file data. This is done by using the submitFile API paglo provides. This is the same API that the Paglo Crawler uses when it submits data it has gathered to Paglo. For other API’s that Paglo offers please check our documentation.

The submitFile API for Paglo has two required things: The processor parameter which indicates what subsystem is going process the submitted data, and a file attachment that is the data being submitted. This API is good for cases where you are bulk submitting data to Paglo that does not require any further interaction from the client once it is submitted.

Along the lines of it being a ‘fire and forget’ API it can submit whole a file of data at once. The file being submitted can also be compressed thus saving bandwidth along the way. A very useful thing for quickly growing log files.

There are currently only two supported ‘processors’ for bulk submission: bulk_store and log.

The bulk_store processor is for submitting a series of PQL ‘merge’ statements. This populates the PQL database directly. Please see the merge statement documentation for more information about the PQL ‘merge’ statement. There are no other parameters required when submitting a file of ‘merge’ statements to the bulk_store processor.

The log processor is for submitting log file fragments to Paglo. It requires additional arguments such as source and source_type. These are covered below.

To keep the submitter agnostic about all this we have whatever wants to submit data encode the parameters to pass to submitFile as the first line in the files it drops off. The paglo_submitter.py script will look in a specific directory for files that match a specific pattern every 10 seconds. This pattern is files whose name consists of only the digits 0-9, “_”, “-”, and “.”.

When the paglo_submitter.py script starts up, after doing various logging and daemon program setup, will create a ‘paglo session object.’ For this we need the Paglo API key for your company’s Paglo Index. To find your API key you go to https://app.paglo.com/user/edit_company and look for the section titled Data Key. By default the paglo_submitter.py script will look for your data key in the file /usr/local/etc/paglo_api_key, which should be protect so that only authorized entities can reads its contents.

   submitter = PagloSubmitter(session, options.directory, logger = logger)

After it creates the ‘paglo session object’ the paglo_submitter.py will loop forever sweeping the drop off directory for new files.

The sweeping process loops through all the files that match the pattern mentioned above. When it finds one it reads the first line of that file. This line is parsed according to CGI parameter encoding rules. We then take the remainder of the file after the first line and submit it using the ‘Paglo session object’s submitFile method which takes as required parameters the processor to use and either the name of a file submit, or an open file handle with the read head positioned at the beginning of the data to submit. Any other parameters that were defined in the first line of the file are passed as a Python keyword argument dict to the submitFile method.

   def submit_file(self, file_name):
        # Open the file, read out the first line to find the parameters
        # to submit to Paglo. (We need to turn each value from a list in to
        # a single value since we will never have repeated parameters.
        #
        f = open(file_name, "r")
        params = cgi.parse_qs(f.readline().strip())
        for key in params.keys():
            params[key] = params[key][0]
        processor = params['processor']
        del params['processor']
        try:
            resp = self.session.submitFile(processor, f, **params)
            os.remove(file_name)
...
        finally:
            f.close()
        return

We check for various kinds of errors that may arise from this submission attempt, removes the file if the submission succeeded and continue with our loop until we have looked at every file that matches our pattern. Sleep, rinse, and repeat.

This provides us a robust way of submitting arbitrary data up to Paglo and only needing to hand our Paglo Data Key to one program to limit exposure. One thing you need to make sure of is that this program has the permission to read the files that are dropped in its directory and delete them after it is done.

For more information about how the Paglo session object’s submitFile method works you can check out the source

Paglo Log Watcher

The other half of our dynamic duo is the program that monitors various log files. It also knows how to parse a maillog file and how to read /proc/meminfo for memory statistics to submit specific statistics as PQL merge statements. So paglo_logwatcher.py shows usage of both the log and bulk_submit processors for the submitFile API.

In the main() function of paglo_logwatcher.py, after we setup various daemon properties and logging, we make use of some Paglo provided utility objects. Both of the objects are defined in the module paglo.daemon_utils. You can look at the source in our subversion repository at: https://svn.paglo.com/paglo_open_source/paglo_for_python/trunk/paglo/daemon_utils.py

We create an instance of the paglo.daemon_utils.FileSubmitter to submit a file of data every 60 seconds, if any data has been written. This is used by both the MaillogParser and the MemWatch objects since those are both going to submit PQL merge statements:

   bulk_store_submitter = paglo.daemon_utils.FileSubmitter(\
        options.directory,
        { 'processor'  : 'bulk_store' },
        logger)
    bulk_store_submitter.start_auto_submit(60)

With an instance of the paglo.daemon_utils.FileSubmitter class you specify the parameters to submit to the bulkSubmit API when you create it. You then invoke the write() method of paglo.daemon_utils.FileSubmitter object to output whole lines. Either at some automatic interval, or when you choose, this file is closed and renamed such that our paglo_submitter.py process will pick it up and submit it to Paglo with the parameters you specified when you created it. Automatically a new file for further submissions will be created when you write more data using the write() method. All you need to do is provide the data to write. For example here is the snippet of code that submits a PQL ‘merge’ statement for updating data gathered from /proc/meminfo:

   def submit(self):
        meminfo = self.get_meminfo()
        # Construct the sub-tree for the values in the 'meminfo' node
        # that we have gleaned from the OS.
        #
        meminfo = ("system",
                   ("meminfo",
                    ([("%s" % k, "%s" % v) for k,v in meminfo.iteritems()])))
        t = paglo.utils.device_merge_prefix(self.intf_info, meminfo)
        self.submitter.write(paglo.tree_builder.render_as_merge(t,datetime.utcnow()) + ";\n")

The other Paglo provided utility object we use is the paglo.daemon_utils.LogWatcher class. The job of this class is to make it easy to watch some file. Whenever a file being watched by a LogWatcher gets a new line of data it will call the provided function. In the case of our log files this is the submitter.process_line() method of the LogSubmitter object defined in paglo_logwatcher.py.

       submitter = LogSubmitter(options.directory, log_file_name, stype,
                                hostname, logger)
        watcher = paglo.daemon_utils.LogWatcher(None,
                                                log_file_name,
                                                submitter.process_line,
                                                logger = logger)

The LogSubmitter class creates an instance of the paglo.daemon_utils.FileSubmitter class so that it can submit data to Paglo. The arguments for submitting a file are a bit different from the previous FileSubmitter we created. We specify processor to be ‘log’, and we provide the additional arguments needed by Paglo’s log processing to categorize the log messages we submit: source indicates the name of the file we are submitting log messages for. source_type is used by Paglo to determine how to parse the log messages, and finally host_name is used to classify what host produced this log message. We also set this FileSubmitter to automatically submit the file to the paglo_submitter.py process every minute if there is any data to submit.

       self.submitter = paglo.daemon_utils.FileSubmitter(directory,
                                       { 'processor'  : 'log',
                                         'source'     : log_file,
                                         'source_type': log_type,
                                         'host_name'  : hostname },
                                       logger)
        self.submitter.start_auto_submit(60)

The LogSubmitter class defined the process_line() method. This is the method that will be invoked by the paglo.daemon_utils.LogWatcher object whenever a watched file has a new line of data.

   def process_line(self, line = None, ign = None):
        # Append our newly gotten line to our file.
        #
        self.submitter.write(line)
        # If the file is larger then &lt;m> bytes submit it to paglo.
        #
        if self.submitter.size >= self.size_limit:
            self.submitter.submit()
        return

Running the Programs

All that is left is to decide what parameters to run these programs with and set up their environment. For the paglo_submitter.py we need to make sure that an API key file exists, and that a directory for file submissions exists.

The defaults for these are /usr/local/etc/paglo_api_key and /var/tmp/paglo_submitter. You should set the permissions on the submission directory to be such that the paglo_submitter.py can delete files from it while also allowing processes to drop files for submission in to it. I use the mode 1770, making sure that the directory is owned by the same uid that the paglo_submitter.py is running as, and that the directory’s group is the same as what the paglo_logwatcher.py script is running as.

With this you can just run pagloy_submitter.py without any arguments.

The paglo_logwatcher.py requries at least the list of log files to watch. For example on my OS X Macs I run the command:

/usr/local/sbin/paglo_logwatcher.py  \
--log_files=syslog:/var/log/install.log,syslog:/var/log/secure.log,\
syslog:/var/log/system.log,syslog:/var/log/windowserver.log,\
http_common:/var/log/cups/access_log,http_common:/var/log/cups/error_log,\
generic:/var/log/hdiejectd.log,http_common:/var/log/apache2/access_log,\
http_common:/var/log/apache2/error_log

The format of the log_files parameter is a comma separated list. Each element in the list is a tuple of “log file format” followed by “log file name” separated with a ‘:’ character.

If you watch the contents of /var/tmp/paglo_submitter/ you will see files appear as the logs you are monitoring change. After a minute, you see these renamed to the pattern that paglo_submitter.py expects, and shortly after, it will vanish indicating it has been sent up to Paglo for indexing.

The power of indexing these log files lets us do things like see when ‘Time Machine’ recently ran using the query “backupd completed”:

Stay tuned for an upcoming article on how to use the API methods that Paglo provides for your use.

Icon_blog_forum_small Searching Ruby on Rails Production Log Files with Paglo
Icon_person
Chris
Icon_time
7/16/2009 03:26 pm
Icon_post
0 comments

If you are like me you tend to ignore the logs from your Ruby on Rails applications until something goes wrong, then once I find the error message I want to answer questions like: how frequently has this error happened?, were any customers affected?, what actions were being done just before the error? Does the error only happen at a certain time of day? These questions and more are exactly what searching your log files is good for. Searching a few thousand lines of log files with grep is fast enough, but as soon as you want to look at more than a few hours of logs from a production server, using grep can get pretty tedious. And grep can help you analyze trends.

Paglo uses inverted index technology (the same technology Google uses to search the Internet) to search through logs files blazingly fast. The really nice part is that Paglo is web based/SaaS so there is no complex software to install.

These instructions are designed for Unix and OS X hosts. They will not work under Windows. I am considering writing a pure-ruby interface to Paglo log search which would work under Windows. Let me know if this would be useful to you.

sudo gem install SyslogLogger

Configure the syslog logger to be used when the application runs in production mode by adding the following lines to RAILS_ROOT/config/environments/production.rb

require 'syslog_logger'
logger = SyslogLogger.new("application_name")
logger.level = Logger::INFO
config.logger = logger

You can customize the application_name to the name of your application. This name will be attached to each log message and will be used by syslog-ng and Paglo to separate the messages from other applications.

Now you are ready to configure syslog-ng. Note you can do a similar thing with RSyslog, instructions are available for configuring RSyslog within Paglo at https://app.paglo.com/log_manage/configure. I far prefer working with syslog-ng because its configuration files are logical and easy to understand. RSyslog tries to maintain backwards compatibility with the original syslog and its configuration files are cryptic.

You need to be running syslog-ng version 3.0 or newer. You can check what version of syslog-ng you are running using the command:

syslog-ng --version

Syslog-ng is completely controlled by the contents of the syslog-ng.conf file. On my system it is in /etc/syslog-ng/syslog-ng, but I have also seen it in /usr/local/etc/syslog-ng/syslog-ng.conf. We need to add some configuration to the file so it knows what to do with the log messages received from the SyslogLogger we just added to our rails application. SyslogLogger is sending its messages to the default syslog destination, on linux this is /dev/log. In my existing configuration file this source shows up near the top of the file and is called s_all:

# all known message sources
source s_all {
        # message generated by Syslog-NG
        internal();
        # standard Linux log source (this is the default place for the syslog()
        # function to send logs to)
        unix-stream("/dev/log");
        # messages from the kernel
        file("/proc/kmsg" log_prefix("kernel: "));
        # use the following line if you want to receive remote UDP logging messages
        # (this is equivalent to the "-r" syslogd flag)
        # udp();
};

We want to separate out the log messages from our Rails application so we define a filter based on the program name.

filter rails_app { 
  program("application_name "); 
};

Now we can configure the destination which will stream the data to Paglo.

destination log_to_paglo {
        syslog("crawler.paglo.com"
                port(6514)
                transport("tls")
                tls( ca_dir("/etc/syslog-ng/ca.d")
                        key_file("/etc/syslog-ng/cert.d/paglo.key")
                        cert_file("/etc/syslog-ng/cert.d/paglo.crt")
                        peer_verify(required-trusted))
        );
};

This configures syslog-ng to stream the logs to crawler.paglo.com using the syslog over TLS protocol. The files specified in the tls section are the private key and certification that authenticates this machine with Paglo, and also allows this machine to ensure that it is talking to the ‘real’ Paglo server and not an imposter. The certificate files are generated by Paglo at https://app.paglo.com/log_manage/configure.

Finally we can link all of these configurations together with a log directive which specifies a logging action:

log {
        source(s_all);
        filter(rails_app);
        destination(log_to_paglo);
};

Save the configuration and tell syslog-ng to reread the configuration file by restarting it or sending a HUP signal. I have found it useful to troubleshoot configuration problems by running syslog-ng from the command line in the foreground like:

syslog-ng --foreground

When run like this it print an error messages directly to the console, If the configuration is good then stop it with Ctrl-C.

If everything is working then the next time the Rails application generates a log message it will be sent to Paglo and automatically indexed. Log messages show up in the Paglo search index about 30 seconds after they were generated. Now I can log into Paglo and search for those pesky error messages.

Since we added the SyslogLogger to the Rails application no logs will be recorded in the standard log/production.log file. This can sometimes be a bit frustrating, but we can use syslog-ng to log the same information to multiple destinations. So we can add a file destination like:

destination rails_production_log {
        file("/apps/my_application/current/log/production.log");
};

And add this destination to the same log directive so the Rails logs go to both places:

log {
    source(s_all);
    filter(rails_app);
    destination(log_to_paglo);
    destination(rails_production_log);
};

Then restart syslog-ng again and you are ready to go.

Icon_blog_forum_small Log Search Based Alerts using the Paglo Alerts Application
Icon_person
Scanner
Icon_time
7/15/2009 05:39 pm
Icon_post
0 comments

At the end of June we rolled out Paglo’s log collection service. Chris has previously discussed how you can use this service for searching and analysis.

I want to cover another topic: alerts. Paglo has an Alerts application. Any query you can construct against your Paglo database can be used to generate an alert. We have extended the Alerts app to allow you to create alerts that trigger when certain log searches return results.

When you create an alert you specify the conditions to generate an alert. You can select whether this alert is triggered by the results of a PQL query, a free text search of your PQL database via an Assert search, or a free text Log search.

Here you can see that we have selected Log search and want an alert generated when either core dumped or ReportCrash appear in any log source.


Note: search terms are not case sensitive, but the logic operators OR, AND, NOT must be uppercase.

We use those terms specifically because on FreeBSD and Mac OS X which I am monitoring those are the strings that will appear in a system log file when a program crashes in a generally unhappy way.

The left and right parentheses group the search terms core and dumped together. The word OR will match any log messages where either set of terms appears. So a log message must have the words core and dumped or the word ReportCrash somewhere in it. Note that the logical operator OR must be in uppercase.

There is a final important element to log search based alerts: When the alert is saved, and when the alert is triggered it saves a bookmark of sorts that represents a certain position in your entire database of logs. When the alert is checked it only searches log messages that occurred in time after that bookmark.

Therefore an alert will only fire whenever a log message appears that matches the search terms after the alert has been created. Shortly after, when the alert is next tested, we only look for log messages after the one that caused the alert to fire. This will cause the alert to clear. The alert will then remain clear until another log message appears that matches the search terms.

The email that the Alerts app will generate will have just the log messages that caused the alert to trigger in it so you will have some idea of the specifics of what caused the alert to fire. Here is an example of the email generated by the Alerts app when the alert we defined above fires:

Icon_blog_forum_small Paglo for monitoring Amazon EC2 instances - Did you know?
Icon_person
Paglo Field Reports
Icon_time
7/13/2009 05:16 pm
Icon_post
0 comments

Here are a few questions that recently came in about how Paglo works with the Amazon CloudWatch API.

Question
Does Paglo have auto discovery for Amazon EC2 instances?

Answer
Yes, our CloudWatch plugin (for the Paglo Crawler) will auto-discover every EC2 instance for which you have CloudWatch monitoring turned on.

NOTE: That the last bit is an important point: Paglo will only monitor EC2 instances that you have CloudWatch monitoring turned on via Amazon.

Question
Do you require a client install on monitored servers?

Answer
The answer here is the dreaded…Yes and No. It requires no install on monitored servers, but you need to be running the Paglo Crawler somewhere. It can run on a monitored server or a Win32 machine at your office.

Learn more about monitoring your Amazon EC2 instances

Sign up for free