|
How to create a "Smart Group" to see/separate the computers you set to "Don't Monitor"
Sometimes you want to set computers or devices to “Don’t Monitor”. Commonly this is used to free up precious server licenses. When a device is set to “Don’t Monitor” it still shows in inventory but it shows a gray dot next to the icon in the first column. If you want to see all your “Not monitored” devices in one group you can create a smart group. A “Smart group” is a group where the members are determined by a query (instead of manually adding them to a group).
To create this “Smart Group”, click Config > Groups > New and check the “Smart Group” checkbox. You’ll see a field where you can enter the following query.
coalesce(monitored, true) = false
Note: The “test query” button will show the members immediately but it may take up to ~3 hours which is the interval that the Smart Group membership uses. We plan to make this interval much more frequent soon. Stay tuned.
|
|
How to install Remote Support agent with imaged/cloned Windows computers
The Remote Support Agent (aka unattended installer) needs to generate a UUID (Universally Unique Identifier) during installation which by definition is different/unique for each computer.
This means it shouldn’t be installed until AFTER Windows has finished being imaged and named etc.
If you are on a domain, I recommend using the logon script method (see article on this blog) but if not on a domain, then I recommend using this helper script and setting it to run at each logon. If it detects that the agent is already installed/running, then it exits immediately.
If it detects the user doesn’t have admin permissions, it uses the credentials you specify (encrypted by compiling). If Windows UAC (User Access Control) is enabled, then UAC will prompt “Allow yes/no” (as per UAC design) for this one-time installation.
INSTRUCTIONS:
Click here to watch the video
Click here to download the script
Click here to download free AutoIT script editor/compiler
Put the installer file and this compiled helper script somewhere on the hard drive (eg c:\temp )
Create this registry entry:
HKLM\software\microsoft\windows\currentVersion\Run
Install-GoToAssist = c:\temp\install-GoToAssist.exe (data type of ‘string’)
Please test of course.
|
|
Running multiple instances of Remote Support so you can see 2+ remote control windows simultaneously
Sometimes it’s helpful to see multiple unattended computers at the same time and now it’s quite common for IT people and others to have 2 monitors which makes this feasible.
I heard from a colleague about this article and I tested and indeed it seems to work.
It uses a program from Sandboxie.com to allow sandboxed apps.
Disclaimer: your mileage may vary.
Currently the expert console allows you to have up to 8 windows but you need to flip thru them so you can’t see them at the same time.
Also, like many programs, it doesn’t not allow multiple instances. We will look into changing this (no promises).
|
|
Summary tips and instructions to setup Linux in GoToManage
The following info can be collected from Linux if you have SSH credentials configured on the crawler.
Inventory shows the following:
Be sure to install and configure and test SNMP in order to get network and other info.
1) Health tab (ping and DNS uptime)
2) Performance tab:
– Uptime
– graphs for Load average for 1, 5, and 15 minutes.
– graphs for Memory Total, Free, Buffers, Cached, swap total, swap free
3) Network tab: Mac address, in/out bytes
4) Users (not currently)
5) Processes tab:(similar to ‘ps aux’ command)
6) Disks tab: name, total size, free
7) Hardware: Processor type, speed
8) Packages: for some distros of Linux like RedHat, CentOS
9) Logs: not currently
10) Configuration: more info soon
Phase 2 (in GoToAssist)
1) If you haven’t already, use crawler app on your crawler computer and add SSH credentials for your “crawler” user.
2) Add a schedule to the Crawler (example settings):
Frequency: 5 or 10 minutes minutes (or as desired)
Applies to: Unix computers computers
plugin: “remote commands”
set parameters of plugin to use these commands:
- disks
- dmesg
- lsb_release
- mbmon
- proc_filesystem
- processes
- w_and_loadavg
As of April 2012 there are now a few ‘standard alerts’ for Linux but you can use PQL or Log Query to create your own. Also see the community forums. Often times Linux servers are used for email, web server, database etc which can use alerts for ‘network services’.
|
|
Summary tips and instructions to setup Macs in GoToManage
Summary of steps:
Phase 1 on each Mac (Try to visit only once)
1) Assuming you have a windows Active Directory (AD) domain, “join” (aka bind) the Mac to your domain.
2) set your AD group eg “localAdmins” to be administrators of the Mac.
Since your ‘crawler’ user is in this AD group, it will now have admin access it needs. I feel this is a much better alternative than creating a local user account with same password and maintaining it on each Mac.
3) allow any network user to logon (username & password)
3) install Remote Support agent if you are licensed and if desired. This allows you to do remaining steps remotely.
4) Enable “remote access” which enables SSH remote logon that’s needed for auto classification/discovery and detailed inventory.
Phase 2 (in GoToManage)
1) If you haven’t already, use crawler app on your crawler computer and add SSH credentials for your “crawler” user.
2) Add a schedule to the Crawler (example settings):
Frequency: 5 minutes (or as desired)
Applies to: Apple/OSX computers
plugin: “remote command”
set parameters of plugin to use these commands:
- OSX_system_profiler
- processes
- w_and_loadavg
- disks
After several hours (up to 24), it should be discovered and show in inventory with the SSH icon in the “protocols” column and auto classify them under the built in ‘smart group’ called “Apple/OSX computers”.
As of Jan 2012, Off-LAN monitoring is not available but it is on the roadmap this year.
I hope this helps, email me with any questions. mike.suding AT citrix.com
|
|
How to...automate the install of Remote Support agent (aka 'host')
Currently there are 2 ways to automate the installation of the Remote Support agent (aka host or endpoint). Both require that you are using an Active Directory domain. Sorry, it won’t work on a workgroup.
Note: Currently, as of September 2011, converting the exe to an MSI and pushing out via a GPO is not supported. We are planning to address this in a future version. Please use one of these methods instead.
You can use either method or both in combination.
Method #1 uses a utility I wrote called “Push-install”. This method is immediate but the computers must be powered on and connected when you do it. It will work if a user is logged on or not logged on.
Method #2 uses a program I named “RunAs-special.exe” in the logon script which allows installation when any user logs on. This may be helpful if some computers aren’t currently connected or powered on. Example: traveling laptop users or conference rooms. You can specify that the installation is skipped if it’s already installed (ie a current file exists). It will install even if the user is not an admin level user.
Note: I improved this program in February 2012 by storing the password in an encrypted string in a config file instead of specifying it as a parameter where it could be read by a clever person.
It is important to know that this user account should only be in the local “Administrators” group (not “Domain admins” group; see other blog entry for info)
Click here to see the video of method #1
Click here to download Push-install.exe
Click here to see video of method #2
Click here to download RunAs-special.exe
To see the instructions, start this program without any parameters.
Here is an example of the parameters:
script name: RunAs-special.exe
parameters: \\2008R2\share1\g2mg-COL-IT.exe /silent /u:wowie\adminMike /skip-if-present:%programs%\Citrix\GoToAssist Express Customer\309\g2ax_service.exe
I hope you like them!
|
|
How to allow users to get into a support session from your web site
This simple piece of code will allow your clients to input a support key and get into session from your company web site:
<iframe src=“http://www.Fastsupport.com” height=“410” width=“100%”>your browser does not support IFRAMEs</iframe>
Now you can drive more business to your site when providing support. Just another way to make GoToManage YOUR remote support tool!
Enjoy,
-Tem
|
|
How to...setup a user with lower permissions
Below are the 3 methods to setup a user for the GoToManage Crawler to use. This article will focus on method #2.
1) Use a user in the ‘Domain Admins’ group.
2) Create a group that is in the local ‘Administrators’ group on each PC.
3) setup a WMI permissions so a regular user or group will work. (see later blog article)
Normally we suggest you use method #1 because it’s easiest. The reason is, “domain admins” are by default included in each computers local “Administrators” group which is normally required for WMI and automation features etc.
Method #2 will setup a user that is in this local administrators group but with less permissions than a domain admin.
Most people would agree that, relatively speaking, this is MUCH less permissions (i.e. risk) than giving “domain admins”.
For example, domain admins can do almost anything including create/delete users, change passwords etc, whereas a local admin ONLY has permissions on each computer.
INSTRUCTIONS:
- Create a group in your domain called something like “Local-admins” and put the crawler user (eg G2mg-crawl) into this group.
- Use Group Policy Management Editor (usually installed by default on all your domain controllers) to edit your “default domain” policy
- It is in this branch: Computer Configuration > Windows Settings > Security Settings > Restricted Groups
- Right click on the “Restricted Groups” > add and type in or choose “mydomain\local-admins”
- Double click this group “mydomain\local-admins” and click “Add” (this group is a member of on bottom – see screenshot. Type in “Administrators”
When your computers receive this policy automatically (usually up to 90 minutes or a reboot will do it.) then you should see this new group “Local-admins” is in the “Administrators” group on each PC.
Screenshot is coming as soon as I can get it working. sorry.
|
|
Video How to...alerts and notifications
I show how to create alerts and notifications with 3 examples:
1) Disk full on windows computer
2) EventID on Windows computer
3) Use “Cloud Crawler” to alert when a branch/customer office internet connection is down.
Click here to see the video
I hope you like it!
P.S. Here is the PQL query I used for the eventID alert:
source_type::win_event_log host_name::2008r2.wowie.us EventCode=65
|
|
Video How to...adding users
This video shows how to add users for GoToManage Remote Support and/or Monitoring.
Please note, this shows 200 licenses for Remote Support. This quantity will be the actual number of licenses you have purchased.
Click here to see the video
I hope you like it!
|
|
Hiding Services in the Server Health app
In the “Server Health” app the status of various services as determined by the “Server Health Plugin” are displayed. Typically the Paglo Crawler will not probe a service, and thus will not report any status for it, if it knows that that device does not offer that service.
However, some services liked DNS, are checked using UDP. This means Paglo can not differentiated between a device actually offering DNS and it not currently working vs that device simply not offering DNS service. In these cases the ‘Server Health’ app will show for every device that the ‘Service Health Check’ plugin is run against that DNS is down. You can tell the ‘Server Health’ app to not consider that service in the list of services that matter for a device, but what if you simply do not want to see it at all?
As of today the ‘Server Health’ app now offers a way to do this. In any list of services, whether it be by device or by service a new checkbox will appear. It has the label “Hide this service.”

When you ‘hide’ a service on a specific device you may not have it selected for either alerting or showing in the status for that device. Hiding a service that has either of those other checkboxes selected will uncheck them before hiding the service. The service will be hidden immediately. If this service previously contributed to the status of the device that status will not update until the whole page is refreshed.
At the bottom of the list of services will be a link that will either make visible or disappear the currently hidden services:

This feature should help you make the ‘Server Health’ app more useful by eliminating views of data that just are not relevant for specific devices and services.
|
|
Using Paglo Alerts to Notify You of Failing Disks
One of the cool things about working at Paglo is being able to use the product you are developing to help detect problems before they become serious failures on your own systems. This morning I had the experience of Paglo helping me detect a failing disk drive on one of my systems.
How did Paglo help here? Well, we have all of the system logs from these machines captured by Paglo’s Log system. You can see how to setup Paglo to capture logs here: Configure log collection…
NOTE: To access that page you need a Paglo account. A brief summation of the page is: for Windows machines logs are collected automatically once a day up to the free 10mb per day limit. For Unix machines you need to set up syslog-ng or RSyslog to send logs securely to your Paglo account for your company.
Once you are collecting logs there are lots of things you can do to monitor and inspect the ongoings of your systems. In this blog post we are going to focus on log messages that are related to disk or kernel issues. We want to look for log messages that have the words 'kernel' and 'failure' or 'root' and 'ZFS' in them. This may capture more than we want, but we can always refine the query later if it triggers too often.
In Paglo log search you use parentheses, ie: ‘(’ and ‘)’, to group search clauses. You use boolean logic to join search clauses with words such as AND and OR. This makes our search term be: “(kernel failure -user) OR (root ZFS)”.
In order for Paglo log search to differentiate between searching for the word ‘or’ and telling Paglo you want it to use an OR clause you must use an upper case “OR”. We have added “-user” to our search to tell Paglo that we want to exclude log messages that have the word “user” in them. Log searches are case insensitive.
In the case of our machine with a failing disk drive this returns the results:

To turn this in to an alert that will notify me via email whenever any log messages matching this occur is pretty simple. Go to the “Alerts” app and click the “Create a new alert” link. Then fill in the ‘Generate alert when’ section with our search term, being sure to select the ‘Log search’ type:

I have already created this alert and shared it with the Paglo community. You can find it at: Kernel / ZFS failures
Be sure to add some destinations to the alert if you use it so that Paglo has somewhere to send alert notifications to.
Although this does not remove the sinking feeling you get when you need to replace a failed hard drive this should at least tell you that a drive is beginning to fail before the whole system goes south.
|
|
Detecting Problems with your Printers via Paglo Alerts and the SNMP Get Plugin Crawler Plugin
One of the great tools of network and IT management is SNMP. Many devices purchased today have support for some set of SNMP MIBs. The Paglo Crawler has a plugin that can capture specific SNMP OIDs from any device it scans. Coupled with a Paglo Crawler Schedule, and a Paglo Alert you can be notified of many errors and warning conditions on devices on your network.
In this post we show you how to be notified when there is an exception with a network attached printer that supports SNMP. (This is just one example of how SNMP and Paglo can work for you)
First we need to configure the Paglo Crawler’s SNMP Get Plugin to query some parts of the SNMP tree that contain the information we care about. We want to track two additional OIDs: HOST-RESOURCES-MIB::hrDeviceStatus and HOST-RESOURCES-MIB::hrPrinterDetectedErrorState. This first will tell us if the printer is behaving normally, or if something is wrong. The second can tell us what the printer thinks is wrong.
By clicking on the ‘Plugins’ tab for your Paglo Crawler in the ‘Crawlers’ App in your Paglo account you will see the plugin “SNMP Get Plugin” in the list of available plugins. When you select this plugin make sure that Enabled is checked. To monitor the two OIDs we mentioned above you need to add them. By clicking on the Custom… button brings up a dialog when you can enter the OIDs one at a time.
Currently you need to use the numerical representation of the OIDs. For HOST-RESOURCES-MIB::hrPrinterDetectedErrorState this is: “1.3.6.1.2.1.25.3.5.1.2.1” and for HOST-RESOURCES-MIB::hrDeviceStatus it is: “1.3.6.1.2.1.25.3.2.1.5.1”. We also get the HOST-RESOURCES-MIB::hrPrinterStatus OID to differentiate between other types of devices and printers. Its numerical representation is: “1.3.6.1.2.1.25.3.5.1.1.1”.

The SNMP Get Plugin is a ‘device’ plugin which means it will be run at least once a day automatically against all hosts that the Paglo Crawler discovers. For monitoring a service for problems this is not often enough. So we are going to create a schedule that will run this plugin against our printers every 5 minutes. We also throw in the Service Health Check plugin because it is inexpensive (in terms of resource utilization) to run and can tell you other valuable things about your printer like “is it available?”

Now we have the Paglo Crawler checking up on our printers every 5 minutes we can now construct an alert to tell us when the printer is unhappy.
This involves delving a little in to how Paglo stores data for data retrieved via the SNMP Get Plugin. Paglo uses a sub-tree named ‘snmp’ rooted for every device. At every level of the OID we create a new level in Paglo. Thus, the hrDeviceStatus OID’s value is stored at: /network/device/oid_1/oid_3/oid_6/oid_1/oid_2/oid_1/oid_25/oid_3/oid_2/oid_1/oid_5/oid_1 The value stored here will be ‘2’ if the device is running normally, ‘3’ if it is in some unspecified warning state, and ‘5’ if it is down — ie: not operating due to some error.
The heart of a Paglo Alert is a PQL query. (Go to the end of this post if you just want to click on a link and have the Alert created for you). We can turn any PQL query that returns any result when an error condition is true in to an alert. So what we ned is a query that will tell us the hrPrinterStatus and hrPrinterDetectedErrorState when any printer is down. For it to be considered a printer it must have the hrPrinterStatus OID. For it to be having a problem the hrDeviceStatus must be ‘5’.
In the Paglo Alerts app we create an alert and use the following PQL query:
select nvl(../../../../../../../../../system/name, ../../../../../../../../../system/dns_name)||'('||first(../../../../../../../../../interface/inet/ip_address)||')' as "Printer",
oid_2/oid_1/oid_5/oid_1 as "Device Status",
oid_5/oid_1/oid_1/oid_1 as "Printer Status",
oid_5/oid_1/oid_2/oid_1 as "Detected Error State"
from /network/device/snmp/oid_1/oid_3/oid_6/oid_1/oid_2/oid_1/oid_25/oid_3
where oid_2/oid_1/oid_5/oid_1 = '5' and oid_5/oid_1/oid_2/oid_1 != '0x00'

Here is what the results of the above PQL would look like if a printer was offline:

Now you will be alerted whenever any of your Paglo monitored printers is having problems.
One of the nice things about such PQL queries is that you can easily view the history of any of the results. If you click on a label, such as ‘Device Status’ in the above query results you will get a popup menu. From this select History and you will see the history of this value over time as a chart, if the value is numeric.
This alert has already been created as a public Share-It in Paglo’s community: Printer Having Problems .
|
|
The Paglo Data Center Hosted Cloud Based Crawler
You can now use Paglo’s data center infrastructure to remotely monitor your Web site and applications running on Cloud-based platforms like Amazon.
Last week we introduced a new alert that would monitor the health of your network connection. The key to the ability of this new alert was that it was being run out of the Paglo Data Center via our Cloud Based Paglo Crawler.
At the time there was no easy way for a user to create a Cloud Based Crawler outside of creating an instance of the ‘Internet is Down’ alert. Now we have added in to the ‘Crawlers’ app the ability to directly create a Cloud Based Crawler. Each company index may have only one Cloud Based Crawler.

When you go to the ‘Crawlers’ app if your company does not yet have a Cloud Based Crawler a link will show up in the upper right hand of the page. Clicking this will cause a Cloud Based Crawler to be created for your company. It may take a couple of minutes for the Cloud Based Crawler to be fully instantiated and running. When the ‘Schedules’ tab shows up is when the creation process has been completed.

Having a Cloud Based Paglo Crawler can do a lot more then simply probe the health of your network connection. Using the Service Health Check plugin you can monitor the reliability of a number of internet services and you can do this from our data center. Useful if you want to know if your externally hosted web site is up and running and what sort of latency is may be experiencing.
The list of services that are probed the Service Health Check plugin are:
- ICMP Echo (‘ping’)
- FTP (port 21, TCP)
- Telnet (port 23, TCP)
- SMTP (port 25, TCP)
- DNS (port 53, UDP)
- HTTP (port 80, Success if “GET /” returns a 2XX or 3XX response.)
- POP3 (port 110, TCP)
- IMAP (port 143, TCP)
- LDAP (port 389, TCP)
- HTTPS (port 443, TCP)
- SQLServer (port 1433, TCP)
- Oracle (port 1521, TCP)
- MySQL (port 3306, TCP)
- eHub (port 9013, TCP)
The Service Health Check plugin will return success for a specific service if we can connect to it. The exceptions are: DNS in which we look for any sort of DNS response to a generic query, and HTTP must return a 2XX or 3XX response from a ‘GET /’ request.

A number of users of Paglo have been interested in our ability gather Amazon CloudWatch data, but have not wanted to run a Paglo Crawler on their machines. They can now use the AWS CloudWatch plugin in the Cloud Based Crawler to monitor their Amazon EC2 instances.

How do you delete a Cloud Based Crawler? If you go to the ‘Advanced’ tab for the Cloud Based Crawler and click on the ‘Delete’ button it will not only schedule the entries in the PQL database that represent the crawler to be deleted but also delete the associated Cloud Based Crawler instance. NOTE: When you do this, unlike with a Paglo Crawler running on one of your machines, the Cloud Based Crawler is itself also deleted. This means that all schedules, plugin configurations and historical information directly related to this Cloud Based Crawler are also deleted. It will not show up again after you delete it.
NOTE: The Cloud Based Crawler is load balanced across a number of our servers. This means that once you enable it and turn on the ‘Service Health Check’ plugin you are going to be probed from our data center’s IP addresses. Currently the probing will come from crawler01.paglo.com and crawler02.paglo.com although this may grow in the future.
|
|
The new 'Internet is down' alert and the Paglo Cloud Based Crawler
We have just released a terrific new feature for testing whether your WAN link (or public facing Web site) is up or down. (And it can also be used to monitor your Amazon EC2 instances, but that is a story for another day.)
This functionality gives us a platform to do more sophisticated infrastructure testing from the cloud. We are going to describe how to get started with the WAN link up/down test and Alert feature here and will talk more about the other capabilities in the near future.
To create a Paglo alert to notify you when your internet connection has gone down you go to the ‘My Paglo home page’ which you get to by clicking on the ‘Home’ tab in the left navigation bar.
In the box titled ‘Created Alerts’ the ‘Internet is down’ alert now has a text field next to it when you enable it. In to this text field you need to put the external IP address of your router. By default it will be filled in with the IP address that Paglo sees that you are coming from. You need to enter the external IP address of your network’s router so that Paglo knows what specifically to check for connectivity.

How does this work?
Paglo now has a Cloud Based Crawler that runs out of our data center. It is just like the Paglo Crawler you have installed on your local network except that it does not do network scans and discovery. All it does is run a subset of the Paglo Crawler plugins. It runs the Service Health Check plugin and the Amazon CloudWatch plugin.
When you create the Internet is down alert via the My Paglo ‘Create Alerts’ page we will create a Cloud Based Crawler for you, configure it to probe the IP address you gave when you created the alert using the Service Health Check plugin, and set up an alert to notify you when that IP address is no longer reachable from our data center.
The device with the IP address you entered will also show up in the Server Health application in its default configuration. This is because we will identify that device as a router.
So now you can have both health and performance monitoring behind the firewall and testing from the Internet to your web facing infrastructure. Go at it and let us know what you think of this extension to our IT Management SaaS platform.
|
|
The Power of Correlating Data with PQL Queries
We had observed a problem, using the Service Health app where a mail server was not responding to SMTP almost more often then it was up! Although this would not lose mail it could seriously delay messages being sent through the service so we had to figure out why this was happening. Looking through the data for the company we started to see a pattern. Now if we only had a way to correlate the different bits of historical data we had gathered.

The red bars show where the service was down. A short red bar indicates that it was only down for a part of that time interval.
One of the big advantages of Paglo is how we gather so many different kinds of data. Since it is all stored in to a single tree-structured database for each company you can correlate different pieces of data. Our SQL-like query language PQL allows you to extract disparate pieces of information and plot them together in the same chart.
For example, on unix hosts we can gather the 1 minute, 5 minute, and 15 minute load averages over time. We also gather the status of various network services offered by a host. You can learn more about the Server Health app in our earlier blog post Server Health, the Service Health Crawler Plugin, Alerts, and You.
A service that we monitor is SMTP. One of the most common Mail Service Agent programs on unix is sendmail. sendmail has a anti-denial of service attack feature where if the machine it is running on is too busy it will not answer requests, effectively appearing to be down to other hosts. So we should be able to correlate time periods when SMTP on the host is down with its load average. Here we unleash the power of PQL:
select service_id||’ on ‘||../../../../../system/dns_name as caption,
mean(constant(success))*10 as success,
../../../../../system/stats/loadavg_1min
from /network/device[system/dns_name=’<your>’]/apps/com/paglo/service_health/service
where service_id=‘smtp’
history from ‘1 day ago’ to ‘now’
We are not going to go in to too much detail on the specifics of the PQL statement but in short we are selecting a service id combined with the host’s DNS name, the ‘success’ state of that service as monitored by the Service Health Plugin in the crawler, and the 1 minute load average of that host. We want these values going back in time from now to one day ago.
This query gives us a large table of values. But if we click on the link to convert this to a chart, for a host that is frequently very busy we may see a chart something like this:

The blue line shows the 1 minute load average of the host over the past day. The red line, when it is at ‘10’ represents when the host is accepting mail via SMTP. Whenever the red line dips down to 0, the host is refusing mail.
You can quickly see that whenever the load average is below 10, it accepts mail and when it is above 10 it refuses mail. In many other charting and reporting systems to gather the right data and then generate such a chart would take a fair amount of custom code. With Paglo’s rich data gathering and PQL it only took us a few minutes to come up with a query to chart the correlated data.
For more information on PQL and how to use it consult our PQL documentation
|
|
Server Health, the Service Health Crawler Plugin, Alerts, and You
We frequently get asked how you can monitor uptime of various machines and services that they offer. Immediately following that people frequently want to be notified when some set of monitored services go down.
Paglo can do both of these things for you. There are four pieces involved in this: The Service Health Paglo Crawler Plugin, the Inventory app, the Server Health app, and the Alerts app.
The Service Health Paglo Crawler Plugin
The Service Health Plugin scans a variety of network services on the hosts that it is run against. It records the success or failure of these probes to Paglo along with the amount of time the probe took to complete. Currently this list of services that are probed is hard coded in to the Service Health Plugin itself.
The list of services that it probes are ICMP echo (aka ping), HTTP on port 80, HTTPS on port 443, FTP on port 21, telnet, SMTP, POP3, IMAP, LDAP, SQLServer on port 1443, Oracle on port 1521, MySQL on port 3306, and DNS queries via UDP.
By default, as is the same for any other device plugin that the Paglo Crawler runs, this will be run approximately once per day against every device that has been discovered. This is not very useful for actually monitoring services on specific machines. This is where setting up a schedule for the crawler becomes necessary.
On the Paglo site go to the Crawlers app by clicking on the Crawlers link in the left hand column. If you have only one crawler this will take you to that crawler’s configuration. If you have more than one you will need to click on the box for the crawler you wish to configure. The configuration page will start on the Scheduler tab.

Creating a Service Check Schedule
In the Scheduler tab click on the Create New Schedule link. This will present you with a form for your new schedule. You should give it some distinctive name so that you will know what this schedule is doing. For service health check we recommend that you set the Interval to at least 5 minutes. This means for the selected hosts the plugins in this schedule will be run every 5 minutes.
Next you will select the plugins you want this schedule to run. By clicking on the Select… button a dialog will come up with the list of plugins available to run in this schedule. In this case we only want the Service Health Plugin.

The final thing we need is to select which hosts it will be run against. Press the Select… button in the Hosts section and another dialog will come up.

NOTE: This list will be the hosts that the crawler has found. You can click the Custom… button and enter the IP address of any specific you want.
After you have selected all of the hosts you wish to monitor the services on your form should look something like this:

Do not forget to click on the Save Changes button at the bottom of the page!
Making a Group of the Monitored Hosts
One of the features of Paglo that we are expanding is the ability to create groups of hosts either using various search criteria or a static set of hosts. The Server Health app uses groups to determine what hosts to display. Creating groups is done in the Inventory app.
Once you have entered the Inventory app click on the Configure link in the upper right hand corner of the page, below the Paglo Search bar. On the ‘Configure the Inventory Application’ page, select the tab labeled Groups.
In the Groups tab click the Add Group… button. This will bring up a dialog that lets you create a group. Since we are making a group to track the hosts that we want to view in the Server Health app, we will call this group “Service Health Check”.

Now that you have created the group we can add hosts to it. Currently you do this by navigating in the Inventory app to lists of hosts. Here you can see we are looking in our Network appliances and have selected our three Netgear NAS boxes to add to the Service Health Check group by selecting it from the Group: Add to Group menu in the top right of the page:

Once you have added all the hosts you wish to monitor in the Server Health app you can browse to the group you created.

The Server Health app
The Server Health app is where you select which services you wish to be alerted when they stop responding. We created a group in the Inventory app so that we could tell the Server Health app to use that group.
In the Server Health app click on the Configure link in the upper right hand of the page. By default the Server Health app will show you the list of devices that Paglo has classified as either a server, a router, or a switch. In this case, however, we want it to show the list of devices defined in our Service Health Check group. Select the Use Group item and in the pull down menu select the Service Health Check group.

You can then go back to the main page of the Server Health app by clicking either on the app in the left menu, or on the Servers link at the top of the page. On the left side of the page there will be an expandable tree widget. There will be a node for every device you have specified in the Service Health Check group. By clicking on a device name you can see the report for all the services on that device on the right side of the page.

The report will show you an uptime bar for each device. Below each uptime bar will be the probed latency of that service. The uptime bar will show increasing amounts of red for the longer a device is down. One of the features is that no matter the time scale of the uptime bar, if a device was down red will show up so that downtimes will be easy to see.
Here you can select which services you care about for this device and additionally which of these services you want to receive a Paglo Alerts for.
Going through each device this way and selecting the Device uptime service may be a bit tedious so instead you can instead expand the Services node in the tree and then expand the ping node. By clicking on ping the right side of the Server Health app will display the results of the ping service check for all devices in the Service Health Check group. You can now go through this list and enable the Alert when down checkbox for all the devices you wish to receive an alert when that device stops responding to pings.

Receiving Service Health Check Alerts
The final piece of our tour brings us to the app that started this whole quest: The Alerts app. Select the Alerts app from the list of apps in the left hand menu of the Paglo site. By selecting any service in the Server Health app for Alert when down the alert Service Down alert will automatically be created in the Alerts app.

You can not select the Service Down alert. It is important that you do not modify the Name or the PQL statement inside the Generate alert when box. These are tuned specifically to watch for services being down in the Server Health app.
You can now add your email address or twitter account to have messages sent to when any service that you selected with Alert when down in the Server Health app. You can see here where I have added my email address to be notified when any service I am monitoring goes down, as well as every hour when it is still down, and an additional notification when the service comes back up.

This one alert will fire when any service listed in the Server Health app is down, letting you centralize your alerts for services instead of having to create a new alert for every one of dozens of services you may be monitoring.
That concludes our tour of the Service Health Check Paglo Crawler Plugin, setting up a schedule in the Crawlers app, collecting a set of hosts together in to a group in the Inventory app. Using this group in the Server Health app to select which services to be alerted on, and finally adding a notification address in the Alerts app to send those alerts to.
|
|
Using the Paglo bulkSubmit API from Python to Watch Log Files
We frequently talk about the Paglo API but do not have much in the way of example code on how to use it. This post is more an exploration of the Paglo submitFile API then a way to submit logs to be indexed from Python. For a simpler way to submit logs from unix machines please consult Searching Ruby on Rails Production Log Files with Paglo
Here I am going to cover how to use the bulkSubmit API from Python to capture log files on unix based systems to submit up to Paglo. This article assumes some basic familiarity with Python and Unix administration concepts. Although the actual Python implementation of the bulkSubmit API will work fine under all OS’s that Python runs on, these example scripts were written specifically for a unix environment. As such they run under FreeBSD, Mac OS X, and Linux (Ubuntu). Also note these examples work with Python versions 2.5 and 2.6.
You can find all of the code referred to here in our public Subversion repository at: https://svn.paglo.com/paglo_open_source/paglo_for_python/trunk
You will need to either install the paglo Python module or make sure that it is included in your PYTHONPATH environment variable before you can make use the scripts and API’s that it provides. A setup.py file is provided so that installing the paglo Python module is a simple matter of running
python setup.py install
Consult Python’s documentation on the module search path for more information.
The basic problem we want to solve in this post is simple:
Watch log files in an ongoing fashion, and through log file rotation. Submit their contents up to Paglo as new log messages are appended to these files.
I split the task of actually monitoring the log files from the task of submitting data up to Paglo. This is done by the two scripts in the scripts/ directory in our subversion repository: scripts/paglo_submitter.py and scripts/paglo_logwatcher.py
Paglo Submitter
Let us look at the script which watches a directory for submission files and submits the data up to Paglo. This script is designed to submit two kinds of data: PQL ‘merge’ statements, and log file data. This is done by using the submitFile API paglo provides. This is the same API that the Paglo Crawler uses when it submits data it has gathered to Paglo. For other API’s that Paglo offers please check our documentation.
The submitFile API for Paglo has two required things: The processor parameter which indicates what subsystem is going process the submitted data, and a file attachment that is the data being submitted. This API is good for cases where you are bulk submitting data to Paglo that does not require any further interaction from the client once it is submitted.
Along the lines of it being a ‘fire and forget’ API it can submit whole a file of data at once. The file being submitted can also be compressed thus saving bandwidth along the way. A very useful thing for quickly growing log files.
There are currently only two supported ‘processors’ for bulk submission: bulk_store and log.
The bulk_store processor is for submitting a series of PQL ‘merge’ statements. This populates the PQL database directly. Please see the merge statement documentation for more information about the PQL ‘merge’ statement. There are no other parameters required when submitting a file of ‘merge’ statements to the bulk_store processor.
The log processor is for submitting log file fragments to Paglo. It requires additional arguments such as source and source_type. These are covered below.
To keep the submitter agnostic about all this we have whatever wants to submit data encode the parameters to pass to submitFile as the first line in the files it drops off. The paglo_submitter.py script will look in a specific directory for files that match a specific pattern every 10 seconds. This pattern is files whose name consists of only the digits 0-9, “_”, “-”, and “.”.
When the paglo_submitter.py script starts up, after doing various logging and daemon program setup, will create a ‘paglo session object.’ For this we need the Paglo API key for your company’s Paglo Index. To find your API key you go to https://app.paglo.com/user/edit_company and look for the section titled Data Key. By default the paglo_submitter.py script will look for your data key in the file /usr/local/etc/paglo_api_key, which should be protect so that only authorized entities can reads its contents.
submitter = PagloSubmitter(session, options.directory, logger = logger)
After it creates the ‘paglo session object’ the paglo_submitter.py will loop forever sweeping the drop off directory for new files.
The sweeping process loops through all the files that match the pattern mentioned above. When it finds one it reads the first line of that file. This line is parsed according to CGI parameter encoding rules. We then take the remainder of the file after the first line and submit it using the ‘Paglo session object’s submitFile method which takes as required parameters the processor to use and either the name of a file submit, or an open file handle with the read head positioned at the beginning of the data to submit. Any other parameters that were defined in the first line of the file are passed as a Python keyword argument dict to the submitFile method.
def submit_file(self, file_name):
# Open the file, read out the first line to find the parameters
# to submit to Paglo. (We need to turn each value from a list in to
# a single value since we will never have repeated parameters.
#
f = open(file_name, "r")
params = cgi.parse_qs(f.readline().strip())
for key in params.keys():
params[key] = params[key][0]
processor = params['processor']
del params['processor']
try:
resp = self.session.submitFile(processor, f, **params)
os.remove(file_name)
...
finally:
f.close()
return
We check for various kinds of errors that may arise from this submission attempt, removes the file if the submission succeeded and continue with our loop until we have looked at every file that matches our pattern. Sleep, rinse, and repeat.
This provides us a robust way of submitting arbitrary data up to Paglo and only needing to hand our Paglo Data Key to one program to limit exposure. One thing you need to make sure of is that this program has the permission to read the files that are dropped in its directory and delete them after it is done.
For more information about how the Paglo session object’s submitFile method works you can check out the source
Paglo Log Watcher
The other half of our dynamic duo is the program that monitors various log files. It also knows how to parse a maillog file and how to read /proc/meminfo for memory statistics to submit specific statistics as PQL merge statements. So paglo_logwatcher.py shows usage of both the log and bulk_submit processors for the submitFile API.
In the main() function of paglo_logwatcher.py, after we setup various daemon properties and logging, we make use of some Paglo provided utility objects. Both of the objects are defined in the module paglo.daemon_utils. You can look at the source in our subversion repository at: https://svn.paglo.com/paglo_open_source/paglo_for_python/trunk/paglo/daemon_utils.py
We create an instance of the paglo.daemon_utils.FileSubmitter to submit a file of data every 60 seconds, if any data has been written. This is used by both the MaillogParser and the MemWatch objects since those are both going to submit PQL merge statements:
bulk_store_submitter = paglo.daemon_utils.FileSubmitter(\
options.directory,
{ 'processor' : 'bulk_store' },
logger)
bulk_store_submitter.start_auto_submit(60)
With an instance of the paglo.daemon_utils.FileSubmitter class you specify the parameters to submit to the bulkSubmit API when you create it. You then invoke the write() method of paglo.daemon_utils.FileSubmitter object to output whole lines. Either at some automatic interval, or when you choose, this file is closed and renamed such that our paglo_submitter.py process will pick it up and submit it to Paglo with the parameters you specified when you created it. Automatically a new file for further submissions will be created when you write more data using the write() method. All you need to do is provide the data to write. For example here is the snippet of code that submits a PQL ‘merge’ statement for updating data gathered from /proc/meminfo:
def submit(self):
meminfo = self.get_meminfo()
# Construct the sub-tree for the values in the 'meminfo' node
# that we have gleaned from the OS.
#
meminfo = ("system",
("meminfo",
([("%s" % k, "%s" % v) for k,v in meminfo.iteritems()])))
t = paglo.utils.device_merge_prefix(self.intf_info, meminfo)
self.submitter.write(paglo.tree_builder.render_as_merge(t,datetime.utcnow()) + ";\n")
The other Paglo provided utility object we use is the paglo.daemon_utils.LogWatcher class. The job of this class is to make it easy to watch some file. Whenever a file being watched by a LogWatcher gets a new line of data it will call the provided function. In the case of our log files this is the submitter.process_line() method of the LogSubmitter object defined in paglo_logwatcher.py.
submitter = LogSubmitter(options.directory, log_file_name, stype,
hostname, logger)
watcher = paglo.daemon_utils.LogWatcher(None,
log_file_name,
submitter.process_line,
logger = logger)
The LogSubmitter class creates an instance of the paglo.daemon_utils.FileSubmitter class so that it can submit data to Paglo. The arguments for submitting a file are a bit different from the previous FileSubmitter we created. We specify processor to be ‘log’, and we provide the additional arguments needed by Paglo’s log processing to categorize the log messages we submit: source indicates the name of the file we are submitting log messages for. source_type is used by Paglo to determine how to parse the log messages, and finally host_name is used to classify what host produced this log message. We also set this FileSubmitter to automatically submit the file to the paglo_submitter.py process every minute if there is any data to submit.
self.submitter = paglo.daemon_utils.FileSubmitter(directory,
{ 'processor' : 'log',
'source' : log_file,
'source_type': log_type,
'host_name' : hostname },
logger)
self.submitter.start_auto_submit(60)
The LogSubmitter class defined the process_line() method. This is the method that will be invoked by the paglo.daemon_utils.LogWatcher object whenever a watched file has a new line of data.
def process_line(self, line = None, ign = None):
# Append our newly gotten line to our file.
#
self.submitter.write(line)
# If the file is larger then <m> bytes submit it to paglo.
#
if self.submitter.size >= self.size_limit:
self.submitter.submit()
return
Running the Programs
All that is left is to decide what parameters to run these programs with and set up their environment. For the paglo_submitter.py we need to make sure that an API key file exists, and that a directory for file submissions exists.
The defaults for these are /usr/local/etc/paglo_api_key and /var/tmp/paglo_submitter. You should set the permissions on the submission directory to be such that the paglo_submitter.py can delete files from it while also allowing processes to drop files for submission in to it. I use the mode 1770, making sure that the directory is owned by the same uid that the paglo_submitter.py is running as, and that the directory’s group is the same as what the paglo_logwatcher.py script is running as.
With this you can just run pagloy_submitter.py without any arguments.
The paglo_logwatcher.py requries at least the list of log files to watch. For example on my OS X Macs I run the command:
/usr/local/sbin/paglo_logwatcher.py \
--log_files=syslog:/var/log/install.log,syslog:/var/log/secure.log,\
syslog:/var/log/system.log,syslog:/var/log/windowserver.log,\
http_common:/var/log/cups/access_log,http_common:/var/log/cups/error_log,\
generic:/var/log/hdiejectd.log,http_common:/var/log/apache2/access_log,\
http_common:/var/log/apache2/error_log
The format of the log_files parameter is a comma separated list. Each element in the list is a tuple of “log file format” followed by “log file name” separated with a ‘:’ character.
If you watch the contents of /var/tmp/paglo_submitter/ you will see files appear as the logs you are monitoring change. After a minute, you see these renamed to the pattern that paglo_submitter.py expects, and shortly after, it will vanish indicating it has been sent up to Paglo for indexing.
The power of indexing these log files lets us do things like see when ‘Time Machine’ recently ran using the query “backupd completed”:

Stay tuned for an upcoming article on how to use the API methods that Paglo provides for your use.
|
|
Searching Ruby on Rails Production Log Files with Paglo
If you are like me you tend to ignore the logs from your Ruby on Rails applications until something goes wrong, then once I find the error message I want to answer questions like: how frequently has this error happened?, were any customers affected?, what actions were being done just before the error? Does the error only happen at a certain time of day? These questions and more are exactly what searching your log files is good for. Searching a few thousand lines of log files with grep is fast enough, but as soon as you want to look at more than a few hours of logs from a production server, using grep can get pretty tedious. And grep can help you analyze trends.
Paglo uses inverted index technology (the same technology Google uses to search the Internet) to search through logs files blazingly fast. The really nice part is that Paglo is web based/SaaS so there is no complex software to install.
These instructions are designed for Unix and OS X hosts. They will not work under Windows. I am considering writing a pure-ruby interface to Paglo log search which would work under Windows. Let me know if this would be useful to you.
sudo gem install SyslogLogger
Configure the syslog logger to be used when the application runs in production mode by adding the following lines to RAILS_ROOT/config/environments/production.rb
require 'syslog_logger'
logger = SyslogLogger.new("application_name")
logger.level = Logger::INFO
config.logger = logger
You can customize the application_name to the name of your application. This name will be attached to each log message and will be used by syslog-ng and Paglo to separate the messages from other applications.
Now you are ready to configure syslog-ng. Note you can do a similar thing with RSyslog, instructions are available for configuring RSyslog within Paglo at https://app.paglo.com/log_manage/configure. I far prefer working with syslog-ng because its configuration files are logical and easy to understand. RSyslog tries to maintain backwards compatibility with the original syslog and its configuration files are cryptic.
You need to be running syslog-ng version 3.0 or newer. You can check what version of syslog-ng you are running using the command:
syslog-ng --version
Syslog-ng is completely controlled by the contents of the syslog-ng.conf file. On my system it is in /etc/syslog-ng/syslog-ng, but I have also seen it in /usr/local/etc/syslog-ng/syslog-ng.conf. We need to add some configuration to the file so it knows what to do with the log messages received from the SyslogLogger we just added to our rails application. SyslogLogger is sending its messages to the default syslog destination, on linux this is /dev/log. In my existing configuration file this source shows up near the top of the file and is called s_all:
# all known message sources
source s_all {
# message generated by Syslog-NG
internal();
# standard Linux log source (this is the default place for the syslog()
# function to send logs to)
unix-stream("/dev/log");
# messages from the kernel
file("/proc/kmsg" log_prefix("kernel: "));
# use the following line if you want to receive remote UDP logging messages
# (this is equivalent to the "-r" syslogd flag)
# udp();
};
We want to separate out the log messages from our Rails application so we define a filter based on the program name.
filter rails_app {
program("application_name ");
};
Now we can configure the destination which will stream the data to Paglo.
destination log_to_paglo {
syslog("crawler.paglo.com"
port(6514)
transport("tls")
tls( ca_dir("/etc/syslog-ng/ca.d")
key_file("/etc/syslog-ng/cert.d/paglo.key")
cert_file("/etc/syslog-ng/cert.d/paglo.crt")
peer_verify(required-trusted))
);
};
This configures syslog-ng to stream the logs to crawler.paglo.com using the syslog over TLS protocol. The files specified in the tls section are the private key and certification that authenticates this machine with Paglo, and also allows this machine to ensure that it is talking to the ‘real’ Paglo server and not an imposter. The certificate files are generated by Paglo at https://app.paglo.com/log_manage/configure.
Finally we can link all of these configurations together with a log directive which specifies a logging action:
log {
source(s_all);
filter(rails_app);
destination(log_to_paglo);
};
Save the configuration and tell syslog-ng to reread the configuration file by restarting it or sending a HUP signal. I have found it useful to troubleshoot configuration problems by running syslog-ng from the command line in the foreground like:
syslog-ng --foreground
When run like this it print an error messages directly to the console, If the configuration is good then stop it with Ctrl-C.
If everything is working then the next time the Rails application generates a log message it will be sent to Paglo and automatically indexed. Log messages show up in the Paglo search index about 30 seconds after they were generated. Now I can log into Paglo and search for those pesky error messages.

Since we added the SyslogLogger to the Rails application no logs will be recorded in the standard log/production.log file. This can sometimes be a bit frustrating, but we can use syslog-ng to log the same information to multiple destinations. So we can add a file destination like:
destination rails_production_log {
file("/apps/my_application/current/log/production.log");
};
And add this destination to the same log directive so the Rails logs go to both places:
log {
source(s_all);
filter(rails_app);
destination(log_to_paglo);
destination(rails_production_log);
};
Then restart syslog-ng again and you are ready to go.
|
|
Log Search Based Alerts using the Paglo Alerts Application
At the end of June we rolled out Paglo’s log collection service. Chris has previously discussed how you can use this service for searching and analysis.
I want to cover another topic: alerts. Paglo has an Alerts application. Any query you can construct against your Paglo database can be used to generate an alert. We have extended the Alerts app to allow you to create alerts that trigger when certain log searches return results.
When you create an alert you specify the conditions to generate an alert. You can select whether this alert is triggered by the results of a PQL query, a free text search of your PQL database via an Assert search, or a free text Log search.
Here you can see that we have selected Log search and want an alert generated when either core dumped or ReportCrash appear in any log source.

Note: search terms are not case sensitive, but the logic operators OR, AND, NOT must be uppercase.
We use those terms specifically because on FreeBSD and Mac OS X which I am monitoring those are the strings that will appear in a system log file when a program crashes in a generally unhappy way.
The left and right parentheses group the search terms core and dumped together. The word OR will match any log messages where either set of terms appears. So a log message must have the words core and dumped or the word ReportCrash somewhere in it. Note that the logical operator OR must be in uppercase.
There is a final important element to log search based alerts: When the alert is saved, and when the alert is triggered it saves a bookmark of sorts that represents a certain position in your entire database of logs. When the alert is checked it only searches log messages that occurred in time after that bookmark.
Therefore an alert will only fire whenever a log message appears that matches the search terms after the alert has been created. Shortly after, when the alert is next tested, we only look for log messages after the one that caused the alert to fire. This will cause the alert to clear. The alert will then remain clear until another log message appears that matches the search terms.
The email that the Alerts app will generate will have just the log messages that caused the alert to trigger in it so you will have some idea of the specifics of what caused the alert to fire. Here is an example of the email generated by the Alerts app when the alert we defined above fires:

|
|
Paglo for monitoring Amazon EC2 instances - Did you know?
Here are a few questions that recently came in about how Paglo works with the Amazon CloudWatch API.
Question
Does Paglo have auto discovery for Amazon EC2 instances?
Answer
Yes, our CloudWatch plugin (for the Paglo Crawler) will auto-discover every EC2 instance for which you have CloudWatch monitoring turned on.
NOTE: That the last bit is an important point: Paglo will only monitor EC2 instances that you have CloudWatch monitoring turned on via Amazon.
Question
Do you require a client install on monitored servers?
Answer
The answer here is the dreaded…Yes and No. It requires no install on monitored servers, but you need to be running the Paglo Crawler somewhere. It can run on a monitored server or a Win32 machine at your office.
Learn more about monitoring your Amazon EC2 instances
Sign up for free
|
|
New Paglo Crawler plugin for SNMP get requests
My name is Peder and I’m doing a summer engineering internship here at Paglo. My first assignment has been writing a plugin for the Paglo Crawler for communicating with network devices using SNMP.
SNMP is a super useful (although archaic) protocol for gathering all sorts of important information about the resources on your network devices, be it CPU load or the amount of ink left in the color printer.
I have added a general SNMP plugin for the Paglo Crawler to expand the data that Paglo collects. The plugin uses SNMP GET requests to do this. To use the plugin, you can specify one or more object identifiers (OIDs). An OID identifies a variable that can be read through SNMP. For instance, if you want to request the CPU load over the past 5 minutes from a Cisco router you enter the OID “1.3.6.1.4.1.9.2.1.58.0”.
Each number in the OID maps to a node in a Management Information Base (MIB) tree.
The first thing you should do when using this plugin is to figure out the OIDs you want. A good place to start is www.mibdepot.com . OIDs are usually vendor specific and not all devices support all of them, so it’s also a good idea to install the net-snmp unix tools and experiment on the command line to find out what you need.
Now, to use this new plugin in Paglo you need to:
=> Log in to Paglo
=> Select the Crawlers application on the left
=> Choose the SNMP Get Plugin under Crawler→Plugins
=> Click the Custom button and add OIDs

After this step, you probably want to add a schedule for the plugin. Go to the Schedule tab (in the same Crawlers application) and create a new schedule. Something like the one shown below should be fine, just remember to add all the hosts you want the plugin to request data from.

Now you can view the data collected by doing queries for it. For example, to get all the SNMP data collected you could do:
select * from /network/device/snmp
or if you want to check against a particular IP you could use the following search:
select * from /network/device[interface/inet/ip_address = ‘10.10.10.1’]/snmp
The value of a node is merged into the PQL (Paglo Query Language) tree every time it changes. You can see the history by clicking on a value (in your search results) and choosing show history. The historical values are presented and you can display them in a chart and move them to a dashboard.
|
|
What Kind of Disk Did I have Installed In My FreeBSD Server?
We frequently talk here about how Paglo indexes all of your IT infrastructure. We also talk about how to construct various kinds of queries to interrogate that data.
This post is a little different in that I want to cover a happy little use of Paglo to answer a question I had about one of my machines.
I am going to perform a re-install of one of my FreeBSD machines and needed to buy a new disk. But the machine has been around for so long that I could not recall if it was using a SATA or PATA disk. I was not at a convenient location to log in and start poking around, however a Paglo tab in my browser is always convenient.
In to the Paglo search box I typed the query:
matsubue ata
Where “matsubue” is the name of the host I wanted to check, and I was looking for any asset or log file that matches ‘ata.’ That query might have been too broad but narrowing queries is very easy.
This returned:

By going to the ‘tool’ menu I selected ‘show details’ which gave me the contents of the ‘dmesg.boot’ which looked promising and there was my answer:
atapci0: nVidia nForce3 Pro UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf900-0xf90f at device 8.0 on pci0
ata0: ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: ATA channel 1> on atapci0
ata1: [ITHREAD]
atapci1: nVidia nForce3 Pro SATA150 controller> port 0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xf400-0xf40f,0xf300-0xf37f irq 23 at device 10.0 on pci0
atapci1: [ITHREAD]
ata2: ATA channel 0> on atapci1
ata2: [ITHREAD]
ata3: ATA channel 1> on atapci1
ata3: [ITHREAD]
...
ad4: 238475MB Seagate ST3250824AS 3.AAD> at ata2-master SATA150
So this system was using a SATA disk. Question answered.
Sure this information is available if you can log in to the machine in question and look at the dmesg.boot file yourself. However that sort of remote access is not always readily available. Paglo’s indexing of my FreeBSD machines, combined with its search index of that data, and its web based UI made this a very quick and painless search.
|
|
Using consolidation queries to analyze large log data sets
My previous post showed how PQL can be used with log searches to perform sophisticated analysis. If you are dealing with up to a few hundred thousand log events then queries can complete within a second or so. However with larger log data sets it may take many minutes to perform a PQL query that involves tens of millions of events. This makes these queries impractical to use in dashboards where you want to be able to see and navigate the data more quickly. To support analytics over massive data sets Paglo support “consolidation queries”.
A consolidation query is a PQL statement which executes a PQL query and stores the results back into PQL. This is analogous to the practice in SQL of storing intermediate results into a temporary table. The syntax of consolidation queries follows the form:
consolidate into pql_path (key_field*) pql_select_statement
The consolidation query executes the PQL select statement and stores the results at the point in the tree specified by the PQL path. Optional key fields can be specified to created a nested structure. Consolidation queries run periodically (approximately every 15 minutes). Each time a consolidation query runs history is maintained in the resulting tree so that PQL history statements can be used to see how data has changed over time. As a special case, if one of the columns in the query is named ‘timestamp’ then that column will not be stored, but instead will be used to timestamp the data history for each of the other columns. If the select statement includes a log search then the select statement only considers events which have been indexed since the previous time the consolidation statement was run.
Some examples might make all of this clearer.
To compute the number of log events received with a resolution of 5 minutes a query like this could be used:
select timestamp(round(timestamp(timestamp), '5 minutes')) as timestamp, count(*) as total_volume
from %*%
group by timestamp(round(timestamp(timestamp), '5 minutes'))
Since summing across tens of millions of events can be time consuming we can use a consolidation query to pre-compute the result for us, and also allow us to use PQL’s HISTORY syntax to then chart the result. This query
consolidate into /consolidation/log_volume
select timestamp(round(timestamp(timestamp), '5 minutes')) as timestamp, count(*) as total_volume
from %*%
group by timestamp(round(timestamp(timestamp), '5 minutes'))
computes the log volume and stores it into the single tree node /consolidation/log_volume/total_volume but with timestamped history every 5 minutes. We can then use another PQL query to see the log volume over time:
select sum(total_volume) as vol
from /consolidation/log_volume
history to 'now' from '1 week ago'
This query only takes a fraction of a second regardless of how many log events are involved because it is dealing with the aggregated data, not the raw events any more. To store the log volume by source we need to key on the source field:
consolidate into /consolidation/log_volume/sources
keys (source)
select timestamp(round(timestamp(timestamp), '5 minutes')) as timestamp, source, count(*) as total_volume
from %*%
group by timestamp(round(timestamp(timestamp), '5 minutes')), source
This will create multiple /consolidation/log_volume/sources nodes, each one with source and total_volume sub nodes containing the selected values. Then we can use a query like this to get the history of a source:
select sum(total_volume) as vol
from /consolidation/log_volume/sources[source = 'source::/var/log/messages’]
history to 'now' from '1 week ago'
So consolidation queries can be used to distill massive log data sets down to summary information that can be viewed in real-time.
|
|
Analyzing log data with PQL
The results of log searches can be used together with PQL to perform sophisticated analytics on log data. In a PQL query the FROM clause can be a log query. For example if we were interested in analyzing which user names were most common in failed logins we might use a query like:
select *
from %"failed password for" -"invalid user" source::/var/log/auth.log from::"3 days ago"%
This query generates output like:
| event_id |
timestamp |
length |
content |
source |
source_type |
host_name |
host_address |
| 635581 |
2009-07-01T02:58:35.000000Z |
97 |
Jun 30 19:58:35 |
kamidake sshd: Failed password for paglo from 64.139.30.162 port 55417 ssh2 |
/var/log/auth.log |
syslog |
kamidake.apricot.com |
| 552084 |
2009-07-01T02:38:33.000000Z |
93 |
Jun 30 19:38:33 |
kamidake sshd: Failed password for paglo from 10.0.1.3 port 52002 ssh2 |
/var/log/auth.log |
syslog |
kamidake.apricot.com |
| 531168 |
2009-07-01T02:27:09.000000Z |
97 |
Jun 30 19:27:09 |
kamidake sshd: Failed password for paglo from 64.139.30.162 port 65375 ssh2 |
/var/log/auth.log |
syslog |
kamidake.apricot.com |
The columns in the output are primarily meta information about the event like its timestamp, source and length. In some cases there will be additional columns based on information extracted from the event (e.g. if it contains key=value formatted information).
We can treat the output of the search just as if it were a SQL table. For example, to count the total number of matching events we could write:
select count(*)
from %"failed password for" -"invalid user" source::/var/log/auth.log from::"3 days ago"%
giving:
In this case we want to find the number of failed logins by user name. The user name is available, within the content of the event message. Using PQL it can be extracted using the EXTRACT clause like this:
select user
from %"failed password for" -"invalid user" source::/var/log/auth.log from::"3 days ago"%
extract "password for (?'user'[^ ]+)"
giving:
| user |
| paglo |
| paglo |
| paglo |
| paglo |
| root |
| root |
The EXTRACT clause uses a regular expression (in PCRE syntax) to identify the portion of the content to extract as a pseudo-column in the table. Multiple EXTRACT clauses can be used, and multiple fields can be specified in each EXTRACT clause. The extracted field is delimited by the parentheses () and named by the word in quotes. Here is another example:
select user, source_ip, source_port
from %"failed password for" -"invalid user" source::/var/log/auth.log from::"3 days ago"%
extract "password for (?'user'[^ ]+)"
extract "from (?'source_ip'\d+\.\d+\.\d+\.\d+) port (?'source_port'\d+)"
giving
| user |
source_ip source_port |
| paglo |
64.139.30.162 |
55417 |
| paglo |
10.0.1.3 |
52002 |
| paglo |
64.139.30.162 |
65375 |
| paglo |
10.0.1.3 |
60944 |
| root |
190.2.0.151 |
49170 |
| root |
190.2.0.151 |
49033 |
Once the fields are extracted any PQL feature can be applied, such as column formatting, sorting and aggregation. So here is the final query we need to find how often each user name appears in failed logins:
select user as User, count(*) as Count
from %"failed password for" -"invalid user" source::/var/log/auth.log from::"3 days ago"%
extract "password for (?'user'[^ ]+)"
group by user
order by 2 desc
and the answer
| User |
Count |
| root |
79 |
| invalid |
67 |
| paglo |
59 |
| postfix |
2 |
| cyrus |
1 |
| pgsql |
1 |
As always, if there is a complex query you need to write, we here at Paglo are happy to help so just contact support.
|
|
Monitoring your Amazon.com EC2 instances via Amazon.com's CloudWatch with Paglo
Here at Paglo we like to monitor all sorts of things about IT infrastructure and (just about any other data that can be captured and monitored over time). Being able to grok how systems and applications are being used, where the hot spots are, really gives us a buzz.
You can see this in our recent Paglo Crawler plugin for VMware virtualization monitoring
You can now also use this plugin to see a VMware guest’s usage not from just inside the device, but from outside, as the VMware host sees it.
When Amazon recently announced their CloudWatch service — we were excited. In essence it was the same sort of view of the EC2 instances that the VMware API gave us.
Being no stranger to Amazon’s APIs we decided to whip up a Paglo Crawler plugin that would gather all the statistics that it could from the Amazon CloudWatch API for monitoring EC2 instances.
With our new release of the Crawler the VMware and Amazon plugins are automatically included. When you configure the Amazon plugin in the “Crawlers” application with your EC2 credentials it will query Amazon’s CloudWatch API every 5 minutes gathering the last 5 minutes of every statistic available.

And you don’t have to worry about doing anything in your Paglo account to handle the new data. This is because the your account’s Paglo index is semi-structured which lets us gather data from any source and plug it in.
All the statistics gathered from Amazon’s CloudWatch are stored under:
/apps/com/paglo/aws/cloud_watch/
Under here there will be a ‘statistic’ subnode for every stat we gather from CloudWatch.
For every statistic captured, we index all of the possible aggregations provided by CloudWatch. For example, the query:
select * from /apps/com/paglo/aws/cloud_watch
will provide something like:
statistic
namespace AWS/EC2
measure_name networkout
samples 15.0
sum 2.2491354E7
average 1499423.6
unit Bytes
minimum 868.0
maximum 7221048.0
total true
statistic
namespace AWS/EC2
measure_name cpuutilization
samples 5.0
sum 6.14
average 1.228
unit Percent
minimum 0.0
maximum 3.46
instanceid i-c11759a8
statistic
namespace AWS/EC2
measure_name diskwritebytes
samples 15.0
sum 3.3189888E7
average 2212659.2
unit Bytes
minimum 32768.0
maximum 8671232.0
instancetype m1.small
This query shows you (for the last 5 minutes) the number of samples, sum, average, min and max for NetworkOut in bytes across all instances, CPUUtilization for InstanceID ‘i-c11759a8’, and DiskWriteBytes for all instances with the InstanceType ‘m1.small’
Because of Paglo’s historical value and data analytics capabilities this lets you immediately begin to graph these various statistics over time.
For example here is some data that we have been tracking. Here is CPUUtialization for our three Amazon EC2 instances over the past six hours:
select instanceid||':'||measure_name||' '||unit as caption,average from /apps/com/paglo/aws/cloud_watch/statistic where instanceid is not null and measure_name in ('cpuutilization') history from '6 hours ago' to 'now'

Network in and out for each of our three instances over the last six hours:
select instanceid||':'||measure_name||' '||unit as caption,average from /apps/com/paglo/aws/cloud_watch/statisticwhere instanceid is not null and measure_name in ('networkin', 'networkout') history from '6 hours ago' to 'now'

In addition to charts you can also generate some useful tables, like: when was the last time we gathered statistics for an instance:
select instanceid, measure_name, format(observed_at(instanceid),'timestamp') from /apps/com/paglo/aws/cloud_watch/statistic where instanceid is not null

If you want to know the names of all the measures collected:
select distinct measure_name from /apps/com/paglo/aws/cloud_watch/statistic

As you can see, the data provided by Amazon’s CloudWatch API coupled with the data gathering, quering, and analytics ability of Paglo makes it easy to monitor your Amazon EC2 instances.
Let us know what you think and what additional data you would like to capture, search, and store via Paglo.
And remember that the Paglo Crawler is released under the GPL which makes it easy for you to add your own plugins too.
|
|