HomeSection_sub_breakBlogsSection_sub_breakTechnical Blog
Icon_blog_forum_small The Paglo Data Center Hosted Cloud Based Crawler
Icon_person
Scanner
Icon_time
11/19/2009 at 15:28
Icon_post
0 comments

You can now use Paglo’s data center infrastructure to remotely monitor your Web site and applications running on Cloud-based platforms like Amazon.

Last week we introduced a new alert that would monitor the health of your network connection. The key to the ability of this new alert was that it was being run out of the Paglo Data Center via our Cloud Based Paglo Crawler.

At the time there was no easy way for a user to create a Cloud Based Crawler outside of creating an instance of the ‘Internet is Down’ alert. Now we have added in to the ‘Crawlers’ app the ability to directly create a Cloud Based Crawler. Each company index may have only one Cloud Based Crawler.

When you go to the ‘Crawlers’ app if your company does not yet have a Cloud Based Crawler a link will show up in the upper right hand of the page. Clicking this will cause a Cloud Based Crawler to be created for your company. It may take a couple of minutes for the Cloud Based Crawler to be fully instantiated and running. When the ‘Schedules’ tab shows up is when the creation process has been completed.

Having a Cloud Based Paglo Crawler can do a lot more then simply probe the health of your network connection. Using the Service Health Check plugin you can monitor the reliability of a number of internet services and you can do this from our data center. Useful if you want to know if your externally hosted web site is up and running and what sort of latency is may be experiencing.

The list of services that are probed the Service Health Check plugin are:

  • ICMP Echo (‘ping’)
  • FTP (port 21, TCP)
  • Telnet (port 23, TCP)
  • SMTP (port 25, TCP)
  • DNS (port 53, UDP)
  • HTTP (port 80, Success if “GET /” returns a 2XX or 3XX response.)
  • POP3 (port 110, TCP)
  • IMAP (port 143, TCP)
  • LDAP (port 389, TCP)
  • HTTPS (port 443, TCP)
  • SQLServer (port 1433, TCP)
  • Oracle (port 1521, TCP)
  • MySQL (port 3306, TCP)
  • eHub (port 9013, TCP)

The Service Health Check plugin will return success for a specific service if we can connect to it. The exceptions are: DNS in which we look for any sort of DNS response to a generic query, and HTTP must return a 2XX or 3XX response from a ‘GET /’ request.

A number of users of Paglo have been interested in our ability gather Amazon CloudWatch data, but have not wanted to run a Paglo Crawler on their machines. They can now use the AWS CloudWatch plugin in the Cloud Based Crawler to monitor their Amazon EC2 instances.

How do you delete a Cloud Based Crawler? If you go to the ‘Advanced’ tab for the Cloud Based Crawler and click on the ‘Delete’ button it will not only schedule the entries in the PQL database that represent the crawler to be deleted but also delete the associated Cloud Based Crawler instance. NOTE: When you do this, unlike with a Paglo Crawler running on one of your machines, the Cloud Based Crawler is itself also deleted. This means that all schedules, plugin configurations and historical information directly related to this Cloud Based Crawler are also deleted. It will not show up again after you delete it.

NOTE: The Cloud Based Crawler is load balanced across a number of our servers. This means that once you enable it and turn on the ‘Service Health Check’ plugin you are going to be probed from our data center’s IP addresses. Currently the probing will come from crawler01.paglo.com and crawler02.paglo.com although this may grow in the future.

Add a Comment