Blog Viewer

SSPA Blog: Analyzing the analyst

  

I’ve recently been using statistical programming recently to analyze … statistical programming!

I thought I’d share a couple of thoughts on this topic.

The Unix server which handles statistical programming where I work is used by dozens of programmers, analysts and statisticians. Throughout the course of a normal day, there are occasions of slow processing times. At least, that’s the perception I have.

I was wondering why this might be – I suspected either an increase in programming jobs, or an increase in the “depth” of particular programming jobs (eg, complex statistical models perhaps, or large simulations).

To analyze what was going on, I needed to:

  1. Collect data
  2. Analyze/Present summaries

To address the first need, I wrote a Unix script which runs over our 12-hour workday (just wanted to make sure you're still reading). This script looks at the server process table, and pulls out such information as:

-          Current programming usage
-          Multiprocessor usage
-          Disk access usage

The script sends all these records to a text file.

At the end of the script, the text file is closed, and a statistical analysis script executes (written in SAS). This script reads in the text file, parses out and summaries usage data, and produces a variety of plots in HTML format. These plots are placed back on the Unix server, and are easily seen within a standard web browser.

The plots reflect:

-          System Load vs Time
-          Programming Jobs vs Time
-          Multiprocessor Load vs Time
-          Disk Access vs Time
-          System Load vs Programming Job
-          And more!

What I’ve learned so far is:

  1. Work slows down around 5-6 pm
  2. The System Load never gets too bad throughout the day
  3. On some days, the association between Programming Jobs and System Load looks to be in the positive direction (that is, as jobs increase, load increases) – but not every day by any means
  4. Associations around Multiprocessor Load and Disk Access are positive, but very small

So, what’s next? The good news is that our server is handling our work just fine. I still believe that there are some other factors out there which have a more significant role in our server performance spikes, but I don’t know what they are – yet.

I did take a look around and see what else is out there – figuring that others have tried to do the same thing. Here are a couple of links here for any interested reader to follow:

* This page provides detail on Unix system monitoring commands

* This paper provides a SAS-based approach to monitoring Unix performance

 

0 comments
9 views

Permalink

Tag