Wednesday 17 October 2012

Monitoring Datastage Jobs

The Monitor window in Datastage Director -

Datastage Job Monitor is accessible through Datastage Director.
This option appears by right clicking on any Job name in Datastage Director client.
OR
Select the Job on the Director list window. Go to Tools --> View Monitor and select the job.


This basically displays summary information about relevant stages in a job that is being run or validated. It has a tree structure containing stages in a job and associated links. For server jobs active stages are shown (active stages are those that perform processing rather than ones reading or writing a data source). For parallel jobs, all stages are shown.

You can display more information about a stage and set the server update interval.

If you are monitoring a parallel job, and have not chosen to view instance information, the monitor displays information for Parallel jobs as follows:

  • If a stage is running in parallel, then x N is appended to the stage name, where N gives how many instances are running.
  • If a stage is running in parallel then the Num Rows column shows the total number of rows processed by all instances. The Rows/sec is derived from this value and shows the total throughput of all instances.
  • If a stage is running in parallel then the %CP may be more than 100 if there are multiple CPUs on the server. For example, on a machine with four CPUs, %CP could be as high as 400 where a stage is occupying 100% of each of the four processors, on the other hand, if the stage is occupying only 25% of each processor the %CP would be 100%.
From the figure above, we see several columns which are elaborated below:

Stage name:
This column displays the names of stages that perform processing (for example, Transformer stages). Stages that represent data sources or data marts are not displayed.

Link Type:
When you have selected a link in the tree, displays the type of link as follows:
  • <<Pri primary input link
  • <Ref input link
  • >Out output link
  • >Rej output link for rejected rows
Status:
The status of the stage. The possible states are:
  • Aborted. The process finished abnormally at this stage.
  • Finished. All data has been processed by the stage.
  • Ready. The stage is ready to process the data.
  • Running. Data is being processed by the stage.
  • Starting. The processing is starting.
  • Stopped. The processing was stopped manually at this stage.
  • Waiting. The stage is waiting to start processing.
Num rows:
This column displays the number of rows of data processed so far by each stage on its primary input.

Started at:
This column shows the time that processing started on the server.

Elapsed time:
This column shows the elapsed time since processing of the stage began.

Rows/sec:
This column displays the number of rows that are processed per second.

%CP:
The percentage of CPU the stage is using (you can turn the display of this column on and off from the shortcut menu)




Parallel Job Instance Information:
 To monitor instances of parallel jobs individually, choose Show Instances from the shortcut menu. The monitor window will then show each instance of a stage as a sub-branch under the ‘parent’ stage, The monitor displays the information for all stage instances under the ‘parent’ stage. Only relevant information is shown for each stage instance as follows:




Summary:

The job monitor provides a useful snapshot of a job's performance at a moment of execution, but does not provide thorough performance metrics. That is, a job monitor snapshot should not be used in place of a full run of the job, or a run with a sample set of data. Due to buffering and to some job semantics, a snapshot image of the flow might not be a representative sample of the performance over the course of the entire job.

The CPU summary information provided by the job monitor is useful as a first approximation of where time is being spent in the flow. However, it does not include any sorts or similar that might be inserted automatically in a parallel job. For these components, the score dump can be of assistance. See "Score Dumps".

A worst-case scenario occurs when a job flow reads from a data set, and passes immediately to a sort on a link. The job will appear to hang, when, in fact, rows are being read from the data set and passed to the sort.


The operation of the job monitor is controlled by two environment variables: APT_MONITOR_TIME and APT_MONITOR_SIZE. By default the job monitor takes a snapshot every five seconds. You can alter the time interval by changing the value of APT_MONITOR_TIME, or you can have the monitor generate a new snapshot every so-many rows by following this procedure:

1.Select APT_MONITOR_TIME on the InfoSphere DataStage Administrator environment variable dialog box, and press the set to default button.
2.Select APT_MONITOR_SIZE and set the required number of rows as the value for this variable.

2 comments: