Tuesday, 2 September 2014

Importing CFDs in Datastage

To import a CFD: Procedure 1. Open the Import Meta Data (CFD) dialog box in either of these ways: > Choose Import > Table Definitions > COBOL File Definitions from the main menu. > Right-click the Table Definitions folder in the repository tree and select Import Table Definition > COBOL File Definitions from the shortcut menu. 2. In the COBOL file description pathname field, type or browse for the path name where...
Read more ...>>

COBOL File Definitions in Datastage

COBOL File Definitions contain data description statements in a text file that describe a file format in COBOL terms. You can import CFDs into the InfoSphere™ DataStage® repository directly from a COBOL program. A CFD file can contain multiple table definitions, and can be either a COBOL copybook or a COBOL source program. Before you import a COBOL FD, be sure it contains valid COBOL syntax. InfoSphere DataStage supports level...
Read more ...>>

The Unix Time Command : tips & tricks

If you have a program ./prog.e then in the bash/ksh shell you can type this command and the output on the screen details how long the code took to run: $ time ./prog.e real 24m10.951s user 6m2.390s sys 0m15.705s Real time - Elapsed time from beginning to end of program (or wall clock time).The real time is the total time of execution. CPU time - Divided into User time and System time. User time - time used by...
Read more ...>>

Wednesday, 26 February 2014

DataSet in DataStage

Inside a InfoSphere DataStage parallel job, data is moved around in data sets. These carry meta data with them, both column definitions and information about the configuration that was in effect when the data set was created. If for example, you have a stage which limits execution to a subset of available nodes, and the data set was created...
Read more ...>>

Thursday, 18 October 2012

Basics Of UNIX

The purpose of this post is to have a single page of frequently used basics commands for getting started with UNIX. Basic UNIX Command Line (shell) navigation: Directories: Directories:  Moving around the file system:  Listing directory contents:  Changing file permissions and attributes  Moving, renaming, and copying files:  Viewing and editing files: Directories: File and directory paths in UNIX...
Read more ...>>

Wednesday, 17 October 2012

Performance Tuning in IBM InfoSphere DataStage

Performance is a key factor in the success of any data warehousing project. Care for optimization and performance should be taken into account from the inception of the design and development process. Ideally, a DataStage® job should process large volumes of data within a short period of time. For maximum throughput and performance, a well performing infrastructure is required, or else the tuning of DataStage® jobs will not make...
Read more ...>>

Monitoring Datastage Jobs

The Monitor window in Datastage Director - Datastage Job Monitor is accessible through Datastage Director. This option appears by right clicking on any Job name in Datastage Director client. OR Select the Job on the Director list window. Go to Tools --> View Monitor and select the job. This basically displays summary information...
Read more ...>>

Tuesday, 16 October 2012

ETL vs ELT

One among the many useful articles by Vincent McBurney about comparison between ETL and ELT.  Every now and then I come across a blog entry that reminds me there are people out there who know a lot more about my niche than I do! This is fortunate as this week it has helped me understand ELT tools. ETL versus ELT and ETLT The world of data integration has it's own Coke versus Pepsi challenge - it's called ETL versus...
Read more ...>>

Datastage Execution Flow

When you execute a job, the generated OSH and contents of the configuration file ($APT_CONFIG_FILE) is used to compose a “score”. This is similar to a SQL query optimization plan. At runtime, IBM InfoSphere DataStage identifies the degree of parallelism and node assignments for each operator, and inserts sorts and partitioners as needed...
Read more ...>>

Saturday, 13 October 2012

Quick Datastage Tips

While designing/coding any kind or level of job, few tips may become very handy. This would result into designing of Datastage jobs efficiently. More the efficiency, better would be the performance of the job once executed. These handy tips are also useful in reducing the job designing/coding time and better debugging approach. Partitioning Partitioning...
Read more ...>>