Shopping Product Reviews

IBM Information Server 8.X (DataStage): architecture and its components

What is DataStage?

  • An ETL tool to extract, transform and load the data in data mart or data warehouse
  • It is used for data integration projects like data warehouse, ODS (Operational Data Store) and can connect to major databases like Teradata, Oracle, DB2, SQL Server etc.
  • Designed ETL jobs can migrate across different environments, such as Dev, UAT, and Prod, by importing and exporting DataStage components.
  • You can manage metadata on jobs.
  • You can schedule, run, and monitor jobs in DataStage

Data stage architecture:

DataStage allows us to develop jobs in Server or Parallel editions. Parallel editing uses parallel processing capabilities to process the data and is ideal for large volumes of data.

Components:

  • Designated
  • Director
  • Administrator

Administrator:

The following tasks performed with the administrator.

  • Add, delete and move projects
  • Set user permissions for projects
  • Purge job log files
  • Set the timeout interval in the engine
  • Engine Activity Tracking
  • Set Job Parameter Defaults
  • Issue WebSphere DataStage Engine commands from the administrative client
  • Configure the parallel processing job settings.
  • Create/set environmental variables.

Enabling job management on the Director client:

These functions allow WebSphere DataStage operators to release the resources of a job that has been canceled or hung, and therefore return the job to a state in which it can be executed.

This procedure enables two commands on the Director menu.

  • CleaningResources
  • Clear state file

Appointed:

  • Design and develop using the graphic design tool.
  • Various stages like General, Database, File, Processing stages used when developing jobs
  • Table definitions can be imported directly from data source or data warehouse tables
  • Jobs are compiled with the designer, and the designer checks main inputs, reference outputs, key expressions, transformations, and so on for compile errors.
  • Import and/or export projects from different environments
  • Server, mainframe and parallel jobs can be created using the designer
  • Define the parameters in the parameters page under the properties and they will be used accordingly in the development phase
  • You can create custom routines
  • Multiple jobs can be selected for the build and provide the report after the build is complete

Director:

  • Validate, schedule, run, and monitor jobs run by the DataStage server
  • The job status displays the current status as running, compiled, finished, aborted, and not compiled
  • Job Log displays the log file for the selected job
  • Reset the job if the state is canceled or stopped before running it again.
  • Provides the execution times of the jobs.
  • Ability to clean up resources (if administrator has enabled this option)

Along with these jobs, DataStage provides containers (local containers and shared containers) and stream jobs allow you to specify a stream of servers or parallel jobs to run.

Leave a Reply

Your email address will not be published. Required fields are marked *