Overview Of Transformations


1. Aggregator

The Aggregator transformation allows you to perform aggregate calculations, such as averages and sums. The Aggregator transformation is unlike the Expression transformation, in that you can use the Aggregator transformation to perform calculations on groups. The Expression transformation permits you to perform calculations on a row-by-row basis only. When using the transformation language to create aggregate expressions, you can use conditional clauses to filter records, providing more flexibility than SQL language. The Informatica Server performs aggregate calculations as it reads, and stores necessary data group and row data in an aggregate cache.

After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Informatica Server performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.

2. Filter

The Filter transformation provides the means for filtering rows in a mapping. You pass all the rows from a source transformation through the Filter transformation, and then enter a filter condition for the transformation. All ports in a Filter transformation are input/output, and only rows that meet the condition pass through the Filter transformation. In some cases, you need to filter data based on one or more conditions before writing it to targets. For example, if you have a human resources data warehouse containing information about current employees, you might want to filter out employees who are part-time and hourly.

The mapping in passes the rows from a human resources table that contains employee data through a Filter transformation. The filter only allows rows through for employees that make salaries of $30,000 or higher.


3. Joiner

While a Source Qualifier transformation can join data originating from a common source database, the Joiner transformation joins two related heterogeneous sources residing in different locations or file systems. The combination of sources can be varied. You can use the following sources:

a) Two relational tables existing in separate databases
b) Two flat files in potentially different file systems
c) Two different ODBC sources
d) Two instances of the same XML source
e) A relational table and a flat file source
f) A relational table and an XML source

You use the Joiner transformation to join two sources with at least one matching port. The Joiner transformation uses a condition that matches one or more pairs of ports between the two sources.

For example, you might want to join a flat file with in-house customer IDs and a relational database table that contains user-defined customer IDs. You could import the flat file into a temporary database table, and then perform the join in the database. However, if you use the Joiner transformation, there is no need to import or create temporary tables.

If two relational sources contain keys, then a Source Qualifier transformation can easily join the sources on those keys. Joiner transformations typically combine information from two different sources that do not have matching keys, such as flat file sources. The Joiner transformation allows you to join sources that contain binary data.
 The Joiner transformation supports the following join types, which you set in the Properties tab:

1. Normal (Default)
2. Master Outer
3. Detail Outer
4. Full Outer
  
4. Source Qualifier

When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier represents the records that the Informatica Server reads when it runs a session. You can use the Source Qualifier to perform the following tasks:

a) Join data originating from the same source database
You can join two or more tables with primary-foreign key relationships by linking the sources to one Source Qualifier.

b) Filter records when the Informatica Server reads source data If you include a filter condition, the Informatica Server adds a WHERE clause to the default query.

c) Specify an outer join rather than the default inner join If you include a user-defined join, the Informatica Server replaces the join information specified by the metadata in the SQL query.

d) Specify sorted ports
If you specify a number for sorted ports, the Informatica Server adds an ORDER BY clause to the default SQL query.

 
e) Select only distinct values from the source If you choose Select Distinct, the Informatica Server adds a SELECT DISTINCT statement to the default SQL query.
For example, you might use a custom query to perform aggregate calculations or execute a stored
procedure.


f) Create a custom query to issue a special SELECT statement for the Informatica Server to read source data


5. Stored Procedure

A Stored Procedure transformation is an important tool for populating and maintaining databases. Database administrators create stored procedures to automate time-consuming tasks that are too complicated for standard SQL statements.

A stored procedure is a precompiled collection of Transact-SQL statements and optional flow control statements, similar to an executable script. Stored procedures are stored and run within the database. You can run a stored procedure with the EXECUTE SQL statement in a database client tool, just as you can run SQL statements. Unlike standard SQL, however, stored procedures allow user-defined variables, conditional statements, and other
powerful programming features. 

Not all databases support stored procedures, and database implementations vary widely on their syntax. You might use stored procedures to:

a) Drop and recreate indexes.
b) Check the status of a target database before moving records into it.
c) Determine if enough space exists in a database.
d) Perform a specialized calculation.

Database
developers and programmers use stored procedures for various tasks within databases, since stored procedures allow greater flexibility than SQL statements. Stored procedures also provide error handling and logging necessary for mission critical tasks. Developers create stored procedures in the database using the client tools provided with the database.
The stored procedure must exist in the database before creating a Stored Procedure
transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server.

You might use a stored procedure to perform a query or calculation that you would otherwise make part of a mapping. For example, if you already have a well-tested stored procedure for calculating sales tax, you can perform that calculation through the stored procedure instead of recreating the same calculation in an Expression
transformation.

 6. Sequence Generator

The Sequence Generator transformation generates numeric values. You can use the Sequence Generator to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers.

The Sequence Generator transformation is a connected transformation. It contains two output ports that you can connect to one or more transformations. The Informatica Server generates a value each time a row enters a connected transformation, even if that value is not used. When NEXTVAL is connected to the input port of another transformation, the Informatica Server generates a sequence of numbers. When CURRVAL is connected to the input port of another transformation, the Informatica Server generates the NEXTVAL value plus one.

You can make a Sequence Generator reusable, and use it in multiple mappings. You might reuse a Sequence Generator when you perform multiple loads to a single target.

For example, if you have a large input file that you separate into three sessions running in parallel, you can use a Sequence Generator to generate primary key values. If you use different Sequence Generators, the Informatica Server might accidentally generate duplicate key values. Instead, you can use the same reusable Sequence Generator for all three sessions to provide a unique value for each target row.

7. Rank

The Rank transformation allows you to select only the top or bottom rank of data. You can use a Rank transformation to return the largest or smallest numeric value in a port or group. You can also use a Rank transformation to return the strings at the top or the bottom of a session sort order. During the session, the Informatica Server caches input data until it can perform the rank calculations.

The Rank transformation differs from the transformation functions MAX and MIN, in that it allows you to select a group of top or bottom values, not just one value. For example, you can use Rank to select the top 10 salespersons in a given territory. Or, to generate a financial report, you might also use a Rank transformation to identify the three departments with the lowest expenses in salaries and overhead. While the SQL language provides many functions designed to handle groups of data, identifying top or bottom strata within a set of rows
is not possible using standard SQL functions.

You connect all ports representing the same row set to the transformation. Only the rows that fall within that rank, based on some measure you set when you configure the transformation, pass through the Rank transformation. You can also write expressions to transform data or perform calculations.

8. Look Up

Use a Lookup transformation in your mapping to look up data in a relational table, view, or synonym. Import a lookup definition from any relational database to which both the Informatica Client and Server can connect. You can use multiple Lookup transformations in a mapping.

The Informatica Server queries the lookup table based on the lookup ports in the transformation. It compares Lookup transformation port values to lookup table column values based on the lookup condition. Use the result of the lookup to pass to other transformations and the target.

You can use the Lookup transformation to perform many tasks, including:

a) Get a related value:

For example, if your source table includes employee ID, but you want to include the employee name in your target table to make your summary data easier to read.
b) Perform a calculation:


Many normalized tables include values used in a calculation, such as gross sales per invoice or
sales tax, but not the calculated value (such as net sales).

c) Update slowly changing dimension tables:

You can use a Lookup transformation to determine whether records already exist in the target.
You can configure the Lookup transformation to perform different types of lookups. You can configure the transformation to be connected or unconnected, cached or uncached:

a) Connected or unconnected:
 Connected and unconnected transformations receive input and send output in different ways.
b) Cached or uncached:
Sometimes you can improve session performance by caching the lookup table. If you cache the lookup table, you can choose to use a dynamic or static cache. By default, the lookup cache remains static and does not change during the session. With a dynamic cache, the Informatica Server inserts rows into the cache during the session.
Informatica recommends that you cache the target table as the lookup. This enables you to look up values in the target and insert them if they do not exist.

9. Expression

You can use the Expression transformations to calculate values in a single row before you write to the target. For example, you might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. You can use the Expression transformation to perform any non-aggregate calculations. You can also use the Expression transformation to test conditional statements before you output the results to target tables or other transformations.

Note:
10. Router

A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do
not meet the condition. However, a Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group.

If you need to test the same input data based on multiple conditions, use a Router Transformation in a mapping instead of creating multiple Filter transformations to perform the same task. The Router transformation is more efficient when you design a mapping and when you run a session. For example, to test data based on three conditions, you only need one Router transformation instead of three filter transformations to perform this task. Likewise, when you use a Router transformation in a mapping, the Informatica Server processes the incoming data only once. When you use multiple Filter transformations in a mapping, the Informatica Server processes the incoming data for each transformation.  

11. Update Strategy

When you design your data warehouse, you need to decide what type of information to store in targets. As part of your target table design, you need to determine whether to maintain all the historic data or just the most recent changes. 

For example, you might have a target table, T_CUSTOMERS that contains customer data. When a customer address changes, you may want to save the original address in the table, instead of updating that portion of the customer record. In this case, you would create a new record containing the updated address, and preserve the
original record with the old customer address. This illustrates how you might store historical information in a target table. However, if you want the T_CUSTOMERS table to be a snapshot of current customer data, you would update the existing customer record and lose the original address.

The model you choose constitutes your update strategy, how to handle changes to existing records. In PowerMart and PowerCenter, you set your update strategy at two different levels:
 
a) Within a session:
When you configure a session, you can instruct the Informatica Server to either treat all records in the same way (for example, treat all records as inserts), or use instructions coded into the session mapping to flag records for different database operations.

b) Within a mapping:
Within a mapping, you use the Update Strategy transformation to flag records for insert, delete, update, or reject.

To perform calculations involving multiple rows, such as sums or averages, use the Aggregator transformation. Unlike the Expression transformation, the Aggregator allows you to group and sort data. 
  

No comments:

Post a Comment