Analytics & Information Management: Teradata

Teradata Overview

Teradata is an enterprise software company that develops and sells a relational database management system (RDBMS) with the same name. In February 2011, Gartner ranked Teradata as one of the leading companies in data warehousing and enterprise analytics. Teradata was a division of the NCR Corporation, which acquired Teradata on February 28, 1991. Teradata's revenues in 2005 were almost $1.5 billion with an operating margin of 21%. On January 8, 2007, NCR announced that it would spin-off Teradata as an independently traded company, and this spin-off was completed October 1 of the same year, with Teradata trading under the NYSE stock symbol TDC.^[6]

The Teradata product is referred to as a "data warehouse system" and stores and manages data. The data warehouses use a "shared nothing architecture, which means that each server node has its own memory and processing power. Adding more servers and nodes increases the amount of data that can be stored. The database software sits on top of the servers and spreads the workload among them. Teradata sells applications and software to process different types of data. In 2010, Teradata added text analytics to track unstructured data, such as word processor documents, and semi-structured data, such as spreadsheets.
Teradata's product can be used for business analysis. Data warehouses can track company data, such as sales, customer preferences, product placement, etc.

Teradata is made up of following components –

Processor Chip – The processor is the BRAIN of the Teradata system. It is responsible for all the processing done by the system. All task are done according to the direction of the processor.

Memory – The memory is known as the HAND of the Teradata system. Data is retrieved from the hard drives into memory, where processor manipulates, change or alter the data. Once changes are made in memory, the processor directs the information back to the hard drive for storage.

Hard Drives – This is known as the SPINE of the Teradata system. All the data of the Teradata system is stored in the hard drives. Size of hard drives reflects the size of the Teradata system

Teradata has Linear Scalability
One of the most important asset of Teradata is that it has Linear Scalability. There is no limit on Teradata system. We can grow it to as many times as we want. Any time you want to double the speed of Teradata system, just double the numbers of AMPs and PE. This can be better explained with the help of an example

- Teradata takes every table in the system and spread evenly among different AMPs. Each Amp works on the portion of records which it holds.

- Suppose a EMPLOYEE table has 8 different employee id’s. Now in a 2 AMP system each AMP will hold 4 rows in its DISK to accommodate total 8 rows.

2 AMP SYSTEM

At the time of data retrieval each AMP will work on its DISK and send 4 rows to PE for further processing. If we suppose, one AMP will take 1 microseconds (MS) to retrieve 1 rows, then the time taken to retrieve 4 rows is 4 MS. And as we know that AMPs work in parallel, so both the AMPs will retrieve all 8 records in 4 MS only (4 MS time for each AMP).

Now we double the AMP in our system, and we use total 4 AMP. As Teradata distribute the records evenly among all AMPs, so now each AMP will store 2 records of the table.

4 AMP SYSTEM

Now according to our time scale, the time taken by each AMP for retrieving 2 records is 2MS.
So all 4 AMPs, working parallel, will retrieve the 8 records in 2MS only. Which was previously 4MS for the 2 AMP system.

Hence we double our speed by doubling the number of AMPs in our system.

This is the power of parallelism in Teradata. It is also known as ‘DIVIDE and CONQUER’ theory, according to which we are dividing the work equally and getting the result faster. To achieve the desirable speed we can increase the number of AMPs accordingly.

Partition Primary Index – Advantage and Disadvantage

Advantage of Partition Primary Index –

Partitioned Primary Index is one of the unique features of Teradata, which is used for distribution of rows based on different partitions so that they can be retrieved much faster than any other conventional approach.
Maximum partitions allowed by Teradata – 65,535 ( suggest if any up gradation )
It also reduces the overhead of scanning the complete table (or FTS) thus improving performance.
In PPI tables row is hashed normally on the basis of its PI, but actual storage of row in AMP will take place only in its respective partition. It means rows are sorted first on the basis of there partition column and then inside that partition they are sorted by there row hash.
Usually PPI’s are defined on a table in order to increase query efficiency by avoiding full table scans without the overhead and maintenance costs of secondary indexes.
Deletes on the PPI table is much faster.
For range based queries we can effectively remove SI and use PPI, thus saving overhead of SI subtable.

Disadvantage of Partition Primary Index –

PPI rows are 2 bytes are longer so it will use more PERM space.
In case we have defined SI on PPI table then as usual size of SI sub table will also increase by 2 bytes for each referencing rowed
A PI access can be degraded if the partition column is not part of the PI. For e.g. if query specifying a PI value but no value for the PPI column must look in each partition for that table, hence loosing the advantage of using PI in where clause.
When we are doing joins to non-partitioned tables with the PPI table then that join may be degraded. If one of the tables is partitioned and other one is non-partitioned then sliding window merger join will take place.
The PI can’t be defined UNIQUE when the portioning columns are not the part of PI.

Technology and product

Teradata is a massively parallel processing system running, a shared nothing architecture. Its technology consists of hardware, software, database, and consulting. The system moves data to a data warehouse where it can be recalled and analyzed.

The systems can be used as back-up for one another during downtime, and in normal operation balance the work load across themselves.

In 2009, Forrester Research issued a report, "The Forrester Wave: Enterprise Data Warehouse Platform," by James Kobielus, rating Teradata the industry's number one enterprise data warehouse platform in the "Current Offering" category.

Marketing research company Gartner Group placed Teradata in the "leaders quadrant" in its 2009, 2010, and 2012 reports, "Magic Quadrant for Data Warehouse Database Management Systems".

Teradata is the most popular data warehouse DBMS in the DB-Engines database ranking.

In 2010, Teradata was listed in Fortune’s annual list of Most Admired Companies

Active enterprise data warehouse

Teradata Active Enterprise Data Warehouse is the platform that runs the Teradata Database, with added data management tools and data mining software.
The data warehouse differentiates between “hot and cold” data – meaning that the warehouse puts data that is not often used in a slower storage section. As of October 2010, Teradata uses Xeon 5600 processors for the server nodes.
Teradata Database 13.10 was announced in 2010 as the company’s database software for storing and processing data.
Teradata Database 14 was sold as the upgrade to 13.10 in 2011 and runs multiple data warehouse workloads at the same time. It includes column-store analyses.
Teradata Integrated Analytics is a set of tools for data analysis that resides inside the data warehouse

Backup, archive, and restore

BAR is Teradata’s backup and recovery system.
The Teradata Disaster Recovery Solution is automation and tools for data recovery and archiving. Customer data can be stored in an offsite recovery center.

Platform family

Teradata Platform Family is a set of products that include the Teradata Data Warehouse, Database, and a set of analytic tools. The platform family is marketed as a smaller and less expensive than the other Teradata solutions

Analytics & Information Management

Like us on Facebook

No of Viewers

Friday, 4 October 2013

Teradata

Partition Primary Index – Advantage and Disadvantage

Active enterprise data warehouse

Backup, archive, and restore

Platform family

No comments:

Post a Comment