Search This Blog

Saturday, January 30, 2010

Hash Partitioning/How Cardinality Affects Distribution

Hash partitioning provides a method of evenly distributing data across a specified number of partitions. Rows are mapped into partitions based on a hash value of the partitioning key and is a better choice than range partitioning when:

a.You do not know beforehand how much data will map into a given range

b.Sizes of range partitions would differ quite substantially

c.Partition pruning and partition-wise joins on a partitioning key are important

d.Range partitioning would cause the data to be undesirably clustered

In 11g hash partitioning can also be used:
a.To enable partial or full parallel partition-wise joins with very likely equi-sized partitions.

b.To distribute data evenly among the nodes of an MPP platform that uses Oracle Real Application Clusters. As a result, you can minimize interconnect traffic
when processing internode parallel statements.

c.To use partition pruning and partition-wise joins according to a partitioning
key that is mostly constrained by a distinct value or value list.

d.To randomly distribute data to avoid I/O bottlenecks if you do not use
a storage management technique that stripes and mirrors across all available
devices.

The concepts of splitting, dropping or merging partitions do not apply to hash partitions. Instead, hash partitions can be added and coalesced and with hash partitioning, pruning is only possible using equality or IN-list predicates.

The number of partitions should always be a power of two (2, 4, 8, and so on)to obtain the most even data distribution.

Cardinality is important for choosing a key for hash partitioning. Oracle recommends that you do not use hash partitioning on key columns with low cardinality (columns in which the number of distinct values is small compared to
the number of rows in the table.)

The hash function works best with a LARGE number of values. Rows are mapped into partitions based on the hash value of the partitioning key. A primary key is a very good hash key because it has a wide range a values.

For example, if you had a hash partitioned table with 8 hash partitions, and a key column with only 10 distinct values, you could encounter the following:

CREATE TABLE dept (deptno NUMBER, deptname VARCHAR(32))
PARTITION BY HASH(deptno) PARTITIONS 8;

Partition DEPTNO Column Values
----------- ---------------------
hash_part1 1,2,3
hash_part2 4,5,6,
hash_part3 7
hash_part4 8
hash_part5
hash_part6
hash_part7 9
hash_part8 10

You would have a scenario where some of your hash partitions would contain a high number of distinct values and your hash partitions would be widely different sizes. It is even possible that some hash partitions could contain zero distinct values and others will contain three or more distinct values. One of the reasons to use hash partitioning is to have evenly sized partitions. That's why we recommend that you do not use hash partitioning when low cardinality is a factor.

In general, the hash-partitioning key should be unique or near unique. Since a table may contain several near-unique keys and thus several candidates for hash
partitioning, the other consideration for choosing the partitioning key is the access profile. Hash partitioning can improve the performance of single-key lookups.

For example:
SELECT * FROM emp WHERE empno = 123 <-----single key lookup SELECT * FROM emp WHERE sal > 5000 <------range lookup

If you create a local index for this table, the database constructs the index so that it is equipartitioned with the underlying table.

The database also ensures that the index is maintained automatically when maintenance operations are performed on the underlying table. The following is an example of creating a local index on the table dept: CREATE INDEX loc_dept_ix ON dept(deptno) LOCAL;

You can optionally name the hash partitions and tablespaces into which the local index partitions are to be stored, but if you do not do so, the database uses the name of the corresponding base partition as the index partition name, and stores the index partition in the same tablespace as the table partition.

Hash partitioned global indexes can improve the performance of indexes where a small number of leaf blocks in the index have high contention in multiuser OLTP
environments. Queries involving the equality and IN predicates on the index partitioning key can efficiently use hash-partitioned global indexes.

The syntax for creating a hash partitioned global index is similar to that used
for a hash partitioned table. For example, the following statement creates a
hash-partitioned global index:
CREATE INDEX hgidx ON tab (c1,c2,c3) GLOBAL
PARTITION BY HASH (c1,c2)
(PARTITION p1 TABLESPACE tbs_1,
PARTITION p2 TABLESPACE tbs_2,
PARTITION p3 TABLESPACE tbs_3,
PARTITION p4 TABLESPACE tbs_4);

No comments:

Post a Comment