Search This Blog

Saturday, January 30, 2010

11g feature: Reference Partitioning

Reference Partitioning

Reference partitioning allows the partitioning of a table based on the partitioning scheme of the table referenced in its referential constraint. The partitioning key is resolved through an existing parent-child relationship, enforced by active primary key or foreign key constraints. This means that tables with a parent-child relationship can be logically equi-partitioned by inheriting the partitioning key from the parent table without duplicating the key columns. The logical dependency also automatically cascades partition maintenance operations, making application development easier and less error-prone.

To create a reference-partitioned table, you specify a PARTITION BY REFERENCE clause in the CREATE TABLE statement. This clause specifies the name of a referential constraint and this constraint becomes the partitioning referential constraint that is used as the basis for reference partitioning in the table.

As with other partitioned tables, you can specify object-level default attributes, and can optionally specify partition descriptors that override the object-level defaults on a per-partition basis.
CREATE TABLE orders
(order_id NUMBER(12),
order_date DATE,
order_mode VARCHAR2(8),
customer_id NUMBER(6),
order_status NUMBER(2),
order_total NUMBER(8,2),
sales_rep_id NUMBER(6),
promotion_id NUMBER(6),
CONSTRAINT orders_pk PRIMARY KEY(order_id)
)
PARTITION BY RANGE(order_date)
(PARTITION Q1_2005 VALUES LESS THAN
(TO_DATE('01-APR-2005','DD-MON-YYYY')),
PARTITION Q2_2005 VALUES LESS THAN
(TO_DATE('01-JUL-2005','DD-MON-YYYY')),
PARTITION Q3_2005 VALUES LESS THAN
(TO_DATE('01-OCT-2005','DD-MON-YYYY')),
PARTITION Q4_2005 VALUES LESS THAN
(TO_DATE('01-JAN-2006','DD-MON-YYYY')));

The example in the slide above creates a list-partitioned table called ORDERS which is range-partitioned on order_date. It is created with four partitions, Q1_2005, Q2_2005, Q3_2005, and Q4_2005. This table will be referenced in the creation of a reference partitioned table on the next slide.
CREATE TABLE order_items
( order_id NUMBER(12) NOT NULL,
line_item_id NUMBER(3) NOT NULL,
product_id NUMBER(6) NOT NULL,
unit_price NUMBER(8,2),
quantity NUMBER(8),
CONSTRAINT order_items_fk
FOREIGN KEY(order_id) REFERENCES orders(order_id)
)
PARTITION BY REFERENCE(order_items_fk);

The reference-partitioned child table ORDER_ITEMS example above is created with four partitions, Q1_2005, Q2_2005, Q3_2005, and Q4_2005, where each partition contains the order_items rows corresponding to orders in the respective parent partition.


If partition descriptors are provided, then the number of partitions described must be exactly equal to the number of partitions or sub-partitions in the referenced table. If the parent table is a composite partitioned table, then the table will have one partition for each sub-partition of its parent; otherwise the table will have one partition for each partition of its parent.

Partition bounds cannot be specified for the partitions of a reference-partitioned table. The partitions of a reference-partitioned table can be named. If a partition is not explicitly named, then it will inherit its name from the corresponding partition in the parent table, unless this inherited name conflicts with one of the explicit names given. In this case, the partition will have a system-generated name.

Partitions of a reference-partitioned table will co-locate with the corresponding partition of the parent table, if no explicit tablespace is specified for the reference-partitioned table's partition.

Hash Partitioning/How Cardinality Affects Distribution

Hash partitioning provides a method of evenly distributing data across a specified number of partitions. Rows are mapped into partitions based on a hash value of the partitioning key and is a better choice than range partitioning when:

a.You do not know beforehand how much data will map into a given range

b.Sizes of range partitions would differ quite substantially

c.Partition pruning and partition-wise joins on a partitioning key are important

d.Range partitioning would cause the data to be undesirably clustered

In 11g hash partitioning can also be used:
a.To enable partial or full parallel partition-wise joins with very likely equi-sized partitions.

b.To distribute data evenly among the nodes of an MPP platform that uses Oracle Real Application Clusters. As a result, you can minimize interconnect traffic
when processing internode parallel statements.

c.To use partition pruning and partition-wise joins according to a partitioning
key that is mostly constrained by a distinct value or value list.

d.To randomly distribute data to avoid I/O bottlenecks if you do not use
a storage management technique that stripes and mirrors across all available
devices.

The concepts of splitting, dropping or merging partitions do not apply to hash partitions. Instead, hash partitions can be added and coalesced and with hash partitioning, pruning is only possible using equality or IN-list predicates.

The number of partitions should always be a power of two (2, 4, 8, and so on)to obtain the most even data distribution.

Cardinality is important for choosing a key for hash partitioning. Oracle recommends that you do not use hash partitioning on key columns with low cardinality (columns in which the number of distinct values is small compared to
the number of rows in the table.)

The hash function works best with a LARGE number of values. Rows are mapped into partitions based on the hash value of the partitioning key. A primary key is a very good hash key because it has a wide range a values.

For example, if you had a hash partitioned table with 8 hash partitions, and a key column with only 10 distinct values, you could encounter the following:

CREATE TABLE dept (deptno NUMBER, deptname VARCHAR(32))
PARTITION BY HASH(deptno) PARTITIONS 8;

Partition DEPTNO Column Values
----------- ---------------------
hash_part1 1,2,3
hash_part2 4,5,6,
hash_part3 7
hash_part4 8
hash_part5
hash_part6
hash_part7 9
hash_part8 10

You would have a scenario where some of your hash partitions would contain a high number of distinct values and your hash partitions would be widely different sizes. It is even possible that some hash partitions could contain zero distinct values and others will contain three or more distinct values. One of the reasons to use hash partitioning is to have evenly sized partitions. That's why we recommend that you do not use hash partitioning when low cardinality is a factor.

In general, the hash-partitioning key should be unique or near unique. Since a table may contain several near-unique keys and thus several candidates for hash
partitioning, the other consideration for choosing the partitioning key is the access profile. Hash partitioning can improve the performance of single-key lookups.

For example:
SELECT * FROM emp WHERE empno = 123 <-----single key lookup SELECT * FROM emp WHERE sal > 5000 <------range lookup

If you create a local index for this table, the database constructs the index so that it is equipartitioned with the underlying table.

The database also ensures that the index is maintained automatically when maintenance operations are performed on the underlying table. The following is an example of creating a local index on the table dept: CREATE INDEX loc_dept_ix ON dept(deptno) LOCAL;

You can optionally name the hash partitions and tablespaces into which the local index partitions are to be stored, but if you do not do so, the database uses the name of the corresponding base partition as the index partition name, and stores the index partition in the same tablespace as the table partition.

Hash partitioned global indexes can improve the performance of indexes where a small number of leaf blocks in the index have high contention in multiuser OLTP
environments. Queries involving the equality and IN predicates on the index partitioning key can efficiently use hash-partitioned global indexes.

The syntax for creating a hash partitioned global index is similar to that used
for a hash partitioned table. For example, the following statement creates a
hash-partitioned global index:
CREATE INDEX hgidx ON tab (c1,c2,c3) GLOBAL
PARTITION BY HASH (c1,c2)
(PARTITION p1 TABLESPACE tbs_1,
PARTITION p2 TABLESPACE tbs_2,
PARTITION p3 TABLESPACE tbs_3,
PARTITION p4 TABLESPACE tbs_4);

Sunday, November 29, 2009

Content management databases

Content Management databases:  There are good content management database in market and  used wide across several companies like yahoo, Facebook, Amazon.

Thrudb is a set of simple services built on top of the Facebook Apache Thrift framework that provides indexing and document storage services for building and scaling websites. Its purpose is to offer web developers flexible, fast and easy-to-use services that can enhance or replace traditional data storage and access layers.

Thrudb - High Level Features

Thrudb - Services


StrokeDB
Minimalistic modular database engine.
 
StrokeDB stores a set of versioned documents identified by UUID.
A document is a hash-like container of slots: flat set of values tagged with string keys.
Slots store arbitrary serializable values (most common types are booleans, 
numbers, strings, arrays, numbers, time).
 
The concept.

1. Every repository is identified by UUID.
2. Every repository writes a log of commits.
3. Commit tuples:
   (timestamp, "store", uuid, version)
   (timestamp, "pull",  repo_uuid, repo_timestamp)
4. When you pull from a repository:
   1. Find out the latest timestamp in your history (index: repo_uuid -> repo_timestamp)
   2. If there is not timestamp yet, pull the whole log.
   3. If there is a timestamp for a repository UUID, pull the tail of the log.
   4. For each "store" record: fetch the version.
   5. For each "pull" record - add to a deferred list of repositories waiting for update.
   6. When whole log is fetched, fetch deferred repositories. We have two options here:
     1. Fetch from the same repository we've been fetching from few moments ago (say, fetch B log from A)
     2. Or, fetch directly from the desired repository (B log from B repository)

 MongoDB

MongoDB is an open source document-oriented database written in the C++ programming language.

MongoDB is designed for problems without heavy transactional requirements that aren't easily solved by traditional RDBMSs, including problems which require the database to span many servers.

CouchDB

  • A document database server, accessible via a RESTful JSON API.
  • Ad-hoc and schema-free with a flat address space.
  • Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.
  • Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language.

Sunday, November 8, 2009

Oracle flash drive recommendations

Oracle flash drive recommendations


EFD-friendly
DB workloads


Not as
cost-effective on EFD


Random
reads

B-Tree
leaf access


ROWID
look up into Table

Access
to out-of-line LOB

Access
to overflowed row

Index
scan over Unclustered Table

Compression:
Increases I/O intensity (IOPS/GB)



Serial
reads Random writes

Row
update by PK

Index
maintenance

Reduce
checkpoint interval


TEMP:
Sort areas and Intermediate tables


Sequentially
read and written but I/O done in 1 MB units: not enough to amortize
seeks

Lower
Latency: Get In, Get Out



Redo log
files


Sequentially
read and written and commit latency already handled by cache in
storage controller


Undo
table space

Sequentially
written, randomly read by flashBack. But reads are
for recently written data that is likely to still be in the buffer cache



Large
table scans Buffer pools with lots of writes

Mismatch
between read and write latency characteristics of EFDs can cause unwanted
“Free Buffer Waits”. Buffer pool tuning is necessary after deploying EFDs



Reference from Oracle Openworld 2008 presentation



Redo Logs on EFDs? (or not)

It is a common misconception that Oracle online redo logs will benefit by moving them on EFDs, whereas all the experimental data indicates the opposite position. Testing has shown that moving redo logs on to EFDs results in a low percentage of improvement. It is better to leave them on the write cache backed Fibre Channel drives rather than moving them on to EFDs, thereby using EFDs for other read intensive parts of the database like indexes or data.

Oracle TEMP tablespace on EFDs

Oracle uses this space mainly for data aggregations and sorting. When the database engine cannot fit the sorts in memory, they will be spilled on to disk for storing intermediary results. Oracle typically does large sequential I/Os against these tablespaces in the context of single user. When multiple users are performing concurrent sorts on these tablespaces, the I/O turns out to be largely random in nature.  Even though EFDs do not provide as much benefit for large random I/O as they provide to small random operations, still they are far ahead of what regular rotation Fibre Channel drives can deliver. Depending on the availability of space on EFDs, Oracle applications will be benefited by moving the temporary tablespaces to them. Temporary tablespace files should only be moved to EFDs after all the I/O intensive parts have been moved to them.

Database workloads that are the best fit for EFDs

There are no simple, definitive rules that would readily identify applications that best suit the EFDs, but we can follow some guidelines. It is very important to understand the load profile of an application before putting it on the EFDs, taking into consideration the fact that most databases have different workload profiles during different times of the day. The EFDs are suitable for highly read intensive and extremely latency sensitive applications and using these drives against the wrong target may not yield the desired benefit for the investment. It is important to understand the following terminology to help with deciding whether the EFDs are suitable for certain workloads.

Write cache: Most of the storage systems have big write side cache and all write IOPS from a host are generally written to cache and incur no delay due to physical disk access. Storage arrays have write caches sized to match the disk count supported by the controller and support enabling and disabling write cache at the LUN level, if needed.

Read hit: A read request from a database host can be served by storage system immediately if it already exists in storage cache because of a recent read or write or due to prefetch. A read serviced from the storage cache without causing disk access is called a read hit. If the requested data is not available in storage cache, the array must retrieve it from disk; this is referred to as a read miss.

Short-stroked drives: Some extremely latency sensitive applications use this technique on regular Fibre Channel drives to obtain low latencies. This is a technique where data is laid out on many partially populated disks in order to reduce the spindle head movement to provide high IOPS at a very low latency.

Workloads with high cache read-hit rates are already serviced at memory access speed, and deploying them on flash drive technology may not show a significant benefit. Workloads with low cache read-hit rates that exhibit random I/O patterns, with small I/O requests of up to 16 KB, and that require high transaction throughput will benefit most from the low latency of EFDs.

Database and application managers can easily point to mission-critical applications that directly improve business revenue and productivity when business transaction throughput is increased, along with reduced service latencies. Cognizant of these applications, storage administrators would often resort to “short stroking” more drives to ensure the highest possible I/O service level supported for them. EFDs can provide two very important benefits for such applications.

A single EFD can replace many short-stroked drives by its ability to provide a very high transaction rate (IOPS). This reduces the total number of drives needed for the application, increases power saving by not having to keep many spinning disks, and may reduce floor space in the data center as well.

EFDs provide very low latency, so applications where predictable low response time is critical and not all the data can be kept at the host or storage cache may benefit from using such drives. Because of the absence of rotating media in EFDs, their transfer rate is extremely high and data can be served much faster than the best response time that can be achieved even with a large number of short-stroked hard drives.

Enterprise Flash Drives or Solid State Drives

Enterprise flash drives are designed to dramatically increase the performance of latency sensitive applications. Enterprise flash drives, also known as solid state drives (SSD), contain no moving parts and appear as standard drives to existing storage management tools, allowing administrators to manage Tier 0 without special processes or custom tools or extra training. Tier 0 EFDs are ideally suited for applications with high transaction rates and those requiring the fastest possible retrieval and storage of data, such as currency exchange and electronic trading systems, or real-time data acquisition and processing. They also can prove to be extremely good for highly read-intensive workloads like search engine databases.

The EFDs are designed to deliver millisecond application response times and up to 30 times more I/O operations per second (IOPS) than traditional Fibre Channel hard disk drives. Additionally, EFDs consume significantly less energy per IOPS than traditional hard disk drives, providing the opportunity for significantly increased TCO by reducing the data center energy and space footprints. Database performance has long been constrained by the I/O capability of hard disk drives (HDD), and the performance of the HDD has been limited by intrinsic mechanical delays of head seek and rotational latency. EFDs, however, have no moving parts and therefore no seek or rotational latency delays, which dramatically improves their ability to sustain very high number of IOPS with very low overall response times.

Over the past 25 years, the rotational speeds of HDDs have improved from 3,600 rpm to 15,000 rpm, yielding only four times the improvement in IOPS when the rest of the computer technologies like CPU speeds saw double digit growth. EFD technology represents a significant leap in performance and may sustain up to 30 times the IOPS of traditional HDD technology. Proper use of EFDs can deliver vastly increased performance to the database application when compared to traditional Fibre Channel drives, both in transaction rates per minute as well as transaction response time.

Sunday, October 25, 2009

Tuning SQL Queries

About 80 percent of database query performance problems can be solved by adjusting the SQL statement. However, you must understand how the particular DBMS being used processes SQL statements in order to know what to tweak.

The query optimizer is the software component in the RDBMS that analyzes an SQL statement to determine the best way to execute it. Most modern optimizers are cost-based, which means that they estimate the cost of all possible ways to execute a statement and then chose the one with the lowest cost.

Here are some considerations regarding query optimizers:

    * Order of table names: Does the order of the table names in the FROM or JOIN clause have any influence on the order in which tables are accessed when performing the joins? This is more likely the case with a rule-based optimizer. Ideally, the DBMS should access the most selective table (the one that will eliminate the most number of rows from the result set) first.

    * Order of search predicates Does the order of the predicates in the WHERE clause have any influence on the order in which the predicates are evaluated? Ideally, the most restrictive predicate (the one that eliminates the most number of rows) should be evaluated first.

    * Lack of statistics If a cost-based optimizer is being used, what does it do when statistics have not been collected for one or more tables? Some optimizers, such as Oracle's, revert back to rule-based, while others assume default values for the required statistics or simply refuse to use any indexes and do full table scans of all the tables. A full table scan is where the DBMS reads every row in the table to find the desired ones, which of course can be a performance disaster for tables with very large numbers of rows.

    * Query rewrites Are queries rewritten into more efficient forms by the optimizer? For example, many optimizers automatically rewrite sub selects into equivalent joins in order to simplify subsequent processing. In some cases, you may find that certain DBMS options must be enabled in order to allow the optimizer to rewrite queries.
    * View definition merges For queries that use views; at what point does the DBMS merge the view definition (the query that defines the view) into the SQL statement submitted by the database user? This has obvious implications for the optimizer -the sooner it can evaluate the entire SQL statement, the smarter its decision should be.
    * Other criteria What other criteria influence the optimizer? For example, some optimizers will favor unique indexes over non unique ones, and some will favor the use of an index to sequence rows over sorting the result set.

Here are some considerations regarding query design:

    * Know your data.  
When writing the SQL statement, you should have some idea of how many rows are in each table, how selective your WHERE predicates are, and how many rows you expect in the result set. The larger the number of rows involved, the more time you should spend thinking about the best way to write the SQL statement.
    * Minimize returned rows.  The fewer the rows in the result set, the more efficiently the query will run.
    * Avoid scans of large tables.  For tables over 1000 rows or so, scanning all the rows in the table instead of using an index can be expensive in terms of resources required. And, of course, the larger the table, the more expensive a table scans becomes. Full table scans occur in the following situations:
    * The query does not contain a WHERE clause to limit rows.
    * None of the columns referenced in the WHERE clause matches the leading column of an index on the table.
    * Index and table statistics have not been updated. Most RDBMS query optimizers use statistics to evaluate available indexes, and without statistics, a table scan may be seen as more efficient than using an index.
    * At least one column in the WHERE clause does match the first column of an available index, but the comparison used obviates the use of an index. These cases include the following:

          o Use of the NOT operator (for example, WHERE NOT CITY = 'New York'). In general, indexes can be used to find what is in a table, but cannot be used to find what is not in a table.
          o Use of the NOT EQUAL operator (for example, WHERE CITY <> 'New York').
          o Use of a wildcard in the first position of a comparison string (for example, WHERE CITY LIKE '%York%').
          o Use of an SQL function in the comparison (for example, WHERE UPPER(CITY) = 'NEW YORK').

    * Avoid unnecessary columns.  The wider the data in each row in the result set, the more disk space and memory that is required for intermediate operations such as sorts and to hold the result set.
    * Avoid unnecessary tables.  The fewer the tables, the more efficient the query.
    * Avoid sorts of large result sets.  Sorts are expensive, especially when the rows being sorted will not fit in memory. When the result set is expected to be very large, sorts should be avoided. Most optimizers will use an index if it can eliminate the need for a sort, but creating an index solely to avoid a sort of a large number of rows is probably not wise because of the overhead required to maintain the index.
    * Match data types in predicates.  Whether a predicate compares two column values as is done with joins, or a column value and a literal as is done when filtering rows, it is important for the data types to match. When the data types do not match, the DBMS must convert one of them before performing the comparison, and while the work to do this is relatively small, it mounts quickly when it has to be done for each row in a large table.
    * Use IN instead of OR when possible.  The IN operator can be rewritten as a JOIN, but the OR operator often requires multiple queries to be run with the results combined by the DBMS. The former is far more efficient.
    * Use GROUP BY instead of DISTINCT.  In most DBMSs, a GROUP BY is a more efficient way to eliminate duplicate rows compared with the DISTINCT keyword. The reason for this is that a GROUP BY invokes the sort required to find the duplicates earlier in the processing of the query, while a DISTINCT applies the sort as the very last step (applied to the final result set). The sooner the duplicate rows are eliminated, the more efficiently the remainder of the processing on that result set can be performed.
    * Use hints if you must.  Hints are special syntax that can be placed in the SQL statement to direct the optimizer to take certain actions, such as forcing the use of a particular index or a particular method to join tables. While this can be a very attractive option, it should only be used as a last resort because hints are not portable across database vendors, and they sometimes stop working when the DBMS software is upgraded to a newer version. The Oracle, MySQL, and Microsoft SQL Server optimizers all respond to hints, but the syntax accepted by each is different.
    * Temporary tables may help.  Temporary tables can help in some situations, such as assembling large result sets and then adding an index or two to support multiple subsequent queries against the temporary table. However, remember that you're doubling up the reads and writes when you do this because all the data selected from the original (base) tables must be written to the temporary table(s) and then read back from there. In short, there are no free lunches.
    * Views may help.  Views can help because they hide complex operations such as nested aggregate functions. And with DBMSs that don't have an SQL statement cache, views may process more efficiently than ordinary queries because the SQL statement that defines the view has already been parsed and optimized, which means this work does not have to be done every time the view is accessed. But above all, remember that views are also SQL queries, so they are subject to all the tuning considerations you apply to any SQL statement.