Database Analyzer: 2009

Sunday, November 29, 2009

Content management databases

Content Management databases: There are good content management database in market and used wide across several companies like yahoo, Facebook, Amazon.

Thrudb is a set of simple services built on top of the Facebook Apache Thrift framework that provides indexing and document storage services for building and scaling websites. Its purpose is to offer web developers flexible, fast and easy-to-use services that can enhance or replace traditional data storage and access layers.

Thrudb - High Level Features

Thrudb - Services

Thrudoc - Document storage service
Thrudex - Indexing and search service
Thruqueue - Persistent message queue service
Throxy - Scaling service

StrokeDB
Minimalistic modular database engine.

StrokeDB stores a set of versioned documents identified by UUID.
A document is a hash-like container of slots: flat set of values tagged with string keys.
Slots store arbitrary serializable values (most common types are booleans, 
numbers, strings, arrays, numbers, time).

The concept.

1. Every repository is identified by UUID.
2. Every repository writes a log of commits.
3. Commit tuples:
   (timestamp, "store", uuid, version)
   (timestamp, "pull",  repo_uuid, repo_timestamp)
4. When you pull from a repository:
   1. Find out the latest timestamp in your history (index: repo_uuid -> repo_timestamp)
   2. If there is not timestamp yet, pull the whole log.
   3. If there is a timestamp for a repository UUID, pull the tail of the log.
   4. For each "store" record: fetch the version.
   5. For each "pull" record - add to a deferred list of repositories waiting for update.
   6. When whole log is fetched, fetch deferred repositories. We have two options here:
     1. Fetch from the same repository we've been fetching from few moments ago (say, fetch B log from A)
     2. Or, fetch directly from the desired repository (B log from B repository)

MongoDB

MongoDB is an open source document-oriented database written in the C++ programming language.

MongoDB is designed for problems without heavy transactional requirements that aren't easily solved by traditional RDBMSs, including problems which require the database to span many servers.

CouchDB

A document database server, accessible via a RESTful JSON API.
Ad-hoc and schema-free with a flat address space.
Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.
Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language.

Sunday, November 8, 2009

Oracle flash drive recommendations

Oracle flash drive recommendations

EFD-friendly
DB workloads

Not as
cost-effective on EFD

Random
reads

B-Tree
leaf access

ROWID
look up into Table

Access
to out-of-line LOB

Access
to overflowed row

Index
scan over Unclustered Table

Compression:
Increases I/O intensity (IOPS/GB)

Serial
reads Random writes

Row
update by PK

Index
maintenance

Reduce
checkpoint interval

TEMP:
Sort areas and Intermediate tables

Sequentially
read and written but I/O done in 1 MB units: not enough to amortize
seeks

Lower
Latency: Get In, Get Out

Redo log
files

Sequentially
read and written and commit latency already handled by cache in
storage controller

Undo
table space

Sequentially
written, randomly read by flashBack. But reads are
for recently written data that is likely to still be in the buffer cache

Large
table scans Buffer pools with lots of writes

Mismatch
between read and write latency characteristics of EFDs can cause unwanted
“Free Buffer Waits”. Buffer pool tuning is necessary after deploying EFDs

Reference from Oracle Openworld 2008 presentation

Redo Logs on EFDs? (or not)

It is a common misconception that Oracle online redo logs will benefit by moving them on EFDs, whereas all the experimental data indicates the opposite position. Testing has shown that moving redo logs on to EFDs results in a low percentage of improvement. It is better to leave them on the write cache backed Fibre Channel drives rather than moving them on to EFDs, thereby using EFDs for other read intensive parts of the database like indexes or data.

Oracle TEMP tablespace on EFDs

Oracle uses this space mainly for data aggregations and sorting. When the database engine cannot fit the sorts in memory, they will be spilled on to disk for storing intermediary results. Oracle typically does large sequential I/Os against these tablespaces in the context of single user. When multiple users are performing concurrent sorts on these tablespaces, the I/O turns out to be largely random in nature. Even though EFDs do not provide as much benefit for large random I/O as they provide to small random operations, still they are far ahead of what regular rotation Fibre Channel drives can deliver. Depending on the availability of space on EFDs, Oracle applications will be benefited by moving the temporary tablespaces to them. Temporary tablespace files should only be moved to EFDs after all the I/O intensive parts have been moved to them.

Database workloads that are the best fit for EFDs

There are no simple, definitive rules that would readily identify applications that best suit the EFDs, but we can follow some guidelines. It is very important to understand the load profile of an application before putting it on the EFDs, taking into consideration the fact that most databases have different workload profiles during different times of the day. The EFDs are suitable for highly read intensive and extremely latency sensitive applications and using these drives against the wrong target may not yield the desired benefit for the investment. It is important to understand the following terminology to help with deciding whether the EFDs are suitable for certain workloads.

Write cache: Most of the storage systems have big write side cache and all write IOPS from a host are generally written to cache and incur no delay due to physical disk access. Storage arrays have write caches sized to match the disk count supported by the controller and support enabling and disabling write cache at the LUN level, if needed.

Read hit: A read request from a database host can be served by storage system immediately if it already exists in storage cache because of a recent read or write or due to prefetch. A read serviced from the storage cache without causing disk access is called a read hit. If the requested data is not available in storage cache, the array must retrieve it from disk; this is referred to as a read miss.

Short-stroked drives: Some extremely latency sensitive applications use this technique on regular Fibre Channel drives to obtain low latencies. This is a technique where data is laid out on many partially populated disks in order to reduce the spindle head movement to provide high IOPS at a very low latency.

Workloads with high cache read-hit rates are already serviced at memory access speed, and deploying them on flash drive technology may not show a significant benefit. Workloads with low cache read-hit rates that exhibit random I/O patterns, with small I/O requests of up to 16 KB, and that require high transaction throughput will benefit most from the low latency of EFDs.

Database and application managers can easily point to mission-critical applications that directly improve business revenue and productivity when business transaction throughput is increased, along with reduced service latencies. Cognizant of these applications, storage administrators would often resort to “short stroking” more drives to ensure the highest possible I/O service level supported for them. EFDs can provide two very important benefits for such applications.

A single EFD can replace many short-stroked drives by its ability to provide a very high transaction rate (IOPS). This reduces the total number of drives needed for the application, increases power saving by not having to keep many spinning disks, and may reduce floor space in the data center as well.

EFDs provide very low latency, so applications where predictable low response time is critical and not all the data can be kept at the host or storage cache may benefit from using such drives. Because of the absence of rotating media in EFDs, their transfer rate is extremely high and data can be served much faster than the best response time that can be achieved even with a large number of short-stroked hard drives.

Enterprise Flash Drives or Solid State Drives

Enterprise flash drives are designed to dramatically increase the performance of latency sensitive applications. Enterprise flash drives, also known as solid state drives (SSD), contain no moving parts and appear as standard drives to existing storage management tools, allowing administrators to manage Tier 0 without special processes or custom tools or extra training. Tier 0 EFDs are ideally suited for applications with high transaction rates and those requiring the fastest possible retrieval and storage of data, such as currency exchange and electronic trading systems, or real-time data acquisition and processing. They also can prove to be extremely good for highly read-intensive workloads like search engine databases.

The EFDs are designed to deliver millisecond application response times and up to 30 times more I/O operations per second (IOPS) than traditional Fibre Channel hard disk drives. Additionally, EFDs consume significantly less energy per IOPS than traditional hard disk drives, providing the opportunity for significantly increased TCO by reducing the data center energy and space footprints. Database performance has long been constrained by the I/O capability of hard disk drives (HDD), and the performance of the HDD has been limited by intrinsic mechanical delays of head seek and rotational latency. EFDs, however, have no moving parts and therefore no seek or rotational latency delays, which dramatically improves their ability to sustain very high number of IOPS with very low overall response times.

Over the past 25 years, the rotational speeds of HDDs have improved from 3,600 rpm to 15,000 rpm, yielding only four times the improvement in IOPS when the rest of the computer technologies like CPU speeds saw double digit growth. EFD technology represents a significant leap in performance and may sustain up to 30 times the IOPS of traditional HDD technology. Proper use of EFDs can deliver vastly increased performance to the database application when compared to traditional Fibre Channel drives, both in transaction rates per minute as well as transaction response time.

Sunday, October 25, 2009

Tuning SQL Queries

About 80 percent of database query performance problems can be solved by adjusting the SQL statement. However, you must understand how the particular DBMS being used processes SQL statements in order to know what to tweak.

The query optimizer is the software component in the RDBMS that analyzes an SQL statement to determine the best way to execute it. Most modern optimizers are cost-based, which means that they estimate the cost of all possible ways to execute a statement and then chose the one with the lowest cost.

Here are some considerations regarding query optimizers:

    * Order of table names: Does the order of the table names in the FROM or JOIN clause have any influence on the order in which tables are accessed when performing the joins? This is more likely the case with a rule-based optimizer. Ideally, the DBMS should access the most selective table (the one that will eliminate the most number of rows from the result set) first.

    * Order of search predicates Does the order of the predicates in the WHERE clause have any influence on the order in which the predicates are evaluated? Ideally, the most restrictive predicate (the one that eliminates the most number of rows) should be evaluated first.

    * Lack of statistics If a cost-based optimizer is being used, what does it do when statistics have not been collected for one or more tables? Some optimizers, such as Oracle's, revert back to rule-based, while others assume default values for the required statistics or simply refuse to use any indexes and do full table scans of all the tables. A full table scan is where the DBMS reads every row in the table to find the desired ones, which of course can be a performance disaster for tables with very large numbers of rows.

    * Query rewrites Are queries rewritten into more efficient forms by the optimizer? For example, many optimizers automatically rewrite sub selects into equivalent joins in order to simplify subsequent processing. In some cases, you may find that certain DBMS options must be enabled in order to allow the optimizer to rewrite queries.
    * View definition merges For queries that use views; at what point does the DBMS merge the view definition (the query that defines the view) into the SQL statement submitted by the database user? This has obvious implications for the optimizer -the sooner it can evaluate the entire SQL statement, the smarter its decision should be.
    * Other criteria What other criteria influence the optimizer? For example, some optimizers will favor unique indexes over non unique ones, and some will favor the use of an index to sequence rows over sorting the result set.

Here are some considerations regarding query design:

    * Know your data.
When writing the SQL statement, you should have some idea of how many rows are in each table, how selective your WHERE predicates are, and how many rows you expect in the result set. The larger the number of rows involved, the more time you should spend thinking about the best way to write the SQL statement.
    * Minimize returned rows. The fewer the rows in the result set, the more efficiently the query will run.
    * Avoid scans of large tables. For tables over 1000 rows or so, scanning all the rows in the table instead of using an index can be expensive in terms of resources required. And, of course, the larger the table, the more expensive a table scans becomes. Full table scans occur in the following situations:
    * The query does not contain a WHERE clause to limit rows.
    * None of the columns referenced in the WHERE clause matches the leading column of an index on the table.
    * Index and table statistics have not been updated. Most RDBMS query optimizers use statistics to evaluate available indexes, and without statistics, a table scan may be seen as more efficient than using an index.
    * At least one column in the WHERE clause does match the first column of an available index, but the comparison used obviates the use of an index. These cases include the following:

          o Use of the NOT operator (for example, WHERE NOT CITY = 'New York'). In general, indexes can be used to find what is in a table, but cannot be used to find what is not in a table.
          o Use of the NOT EQUAL operator (for example, WHERE CITY <> 'New York').
          o Use of a wildcard in the first position of a comparison string (for example, WHERE CITY LIKE '%York%').
          o Use of an SQL function in the comparison (for example, WHERE UPPER(CITY) = 'NEW YORK').

    * Avoid unnecessary columns. The wider the data in each row in the result set, the more disk space and memory that is required for intermediate operations such as sorts and to hold the result set.
    * Avoid unnecessary tables. The fewer the tables, the more efficient the query.
    * Avoid sorts of large result sets. Sorts are expensive, especially when the rows being sorted will not fit in memory. When the result set is expected to be very large, sorts should be avoided. Most optimizers will use an index if it can eliminate the need for a sort, but creating an index solely to avoid a sort of a large number of rows is probably not wise because of the overhead required to maintain the index.
    * Match data types in predicates. Whether a predicate compares two column values as is done with joins, or a column value and a literal as is done when filtering rows, it is important for the data types to match. When the data types do not match, the DBMS must convert one of them before performing the comparison, and while the work to do this is relatively small, it mounts quickly when it has to be done for each row in a large table.
    * Use IN instead of OR when possible. The IN operator can be rewritten as a JOIN, but the OR operator often requires multiple queries to be run with the results combined by the DBMS. The former is far more efficient.
    * Use GROUP BY instead of DISTINCT. In most DBMSs, a GROUP BY is a more efficient way to eliminate duplicate rows compared with the DISTINCT keyword. The reason for this is that a GROUP BY invokes the sort required to find the duplicates earlier in the processing of the query, while a DISTINCT applies the sort as the very last step (applied to the ﬁnal result set). The sooner the duplicate rows are eliminated, the more efficiently the remainder of the processing on that result set can be performed.
    * Use hints if you must. Hints are special syntax that can be placed in the SQL statement to direct the optimizer to take certain actions, such as forcing the use of a particular index or a particular method to join tables. While this can be a very attractive option, it should only be used as a last resort because hints are not portable across database vendors, and they sometimes stop working when the DBMS software is upgraded to a newer version. The Oracle, MySQL, and Microsoft SQL Server optimizers all respond to hints, but the syntax accepted by each is different.
    * Temporary tables may help. Temporary tables can help in some situations, such as assembling large result sets and then adding an index or two to support multiple subsequent queries against the temporary table. However, remember that you're doubling up the reads and writes when you do this because all the data selected from the original (base) tables must be written to the temporary table(s) and then read back from there. In short, there are no free lunches.
    * Views may help. Views can help because they hide complex operations such as nested aggregate functions. And with DBMSs that don't have an SQL statement cache, views may process more efficiently than ordinary queries because the SQL statement that defines the view has already been parsed and optimized, which means this work does not have to be done every time the view is accessed. But above all, remember that views are also SQL queries, so they are subject to all the tuning considerations you apply to any SQL statement.

Postgres monitoring 8.4

I like the monitoring enhancement of postgres8.4 The following two are very useful if you are debugging the performance issue.

Pg_stat_statements can provide you all the queries and response time.

pg_stat_statements (contrib module)
- Allows real-time monitoring of summary query statistics, letting you see at a glance which of your normalized queries is executing the most often and using the most system time. pg_stat_statement will speed up the process of "bad query" troubleshooting by an order of magnitude.
Auto Explain is good if you are debugging application performance. Just enable for few hours and get SQL performance and turn it off.
auto_explain (contrib module)
- Lets you automatically log select EXPLAIN plans to the PostgreSQL log for later analysis. Will help a great deal with troubleshooting slow stored procedures, as well as queries which perform differently between production and testing. Also allows logging of EXPLAIN plans for nested statements, that is, statements executed from within a function.

Sunday, October 4, 2009

MySQL vs Postgres

We have two good matured open source databases. One is MYSQL and other is POSTGRES.

Both systems have much to offer in terms of stability, flexibility, and performance. MySQL has features that PostgreSQL lacks, and vice versa. However, my primary focus is to help you determine which of the two databases to use in your own development.

For example, some of the features I use are foreign key references, triggers, and views. They allow me to hide the complexity of the database from the application, thus avoiding the creation of complicated SQL commands. I know many developers who prefer the rich functionality of PostgreSQL’s SQL commands. One of the most notable differences between MySQL and PostgreSQL is the fact that you can’t do nested subqueries of subselects in MySQL. PostgreSQL follows many of the SQL ANSI standards, thus allowing the creation of complex SQL commands.

                                         POSTGRESQL                        MYSQL
ANSI SQL compliance     Closer to ANSI SQL Follows some of the ANSI
Performance                                   Slower                 Faster
Sub-selects                                      Yes                    No
Transactions                                    Yes                     Yes, however InnoDB table type must be used
Database replication                        Yes                     Yes
Foreign key support                        Yes                      No
Views                                             Yes                      No
Stored procedures                          Yes                      No
Triggers                                          Yes                      No
Unions                                            Yes                      No
Full joins                                         Yes                      No
Constraints                                     Yes                       No
Windows support                           Yes                       Yes
Vacuum (cleanup)                          Yes                        No
ODBC                                          Yes                        Yes
JDBC                                           Yes                         Yes
Different table types                      No                          Yes

Let me suggest some reasons for using PostgreSQL over MySQL:

* Complex database design
* Moving away from Oracle, Sybase, or MSSQL
* Complex rule sets (i.e., business rules)
* Use of procedural languages on the server
* Transactions
* Use of stored procedures
* Use of geographical data
* R-Trees (i.e., used on indexes)

Let me suggest some reasons for using MySQL over PostgreSQL:

* MySQL is relatively faster than PostgreSQL.
* Database design will be simpler.
* You can create a basic Web-driven Web site.
* MySQL’s replication has been thoroughly tested.
* There’s no need for cleanups in MySQL (Vacuum).

I think now you can make your decision easier. If you are having complex system and need good features of database obviously POSTGRESQL is good choice and if your system is simple and needs fast response time first choice is MYSQL.

Sunday, September 27, 2009

Oracle Database 11g Limitations

Eveyone wants know what's limitation of the oralce 11g database

Here is some information for you to chew.

CHAR
Maximum size: 2000 bytes

CHAR VARYING
Maximum size: 4000 bytes

CLOB
Maximum size: (4 GB - 1) * DB_BLOCK_SIZE initialization parameter (8 TB to 128 TB).The number of LOB columns per table is limited only by the maximum number of columns per table (that is,10001).

Literals (characters or numbers in SQL or PL/SQL)
Maximum size: 4000 characters

LONG
Maximum size: 2 GB - 1 .Only one LONG column is allowed per table.

DATE
Minimum: -4712 (January 1 4712 BC)

NCHAR
Maximum size: 2000 bytes

NCHAR VARYING
Maximum size: 4000 bytes

NCLOB
Maximum size: (4 GB - 1) * DB_BLOCK_SIZE initialization parameter (8 TB to 128 TB).The number of LOB columns per table is limited only by the maximum number of columns per table (that is,10001).

NUMBER
999...(38 9's) x10125 maximum value.Can be represented to full 38-digit precision (the mantissa)
-999...(38 9's) x10125 minimumvalue.Can be represented to full 38-digit precision (the mantissa)

Precision
38 significant digits

RAW
Maximum size: 2000 bytes

VARCHAR
Maximum size: 4000 bytes

VARCHAR2
Maximum size: 4000 bytes

Columns
Type Of Limit : Per table
Limit Value : 1000 columns maximum
Type Of Limit : Per index (or clustered index)
Limit Value : 32 columns maximum
Type Of Limit : Per bitmapped index
Limit Value : 30 columns maximum

BFILE
Maximum size: 4 GB
Maximum size of a file name:255 characters
Maximum File Name 30 characters
Maximum number of open BFILEs: The maximum number of BFILEs is limited by the value of the SESSION_MAX_OPEN_FILES initialization parameter, which is itself limited by the maximum number of open files the operating system will allow.

BLOB
Maximum size: (4 GB - 1) * DB_BLOCK_SIZE initialization parameter (8 TB to 128 TB).The number of LOB columns per table is limited only by the maximum number of columns per table (that is,10001).

Constraints
Type Of Limit : Maximum per column
Limit Value : Unlimited

Subqueries
Type Of Limit : Maximum levels of subqueries in a SQL statement
Limit Value : Unlimited in the FROM clause of the top-level query ;255 subqueries in the WHERE clause

Partitions
Type Of Limit : Maximum length of linear partitioning key
Limit Value : 4 KB - overhead
Type Of Limit : Maximum number of columns in partition key
Limit Value : 16 columns
Type Of Limit : Maximum number of partitions allowed per table or index
Limit Value : 1024K - 1

Rows
Type Of Limit : Maximum number per table
Limit Value : Unlimited

Stored Packages
Type Of Limit : Maximum size
Limit Value : PL/SQL and Developer/2000 may have limits on the size of stored procedures they can call. The limits typically range from 2000 to 3000 lines of code.

Trigger Cascade Limit
Type Of Limit : Maximum value
Limit Value : Operating system-dependent, typically 32

Users and Roles
Type Of Limit : Maximum
Limit Value : 2,147,483,638

Tables
Type Of Limit :Maximum per clustered table
Limit Value : 32 tables
Type Of Limit : Maximum per database
Limit Value : Unlimited

Locks
Type Of Limit : Row-level
Limit Value : Unlimited

Locks
Type Of Limit : Distributed Lock Manager
Limit Value : Operating system dependent

Advanced Queuing Processes
Type Of Limit : Maximum per instance
Limit Value : 10

Job Queue Processes
Type Of Limit : Maximum per instance
Limit Value : 1000

I/O Slave Processes
Type Of Limit : Maximum per background process (DBWR, LGWR, etc.)
Limit Value : 15

I/O Slave Processes
Type Of Limit : Maximum per Backup session
Limit Value : 15

Sessions
Type Of Limit : Maximum per instance
Limit Value : 32 KB; limited by the PROCESSES and SESSIONS initialization parameters

Global Cache Service Processes
Type Of Limit : Maximum per instance
Limit Value : 10

Shared Servers
Type Of Limit : Maximum per instance
Limit Value : Unlimited within constraints set by the PROCESSES and SESSIONS initialization parameters, for instance.

Dispatchers
Type Of Limit : Maximum per instance
Limit Value : Unlimited within constraints set by PROCESSES and SESSIONS initialization parameters, for instance.

Parallel Execution Slaves
Type Of Limit : Maximum per instance
Limit Value : Unlimited within constraints set by ROCESSES and SESSIONS initialization parameters, for instance.

Backup Sessions
Type Of Limit : Maximum per instance
Limit Value : Unlimited within constraints set by PROCESSES and SESSIONS initialization parameters, for instance.

GROUP BY clause
Type Of Limit : Maximum length
Limit Value : The GROUP BY expression and all of the nondistinct aggregate functions (for example, SUM, AVG) must fit within a single database block.

Indexes
Type Of Limit : Maximum per table
Limit Value : Unlimited
Type Of Limit : Total size of indexed column
Limit Value : 75% of the database block size minus some overhead

Database Blocks
Type Of Limit : Minimum in initial extent of a segment
Limit Value : 2 blocks
Type Of Limit :Maximum per datafile
Limit Value: Platform dependent; typically 222 - 1 blocks

Controlfiles
Limit Type : Number of control files
Limit Value : 1 minimum; 2 or more (on separate devices) strongly recommended
Limit Type : Size of a control file
Limit Value : Dependent on operating system and database creation options; maximum of 20,000 x (database block size)

Database files
Limit Type : Maximum per tablespace
Limit Value : Operating system dependent; usually 1022
Limit Type: Maximum per database
Limit Value : 65533;May be less on some operating systems.Limited also by size of database blocks and by the DB_FILES initialization parameter for a particular instance.

Database extents
Type OF Limit : Maximum per dictionary managed tablespace
Limit Value : 4 GB * physical block size (with K/M modifier); 4 GB (without K/M modifier)
Type Of Limit : Maximum per locally managed (uniform) tablespace
Limit Value : 2 GB * physical block size (with K/M modifier); 2 GB (without K/M modifier)

Database file size
Type Of Limit : Maximum
Limit Value : Operating system dependent. Limited by maximum operating system file size; typically 222 or 4 MB blocks

MAXEXTENTS
Type Of Limit : Default value
Limit Value : Derived from tablespace default storage or DB_BLOCK_SIZE initialization parameter
Type Of Limit : Maximum
Limit Value : Unlimited

Redo Log Files
Type Of Limit : Maximum number of logfiles
Limit Value : Limited by value of MAXLOGFILES parameter in the CREATE DATABASE statement .Control file can be resized to allow more entries; ultimately an operating system limit
Type Of Limit : Maximum number of logfiles per group
Limit Value : Unlimited

Redo Log File Size
Type Of Limit : Minimum size
Limit Value : 4 MB
Type Of Limit : Maximum Size
Limit Value : Operating system limit; typically 2 GB

Tablespaces
Type Of Limit : Maximum number per database
Limit Value : 64 K;Number of tablespaces cannot exceed the number of database files because each tablespace must include at least one file

Bigfile Tablespaces
Type Of Limit : Number of blocks
Limit Value : A bigfile tablespace contains only one datafile or tempfile, which can contain up to approximately 4 billion ( 232 ) blocks. The maximum size of the single datafile or tempfile is 128 terabytes (TB) for a tablespace with 32 K blocks and 32 TB for a tablespace with 8 K blocks.

Smallfile (traditional) Tablespaces
Type Of Limit : Number of blocks
Limit Value : A smallfile tablespace is a traditional Oracle tablespace, which can contain 1022 datafiles or tempfiles, each of which can contain up to approximately 4 million (222) blocks.

External Tables file
Type Of Limit : Maximum size
Limit Value : Dependent on the operating system.An external table can be composed of multiple files.

Database Analyzer: DB comparison

Database cost more and making right decision for our requirements is very important. Databases continue to expand, and costs for database software, storage systems and servers to support them continue to increase. Overall growth in data and storage volumes is routinely in the range of 30 to 50 percent per year, and may be higher in individual organizations.

Organizations must deal not only with data growth, but also with increasing database complexity, more
frequent changes and updates, and the need to deliver new applications to increasingly diverse user groups.

This blog examines the option of databases in the industry and the features.

There are several database vendors in the industry and oracle, DB2 and SQL Server are the major database market share holders. We can't rule out Sybase and Teradata also in the list. There are open source database MySQL and Ingres are famous among them.

Let's look at what's latest stable version of our five top database vendors.

============================================
Database Vendor                 Latest Stable Version

Oracle                                 Oracle 11g R2

DB2                                    DB2 9.7

SQL SERVER                    SQL SERVER 2008 (V10.0)

Sybase                                Sybase 11.0

Teradata                              TeraData V12

============================================

What are the operating systems our top 5 supports:

Oracle 11g supports --- Windows, Linux(highly installed and developed), Unix and Mac O/S X

DB2 Supports --- Windows, Linux and Unix. No MAC.

SQL SERVER --- Windows ONLY

Sybase --- Windows, Mac OS X, Linux and Unix

TeraData --- Windows, Linux and Unix.

Oracle database 11g also runs on OpenVMS, HP/UX and AIX. Mac OS X is limited to 10gR2.

All our top 5 satisfies the fundamental features.

I always like to know the limitations of the product. Every product has its own limitations. Let's look at our top 5 DB's limitations.

Oracle

Max DB size Unlimited (4 GB * block size per tablespace)
Max table size 4 GB * block size (with BIGFILE tablespace)
Max row size 8kB
Max columns per row 1000
Max Blob/Clob size unlimited
Max CHAR size 4000 B
Max NUMBER size 126 bits
Min DATE value -4712 (January 1 4712 BC)
Max DATE value 9999
Max column name size 30

DB2

Max DB size 512 TB (512 TiB)
Max table size 512 TB
Max row size 32,677 B
Max columns per row 1012
Max Blob/Clob size 2 GB
Max CHAR size 32 KB (32 KiB)
Max NUMBER size 64 bits
Min DATE value 0001
Max DATE value 9999
Max column name size 128

SQL SERVER

Max DB size 524,258 TB (32,767 files * 16 TB max file size)
Max table size 524,258 TB
Max row size unlimited
Max columns per row 30000
Max Blob/Clob size 2 GB
Max CHAR size 2 GB
Max NUMBER size 126 bits
Min DATE value 0001
Max DATE value 9999
Max column name size 128

TERADATA

Max DB size unlimited
Max table size unlimited
Max row size 64 KB wo/lobs (64 GB w/lobs)
Max columns per row 2048
Max Blob/Clob size 2 GB
Max CHAR size 10,000
Max NUMBER size 64 bits
Min DATE value Not defined
Max DATE value 9999
Max column name size Not defined.

SYBASE

Max DB size 104 TB (13 files, each file up to 8 TB (32k pages))
Max table size Limited by file size
Max row size Limited by file size
Max columns per row 45000
Max Blob/Clob size 2 GB
Max CHAR size 2 GB
Max NUMBER size 64 bits
Min DATE value 0001
Max DATE value 9999
Max column name size Not defined.

Conclusion:

ORACLE 11g has rich functionality and stable for business. Oracle 11g early adapters are high because it provides wonderful tool to find out your performance and behavior head of time. Some of the cool features are database vault, Bind variable fix, OEM Adviser for partition table and partition key.

DB2 catching up with oracle functionality and IBM huge customer base is helping lot for DB2 installation. I like the IBM data compression and it's huge storage saving for company. The level of compression that may be realized is increased in DB2 9.7 with the addition of new index and temporary space compression capabilities, while a number of functions improve performance.

SQL SERVER is good if you are windows shop. Definitely you can save some money if you shop at one vendor. Its really good for small and medium businesses. SQL SERVER's some of the cool features PowerShell,Policy-Based Management,Filtered indexes,Advanced sparse, columns,Multithreaded partition access,Column-prefix,compression,Module signing using certificates and SQL Server Data Services.

SYBASE database is easy to maintain and has all the fundamental features. If you like SQL SERVER and you want to run it in Linux or Unix then your option is SYBASE.

TERADATA handles huge volume of data and wonderful full table scan facility. If you are running or growing data warehouse system with complex BI reports then you should take a look at Teredata. Now Teradata is attached with big company names.

I will cover the cool features in detail in my future blogs.

Database Analyzer

Search This Blog