Search This Blog

Sunday, November 29, 2009

Content management databases

Content Management databases:  There are good content management database in market and  used wide across several companies like yahoo, Facebook, Amazon.

Thrudb is a set of simple services built on top of the Facebook Apache Thrift framework that provides indexing and document storage services for building and scaling websites. Its purpose is to offer web developers flexible, fast and easy-to-use services that can enhance or replace traditional data storage and access layers.

Thrudb - High Level Features

Thrudb - Services


StrokeDB
Minimalistic modular database engine.
 
StrokeDB stores a set of versioned documents identified by UUID.
A document is a hash-like container of slots: flat set of values tagged with string keys.
Slots store arbitrary serializable values (most common types are booleans, 
numbers, strings, arrays, numbers, time).
 
The concept.

1. Every repository is identified by UUID.
2. Every repository writes a log of commits.
3. Commit tuples:
   (timestamp, "store", uuid, version)
   (timestamp, "pull",  repo_uuid, repo_timestamp)
4. When you pull from a repository:
   1. Find out the latest timestamp in your history (index: repo_uuid -> repo_timestamp)
   2. If there is not timestamp yet, pull the whole log.
   3. If there is a timestamp for a repository UUID, pull the tail of the log.
   4. For each "store" record: fetch the version.
   5. For each "pull" record - add to a deferred list of repositories waiting for update.
   6. When whole log is fetched, fetch deferred repositories. We have two options here:
     1. Fetch from the same repository we've been fetching from few moments ago (say, fetch B log from A)
     2. Or, fetch directly from the desired repository (B log from B repository)

 MongoDB

MongoDB is an open source document-oriented database written in the C++ programming language.

MongoDB is designed for problems without heavy transactional requirements that aren't easily solved by traditional RDBMSs, including problems which require the database to span many servers.

CouchDB

  • A document database server, accessible via a RESTful JSON API.
  • Ad-hoc and schema-free with a flat address space.
  • Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.
  • Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language.

Sunday, November 8, 2009

Oracle flash drive recommendations

Oracle flash drive recommendations


EFD-friendly
DB workloads


Not as
cost-effective on EFD


Random
reads

B-Tree
leaf access


ROWID
look up into Table

Access
to out-of-line LOB

Access
to overflowed row

Index
scan over Unclustered Table

Compression:
Increases I/O intensity (IOPS/GB)



Serial
reads Random writes

Row
update by PK

Index
maintenance

Reduce
checkpoint interval


TEMP:
Sort areas and Intermediate tables


Sequentially
read and written but I/O done in 1 MB units: not enough to amortize
seeks

Lower
Latency: Get In, Get Out



Redo log
files


Sequentially
read and written and commit latency already handled by cache in
storage controller


Undo
table space

Sequentially
written, randomly read by flashBack. But reads are
for recently written data that is likely to still be in the buffer cache



Large
table scans Buffer pools with lots of writes

Mismatch
between read and write latency characteristics of EFDs can cause unwanted
“Free Buffer Waits”. Buffer pool tuning is necessary after deploying EFDs



Reference from Oracle Openworld 2008 presentation



Redo Logs on EFDs? (or not)

It is a common misconception that Oracle online redo logs will benefit by moving them on EFDs, whereas all the experimental data indicates the opposite position. Testing has shown that moving redo logs on to EFDs results in a low percentage of improvement. It is better to leave them on the write cache backed Fibre Channel drives rather than moving them on to EFDs, thereby using EFDs for other read intensive parts of the database like indexes or data.

Oracle TEMP tablespace on EFDs

Oracle uses this space mainly for data aggregations and sorting. When the database engine cannot fit the sorts in memory, they will be spilled on to disk for storing intermediary results. Oracle typically does large sequential I/Os against these tablespaces in the context of single user. When multiple users are performing concurrent sorts on these tablespaces, the I/O turns out to be largely random in nature.  Even though EFDs do not provide as much benefit for large random I/O as they provide to small random operations, still they are far ahead of what regular rotation Fibre Channel drives can deliver. Depending on the availability of space on EFDs, Oracle applications will be benefited by moving the temporary tablespaces to them. Temporary tablespace files should only be moved to EFDs after all the I/O intensive parts have been moved to them.

Database workloads that are the best fit for EFDs

There are no simple, definitive rules that would readily identify applications that best suit the EFDs, but we can follow some guidelines. It is very important to understand the load profile of an application before putting it on the EFDs, taking into consideration the fact that most databases have different workload profiles during different times of the day. The EFDs are suitable for highly read intensive and extremely latency sensitive applications and using these drives against the wrong target may not yield the desired benefit for the investment. It is important to understand the following terminology to help with deciding whether the EFDs are suitable for certain workloads.

Write cache: Most of the storage systems have big write side cache and all write IOPS from a host are generally written to cache and incur no delay due to physical disk access. Storage arrays have write caches sized to match the disk count supported by the controller and support enabling and disabling write cache at the LUN level, if needed.

Read hit: A read request from a database host can be served by storage system immediately if it already exists in storage cache because of a recent read or write or due to prefetch. A read serviced from the storage cache without causing disk access is called a read hit. If the requested data is not available in storage cache, the array must retrieve it from disk; this is referred to as a read miss.

Short-stroked drives: Some extremely latency sensitive applications use this technique on regular Fibre Channel drives to obtain low latencies. This is a technique where data is laid out on many partially populated disks in order to reduce the spindle head movement to provide high IOPS at a very low latency.

Workloads with high cache read-hit rates are already serviced at memory access speed, and deploying them on flash drive technology may not show a significant benefit. Workloads with low cache read-hit rates that exhibit random I/O patterns, with small I/O requests of up to 16 KB, and that require high transaction throughput will benefit most from the low latency of EFDs.

Database and application managers can easily point to mission-critical applications that directly improve business revenue and productivity when business transaction throughput is increased, along with reduced service latencies. Cognizant of these applications, storage administrators would often resort to “short stroking” more drives to ensure the highest possible I/O service level supported for them. EFDs can provide two very important benefits for such applications.

A single EFD can replace many short-stroked drives by its ability to provide a very high transaction rate (IOPS). This reduces the total number of drives needed for the application, increases power saving by not having to keep many spinning disks, and may reduce floor space in the data center as well.

EFDs provide very low latency, so applications where predictable low response time is critical and not all the data can be kept at the host or storage cache may benefit from using such drives. Because of the absence of rotating media in EFDs, their transfer rate is extremely high and data can be served much faster than the best response time that can be achieved even with a large number of short-stroked hard drives.

Enterprise Flash Drives or Solid State Drives

Enterprise flash drives are designed to dramatically increase the performance of latency sensitive applications. Enterprise flash drives, also known as solid state drives (SSD), contain no moving parts and appear as standard drives to existing storage management tools, allowing administrators to manage Tier 0 without special processes or custom tools or extra training. Tier 0 EFDs are ideally suited for applications with high transaction rates and those requiring the fastest possible retrieval and storage of data, such as currency exchange and electronic trading systems, or real-time data acquisition and processing. They also can prove to be extremely good for highly read-intensive workloads like search engine databases.

The EFDs are designed to deliver millisecond application response times and up to 30 times more I/O operations per second (IOPS) than traditional Fibre Channel hard disk drives. Additionally, EFDs consume significantly less energy per IOPS than traditional hard disk drives, providing the opportunity for significantly increased TCO by reducing the data center energy and space footprints. Database performance has long been constrained by the I/O capability of hard disk drives (HDD), and the performance of the HDD has been limited by intrinsic mechanical delays of head seek and rotational latency. EFDs, however, have no moving parts and therefore no seek or rotational latency delays, which dramatically improves their ability to sustain very high number of IOPS with very low overall response times.

Over the past 25 years, the rotational speeds of HDDs have improved from 3,600 rpm to 15,000 rpm, yielding only four times the improvement in IOPS when the rest of the computer technologies like CPU speeds saw double digit growth. EFD technology represents a significant leap in performance and may sustain up to 30 times the IOPS of traditional HDD technology. Proper use of EFDs can deliver vastly increased performance to the database application when compared to traditional Fibre Channel drives, both in transaction rates per minute as well as transaction response time.