Search This Blog

Sunday, June 19, 2011

MYSQL master / Slave setup and rebuilding steps

MYSQL  Replication

MySQL Replication, a solution suggested by our resident database guru, is the process of setting up a master/slave relationship between two database servers, where the slave continually and automatically gets fed all MySQL queries from the master machine, resulting in a replication of data across the two machines.

Once replication has been configured and activated, the slave machine will act as a live MySQL database, holding an up-to-the-minute snapshot of all data from the master machine.

What we love about this solution is that it leaves no "holes". No potential for data loss and no need to ever take the database down.

Once the slave machine is ready, with the flip of a switch, we could redirect all queries to the new machine, terminate the master/slave relationship and we'll be up and running with the new machine.

Here's how the actual implementation went:

Step 1: Set up an account on the master server that the slave server can use to connect. This account must be given the REPLICATION SLAVE privilege.


mysql> use mysql;
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%.mydomain.com' IDENTIFIED BY 'slavepass';

Step 2: Turn on binary logging of MySQL transactions. The server cannot act as a replication master unless binary logging is enabled. Locate the my.cnf file on the master database server machine and add these two lines to the file, under the [mysqld] section:

[mysqld]
log-bin=mysql-bin
server-id=1

If those options were not present (we had them), you will have to restart the server.

Step 3: Flush all the tables and block write statements by executing a FLUSH TABLES WITH READ LOCK statement - this is important so that we can safely copy the database files, with no fear of the files being changed during the copy operation. Make sure you keep the shell window where you issue the command below open throughout the process, until the tar finishes. Once this window closes, the lock will be automatically released.

mysql> SET GLOBAL WAIT_TIMEOUT=600000; SET WAIT_TIMEOUT = 600000; FLUSH TABLES WITH READ LOCK;


Step 4: Take a snapshot of all data on the master database server. The easiest way to create a snapshot is to use tar. (Make sure you have sufficient storage space on your master server to hold the tar. If unsure, calculate the current size of your database and plan for an additional 70% of available space)

cd /var/db/mysql
tar -cvf /tmp/mysql-snapshot.tar ./ --exclude mysql &

On our system this operation took a little over four hours to complete. Notice the "&" operand - this causes tar to run as a background process. Useful if your shell gets disconnected. You'll know it's done by periodically viewing the process-list using "ps -ax"

Step 5: While the read lock placed by FLUSH TABLES WITH READ LOCK is in effect, read the value of the current binary log name and offset on the master:

mysql > SHOW MASTER STATUS;
+---------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+---------------+----------+--------------+------------------+
| mysql-bin.003 | 73 | test | manual,mysql |
+---------------+----------+--------------+------------------+

The File column shows the name of the log and Position shows the offset within the file. In this example, the binary log file is mysql-bin.003 and the offset is 73. Record these values. You need them later when you are setting up the slave. They represent the replication coordinates at which the slave should begin processing new updates from the master.

Step 6: Now that you have taken the snapshot and recorded the log name and offset, you can re-enable write activity on the master:

mysql> UNLOCK TABLES;


Step 7: Hop on the new database server machine (slave). Shutdown the database, then locate the my.cnf file and add these lines to the file:

[mysqld]
server-id=2

Step 8: Copy and extract master database snapshot to the slave server under /var/db

cd /var/db

ftp
open masterdb.com
cd /tmp
get mysql-snapshot.tar
exit

tar --extract --file=mysql-snapshot.tar

Step 9: Start the slave database and issue these commands, replacing the option values with the actual values relevant to your system:

mysql> CHANGE MASTER TO
MASTER_HOST='master_host_name',
MASTER_USER='replication_user_name',
MASTER_PASSWORD='replication_password',
MASTER_LOG_FILE='recorded_log_file_name',
MASTER_LOG_POS=recorded_log_position;

Step 10: Start the slave

mysql> START SLAVE;

If everything went well, your slave database will now connect to the master database, downloading all transactions following the time you took the snapshot. From this moment on, the slave database will continually feed itself with data from the master database, always staying in sync.

Confirm the slave server is up-to-date by running a:

mysql> SHOW SLAVE STATUS;

and comparing the position markers with what the master database reports under:

mysql> SHOW MASTER STATUS;

Step 11: Update database-connect functions

Once the slave database is up-to-date, it is now time to update your application code, routing all calls to the new server.

Assuming you have all database access functions encapsulated in a database.php / database.c layer, locate the function that establishes a connection to the database and update the IP address or domain name to point to the slave database.

Voila! There you have it -- migration of MySQL from machine A to machine B without ever taking the database down.

Step 12: Test

Run a tail on the master database MySQL log file, or run SHOW PROCESSLIST and make sure no one is accessing the master database any more. Once confirmed, shutdown the master database and test for any strange side effects.

I recommend you keep the old database server up for a few days until you're absolutely certain the transfer went through smoothly.

Running a mysqlcheck --all-databases on the new database is highly recommended.


MYSQL  Replication rebuild


A few assumptions before getting started:
*You are repairing a previously existing master/slave replication process. If not, you'll have to add a user to Master who the Slave can use to get its info. Will also have to set up some other variables etc. Can find that info somewhere.
*If your server/database is live and are worried about performance, use:
"nice" before linux commands
"--skip-lock-tables" on mysqldumps to prevent locking insert/reads
"nohup ... &" to execute the linux/unix command inthe background, unattended

Here is the process I did:

1) Drop Slave database.
mysql>drop database x;

2) Create dump of Master's create table info:
nohup mysqldump --no-data --skip-lock-tables --single-transaction -uuser -ppassword db > file.sql &

3) Dump any tables you know won't change or that you can stop from changing without problems. Use same command as in 2)

4) **This is the part you need to lock tables
ON MASTER:
mysql>reset master; **This took me ~9 minutes
mysql>flush tables with read lock; **~17 seconds
mysql>show master status;

(VERY IMPORTANT)
Copy down the File and Position information

(From another terminal, still on master's server)
linux>nohup mysqldump --skip-lock-tables --single-transaction --no-create-info --routines --ignore-table=db.table_already_dumped -uuser -ppassword db > file2.sql &

mysql>UNLOCK TABLES;

(If databases are on different servers)
5)Copy dumps to other server
nohup nice scp file1.sql user@IPAddress:/path/ &
nohup nice scp file2.sql user@IPAddress:/path/ &
(Include other dump(s) from step 3) )

(ON SLAVE)
6)Import dumps
mysql>\. file.sql
*Might want to use nohup mysqlimport .... if you want to leave UNATTENDED!

(Still on Slave)
*The x,y on the change master... command are the data you got from "show master status" in step 4)
7)Start replication
mysql>stop slave;
mysql>reset slave; (This will delete the relay logs, etc)
mysql>change master to MASTER_LOG_FILE='x', MYSQL_LOG_POS='y';
mysql>start slave;

mysql>show slave status \G



Saturday, April 30, 2011

How to do successful oracle 11g upgrade?

I recommend users who are upgrading to oracle 11g to check the sys/system
duplicate objects before upgrade. Duplicate check step is not 
part of the 11g upgrade doc or manual installation doc.
 
I recommend following all steps which are provided in your 
installation doc and use the following procedure to find any 
duplicate before you proceed with your upgrade. 
 
Oracle should include this steps part of upgrade doc to make 
upgrade successful. 
 
Problem Description: 
==================== 
 
If the same data dictionary object has been created under both user SYS and 
SYSTEM schema then errors will often occur when trying to use the database 
features associated with these objects. 
 
Problem Explanation: 
==================== 
 
During the installation of Oracle many scripts are run which create the 
underlying data dictionary objects. Most of these scripts are run at database 
creation time, but others are only run if specific database features (such as 
replication or shared pool management) are needed. These scripts are usually 
run manually after the database has been created. 
 
Solution Description: 
====================== 
 
In order to clean up the duplicate objects you need to issue a SQL script to 
find out the names of the duplicate objects. You can then manually drop the 
objects or use a 'SQL generating SQL' script to generate a list of drop 
commands. 
 
Below is a SQL*Plus script that will list all objects that have been created 
in both the SYS and SYSTEM schema: 
 
column object_name format a30 
select object_name, object_type 
from dba_objects 
where object_name||object_type in 
   (select object_name||object_type  
    from dba_objects 
    where owner = 'SYS') 
and owner = 'SYSTEM'; 
 
The output from this script will either be 'zero rows selected' or will look 
something like the following: 
 
OBJECT_NAME                OBJECT_TYPE 
------------------------------ ------------- 
ALL_DAYS                       VIEW 
CHAINED_ROWS                   TABLE 
COLLECTION                     TABLE 
COLLECTION_ID                  SEQUENCE 
DBA_LOCKS                      SYNONYM 
DBMS_DDL                       PACKAGE 
DBMS_SESSION                   PACKAGE 
DBMS_SPACE                     PACKAGE 
DBMS_SYSTEM                    PACKAGE 
DBMS_TRANSACTION               PACKAGE 
DBMS_UTILITY                   PACKAGE 
 
If the select statement returns any rows then this is an indication that at 
least 1 script has been run as both SYS and SYSTEM. 
 
Since most data dictionary objects should be owned by SYS (see exceptions 
below) you will want to drop the objects that are owned by SYSTEM in order to 
clear up this situation. 
 
EXCEPTION TO THE RULE 
===================== 
THE REPLICATION SCRIPTS (XXX) CORRECTLY CREATES OBJECTS WITH THE SAME NAME IN 
THE SYS AND SYSTEM ACCOUNTS. LISTED BELOW ARE THE OBJECTS USED BY REPLICATION 
THAT SHOULD BE CREATED IN BOTH ACCOUNTS. DO NOT DROP THESE OBJECTS FROM THE 
SYSTEM ACCOUNT IF YOU ARE USING REPLICATION. DOING SO WILL CAUSE REPLICATION 
TO FAIL! 
 
The following objects are duplicates that will show up (and should not be removed)
when running this script in 8.1.x and higher.
 
Without replication installed:
 
INDEX           AQ$_SCHEDULES_PRIMARY
TABLE           AQ$_SCHEDULES
 
If replication is installed by running catrep.sql:
 
INDEX           AQ$_SCHEDULES_PRIMARY
PACKAGE         DBMS_REPCAT_AUTH
PACKAGE BODY    DBMS_REPCAT_AUTH
TABLE           AQ$_SCHEDULES
 
When database is upgraded to 11g using DBUA, following duplicate 
objects are also created
 
 
OBJECT_NAME                OBJECT_TYPE 
------------------------------ ------------- 
Help                           TABLE 
Help_Topic_Seq                  Index
 
The objects created by sqlplus/admin/help/hlpbld.sql must be owned by SYSTEM 
because when sqlplus retrieves the help information, it refers to the SYSTEM 
schema only. DBCA runs this script as SYSTEM user when it creates the database 
but DBUA runs this script as SYS user when upgrading the database (
reported as an unpublished BUG 10022360).  You can drop the ones in SYS schema.
 
Now that you have a list of duplicate objects you will simply issue the 
appropriate DROP command to get rid of the object that is owned by the SYSTEM user. 
 
If the list of objects is large then you may want to use the following 
SQL*Plus script to automatically generate an SQL script that contains 
the appropriate DROP commands: 
 
set pause off 
set heading off 
set pagesize 0 
set feedback off 
set verify off 
spool dropsys.sql 
select 'DROP ' || object_type || ' SYSTEM.' || object_name || ';' 
from dba_objects 
where object_name||object_type in 
   (select object_name||object_type  
    from dba_objects 
    where owner = 'SYS') 
and owner = 'SYSTEM'; 
spool off 
exit 
 
You will now have a file in the current directory named dropsys.sql that 
contains all of the DROP commands. You will need to run this script as a 
normal SQL script as follows: 
 
$ sqlplus 
SQL*Plus: Release 3.3.2.0.0 - Production on Thu May  1 14:54:20 1997 
Copyright (c) Oracle Corporation 1979, 1994.  All rights reserved. 
Enter user-name: system 
Enter password: manager 
SQL> @dropsys 
 
Note: You may receive one or more of the following errors: 
 
      ORA-2266 (unique/primary keys in table 
      referenced by enabled foreign keys):  
 
      If you encounter this error then some of the tables 
      you are dropping have constrints that prevent the table from being 
      dropped. To fix this problem you will have to manually drop the 
      objects in a different order than the script does. 
 
      ORA-2429 (cannot drop index used for enforcement of 
      unique/primary key): 
 
      This is similar to the ORA-2266 error except that it points to 
      an index. You will have to manually disable the constraint 
      associated with the index and then drop the index. 
 
      ORA-1418 (specified index does not exist): 
 
      This occurs because the table that the index was created on 
      has already been dropped which also drops the index. When the 
      script tries to drop the index it is no longer there and thus 
      the ORA-1418 error. You can safely ignore this error.

Monday, February 21, 2011

Oracle Enterprise Manager 11g for Oracle Database 11g Release 2

What's new in OEM 11g R2


  • Compression Advisor: A Compression Advisor has been added in Oracle Database 11g Release 2 to facilitate choosing the correct compression level for your data. The Compression Advisor is available through Oracle Enterprise Manager 11g Grid Control console. As part of the existing Advisor framework in Oracle Database 11g, the compression advisor analyzes objects in the database, discovers the possible compression ratios that could be achieved, and recommends optimal compression settings.


  • Instance Caging: Oracle Enterprise Manager 11g introduces more flexible alternatives to server partitioning to support resource management. Instance caging allows you to limit CPU count per instance and enables resource management that has more consistent support across operating systems compared to resource managers available in many operating systems. It also has lower administration overhead than traditional virtualization techniques.


  • Global RAC Diagnostics: Oracle Enterprise Manager 11g extends Automatic Database Diagnostics Monitor (ADDM) for Oracle Database 11gR2 by greatly enhancing cluster-wide performance analysis for Real Application Clusters (RAC) databases. For RAC environments ADDM analyses the RAC cluster and reports on issues that are affecting the entire database as well as its individual instances. DBAs can now use ADDM to perform database-wide analysis of global resources, such as high-load SQL, global cache interconnect traffic, network latency issues, skew in instance response times, and I/O capacity.


  • Tuning Standby Databases: Active Session History (ASH) is now available for performance diagnostics for standby databases. Using Grid Control, DBAs can quickly drill-down into problem areas that are affecting performance and resolve them with minimal impact.



  • High Availability Console: Accessed through Grid Control, the new High Availability Console allows you a configuration-wide summary view, with charts, of RAC and HA setups instead of individual targets. It provides HA configuration best practices and standards, along with recommendations from the new Maximum Availability Architecture Advisor, to help you implement a HA system. The HA Console also uses EM's powerful workflows and provisioning automation to minimize downtime for things like single instance to MAA conversions.

    Thursday, January 6, 2011

    What's new in PostgreSQL 9.0.2

    PostgreSQL 9.x has two good features:

    Hot Standby and Streaming Replication are the two new features that mark Version 9.0 as a landmark in PostgreSQL's development . 

    Other benefits:
    • Avoid failure when trying to report an impending transaction wraparound condition from outside a transaction. This oversight prevented recovery after transaction wraparound got too close, because database startup processing would fail, backpatched to all versions.
    • Fix recovery from base backup when the starting checkpoint WAL record is not in the same WAL segment as its redo point, backpatched to all supported versions.
    • Force default wal_sync_method to be fdatasync for Linux, backpatched to all supported versions.
    • Fix memory leak with ANALYZE and complex index expressions, backpatched to all supported versions.
    • Fix memory leak in contrib/xml2 functions, backpatched to all supported versions.
    • Fix for segfault in pgcrypto when attempting to access an undefined public key, backpatched to all supported versions.
    • Prevent inlining an SQL function with multiple OUT parameters, backpatched to all supported versions.
    • Fix bugs in WAL replay logic for GIN indexes, backpatched to 8.2.
    • Fix segfault when accept() or a call made right after accepting a connection fails when server is compiled with GSSAPI support, backpatched to 8.3.
    • Fix error handling in tempfile deletion with log_temp_files active, backpatched to 8.3.
    • Prevent slowing of of autovacuum workers because of leakage in cost parameter over life of worker, backpatched to 8.3.
    • Check for too many children before returning "okay to proceed", backpatched to 8.4.
    • Improve performance of internal tuplestore structures for window functions, backpatched to 8.4.
    The 9.0.2 update also contains several fixes for issues with features introduced or changed in version 9.0:
    • Fix low-risk potential denial of service against RADIUS login.
    • Make OFF keyword unreserved.
    • Fix bug in tracking of latest removed WAL segment in streaming replication.
    • Fix for buffer overrun in pg_upgrade.
    • Fix memory leak in autovacuum launcher.
    • Fix for race condition in transaction ID assignment and lock acquisition in hot standby.
    • Improve performance for parallel restore with many blobs.