Skip to content

HOWTO: PostgreSQL Replication with Automatic Fail-Over , using repmgr a.k.a. replication manager

In our scenario we only have two separate database servers. The OSes are Linux Centos 6.7. The Database Servers are PostgreSQL 9.4. The repmgr is repmgr94.

The Servers was been configured with streaming replication. So, this setup is not from bare-metal, in opposite it’s from production setup.

I stop the standby server, rename the recovery.conf file and start it again. So , now , we can addusers,databases,etc and we can setup the system with repmgr.

On master and slave nodes:

  1. create ssh keys and populated , so every server can connect to each other in the context of postgres user

ON THE MASTER NODE: on the command line

  1. Install repmgr on the master node
hostname# yum -y install repmgr94 repmgr94-debuginfo

…..

2. Just for info (if I forgot something) , I’m showing the  postgresql.conf key-values.

hostname# su - postgres
-bash-4.1$ grep ^[a-zA-Z0-9] /usr/local/pgsql/data/postgresql.conf

 

listen_addresses = ‘0.0.0.0’
port = 5432 # (change requires restart)
max_connections = 2048 # (change requires restart)
tcp_keepalives_idle = 0 # TCP_KEEPIDLE, in seconds;
tcp_keepalives_interval = 0 # TCP_KEEPINTVL, in seconds;
tcp_keepalives_count = 0 # TCP_KEEPCNT;
shared_buffers = 1024MB # min 128kB
temp_buffers = 128MB # min 800kB
work_mem = 16MB # min 64kB
dynamic_shared_memory_type = posix # the default is the first option
max_files_per_process = 1024 # min 25
wal_level = hot_standby
fsync = on # turns forced synchronization on or off
synchronous_commit = on # synchronization level;
checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 5min # range 30s-1h
archive_mode = on
archive_command = ‘cp %p /usr/local/pgsql/wal/%f’
archive_timeout = 2 # force a logfile segment switch after this
max_wal_senders = 8 # max number of walsender processes
wal_keep_segments = 64 # in logfile segments, 16MB each; 0 disables
wal_sender_timeout = 30s # in milliseconds; 0 disables
log_destination = ‘stderr’ # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_directory = ‘pg_log’ # directory where log files are written,
log_filename = ‘postgresql-%a.log’ # log file name pattern,
log_truncate_on_rotation = on # If on, an existing log file with the
log_rotation_age = 1d # Automatic rotation of logfiles will
log_rotation_size = 0 # Automatic rotation of logfiles will
log_line_prefix = ‘< %m >’ # special values:
log_timezone = ‘Europe/Berlin’
datestyle = ‘iso, mdy’
timezone = ‘Europe/Berlin’
lc_messages = ‘en_US.UTF-8’ # locale for system error message
lc_monetary = ‘en_US.UTF-8’ # locale for monetary formatting
lc_numeric = ‘en_US.UTF-8’ # locale for number formatting
lc_time = ‘en_US.UTF-8’ # locale for time formatting
default_text_search_config = ‘pg_catalog.english’

 

-bash-4.1$ createuser -s repmgr
-bash-4.1$ createdb repmgr -O repmgr
# touch /var/log/repmgr/repmgr-9.4.log
# grep -v ^\# /etc/repmgr/9.4/repmgr.conf

cluster=CLUSTER_NAME
node=1
node_name=MASTERNODE_HOSTNAME
conninfo=’host=127.0.0.1 dbname=repmgr user=repmgr’
loglevel=NOTICE
logfacility=STDERR
logfile=’/var/log/repmgr/repmgr-9.4.log’
pg_bindir=/usr/pgsql-9.4/bin
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic # one of ‘automatic’, ‘manual’
priority=100 # a value of zero or less prevents the node being promoted to master
promote_command=’/usr/pgsql-9.4/bin/repmgr standby promote -f /etc/repmgr/9.4/repmgr.conf’


# repmgr -f /etc/repmgr/9.4/repmgr.conf --verbose master register

[2016-01-21 19:19:07] [NOTICE] opening configuration file: /etc/repmgr/9.4/repmgr.conf
[2016-01-21 19:19:07] [NOTICE] Redirecting logging output to ‘/var/log/repmgr/repmgr-9.4.log’

Execure cluster show to ensure everything is ok until now.

# repmgr -f /etc/repmgr/9.4/repmgr.conf --verbose cluster show

[2016-01-21 19:34:40] [NOTICE] Redirecting logging output to ‘/var/log/repmgr/repmgr-9.4.log’
Role | Connection String
* master | host=MASTERNODE_HOSTNAME dbname=repmgr user=repmgr

 

ON THE SLAVE NODE: on the command line

  1. First, clone the production database from primary/master database server: MASTERNODE_HOSTNAME
# su - postgres
-bash-4.1$ /usr/pgsql-9.4/bin/repmgr -f /etc/repmgr/9.4/repmgr.conf --force --rsync-only -d repmgr -U repmgr -h MASTERNODE_HOSTNAME --verbose  standby clone

 

……

pg_twophase/
pg_xlog/

sent 73062 bytes received 1674435 bytes 1164998.00 bytes/sec
total size is 62768776 speedup is 35.92
receiving incremental file list
pg_control
8192 100% 7.81MB/s 0:00:00 (xfer#1, to-check=0/1)

sent 102 bytes received 236 bytes 676.00 bytes/sec
total size is 8192 speedup is 24.24

 

# /etc/init.d/postgresql start
# su - postgres
-bash-4.1$ /usr/pgsql-9.4/bin/repmgr -f /etc/repmgr/9.4/repmgr.conf --verbose standby register

 

Finally ensure cluster status, with cluster show command:

-bash-4.1$ /usr/pgsql-9.4/bin/repmgr -f /etc/repmgr/9.4/repmgr.conf cluster show

 

[2016-01-21 20:28:41] [NOTICE] Redirecting logging output to ‘/var/log/repmgr/repmgr-9.4.log’
Role | Connection String
* master | host=MASTERNODE_HOSTNAME dbname=repmgr user=repmgr
standby | host=SLAVENODE_HOSTNAME dbname=repmgr user=repmgr

 

That’s All.

HOWTO: Nginx SSL Chain , or private->intermediate->ca

When you issue cert and want to put it on your nginx, there is a little thing to make in mind.

Nginx wants one file (also called ssl bundle), where all your certs are put in order!

The order is:

  1. your_cert (your server crt)
  2. intermediate cert (your ssl provider intermediate ssl)
  3. CA cert (your ssl provider CA)

 

Check is your key, csr and crt match. Basically you do openssl requests on them , then md5sum on the results. The md5 hashesh should be the same!!!

Example:

 openssl x509 -noout -modulus -in certificate.crt | openssl md5
 openssl rsa -noout -modulus -in privateKey.key | openssl md5
 openssl req -noout -modulus -in CSR.csr | openssl md5

Get more info on these sources:
https://www.sslshopper.com/certificate-key-matcher.html
https://www.digicert.com/ssl-certificate-installation-nginx.htm

FIX: MySQL5.6 – MySQL5.7 without changing the code. Errors like: (22003): Out of range value , ERROR 1365 (22012): Division by 0, and etc.

When your code is for mysql5.6 and your admin upgrade to mysql5.7

you can see errors like this:

ERROR 1264 (22003): Out of range value for column 'a' at row 1

ERROR 1365 (22012): Division by 0

ERROR 1406 (22001): Data too long for column 'a' at row 1

ERROR 1292 (22007): Incorrect datetime value: '0000-00-00 00:00:00' for column 'a' at row 1
ERROR 1055 (42000): Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'invoice_line_items.description' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by


Quick tutor:

https://www.digitalocean.com/community/tutorials/how-to-prepare-for-your-mysql-5-7-upgrade 

 

Detailed doc:

http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sql-mode-changes

FIX: jirra with mysql CPU overload when leap-second happens

When leap second happens , our jirra server start overloading. All the java processes eat all the CPU time, the app is responsive, as well as the server. But still it’s overloaded.

restarting jirra -> not helping at all

restarting mysql -> not helping at all

The FIX is , the you need to manually set the date with time 00:00:00 (in our case, in yours may be not important), then manually run ntpdate pool.ntp.org and then start again the ntpd daemon. Without any services/processes restart the server is ok now, the CPU time is back as normal and the app was not disrupted in any way.

 

strace from the set date command:

Read more…

FIX: Centos 6.6 on X8SIE-F, nics links down, all packet counters rise and goes crazy , no networking or: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

Bug:  new Centos 6.6 install on SuperMicro X8SIE-F, After some time nics links down, all packet counters rise and goes crazy , no networking

     If you succeed to install centos 6.6 via network 🙂

Cause: It’s ALL about the ASPM

Logs:

Jul 10 23:01:41 localhost kernel: Hardware name: X8SIE
Jul 10 23:01:41 localhost kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Jul 10 23:01:41 localhost kernel: Modules linked in: ipv6 iTCO_wdt iTCO_vendor_support serio_raw i2c_i801 i2c_core sg lpc_ich mfd_core e1000e ptp pps_core ext4 jbd2 mbcache raid1
sd_mod crc_t10dif pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

fixing on boot time:

append to grub boot line: pcie_aspm=off

fixing after boot:

 echo “performance” > /sys/module/pcie_aspm/parameters/policy

fixing from the BIOS:

Goto: Advanced – Chipset Configuration ( most probably or try something similar in the main menus )

Set: Active State Power Module = Disabled

Here is the /var/log/messages log:

Read more…

HOWTO: Migrate lvm root partition to new created software raid in LINUX

/dev/sda – > production disk

/dev/sdb -> new disk

/dev/sdb5 -> /dev/mapper/srv0-root  mounted on /

sfdisk -d /dev/sda |sfdisk –force /dev/sdb

reboot or partprobe

pvcreate /dev/sdb5

 

HOWTO: import private keys with MACOS X Bitcoin-QT.app

DONT NEED TO DO THIS!!! IT’S ONLY FOR RPC JSON API CALLS:

PREPARE the bitcoin-qt.app for open in server mode with rpc authentications

1. enter rpcuser=someuser and rpcpassword=somepassword in your ~/Library/Application Support/Bitcoin/bitcoin.conf

2. chmod it to only you (600 or 700)

3. run the server:

# open Bitcoin-Qt.app

Read more…