Brents IT Blog

Random thoughts by an IT GOAT


Upgrading Ceph from Hammer to Jewel then Luminous 12.2.2

In a previous post i go over the issues i had upgrading to Hammer.  I suggest reading that before this if you are not currently on hammer.  We continued to have stabilty issues under load so it was decided that we needed to go all the way to the latest LTS ( Luminous 12.2.2 ).  We added some new OSD nodes to the cluster that have ubuntu 16.04 on them and i also upgraded the radosgw VMs to 16.04 before upgrading.  I left the monitors and older OSD nodes on ubuntu 14 as a sanity check.  I should note that it was suggested that we upgrade the OS of all the servers to 16 before upgrading to Luminous.  It is also suggested that you must upgrade from hammer 94.9+ to Jewel then to Luminous.  So its a two step process, you shouldnt go directly to Luminous( i guess some folks have tried and were successful though, i am not that adventurous with a production cluster ).  

I started upgrading the nodes after reading the notes and what i thought would go well, did not, mainly because i ignored a stated issue in the notes.  The stated issue was that if you were using a user to deploy ceph using ceph-deploy and that users name was "ceph", you will run into deployment issues.  Initially, the only issue i ran into was my ssh deploy key being wiped each time i ran an upgrade on a host.  After the upgrade was over however, i found out that i needed to change the permissions on all the OSD drives and set them for the ceph user.  Well the ceph install is expecting a ceph user with a very specific user and group id.  So because i created the user on all the servers, it just had a 1003 user id, which of course did match what they were expecting.  So the main point i want to make here is that if you created a user called "ceph" and were using it for previous deployments, delete that user on every host and create a new one, perhaps "cephusr" instead, then create and push out a new ssh key to all the hosts.  Do this before any upgrade higher than hammer if you used the ceph user for ceph-deploy for ceph deployments.

Ceph-Deploy Reference Commands:
I installed ceph-deploy via pip to ensure i get the latest one from their repository:
pip install ceph-deploy==1.5.39
Then i used the following command to tell ceph-deploy to install the latest stable Hammer release:
ceph-deploy install --release hammer 
Then the latest stable Jewel release:
ceph-deploy install --release jewel
Then the latest stable Luminous release:
ceph-deploy install --release luminous

After i fixed the user issue, the upgrades went without much of an issue, upgrading from Jewel to Luminous was easy.

Issues ran into:
1. ceph username being used prior to the install of Jewel ( correct it before the upgrade to Jewel )

If you want to fix your existing user rather than recreate it ( this is what i did after the install to avoid further issues but before setting permissions on the OSDs ):
usermod -u 64045 ceph
groupmod -g 64045 ceph
more /etc/group ( verify group ID )
awk -F: '{printf "%s:%s\n",$1,$3}' /etc/passwd  ( verify user id )

2. After upgrading to Jewel, my radosgw's wouldnt upgrade/install, i needed to run the gatherkeys function, which was broken due to my user of the "ceph" username.  I corrected that and then reran the gatherkeys function with ceph-deploy on the admin node.
3. The older version of Radosgw had you install apache.  This needed to be removed prior to upgrading to Jewel.  I actually removed apache and the old radosgw install prior to installing the gateway command.
4. I needed to upgrade ceph-deploy to the latest before installing Jewel because even though i had upgraded for the Hammer release, there was no command to create the manager servers required by Jewel and newer versions of ceph.  Thus when i ran the create manager command, it errored on me.  I also got a complaint about keys missing even after i upgraded ceph-deploy, so i had to run gatherkeys again to create that key file its expecting in the /var/lib/ceph folder.
5. There are several flags you need to set after installing, read the docs as you will see error messages.  You must also create manager nodes, i set them up on our monitor nodes to avoid setting up new servers.  There is also a heath monitoring dashboard you can enable after the managers are setup.
6. There is an option you can add to each OSD servers ceph.conf file to skip past the ceph user that was created by the upgrade to Jewel needing the ownership permissions on the OSD drive directories.  I used this setting to just get the OSDs online and verify the upgrade worked.  After that, i then had to run the command to perform the take ownership and then remove the setting option from the ceph.conf file.  I did this because i didnt want to push that issue down the road ( and the cluster was offline for maintenance ).  If your cluster is online, you could stop the OSD, run the command and restart it ( or do the whole server after setting the noout flag, though technically you should have already set that when you started the upgrade ).

If you need the setting to get around the ceph ownership issue, meaning your stuff is owned by root, here it is ( put in ceph.conf file under global and restart OSDs ):
setuser match path = /var/lib/ceph/$type/$cluster-$id

6b. Setting the permissions on each OSD was not fun, it took like 30 minutes to run on each OSD.  It is recommended to install the parallel package so you can run these fast)  Here are the command i ran to takeover the directory, the first taking the 30 minutes and the rest taking seconds to process:

find /var/lib/ceph/osd -maxdepth 1 -mindepth 1 -type d|parallel chown -R 64045:64045

for ID in $(ls /var/lib/ceph/osd/|cut -d '-' -f 2); do
JOURNAL=$(readlink -f /var/lib/ceph/osd/ceph-${ID}/journal)
chown 64045 ${JOURNAL}

chown -R 64045:64045 /var/lib/ceph
chown 64045:64045 /var/lib/ceph
chown 64045:64045 /var/lib/ceph/*
chown 64045:64045 /var/lib/ceph/bootstrap-*/*

Reference site: 

I also ended up having to take ownership of the journal drive partitions as well before it would spin up the OSD( this was frustrating as i will need to do it going forward for any new drive being added ):
chown ceph:disk /dev/sdi1 ( change 1 to be whatever partition you are using )

7. The radosgw goes from apache to civetweb, so after doing the above stuff to get it reinstalled, i also had to go into the ceph.conf file and change the radosgw area to:
host = radosgw1
log file = /var/log/radosgw/client.radosgw.ukradosgw1.log
rgw_frontends = civetweb port=80 #needed to change the default port so the existing haproxy setup can see them

All of the apache settings get removed from the line configs for all the radosgw servers, which is great, less crap in my config, but i had to figure out exactly the settings.  I dont think you need the log file setting either, but i left mine in.

8.  The last issue i had was a kick in the balls really.  In previous releases they didnt care soo much about having the proper PGs per pool, but they started clamping down on the numbers because soo many people were doing it wrong.  They did institute a calculator on their page, which is great, but unfortunately for us, we had setup this up before that and apparently misunderstood the requirements.  This meant our PGs were too high and thus the second the cluster came up, it complained.  After we upgraded to Luminous, the cluster stopped allowing writes, essentially making the cluster read only.  After several emails to the mailing list, a nice guy pointed me in the right direction to the following settings:

mon max pg per osd = 200
osd max pg per osd hard ratio = 2.0

Reference site: 

The settings i show are the default ones.  We had so many PGs, i had to bump the mon setting to 4000 and the osd setting to 50.0 ( note .0 is required as its a float variable ).  Then the configuration sent out and all the OSD, Mgr and Mon daemons restarted.  This eliminated the error and allowed the cluster to write data again.  After getting this online, i went back and corrected my pools using the following commands( i also updated my defaults in ceph.conf file previous to deploying the new config file ):

Luminous pool copy to fix too many PGs setup ( STOP all radosgw's before doing this, there must be no I/O to the pools during this transition )

applied to the following pools: .rgw.root .rgw.gc .rgw.control .rgw.buckets.index .rgw .users .users.uid ( all are radosgw default pools ) and its assumed they had their PGs increased beyond 16 ( mine were at 4096 due to the defaults i setup in the config ).  The copy didnt break my rgw setup and if you need to recreate your rgw setup, simply delete all the pools and start the radosgw's again.  You will need to recreate users and buckets after that though.

#Create new pool
ceph osd pool create 16
#Copy old pool contents to new pool 
rados cppool
#Delete old pool
ceph osd pool delete --yes-i-really-really-mean-it
#rename new pool to old pools name
ceph osd pool rename
#Set application tag on pool; optional for Luminous, but going forward will be required for control purposes and will also give a warning in health output without it.
ceph osd pool application enable rgw

Reference Site ( contains a script to do this minus the last command ): 

9. If you enable the manager daemons health dashboard bit and you have more than one manager daemon, the dashboard will move around between them as the master manager node changes.  They recommend you use haproxy in front in order to redirect queries to the page, i really wish they would just automatically redirect you to the masters IP/page.

10. Some useful commands for ubuntu 16.04 ( each run on the specific host )
#restart all OSDs
systemctl restart
#restart radosgw
systemctl restart
#restart monitor
systemctl restart
#restart manager
systemctl restart
#restart single OSD ( where 21 is the OSD number )
systemctl start ceph-osd@21 
#restart specific ceph OSD using disk name ( used this when the OSD wouldnt start and after i fixed the journal permissions )
systemctl start ceph-disk@dev-sdb1.service