Brents IT Blog

Random thoughts by an IT GOAT

NAVIGATION - SEARCH

Ceph upgrade from Firefly .80.10 to Hammer .98 woes and Solution

Just ran into a major pain in the butt..  so i thought i would document it as it took hours to work out.

 

I was asked to upgrade Ceph in order to fix an instability in the existing cluster.  So i rebuilt my local vmware test cluster ( i had blown it away when i rebuilt my computer ).  I installed ceph per my docs, but this time, just like when i setup the cluster i needed to upgrade, i used the ubuntu repository for the ceph-deploy install.  This wonderful little set of scripts downloads and sets up ceph with a few keystrokes(once the servers are prepped).  Anyhow, it turns out that once Ubuntu moves on to a new LTS build ( version 14 to 16 in my case ), they stop putting out new packages for the older version.  So when i checked for upgrades, i only got firefly .80.11 as an option.  I couldnt select firefly .80.10, which is what my production cluster had( annoying ), so i had to put .80.11 on it instead.  So setting up the test cluster went fine, no issues.  Once i had it up, it was time to test the upgrade to hammer.

 

Full disclosure, i know ceph-deploy is finicky ( or brittle as some have described it ), but when it works, its great!  So i go to upgrade and as noted above, the repositories for ubuntu 14.04 stop at .80.11 of the ceph-deploy package.  So i figure, i can just add the ceph repository and go from there.  I was able to load ceph-deploy, but when i tried to install (which is how you upgrade), i ran into a dependency conflict.  Apparently, the Hammer ceph-common package was conflicting with the .80.11 firefly package.  This is of course the fault of the packaging person for not putting notes in the package to differentiate the .80.11 install enough so newer versions dont fail.  So after many hours of trying to figure out how to get around this stopping point, i found a post that noted the package conflict.  They then vaguely stated the removal of the dependency.  Good thing i had snapped my installs on my test server, because i had to run through this a few times to figure it out.

 

The solution was to add the new ceph respository, install ceph-deploy, then install python-pip and install ceph-deploy version 1.5.34 on the deployment node.  Next was to deploy to the first monitor server(same as the deployment node in my case) using ceph-deploy(the 1.5.34 version).  It will then fail stating the conflict.  At that point i had to do a "dpkg -f ceph-common" on the server i was installing.  Then run "apt-get -f install -y" to reinstall the ceph-common dependency.  I was then able to rerun the ceph-deploy again without an error( though not necessary since the last command finishes the install ), then the last step was to restart the ceph monitor.

 

So the moral of the story, dont use ubuntu repositories to install ceph-deploy.

 

Recap of process(in commands):

wget -q -O- 'https://download.ceph.com/keys/release.asc' | apt-key add -

echo deb http://download.ceph.com/debian-hammer/ trusty main | sudo tee /etc/apt/sources.list.d/ceph.list

apt-get update && apt-get install python-pip ceph-deploy -y

pip install ceph-deploy==1.5.34

su cephuser

ceph-deploy install --release hammer monhost1

exit

dpkg -r ceph-common

apt-get -f install -y

su cephuser

ceph-deploy install --release hammer monhost1

 

On the rest of the ceph nodes, you only need to do this:

From the ceph-deploy server, run the install:

ceph-deploy install --release hammer monhost2

 

 

From the monhost2 server, after the install failure, run:

dpkg -r ceph-common

apt-get -f install -y

 

Then go back to the ceph-deploy server and run the install again:

ceph-deploy install --release hammer monhost2

 

Technically, you dont have to run the install again via ceph-deploy, but i did for the warm blanket feeling of have it go through without an error.  Also, i installed ceph-deploy version 1.5.34 because the newer versions ( 1.5.35 - 1.5.37 ) were not adding the repository on the local servers before install.  They were also failing in different way per version, so after many tests with different versions, i landed on 1.5.34 for the upgrade.  I may even upgrade to Jewel using that version as well.

I still have to decide how i want to handle our older cluster, which is running firefly .80.7 . They support the upgrade to Hammer, but only from .80.10 or higher.  On top of that, the servers are running Ubuntu 12.04, so i cannot upgrade to Jewel if i stick with 12.04.  This means a distro upgrade to 14 and then 16, then a ceph upgrade.  Quite the process and downtime/maintenance involved to get it done.  The cluster is very stable, so i dont see the trigger being pulled on this anytime soon.