Brents IT Blog

Random thoughts by an IT GOAT


Nimble Storage Review

I get asked all the time from my peers what storage system i am using.  They expect me to have gone out and really done my homework.  Well the truth is, its very hard to do homework on storage systems.  There are many variables and as time goes by, the variables only increase in scope.  So when we chose our new SAN system last year, i went around and did what i thought was dudilligence.  I pulled my stats, thought out upcoming projects and use projections, then compiled a needs list.  I then submitted this list to several of my final contestants and asked for quotes based on the needs list.  Well of course each one came back with different questions as some systems could add thousands based on my needs.  This meant that comparing apples to apples was near impossible when it came to quotes, but factoring in everything, we chose to go with a Nimble SAN system, more specifcally the CS240 model.

The Nimble SAN offers a self teiring system which boosts your performance significantly based on what you are doing.  The performance is pretty tolerant of wrong configurations but you need to know what your data will be doing and if there is any that needs priority.  The setup on the unit consisted of a sheet that we filled out with desired IP addresses and initial configuration.  This was then carried out by an installation technician.  Once all is well ( about 1 hour for racking, configuration and system health check ), they then move on to the knowledge transfer which takes about an hour.  The web gui is pretty intuitive to an IT person and is pretty much your only control point ( unless you have vmware and install the module in your vcenter server ).  

The web interface is quick and easy to use.  I found that firefox works the best with it, IE works and Chrome allows viewing but edits are little wanky if they actually work at all (i was told they were working on that though).  This being of little concequence to me, i keep the website up in firefox with all my other monitoring sites.  The main page shows you the overall stats of the system, events and a nice graph of space savings and disk usage.  The next section is the manage section where you, ya you guessed it, manage the devices volumes, connectors, performance policies, etc.. There is some chicken/egg setup there but they do allow you to create the egg after the chicken where needed.  The next section is for monitoring, it basically shows you all sorts of graphs and stats based on what they want to show you.  There is some customization allowed, but its not going to fit every need.  I can say its one of the more comprehensive reporting tools i have seen built into a SAN console( so kudos there!).  The next area is the events log(has a list of all events), followed by the administration area.  The administration area is where you configure the SAN itself:  email alerts, upgrades, support, system defaults, etc.  The final area is the help menu where you can find links to the help website and manuals.

Moving on...

So after one year of service we have discovered the pluses and minuses of the Nimble SAN.  The pluses amount to the following:

1. Support is on the ball faster than you know there is a problem.

2. Upgrades are frequent and DO NOT interfere with the operation of the SAN, though as with anything, i suggest you do not upgrade during high loads(we did once for giggles and no issues).

3. Configuration of new volumes is a cinch!  Once you setup performance policies and protection plans, configuring a volumes takes about 1 minute.  Even without those, it only takes 2-3 minutes unless you have to sit there and think about it.

4. Graphs are great and useful, i send them off to other admins and ask if they are jealous(of my boogie)!

5. Compression on this thing is amazing!  Now you really have to manage the volume configuration to data type in order to get the best compression possible, but its worth the effort!  We are currently getting 38% compression on the primary data and 1.62 times compression on the backups( yea not sure why one is percent and one is times.. ).  Total space saved is 3.4TB.

6. Snapshots are easy and they just work.  Restores and cloning from them is a cinch.

7. Replication to a secondary unit works great and the management is exactly what you need.  They also have an online service now that allows you to skip buying a secondary unit but still allow replication.

8. Everything is thin provisioned on the SAN, no need to decide.  So when setting up a vmware, just set volumes to thin... and done.

I wanted to state the things i like about the SAN first because 1. its easy to concentrate on the negatives and 2. the negatives are more specific to my environment( though you might run into the same thing), so here are my negatives:

1. The ISCSI with VMWARE is a 1 to 1, so 1 VMIC to 1 SAN NIC.  This means that for every volume you connect through VMWare, you use 4 connections, so for 4 NICs, you use 16 connections.  Thus when you start getting up there in volumes, you can hit that 1024 connection limit( i was told they are working on this, but for me its currently a fatal flaw because of the vmware limitation ).  The bigger problem i have with this is that Microsoft doesnt support having iSCSI run across a VMWARE setup.  So connecting directly from the guest OS is a gamble.  So far i have found it works fine with windows 2003 and 2008, but not so much with 2012( still working out the bugs there ).  Little math 4 connections means a max of 64 volumes, 3 connections means a max of 85.  Now if you get 10GBe, then you only have 2 and 2, so 256 volumes(i am assuming only 2 NICs on the hosts for redundancy).  Update:  This "issue" is resolved in firmware version 2.0.6.  Install NCM on your vmware hosts and you can manually set the connection limit per volume (which should be equal to the total amount of cards you have connected to your storage network).  So instead of the 16 we had, we now have 4 per volume using the NCM plugin and thus a much higher limit of volumes allowed per host now.  Check out my re-invented post for details.

2. The next issue is more of a warning.  You need to make sure they size your stuff properly upfront.  While i handed in my stats, i was given the whole, this system doesnt work like those systems, so dont worry about it.  Well low and behold, 6 months after going live and dealing with several service reboots that happened randomly but didnt affect service, i was told we needed larger SSDs for Cache.  We were killing the CPU due to the cache being to small.  Well, they cache vmware volumes automatically and since everything we have is pretty much on a vmware volume, it means we need a lot more cache.  I did tell them up front about this during our techinical meetings, but i guess everyone was too new to configuration of this box?  I have a problem with forking more money over right after i buy something.  My solution was to move many of the volumes to direct iSCSI, which i of course found out later is not supported by Microsoft.  Update:  setup a new volume type that copies their vsphere 5 type and turn off caching, then move the vmware volume to this.  With the NCM tool fixing the volume limitations, using direct iSCSI in the vmware environment is no longer necessary unless you want to use MS Clustering

3. Creating golden images is a bit of task that is poorly documented. VMware doesnt handle them well, though they work.  Last but not least, if you replicate golden images, the replicated volumes all expand rather than staying a link to the golden image( so you lose the space on the replicated system ).

4. DO NOT try to setup all your snapshots to take place at the same time, the SAN cannot handle it, make sure to stagger them every 10-15 minutes.  While this may not really be a negative, i find it to a be a minor inconvenience when setting up volumes as i have to take it into account.

Few notes:
Make sure to follow their guides closely, we made the mistake of not setting up round robin on our iSCSI connections and thus the latency on our datastores went out of site as we brought more and more volumes online.  There is a nifty command you can run on each one of your esxi hosts to set the round robin to default, i recommend that procedure before adding any volumes.  We had a lefthand unit from before that used last path and on top of that we had been upgrading since vmware 3.5 ( left over defaults ).  

Guides also apply when dealing with exchange and SQL databases, very important to pay attention.  Same goes for the networking setup guide.

So there you have it, my review.  I will clean up the spelling and grammer later when i am more sober :)