13. January 2011 16:54
General . networking . SAN
So about a year ago i had this bright idea to start using MS Clustering. We started with the printers. The print servers kept having issues, so i figured cluster services could deal with spool failures and bad drive interactions. So we brought it online and used it.
About two months after that i decided that it was very stable and took pride in the fact that our printer work orders had dropped to company record lows. My guys were able to spend more time doing other things! This was great. Anyhow, it was time to add DHCP to the cluster and so it was. Probably the easiest thing we had to do and it again worked perfectly!
Ran that for two months and then decided it was time to move our file shares off a single server to a more central redundant location, yes "TO THE CLUSTER!". After a marathon friday night, all the files had been moved and all was well with the world. Well maybe not everything, had some backup challenges with the cluster setup, but we resolved them quickly.
So two weeks ago i start getting requests from employees saying they are getting profile errors when they login. The profile errors turn out to be corruption occuring in the profile when they log off. So as more and more start occuring, i start to take notice. Well last week users started getting file corruption messages. While we were able to fix most of them, some would not come back. This was bad of course. So i ensured the backups were working and planned to move the whole thing to a virtual file server in vmware on that friday. Well tuesday our SAN dropped offline, while completely unrelated, it took my eyes off this issue. Bottom line, our backup failed that night and the next day, the entire clustered file share failed. It had run for months without issue and in two days, it was dead. Much like my cat who had died the week prior. Talk about a crappy month. So we quickly built the file server and restored the last good backup to the share. Many hours later and a few unhappy users, we had a new file server. Again, no medical data loss, but still data loss. I wouldnt have been so bad if it had not occured on the same week as the SAN failure.
So for those of you thinking of using windows cluster services for file sharing, think again or atleast test it for a long time. Maybe 2008 fixed some stuff, but 2003 is not baked for file sharing. I should also mention we were going to use DFS but because we store profile data on the share and had heard of other companies having corruption issues with it, we decided against DFS.