Nutanix journey – part 2

The journey continues, boxes are now delivered and have been racked, ready for installation on Monday.

I wrote in part 1 that I would discuss the reason to choose Nutanix.

An important part was cost, even though Nutanix is expensive upfront, we believe there are big savings going forward. OPEX saving are more important than CAPEX cost to us.

Eliminating SAN networking, bladechassi config, patching, aggregates, volumes, LUN and everything involved in maintaining those are a big saving. HA and DR setup is also very complex in traditional environments. Using those resources for automation and cloud adoption would be much better.

Complexity of Metro-cluster and active/active setup between sites also seems much easier in a Nutanix solution.

We have not evaluated in such big detail as David Quinney in this article:



Never mind the placement of the blocks, it’s kind of temporary…;-)

But I have followed the progress of Nutanix since late 2012 and have been more a more amazed about what they are doing. I have compared them to other vendors and I have read miles of blogs, websites, test, tweets and all other available writing I have come across. For me the decisions was easy, my problem was to convince management. This was not an easy task and we have had numerous reference calls with other customers, we have met with resellers and  Nutanix specialists from around the globe. Still there is a nagging feeling that its to good to be true. I hope that feeling proves to be wrong in the long run.

Today, installation of our first Nutanix cluster actually begun, a reseller got in to install and setup the initial setup which requires some special handling. It proved not to be so easy as I had expected.

Apparently the installations required IPV6 connectivity between hosts, Controller VM and the installation host. A dedicated switch had been setup and VLAN tagged to accommodate this, but things didn’t work, connectivity could not be established as required. Finally we realized the switch it self was faulty, after changing to another switch is was much easier.

For this POC we had decided not to set up a dedicated 10 GB switch, we connected everything to one already in use. This proved to be against us today. While pulling one of the 10 GB DAC connection, several other ports in that switch decided to go offline, causing several other systems go in to trouble. Seems like that switch also is faulty or has a bug. Installation was halted for some hours to figure out if we dared to continue. A dedicated switch is probably a good idea for the future…

After all connectivity issues were sorted out installation and cluster configuration was real smooth and when we stopped to-night we had a five node cluster up and running. Tomorrow we need to adjust VLAN´s on all nodes and configure the rest of the VMware environment. Then there is a health test and a performance test to be run to verify that the cluster delivers the performance expected.

After that we ourselves will run tests to verify that HA is working as we expect. Pull on network cable, pull one disk, kill a Controller VM, pull one node etc… Just so that we know what happens and understands the consequences of something should go wrong. This will also gives us better understanding about node-, block- and rack -awareness, how RF2 and RF3 works.

Since we have reached max cooling kapacity in our data center we can’t power on all Nutanix nodes before we have shutdown the servers it is to replace. We will start with a small cluster and slowly migrate over VM’s one by one and then shutdown hosts in the old Vmware cluster one by one.




This entry was posted in nutanix. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s