Friday, April 30, 2010

Working with Blade8:Configure VLAN for ESX host on Blade

With Blade networking is very important aspect if you are planning to roleout ESX on it. Because based on networking you will plan how you want to define this on VC-Flex10 module. There are many way you can define it but I like this :

a) Two LOM goes for Service Console

b) Two LOM goes for VM network

c) Dedicate two LOM for vmotion and confine traffic within VC domain.

d) Two LOM for NFS/ISCSI.

For VM network where you have to define multiple network. You need to create all the VLAN’s under shared uplink set which you want to trunk on VM network vSwitch

Vmotion network using internal link

1. For vmotion we will be creating a Ethernet Networks but will not be assigning any uplink to it

2. For rest all network it should be like this

3. Once this Ethernet link is created and blade has this profile assigned this nic can be used for vmotion. So create a vmkernel for vmotion, assign the IP address of your choice as the same IP address scheme you need to use it for all the vmotion vmkernel or else it will not work.

VM network using multiple network.

For VM network we create multiple VLAN on single vSwitch. This means VLAN should be in trunk mode. We also do tagging at ESX level .

Click on the tab as shown above and then it will pop up

Here you can select what ever VLAN you want be seen by VM network vSwitch.

SRM Error :Failed to recover datastore

We had setup srm with 4.0 U1 and VM’s were on ESX3.5 U4. We setup replication across location and then decided to simulate DR using  “Test RUN” option.  It goes and mount the lun on ESX host fine but when  it try to recover VM’s it was failing with error “Error: Failed to recover datastore: ” .

We then tried to run this with console to NetAPP filer open and we found that

Filer 01> Fri Apr 23 05:04:36 EST [XYZNAP005: wafl.volume.clone.fractional_rsrv.changed:info]: Fractional reservation for    clone 'testfailoverClone_nss_v10745371_S_XYZESX013_14_15_16PP' was changed to 100 percent because guarantee is set to'file' or 'none'.

Fri Apr 23 05:04:37 EST [XYZNAP005: wafl.volume.clone.created:info]: Volume clone  testfailoverClone_nss_v10745371_S_XYZESX013_14_15_16PP of volume S_XYZESX013_14_15_16PP was created successfully.Creation of clone volume 'testfailoverClone_nss_v10745371_S_XYZESX013_14_15_16PP' has completed.

Fri Apr 23 05:04:37 EST [XYZNAP005: lun.newLocation.offline:warning]: LUN /vol/testfailoverClone_nss_v10745371_S_XYZESX013_14_15_16PP/lun12 has been taken offline to prevent map conflicts after a copy or move operation.

Fri Apr 23 05:04:37 EST [XYZNAP005: lun.newLocation.offline:warning]: LUN /vol/testfailoverClone_nss_v10745371_S_XYZESX013_14_15_16PP/lun11 has been taken offline to prevent map conflicts after a copy or move operation.

Fri Apr 23 05:04:37 EST [XYZNAP005: lun.newLocation.offline:warning]: LUN /vol/testfailoverClone_nss_v10745371_S_XYZESX013_14_15_16PP/lun9 has been taken offline to prevent map conflicts after a copy or move operation.

Fri Apr 23 05:04:37 EST [XYZNAP005: lun.newLocation.offline:warning]: LUN /vol/testfailoverClone_nss_v10745371_S_XYZESX013_14_15_16PP/lun10 has been taken offline to prevent map conflicts after a copy or move operation.

Fri Apr 23 05:04:37 EST [XYZNAP005: wafl.inode.fill.disable:info]: fill reservation disabled for inode 33411686 (vol testfailoverClone_nss_v10745371_S_XYZESX013_14_15_16PP).

Fri Apr 23 05:04:37 EST [XYZNAP005: wafl.inode.overwrite.disable:info]: overwrite reservation disabled for inode 33411686 (vol testfailoverClone_nss_v10745371_S_XYZESX013_14_15_16PP).

Fri Apr 23 05:04:38 EST [XYZNAP005: lun.map:info]: LUN /vol/testfailoverClone_nss_v10745371_S_XYZESX013_14_15_16PP/lun12 was mapped to initiator group srm_esx_host=0

Fri Apr 23 05:04:38 EST [XYZNAP005: app.log.info:info]: AMSVCS001PP: Disaster Recovery SAN Adapter Storage Replication Adapter 1.4: (2) Test-Failover-start Event: Disaster Recovery SAN Adapter executed Test-Failover-start operation with errors from OS major version = 5 ,minor version = 2 ,package = Service Pack 2 and build = 3790

Fri Apr 23 05:04:42 EST [XYZNAP005: iscsi.notice:notice]: ISCSI: New session from initiator iqn.2000-04.com.qlogic:qle4062c.lfc0852h55321.2 at IP addr 10.X.X.X

Fri Apr 23 05:04:48 EST [XYZNAP005: wafl.vol.full:notice]: file system on volume testfailoverClone_nss_v10745371_S_XYZESX013_14_15_16PP is full

Fri Apr 23 05:04:48 EST [XYZNAP005: scsitarget.write.failureNoSpace:error]: Write to LUN /vol/testfailoverClone_nss_v10745371_S_XYZESX013_14_15_16PP/lun12 failed due to lack of space.

NetAPP look at the error and told me that it is getting timed out during retry process and not really looks like space issue because “aggr” on which this lun was mounted had enough space.

I decided to test it myself and created two lun of size 100GB and 90GB . These lun’s had few VM with around 75% of free space.  I ran SRM in test and DR mode and both worked great. This gives me enough reason to believe that this is caused by space and not due to some bug.

I called NetAPP and shown him what actually I am doing. At this point they ran following command

Filer > df -r testfailoverClone_nss_v10745371_S_XYZESX001PP  (This is actually the cloned  volume which SRM were trying to mount). It found that that fractional space is filled and because of which cloned lun were not able to mount

clip_image001

During these test I understand that if  protected lun is totally filled and then you try to run test SRM (FlexClone mechanism is used) against it . You have to make sure that at recovery site the volume should double the size as it try to mount the cloned lun on the same volume

SRM with NFS: Important consideration while creating NFS LUN

While implementing SRM with NFS , we have to follow certain guidelines for creating NFS volume on filer or else while configuring array we wont be able to see those volume.
a. Exports must be in the /etc/exports file. Manual exports made with the CLI without the –p option
will not be discovered by SRM.

b. Exports must use values in the RW (read/write) security field in the /etc/exports file. Exports using
the default setting of rw to all hosts will not be discovered by SRM.

Example of a /etc/export line that would be discovered by SRM:

/vol/srm5 -rw=192.168.2.0/24,root=192.168.2.0/24

Example of a /etc/export line that would not be discovered by SRM:
/vol/srm5 -rw,anon=0
So on filer if you run following
Filer101> exportfs -q /vol/nfstest
/vol/nfstest -sec=sys,(ruleid=564),rw=192.168.2.0,root=192.168.2.0
Where as 192.168.2.0 if the vmkernal IP created on Protected ESX host.

This is from TR3671 (page 14) from NetAPP

SRM: Replicated devices could not be matched with data stores

When I was configuring SRM  SRA then  I was getting following error when array where rescanned

clip_image001

Someone suggested that

For ESX host 3.5 you would be enabling “LVM.EnableReSignature ” under  advance option for esx host. ESX4 performs this function automatically whereas in version 3.5 you had to enable it in the advanced settings section of the host configuration.

This did not fixed my problem yet and then I followed KB 1016862 and its resolution says

“Virtual machine components can be on an array which is replicated to another array.

However, VMware does not support virtual machine components on multiple arrays which replicate to a single array as the VMX configurations do not match in terms of UUID of the datastores.

Ensure that virtual machine components are not on multiple arrays that replicate to a single array.”

Well in our case it was different.  Then I start looking at config again. I found that VM’s were mounted using different export IP then what I have added under array manger.

What I meant is NFS IP were different for ESX host and Array manager. I changed the IP and now I was able to see the datastore.

clip_image002

Thursday, April 29, 2010

Existing IDE disk is not supported at the moment- Migrating VM from 2.X to 4.X



We had ESX V2.X running in our environment with couple of VM’s. It has highly impossible to have this host into our VC. So we disconnected the host and then copy the VM’s over to 4.0 host. Now we cannot add it into inventories since it has older hardware. We tried creating new VM and then attaching the vmdk which fails .

We tried enterprise convertor and then standalone to export the VMDK but it did not work. Finally we decided to clone the disk

vmkfstools -i source.vmdk dest.vmdk

We then created a new VM and then tried to attach the VMDK and we started getting error message "Adding existing IDE disk is not supported at the moment. IDE disks cannot be hot added or there are no free available IDE Controller slots.".

http://vmfaq.com/kb_upload/Image/vSphere_adding_existing_IDE_disk_error.png

http://vmfaq.com/kb_upload/Image/vmware_disk_of_unknown_size.png

Solution

1. Unregister this VM from Virtual Center

2. Remove all scsi0:* lines from the VMs config file using a text editor.

3. Open the disk desciptor file in a text editor and replace "legacyESX" with "lsilogic". The descriptor file is the few hundred byte vmdk file.

4. Repeat this step for all the virtual disk files for this VM .

http://vmfaq.com/kb_upload/Image/vmware_legacyESX.png


5. Register the VM again

6. Add disk(s) to the VM

If you do not edit the descriptor file and add the disk to a vSphere VM (with virtual hw v7) it will come up as a working IDE disk.

Source


Saturday, April 24, 2010

ILO Error 9005/9009/9008 with HP SIM agent on ESX4.0 U1

One of the reader of blog drop me an email with following message

I've installed the HP SIM (v8.3.1) onto my ESX v4.0 update 1 host and configured it using a config file. The config file has the appropriate information to send to my SIM server. Every now and then my host will generate 3 alerts

1: Event Name: (SNMP) Remote Insight/ Integrated LightsOut Self Test Error (9005)

URL: https://xxx:2381/

Event originator: xxx

Event Severity: Critical

Event received: 22-Apr-2010, 18:11:52

Event description: Remote Insight/ Integrated Lights-Out Self Test Error. The Remote Insight/ Integrated Lights-Out firmware has detected a Remote Insight self test error.

2: Event Name: (SNMP) Remote Insight mouse cable disconnected (9009)

URL: https://xxx:2381/

Event originator: xxx

Event Severity: Major

Event received: 22-Apr-2010, 18:11:53

Event description: Mouse Cable Disconnected. The Remote Insight mouse cable has been disconnected.

3: Event Name: (SNMP) Remote Insight keyboard cable disconnected (9008)

URL: https://xxx:2381/

Event originator: xxx

Event Severity: Major

Event received: 22-Apr-2010, 18:11:53

Event description: Keyboard Cable Disconnected. The Remote Insight keyboard cable has been disconnected.

I've tried searching HP and VMware, but as you have said, information is very sparse. The closest I've come up with is a mention about the HPSIM certificate in the iLO2 (http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1271902147498+28353475&threadId=1375014 )

I suggested him to

1. Try reinstalling SIM agent again with host reboot.

2. I would also engage HP since SIM is free product from them.

3. Check ILO (Since I am not sure which DL version you are using) if there are any option or Insight web GUI where you can disable such events

But this the actual solution available on following blog found by Dave. Thanks Dave for finding solution

To disable the alerts add : exclude cmasm2d to the file: - /opt/hp/hp-snmp-agents/cma.conf and restarted the agents: /etc/init.d/hp-snmp-agents restart

Monday, April 5, 2010

How to setup Link mode for Virtual Center 4.0

Today I have done VC upgrade and here is blog for the same. During the VC upgrade it did not ask me for setting VC in link mode which becomes very much necessary when you have multiple VC across environment.

Few things you should know

1. Link mode feature is not available to you if you are holding license for “foundation or essential edition ”. This required stander edition of VC license.

2. It uses ADAM (Active Directory Application Mode ) database to replicate it’s configuration between multiple VC.

3. ADAM stores :

* Connection information (IP addresses and ports)

* Certificates and thumbprints

* License Information

* User Roles

4. Permission can be configured on VC basis.

5. Single VC can manage 300 ESX host with 3000 VM but linked mode can manage 1000 ESX host with 10000 VM’s

More features are available here.

You have to re-run the setup for VC server installation. It will walk you through wizard

It will then detect the VC instance and prompt you for VC link mode setting

By default the link mode configuration will be selected

It will then prompt you to provide server name for which you are connecting

Make sure you have selection as below.

And that is about it.

Why Virtual machine swapfile location is grayed out?

I was trying to figure out why Virtual machine swapfile location is grayed out as shown in the pics below

What I understood is that it inherits properties from cluster. If the cluster has been set to store the swap file on VM then the pic above will be like that.

So you need to choose second option if you have to store swap as per your convenient . Such configuration is done incase where is the requirement to have swap at separate location

Upgrading Virtual Center from 2.5 to 4.0

One of the best upgrade I have done from Virtual Center 2.5 to Virtual Center 4.0 U1. I have done many upgrades where you have to through many pain of redoing the work post upgrade but with 4.0 U1 upgrade it was smoothest and cleanest one. Just pop-in the CD and next –next period.

Here is how I performed it

1. Make sure you have exported all the relevant information from VC so that if any issue comes during the upgrade , you should be able to handle it. Also make sure you have done latest backup of SQL server used by VC. This is how the VC ISO menu looks like and choose


2. It will then detect the VC already running


3. Agree to license agreement and then proceed

4. Fill the correct information

5. t may give message like this and let SQL admin knows about it


6. Type user name with which it can authenticate. Remember this can be your user name since it is just for the authentication purpose

7. Choose the option below as shown and check the box which is mention or else it will not allow to move forwards

8. Here is the option which allow to run the account as a service . Best practice is to run with system account

9. Let this be at default

10. Relax and sit tight till it is done

11. This will do numerous thing during that process

12. Finally it will show like this

Congratulation you have completed the VC upgrade successfully . With my 100+ host environment , I only came across on host which was in disconnect state . It also understand about VC2.5 licensing .