Wednesday, September 30, 2009

Few Question about ballooning and config max.

One of the guy asked me following question . Just wanted to share with you all

1) what is ballooning in VMWARE

Ballooning

When the ESX host’s machine memory is scarce or when a VM hits a Limit, The kernel needs to reclaim memory and prefers ballooning over swapping. The balloon driver is installed inside the guest OS as part of the VMware Tools installation and is also known as the vmmemctl driver.

When the ESX kernel wants to reclaim memory, it instructs the balloon driver to inflate. The balloon driver then requests memory from the guest OS. When there is enough memory available, the guest OS will return memory from its “free” list. When there isn’t enough memory, the guest OS will have to use its own memory management techniques to decide which particular pages to reclaim and if necessary page them out to its swap- or page-file.

In the background, the ESX kernel frees up the machine memory page that corresponds to the physical machine memory page allocated to the balloon driver. When there is enough memory reclaimed, the balloon driver will deflate after some time returning physical memory pages to the guest OS again.

This process will also decrease the Host Memory Usage parameter

Ballooning is only effective it the guest has available space in its swap- or page-file, because used memory pages need to be swapped out in order to allocated the page to the balloon driver. Ballooning can lead to high guest memory swapping. This is guest OS swapping inside the VM and is not to be confused with ESX host swapping, which I will discuss later on.

To view balloon activity we use the esxtop uitility again from the COS (see below). From the COS, issue the command “esxtop” en then press “m” to display the memory statistics page. Now press “f” and then “i” to show the vmmemctl (ballooning) columns.

clip_image002

On the top we see the “MEMCTL” counter which shows us the overall ballooning activity. The “curr” and “target” values are the accumulated values of the “MCTLSZ” and “MCTLTGT” as described below. We have to look for the “MCTL” columns to view ballooning activity on a per VM basis:

● “MCTL?”: indicates if the balloon driver is active “Y” or not “N”

● “MCTLSZ”: the amount (in MB) of guest physical memory that is actually reclaimed by the balloon driver

● “MCTLTGT”: the amount (in MB) of guest physical memory that is going to be reclaimed (targetted memory). If this counter is greater than “MCTLSZ”, the balloon driver inflates causing more memory to be reclaimed. If “MCTLTGT” is less than “MCTLSZ”, then the balloon will deflate. This deflating process runs slowly unless the guest requests memory.

● “MCTLMAX”: the maximum amount of guest physical memory that the balloon driver can reclaim. Default is 65% of assigned memory.

You can limit the maximum balloon size by specifying the “sched.mem.maxmemctl” parameter in the .vmx file of the VM. This value must be in MB.

2) what is the limitation of Physical Memory of VMWARE / ESX Server

3) what is the limitation of Physical CPU of VMWARE / ESX server

Hardware Processors

Table 2-1 displays the number of physical processors supported per ESX Server host.

Table 2-1. Supported Processor Configurations

       

 

Maximum Sockets

Maximum Cores

Maximum Threads

Single core

With hyperthreading

16

16

32

Without hyperthreading

16

16

16

Dual core

With hyperthreading

8

16

32

Without hyperthreading

16

32

32

Virtual Processors

A total of 128 virtual processors in all virtual machines per ESX Server host

Memory

64GB of RAM per ESX Server system

4) What is the Limitation of number of VMs in VMware per ESX server: 128

Error during the configuration of the host: Failed to update disk partition information

I was getting following error when I was trying to create VMFS datastore on ESX host.

Some background:

1. Lun was created on FAS3050 filer.

2. Lun were mapped using S/W ISCSI.

3. Lun were visible under storage adapter.

When I  tried creating datastore I was getting error

clip_image002

I found following document on VMware site which tells how to fix the error message. In nut shell I did this

1. Run fdisk -l. This will give you a list of all of your current partitions. This is important because they are numbered. If you are using SCSI you should see that all partitions start with /dev/sda# where # is a number from 1 to whatever. Remember this list of number as you are going to be adding at least one more and will have to refer to the new partition by it's number.

2. Run fdisk /dev/sda. This will allow you to create a partition on the the first drive. If you have more than one SCSI drive (usually the case with more than one RAID container) then you will have to type the letter value for the device you wish to create the partition on (sdb, sdc, and so on).

3. You are now in the fdisk program. If you get confused type "m" for menu. This will list all of your options. There are a lot of them. You will be ignoring most of them.

4. Type "n". This will create a new partition. It will ask you for the starting cylinder. Unless you have a very good reason hit "enter" for default. The program will now offer you a second option that says ending cylinder. If you press enter you will select the rest of the space. In most cases this is what you want.

5. Once you have selected start and end cylinder you should get a success message. Now you must set the partition type or it's ID. This is option "t" on the menu.

6. Type "t". It will ask you for partition number. This is where that first fdisk is useful. You need to know what the new partition number is. It will be one more than the last number on fdisk. Type this number in.

7. You will now be prompted for the hex code for the partition type. You can also type "L" for a list of codes. The code you want is "fb". So type "fb" in the space. This will return that the partition has been changed to fb (unknown). That is what you want.

8. Now that you have configured everything you want to save it. To do so choose the "w" option to write the table to disk and exit.

9. Because the drive is being used by the console OS you will probably get an error that says "WARNING: Re-reading the partition table failed with error 16: device or resource busy." This is normal. You will need to reboot the ESX host

10. Once you have rebooted you can now format the partition VMFS. DO NOT do this from the GUI. You must once again log into the console or remote in through putty.

11. Once you have su'd to root you must type in

"vmkfstool -C vmfs3 /vmfs/device/disks/vmhba0:0:0:#"

Were # is the number of the new partition. You shoulder now get a "successfully created new volume" message. If you get an error you probably chose the wrong partition. Do an fdisk - l and choose the number with the "unknown" partition type. Note: IF you have more than one SCSI disk or more than one container the first 0 may need to be a 1 as well.

Problem which I faced:

When I start following last step "vmkfstool -C vmfs3 /vmfs/device/disks/vmhba0:0:0:#"

I was getting error

“Error: Invalid handle”

I was wondering what I was doing wrong. One discussion on VMTM suggested to do the following :

Run the command mkfs.ext3, so I ran the command and got following outout

clip_image004

It basically sits at some inode and never move forward. Host was in hung state. So I decided to check the host by using ILO and found PSOD.

clip_image006

I rebooted and tried the above steps again but still same PSOD.

Finally I logged into filer and found that the volume presented by SAN admin was replicated volume . This volume was snap mirror copy of protected site used for SRM.

Crap ☺ ……

Not sure why he have mapped SRM replicated volume to recovery site.

Client Installation Wizard Error 00004E25 Could not generate a unique name for this computer

When our service desk was trying to push image they were getting error message

Client Installation Wizard Error 00004E25 Could not generate a unique name for this computer

The server AXBMMGT02 could not generate a unique name for this computer

Contact your network administrator for further assistance Administrators should check the remote installation services settings to ensure the automatic computer name option is set correctly. This error may also indicate that the computer policy used to create a unique computer name has reached the maximum number of combinations.

  1. We had rebooted the Ghost server.
  2. Tried Domain Admin ID as suggested in following post
  3. Finally I followed this port and it got resolved.

When I opened the OU I found tons of unique name like this .

Finally I deleted and ask the support engineer to start. That worked.

Monday, September 21, 2009

Storage VMotion from NFS lun to ISCSI lun ?

I had been asked to explore if we can do storage vmotion between NFS share and ISCSI lun.

Guess what, we can do vice versa. So storage vmotion is supported between NFS and ISCSI lun

Implementing VCB solution and Performing scripted backup

Guide to VMware standalone convertor

Saturday, September 19, 2009

VMotion keep failing at 78% with Error bad0007

VMotion was failing at 78% with error message "A general system error occurred: Failed waiting for data. Error bad0007."

This was brand new cluster and everything is standard as per other cluster like ESX version ,host physical config etc…

I tried following:

V-motion was failing for all VM's within a cluster at 78%

  • Tried to vmotion other vms within this cluster
  • All of the Vm's failed at 78%
  • Implemented KB 1003577 by installing Update manager
  • Creating a baseline and updated 2 of the esx hosts involved in this and now both of the ESX Servers are now compliant with the latest versions of patches.
  • Created a new cluster called "test Cluster" added two ESX host to this cluster
  • Cold migrated over a VM and tried a Vmotion. It was still failing at same stage (78%)
  • Checked/changed the Following variables according to KB 1003577

Migrate.PageInTimeoutResetOnProgress: Set the value to 1.

Migrate.PageInProgress: Set the value to 30, if you get an error after configuring the

Migrate.PageInTimeoutResetOnProgress variable.

Toggle the Migrate.enabled setting from 1 to 0, click OK, then flip the value back to 1, click OK.

These variables are both identical to what was recommended in the KB

Tried a V-motion this failed at 78%

  • Tried Vmotion Servers on different Datastores this failed too.
  • Swopped the Vmotion network to eliminate a problem with the type of card been used and this did not make a difference. It still fail at 78%
  • Reduced the amount of ram that was been used in the VM this was set to 4GB to 2GB this still keep failing
  • Installed Vmware Tools to this VM.

I created brand new test VM and it works fine. So it looks like there are something fishy about the VM itself. So we started to look at the log and found following

Checked the contents of the Vmware log for the VM been v-motioned an error similar to the following appeared

The vmware.log for the Virtual Machine (VM) being migrated has entries similar to:

---------------------------------------------------------------------------------------------------------------------------------------

May 26 12:06:08.162: vmx Migrate_SetFailure: Now in new log file.

May 26 12:06:08.167: vmx Migrate_SetFailure: Failed to write checkpoint data (offset 33558528, size 16384): Limit exceeded May 26 12:06:08.186: vmx Msg_Post: Error May 26 12:06:08.186: vmx [vob.vmotion.write.outofbounds] VMotion [c0a8644e:1243364928717250] failed due to out of bounds write: offset 33558528 or size 16384 is greater than expected May 26 12:06:08.186: vmx [msg.checkpoint.migration.openfail] Failed to write checkpoint data (offset 33558528, size 16384): Limit exceeded.

May 26 12:06:08.187: vmx ----------------------------------------

May 26 12:06:08.190: vmx MigrateWrite: failed: Limit exceeded

--------------------------------------------------------------------------------------------------------------------------------------

When the amount of Video ram was commented out the following error appeared with the Log file for the VM

---------------------------------------------------------------------------------------------------------------------------------------

Aug 18 08:35:38.194: vmx MKS REMOTE Loading VNC Configuration from VM config file

Aug 18 08:35:38.196: vmx DVGA: Full screen VGA will not be available.

Aug 18 08:35:38.196: vmx Msg_Post: Warning

Aug 18 08:35:38.196: vmx [msg.svgaUI.badLimits] The size of video RAM is currently limited to 4194304 bytes, which is insufficient

for the configured maximum resolution of 3840x1200 at 16 bits per pixel.

Aug 18 08:35:38.196: vmx

Aug 18 08:35:38.196: vmx The maximum resolution is therefore being limited to 1180x885 at 16 bits per pixel.

Aug 18 08:35:38.196: vmx

Aug 18 08:35:38.196: vmx ----------------------------------------

Aug 18 08:35:38.199: vmx SVGA: Truncated max res to VRAM size: 4194304 bytes VRAM, 1180x885

--------------------------------------------------------------------------------------------------------------------------------------

I then checked the .vmx for the problematic server and found that

svga.vramSize= 4194304.

Where as newly created VM has following setting for .vmx file

svga.vramSize=31457280

After investigation with the application owner we found that these are desktop modeling workstation and uses software from HP . HP has recommended this setting for their software to be functional.

One of the KB article explain that “Video Ram (VRAM) assigned to the virtual machine is 30MB or less”

Checked with HP support and they asked me to contact VMWare . When we contacted VMware they told that it is the limitation with ESX3.5 and had been taken care with ESX4.0 U1.

........................................................................Crap …................................................................................

How to present 6TB of LUN to Windows Server

I attached 6TB of ISCSI lun to my VCB backup server and after attaching this is  what I found under diskmgmt.msc

I then checked if the VOL has been split across by SAN admin but that was not the case. This was the part of same VOL so why does WIN2K3 show it as two VOL of 2 and 4TB.

After that I realize with MBR I can have only 2TB of lun and if I need to get all 6TB of LUN I need to convert the disk into GPT(GUID Partition Table).

How to do it  ? Just right click on the disk and you will get option “convert to GPT disk”

What I have read and understood is

MBR is the standard partitioning scheme that's been used on hard disks since the PC first came out. It supports 4 primary partitions per hard drive, and a maximum partition size of 2TB.

GPT disks are new, and are readable only by Windows Server 2003 SP1, Windows Vista (all versions), and Windows XP x64 Edition. The GPT disk itself can support a volume up to 2^64 blocks in length. (For 512-byte blocks, this is 9.44 ZB - zettabytes. 1 ZB is 1 billion terabytes). It can also support theoretically unlimited partitions.

Windows restricts these limits further to 256 TB for a single partition (NTFS limit), and 128 partitions.

Only Itanium systems running Windows Server 2003 and Windows Vista systems with an EFI BIOS can boot from a GPT disk. The other operating systems mentioned earlier can use GPT disks as data disks but not boot disks.

To find out more about GPT read this article from MS.

Tuesday, September 15, 2009

How to reset password for root on Ubantu

Yes I have started learning Ubantu. I installed Ubantu8.1 desktop and during installation it ask me to create user name but never prompt me for root password. So once I logged in I was not sure what would be the root password.

When I did su ,it asked me for password.
So now I was in fix how to change password for root.

To change password /reset password
:~$ sudo sh

it will ask for current logged in user password

Then you will be in # prompt.

Type here passwd and it will change password for root