Friday, March 13, 2009

Netapp Filer error "retrying all the portals again, sincethe portal list got exhausted"

One of the ESX host which is hooked up to software iSCI was throwing following iSCSI related messages into VMKernal :

Mar 11 05:10:17 esxhost vmkernel: 4:12:19:31.755 cpu0:1076)iSCSI: session 0x35203f90 connect timed out at 38997177 Mar 11 05:10:17 esxhost vmkernel: 4:12:19:31.755 cpu0:1076)<5>iSCSI: session 0x35203f90 iSCSI: session 0x35203f90 retrying all the portals again, sincethe portal list got exhausted Mar 11 05:10:17 esxhost vmkernel: 4:12:19:31.755 pu0:1076)iSCSI: session 0x35203f90 to iqn.1992-08.com.netapp:sn.118062462 waiting 60 seconds before next login attempt Mar 11 05:11:17 esxhost vmkernel: 4:12:20:31.756 cpu0:1076)iSCSI: bus 0 target 0 trying to establish session 0x35203f90 to portal 0, address 10.100.200.137 port 3260 group 2000 Mar 11 05:11:32 esxhost vmkernel: 4:12:20:46.756 cpu1:1074)iSCSI: login phase for session 0x35203f90 (rx 1076, tx 1075) timed out at 39004677, timeout was set for 9004677 Mar 11 05:11:32 esxhost vmkernel: 4:12:20:46.756 cpu0:1076)iSCSI: session 0x35203f90 connect timed out at 39004677 Mar 11 05:11:32 esxhost vmkernel: 4:12:20:46.756 cpu0:1076)<5>iSCSI: session 0x35203f90 iSCSI: session 0x35203f90 retrying all the portals again, sincethe portal list got exhausted

As per NetApp, this is somewhat expected behavior from the array itself; as the filer is designed to provide default access through iSCSI to any hosts configured to access any portal groups;ESX
host's iSCSI initiator logs into the filer's target IP address.
Using Dynamic discovery, the ESX host issues a “SendTargets” command to the filer. Filer responds with “ALL available” target portals which by default includes any network interface with iSCSI enabled (this is the default behavior). ESX host sees this list of portals and attempts to establish connection on them . Does it randomly select from the available target portals, or does it have some logic built in about how to determine which portal to try first? ESX host attempts to log into a target portal to which it has no network route. This causes the errors to be generated in the “vmkernel” logs. So, in order to fix this problem from the filer perspective, we added an accesslist for the iSCSI initiator node name of the ESX host. Basically, by default an iSCSI initiator has access to all available portals, as I mentioned above. In order to limit the target portals that are reported back to the ESX host,
we create an access list entry that states which iSCSI target portals are available to a given iSCSI initiator node name.

2 comments:

goof ! said...

how did you create the ACL on the Netapp? I am having a similar issue where it says a session has been made from the esxi host but I cannot see the LUNs.

goof ! said...

how did you create the acl? I am having a similar issue with our netapp filer where the session is made from the esxi host but I cannot see the LUNs.