As a Consultant

VMWare ESXi - Hardware monitoring - vSphere host disconnected

Recently a client of mine experienced a sudden disconnect of one of their hosts from vSphere Client. A quick check quickly confirmed that the VMs were still running happily along, however the host was "tagged" as disconnected. Forcing a reconnect did not work, so it was time to get my hands dirty and start digging.

I connected to the host in question with SSH and changed directory to the location of the syslog files (you do store your logs on a SAN disk right and not on a local drive?). After some quick searches I noticed this entry in the hostd.log file:

2012-12-24T19:01:01.309Z [FFFC2B90 info 'VmkVprobSource'] VmkVprobSource::Post event: (vim.event.EventEx) {

--> dynamicType = <unset>,

--> key = 1348822881,

--> chainId = 761820257,

--> createdTime = "1970-01-01T00:00:00Z",

--> userName = "",

--> datacenter = (vim.event.DatacenterEventArgument) null,

--> computeResource = (vim.event.ComputeResourceEventArgument) null,

--> host = (vim.event.HostEventArgument) {

--> dynamicType = <unset>,

--> name = "nameofesxhost.local",

--> host = 'vim.HostSystem:ha-host',

--> },

--> vm = (vim.event.VmEventArgument) null,

--> ds = (vim.event.DatastoreEventArgument) null,

--> net = (vim.event.NetworkEventArgument) null,

--> dvs = (vim.event.DvsEventArgument) null,

--> fullFormattedMessage = <unset>,

--> changeTag = <unset>,

--> eventTypeId = "esx.problem.visorfs.inodetable.full",

--> severity = <unset>,

--> message = <unset>,

--> arguments = (vmodl.KeyAnyValue) [

--> (vmodl.KeyAnyValue) {

--> dynamicType = <unset>,

--> key = "1",

--> value = "tmp:/auto-backup.17245931/etc/ssh/ssh_host_rsa_key",

--> },

--> (vmodl.KeyAnyValue) {

--> dynamicType = <unset>,

--> key = "2",

--> value = "tar",

--> }

--> ],

--> objectId = "ha-eventmgr",

--> objectType = "vim.HostSystem",

--> objectName = <unset>,

--> fault = (vmodl.MethodFault) null,

--> }

2012-12-24T19:01:01.326Z [FFFC2B90 info 'ha-eventmgr'] Event 46 : The root filesystem's file table is full. As a result, the file tmp:/auto-backup.17245931/etc/ssh/ssh_host_rsa_key could not be created by the application 'tar'.

This made me suspect that a volume was running out of space. Some quick commands later (vdf -h and stat -f /) showed that sufficient space and inodes was available. Google to the rescue as I search for "esx.problem.visorfs.inodetable.full" the following article from VMWare popped up:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2037798

Apparently the hardware monitoring service (sfcbd) is flooding the /var/run/sfcb directory with files (>5000) and creating the problem. Now you could just do a /etc/init.d/sfcbd-watchdog stop, delete the files in /var/run/sfcb and then run /etc/init.d/sfcbd-watchdog start, however I logged in on the console of the host in question and did a full restart management agent. That did the trick.

As a Consultant

Search This Blog

VMWare ESXi - Hardware monitoring - vSphere host disconnected

Labels

Comments

Post a Comment

Popular posts from this blog

Serialize data with PowerShell

Toying with audio in powershell

Creating Menus in Powershell