VMWare ESXi - Hardware monitoring - vSphere host disconnected
Recently a client of mine experienced a sudden disconnect of one of their hosts from vSphere Client. A quick check quickly confirmed that the VMs were still running happily along, however the host was "tagged" as disconnected. Forcing a reconnect did not work, so it was time to get my hands dirty and start digging.
I connected to the host in question with SSH and changed directory to the location of the syslog files (you do store your logs on a SAN disk right and not on a local drive?). After some quick searches I noticed this entry in the hostd.log file:
2012-12-24T19:01:01.309Z [FFFC2B90 info
'VmkVprobSource'] VmkVprobSource::Post event: (vim.event.EventEx) {
--> dynamicType =
<unset>,
--> key =
1348822881,
--> chainId =
761820257,
--> createdTime =
"1970-01-01T00:00:00Z",
--> userName =
"",
--> datacenter =
(vim.event.DatacenterEventArgument) null,
--> computeResource
= (vim.event.ComputeResourceEventArgument) null,
--> host =
(vim.event.HostEventArgument) {
-->
dynamicType = <unset>,
-->
name = "nameofesxhost.local",
-->
host = 'vim.HostSystem:ha-host',
-->
},
-->
vm = (vim.event.VmEventArgument) null,
-->
ds = (vim.event.DatastoreEventArgument) null,
-->
net = (vim.event.NetworkEventArgument) null,
-->
dvs = (vim.event.DvsEventArgument) null,
-->
fullFormattedMessage = <unset>,
--> changeTag =
<unset>,
--> eventTypeId =
"esx.problem.visorfs.inodetable.full",
--> severity =
<unset>,
--> message =
<unset>,
--> arguments =
(vmodl.KeyAnyValue) [
-->
(vmodl.KeyAnyValue) {
-->
dynamicType = <unset>,
-->
key = "1",
-->
value = "tmp:/auto-backup.17245931/etc/ssh/ssh_host_rsa_key",
-->
},
-->
(vmodl.KeyAnyValue) {
-->
dynamicType = <unset>,
-->
key = "2",
-->
value = "tar",
-->
}
--> ],
--> objectId =
"ha-eventmgr",
--> objectType =
"vim.HostSystem",
--> objectName =
<unset>,
--> fault =
(vmodl.MethodFault) null,
-->
}
2012-12-24T19:01:01.326Z [FFFC2B90 info
'ha-eventmgr'] Event 46 : The root filesystem's file table is full. As a
result, the file tmp:/auto-backup.17245931/etc/ssh/ssh_host_rsa_key could not
be created by the application 'tar'.
This made me suspect that a volume was running out of space. Some quick commands later (vdf -h and stat -f /) showed that sufficient space and inodes was available. Google to the rescue as I search for "esx.problem.visorfs.inodetable.full" the following article from VMWare popped up:
Apparently the hardware monitoring service (sfcbd) is flooding the /var/run/sfcb directory with files (>5000) and creating the problem. Now you could just do a /etc/init.d/sfcbd-watchdog stop, delete the files in /var/run/sfcb and then run /etc/init.d/sfcbd-watchdog start, however I logged in on the console of the host in question and did a full restart management agent. That did the trick.
Comments
Post a Comment