ESX VMWare “File not Found” starting machine

Experience is something you never have when you need it. Restarting our websites’ SQL server resulted in down time when the server would not start up. It reached 95% of the start up bar on the ESX task bar in VM Infrastructure Client but terminated the start up with a dialog box “File not found” and an OK button. I had not used anything in VMware Infrastructure client other than starting stopping machines and taking a snapshot, anything else was the VM guy’s role. However we had a server down and customers getting a bad experience so some clicking around and a few Google searches got it sorted out.

Seeking a solution

Using Goole I established there was a more comprehensive log files in the same directory as the VM machine. Reading the contents of these logs, I found that the failure was during loading one of the virtual disks, as it could not find the disk file (virtual disks are just files in the ESX server file system).

The log file showed that the file that could not be found had a different path to the others of that virtual machine, as the others were loading successfully.This was the clue as to what was wrong.

 
Jun 10 10:11:10.988: vmx| DISKLIB-VMFS : "/vmfs/volumes/4908a5f1-67541468-21fa-0016357ea69b/websqlserver01VM/websqlserver01VM-000009-delta.vmdk" : open successful (23) size = 32225215488, hd = 0. Type 8
Jun 10 10:11:10.990: vmx| DISKLIB-VMFS : "/vmfs/volumes/4908a5f1-67541468-21fa-0016357ea69b/websqlserver01VM/websqlserver01VM-000006-delta.vmdk" : open successful (23) size = 32225215488, hd = 0. Type 8
Jun 10 10:11:10.992: vmx| DISKLIB-VMFS : "/vmfs/volumes/4908a5f1-67541468-21fa-0016357ea69b/websqlserver01VM/websqlserver01VM-000003-delta.vmdk" : open successful (23) size = 32225215488, hd = 0. Type 8
Jun 10 10:11:10.994: vmx| DISKLIB-LINK  : "/vmfs/volumes/4901e93d-93a8aeed-12b7-0016357ea69b/websqlserver01VM/websqlserver01VM.vmdk" : failed to open (The system cannot find the file specified).  
J

The storage section of the infrastructure client showed the location for each data store, it showed that all the files were in one data store, however the file with issues was in another data store. Opening up that data store in the infrastructure client the folder for the file causing the error had be renamed to websqlserver01VM_old. This was different to what was listed in the log file.

Solution

Renaming the file back to original directory name then allowed the machine to boot up. Sometime while the machine had been running this file’s folder must have been renamed, only coming to light on reboot when it was needed again for start-up.

At least by having to jump in at the deep end I have a much better understanding of how the ESX server runs, and I got to know what is in the Infrastructure Client well by the end of the issue having scoured it for clues to my issues.

Quick Clues for places to look

Find out where the machine is located, this is listed under the resources in the summary of the virtual machine when the machine is selected in the tree view on in VMware Infrastructure Client.

Virtual Machine's data store locations

You can double click on the data store to open it up and see the files in that data store. Navigate to the machine sub folder of interest, there should be .log files in there, get the latest one, right click download to put it on your local machine for examination in note pad.datastoreContents

While you are here, for interest, have a look at the .vmx, .vmxf, .vmsd files, check the paths in there too for clues.

You can find and confirm the data store paths on the actual ESX server by clicking the ESX server of interest in the Hosts and Clusters tree on the left hand side, then select configuration and click on the data store of interest. Hover over the Location and the path will show as a mouse over tool tip.
Datastorelocation2 

On our server the virtual hard disks are broken up into 2GB files and if you have snapshots this may result in a lot more files than you see on this example. Each file has a dash and a number showing which disk and number it is part of.

I hope this helps someone else out who may be loosing orders by the hour, you really need your ESX administrator to help you if you can.

Add comment

Loading