Recovering a Deleted Volume on Sun/Oracle 2500 or 6000 Storage

Introduction

Often, no amount of fault tolerance can protect from human error. On the Sun/Oracle 2000 and 6000 series storage arrays, it is possible to recover an accidentally deleted volume.

In most cases, a storage volume consists of data blocks and metadata describing the volume. When the volume is deleted, it is usually the metadata that gets erased, while the data blocks are not updated until they get overwritten or initilized at a later point, such as when a new volume is defined.

Starting with Common Array Manager (CAM) 6.5 for the 2000 and 6000 series storage arrays, a utility is provided to instead construct volume metadata without initializing the data blocks.

Background

In the following scenario, VMware ESXi 4.1 was the host, or client, accessing the only volume in the only vdisk in a RAID5 storage pool. The volume was formatted with VMFS and used as a virtual machine datastore containing both VMX configuration files and vmdk disk and snapshot files.

Due to mislabelling, the volume was deleted accidentally while VMware and several virtual machines were running. Surprisingly, VMware did not crash or error out. Most of the VMs hung, while several continued to respond to ping and to provide services where applicable data was cached in memory. In the vSphere client under the VM host storage configuration, the datastore object was grayed out in the storage configuration.

NOTE: It is extremely important to keep the host running at this point so that critical details about the missing LUN might be obtained from the it (see ‘capacity’ below).

Recovery

CAM 6.5 was installed on a Solaris 10 host. With a default installation, the SUNWsefms package is installed under /opt.
The command to recover the volume was:

/opt/SUNWsefms/bin/service \
-d Seeds2530 \
-c recover \
label=Primus-RAID5_Vol1_VMs \
manager=a \
vdiskID=0 \
drives=t85d01,t85d02,t85d03,t85d04 \
raidLevel=5 \
capacity=898388459520 \
segmentSize=524288 \
offset=0 \
readAhead=0

  • -d device: The name of the registered array on which to operate
  • -c command: The service operation to perform
  • label: The name of the recovered volume. It may be different from the original name
  • manager: The controller to “own” the recovered volume at the time of recovery. It may be different from the original owner
  • vdiskID: If the vdisk is still present, specify its name. Otherwise, use ‘0’ and include the “drives” option below
  • drives: A comma separated list of disks that made up the deleted vdisk. The order must be the same as in the original vdisk for the volume to work (more on that below). Note that this option is not needed if the vdisk still exists, in which case it should be specified in the vdiskID option.
  • raidLevel: The raid level of the original volume
  • capacity: The capacity in bytes of the original volume. This can be otained from a storage support data printout from before the deletion or, hopefully, from the client. In this case, the ESXi CLI command esxcfg-info –storage revealed the LUN size (not to be confused with the partition size).
  • segmentSize: The segment size of the original volume (e.g. 512 KB) in bytes. This should match that of the storage profile that applied to the deleted volume.
  • offset: ‘0’ for the first volume in a vdisk. If the deleted volume was not the first volume, the original offset must be specified, and it can only be obtained from a storage support data printout generated prior to the deletion.
  • readAhead: Specifies whether the Read Ahead feature is enabled on the recovered volume. The value may be different from the deleted volume.

A note on drive order: The drive order is determined at the time a vdisk is created, usually while creating the first volume. If disks are specified manually, the specified order is used (usually sequential). However, if the administrator allows the storage system to select a number of available disks, the order can be random.

The drive order necessary for vdisk/volume recovery can be obtained from a storage support data printout generated prior to the deletion. Otherwise it can be guessed. Fortunately, getting it wrong will not cause any data loss; only errors while attempting to mount or access the recovered volume. If such errors are encountered, the volume should be deleted and the service command run again with a different drive order. The number of possible permutations is n! (n factorial), where n is the number of drives in the original vdisk.

In this scenario, there were 4 drives in a RAID5, leaving 24 possible permutations. Fortunately, the drives were specified manually at the time the vdisk was originally created, so the first permutation attempted (1, 2, 3, 4) was successful.

Cleanup

After recovering the volume, it will appear in the volumes list with a new WWN. It will not have a pool assigned and it will be mapped to the Default Storage Domain. Use CAM to reassign the original pool and to remap the volume back to the original host or host group.

In this case, the VMware host had to be rebooted in order to recognize the updated WWN of the volume. After a reboot, all traces of the missing datastore were gone and most of the VMs in the inventory were marked as Unknown. Using the vSphere client, the following steps were used to restore access to the VMs:
1) Click the ESX host -> Configuration Tab -> Storage -> Add Storage
2) Disk/LUN
3) Choose the recovered LUN
4) Choose to keep the existing signature
5) Finish
At this point, the VMs in the inventory recovered their names and configurations but they were all turned off. After starting them, several VMs needed to perform filesystem checks as expected of an unclean shutdown.

References

Related Articles:


    ABOUT US
    Seeds of Genius, Inc. offers a full range of IT solutions including hardware and software products in addition to consulting, installation and support services. For more information, please visit our main web site at http://www.seedsofgenius.com or contact our Technical Sales department at (410) 312-9806.