Friday, December 24, 2010

Seriously people, back up your data!

I write today not - as I would want - with a long-overdue update on our road trip (now long over), but rather with a cautionary tale about data loss. Read and learn from my example! A road trip retrospective will follow later and I hope that will be much more entertaining. This post is going to be a bit long, but please read it all, my experiences are an important lesson.

So why am I talking about data loss today? Well, surprisingly it has nothing to do with my laptop's unexpected death in the middle of our road trip. In fact, this road trip seems to have ended up something of a lighting rod for technology issues (fortunately none involving the car!). After my laptop died, I of course continued taking photos, loading them on the external hard drive I had brought, using Katy's still very much working laptop (thankfully we both brought one). So even though I was mostly out of communication, things continued along just fine. It was when I got back that real trouble began.

Naturally one of the first things I did upon my return was to load all my photos from the trip (about 10,000) onto my home data storage system. I had been using a Lacie 4big Quadra for primary mass data storage (10's of 1000's of photos, documents, projects, work files, and much more) for the past year or so, which has a total of 2.7TB of formatted space when using RAID5. I had about 1.5TB of data on it after loading my photos.

Now, I felt pretty secure with the data on the 4big given it was RAID5. For those not familiar with the technology, it basically uses multiple disks with a sophisticated data distribution system that allows for redundancy. This means that theoretically an entire disk can fail and your data is still ok because it can be rebuilt from the other disks. If 2 disks fail simultaneously (or the controller fails), then you have a problem. Theoretically however the chance of a double disk failure is lower than that of a single failure, so one would imagine the data is safer than with a single drive.

Unfortunately double disk failure or single disk failure combined with other corruption can and does happen, as I found out much to my dismay. I loaded all my photos onto the unit shortly after my return and began sorting through them and posting new sets every day or two. After a week or so of working on photos off and on, I started to see some issues reading certain images. I checked my Windows event log and found a whole bunch of disk-related errors essentially saying my 4big drive was corrupted and it needed to be scanned for errors. I rebooted shortly after and a disk scan ran automatically. Though I've never had much faith in Windows' chkdsk utility, I soon found out that it's even worse to run it on a RAID.

Chkdsk ran, finding a lot of errors, and "correcting" most of them. When I got back in to Windows, I found some missing images (by examining the disk scanning log I was able to see what files it had found "orphaned" or otherwise identified as corrupted and attempted to fix). Fortunately the files lost were fairly minimal, and things seemed to be working ok. That didn't last long though. Within hours my 4big unit became unreadable. Windows still recognized it but it no longer showed the drive as being formatted and no data was accessible. I shut down the computer and the 4big unit and rebooted. The 4big began showing a series of varying configurations of warning and error lights. Eventually it just showed a total failure, meaning at least 2 disks has failed. I was shocked, but not yet panicked. I've seen odd - but correctable - issues like this before (not with this particular unit).

I contacted Lacie technical support, beginning an odyssey of 2 weeks of back and forth during which I received very little useful advice beyond resorting to extremely expensive data recovery options. The first tech I got essentially told me that the indicator lights on the unit didn't necessarily indicate anything useful and that the only way to determine if it was an actual drive failure vs. say a controller failure would be to reinitialize the unit. This meant losing all the data for certain, so it really was not an option. One option I would have liked them to offer would be to ship an identical unit without the drives so I could switch my unit's drives into it and test there. If it still failed, it would indicate it was a drive issue rather than controller or other hardware problem. But they weren't willing to do that or provide any other remedy besides shipping the drive to them in Oregon for data recovery at a cost of thousands of dollars.

At this point I was fairly resigned to having to do data recovery, but still hoped for some cheaper alternatives. I didn't relish the idea of shipping the unit hundreds of miles up to Oregon so I started looking into local data recovery options. I looked around on Yelp and did other general searches online. The world of professional data recovery is rather mysterious and almost universally very expensive. Your average computer shop can sometimes handle very basic recovery, but I can do the same things they can at home, so that wasn't an option.

After a bit of research I came up with a list of local options and started calling to get an idea of rates, turnaround time, etc. The first company I called, Hard Drive 911 Data Recovery, was actually extremely helpful and although I didn't end up going with them, I would still recommend them if only on the basis of their extremely informative and frank reps. The fellow I spoke to told me in plain terms the cost of recovery (minimum $1000 for a RAID 5 with 4 drives, up to $9000!) and the up-front inspection fee (others offer free inspection, 911 credits the inspection fee toward a recovery if you decide to go ahead). Their costs did not end up being particularly more expensive than many other options (this is a shockingly expensive service overall, no matter where you go), but their rep was far more helpful. We had a very candid discussion about the costs and value of professional data recovery and he made clear that they want their customers to feel like the service is worth the cost. He gave me a lot of information and links to resources online to attempt my own recovery, and advised me to get back to them should I want to pursue professional recovery options.

I spent the next 2 weeks attempting recovery myself. My very first task was to image all 4 1TB disks. This takes a very long time to do in pure RAW mode, 1:1 copying (all 1TB of data), so that took up several days in itself. Once I had the RAW images I could attempt recovery on them, rather than further risking the originals, which I wanted to preserve for others to work on should that become necessary. Note that the simple process of attempting to create images on my basic home setup through regular SATA could have caused further damage to the disks, so there was some risk inherent in what I was doing. However I didn't hear any unusual noises from the drives and didn't suspect physical damage, so I felt it was safe to do.

Over the next weeks I tried a number of software tools to recover the data. The biggest challenge turned out to be figuring out the RAID parameters so that the data could be actually be accessed properly. Because RAID distributes data across all the disks, including "parity" (redundancy) information, it can't be read like a normal disk; you need to use special RAID emulation approaches in software, and you need the correct RAID parameters. I didn't have these, and Lacie was extremely reluctant to provide this information to me, which was another frustration I feel was unnecessary. I eventually escalated my support request to a senior staff member who did provide this information, but it ultimately did not help.

In the end I was unable to recover any meaningful data myself. I feel like I was pretty close, I got to the point of being able to list a lot of files and actually see meaningful file names, etc. But the actual data contents wasn't correct, and I suspect I either didn't have enough information on the RAID parameters, or (perhaps more likely) there was corruption in the RAID data model itself. It was time to think seriously about professional data recovery again.

By this time it was almost a month since the failure had occurred. The vast majority of my important data, including 5+ years of photos, 10+ years of documents and project files, and much more was unavailable during this time. I didn't realize it until several weeks in, but it was really affecting my happiness and overall mood, but at a fairly subconscious level. This is not surprising, but as I said I didn't even realize it until after several weeks. I knew I had to get the data back, and that professional data recovery was likely the only option.

Lacie operates their own in-house recovery service called D2; in fact Lacie support initially indicated I was required to use them in order to maintain my warranty (I later found out there were several other authorized providers). I actually had a great back-and-forth with Patrick, the manager there, in which we discussed the recovery options and costs. Since it's an in-house service, they can also perform warranty service, and their costs were lower than most local options. They also offer free 2-way shipping and free evaluation. The service sounded pretty good. But something in me balked at the idea of giving the same company potentially thousands more of my dollars just to fix a piece of hardware that I felt shouldn't have failed in the way it did anyway (not to mention my relatively negative experience with their tech support earlier). I initially asked to proceed but, after not receiving a response for a few days (they did eventually send me a shipping label), I decided to try an evaluation with a local company first.

I had done price comparison and reputation research on a number of local companies. One of the more well-known companies is DriveSavers in Novato. I had initially steered clear of them, partly on the basis of some negative Yelp reviews. Eventually I did call them and ended up having a very nice conversation with one of their reps. We even discussed the Yelp reviews, and having been on the other end of some negative Yelp reviews for businesses I work with, I sympathized with where she was coming from. They were local, in Novato, I could drop off the drive on my way to or from work in Penngrove, and they could do a free analysis within a couple days. So it seemed like a worthwhile option. Their pricing was also comparable with other local options, and their high-end ($4410) was lower than several other local options, so in the worst case scenario I'd pay less than with the same problem at another vendor. The high end of the range usually corresponds with significant *physical* damage and at this point I didn't think that was the case; after all I felt pretty close to recovery with my own simple tools. With the slow response from D2, I decided to go ahead with it.

I dropped the drive off and waited... A few days later I got a call back and they said they estimated they could recover 90% of the data and the cost would be between $3900 and $4400, the maximum they said it could cost before they even looked at the unit. At this point I was tempted just to get the drive back as I was fairly sure the level of corruption didn't justify the cost. I called DriveSavers to talk to the tech working on my recovery, asked him a few questions, and felt his answers were rather condescending and uninformative.

He said there was physical damage on one of the drives, which I had also encountered when I imaged the drives originally. 9 sectors to be precise, which is not much on a 1TB drive. I told him I had some images that were made prior and offered to bring them by if it would help, but he brushed that off as if it was ridiculous that my images taken earlier might be of use. Maybe he's right, but his reaction was not particularly nice. Then I asked him why, if it was just 1 drive with physical corruption, it was going to cost so much (and theoretically be so difficult) to recover the data; after all, isn't RAID5 *meant* to recover from a single drive failure? He barely answered the question, saying only that there was other unspecified corruption, possibly due to chkdsk, and that it was necessary to do advanced recovery.

Thinking back to the Yelp reviews, I recalled some indications there of dubious information from DriveSavers tech support as well. Tales of "motor failure" and "intermittent hardware failure" sounded similar to the lines I was getting. In several of the Yelp reviews people even got their drives back and found other, cheaper ways to recover, discovering that the problem indeed was not as dire or complex as they indicate (for example a simple power supply replacement). That being said I felt by this time that I'd exhausted most of my own simpler diagnoses and recovery options, so even though I didn't trust what I had been told, I wasn't really clear there were many other options.

Then I called Patrick at D2 to see if he felt he could do better on the recovery if I sent it up to him; after all at this point I was still within the free eval at DriveSavers. He felt - and I tended to agree, though reluctantly - that if DriveSavers already have the drives and had them dismantled in their labs, that having them reassemble and box up so I could ship to D2 would incur further risk and I might not get the data back (or not recover as much).

This is where the mystery of the data recovery industry plays into their hands, theoretically justifying the high costs. After all, they're the experts, they can tell you whatever they want, and who are you to argue? The uncertainty customers feel, and the fear of losing data, can justify almost anything in that moment. The lack of information and transparency, the difficulty in trusting these companies, is the biggest problem I have with the whole industry, to this day. I still felt like they weren't telling me the whole truth, but I didn't have any real facts to back that up, just my own experiences with drive failure in my time in IT. So ultimately I said yes, and the final bill was indeed $4410. I paid $4400 to get my data back. And you know what? It was worth it. Fully worth it.

But I don't trust DriveSavers, I don't trust D2 (who never gave me an actual maximum figure, they just said "I've never seen a recovery cost more than $2500", which is no kind of commitment), and I don't trust anyone else in this industry either, except maybe that guy I talked to at Hard Drive 911, who told me quite clearly that it was reasonable for me to try my own recovery first (once he confirmed I was aware of the risks).

The fact is I just don't know. It's possible that my drives were so damaged that it justified the cost. It sure didn't seem like it from any actual evidence I ever saw. And the reaction of the DriveSavers tech was not particularly confidence inspiring. It seemed like he was more interested in stifling questions than clearly informing me about the complexity - and thus justifiably high cost of - the recovery.

So I don't trust any of them, but I used their services because the alternative wasn't worth it to me. Even if they're full of crap and the recovery difficulty did not justify the cost (which I'm fairly certain of), it was still worth the price to me, and that's what matters. What matters more, though, is that I am going to do my best to never have to pay for something like this again.

As to whether I'd recommend DriveSavers or anyone else? Sure, if you have a really serious data loss scenario and the data is worth $1000s of dollars to you, by all means, do it. Not because it may be strictly necessary, but because the risks outweigh the potential cost savings. Just resign yourself to paying the maximum possible quoted amount. If you're ok with that, then it's the right way to go, no questions. DriveSavers is essentially going to treat each recovery like it's the worst possible case, from the get-go, and in some sense they may not be wrong to do that. Sometimes you don't get a 2nd chance, and if they power up a drive and it fries itself before they even get a chance to do anything, they'd have been better off disassembling it from the beginning and running it in a clean room environment. Just be aware that this is going to be their approach and don't be fooled by the low-end of their cost scale; you will almost never pay that.

So on the other hand if your data is only worth a few hundred dollars, or nothing at all, then try to do it yourself, or take it to a local computer shop or trusted IT consultant and see what they can do. It's all about how much the data is really worth to you. If you have a medical problem that could be life threatening and it will only be determined in surgery whether that is in fact the case, you don't go to the bargain clinic to get opened up, you go straight to the best, because the risks outweigh the potential savings, plain and simple. I don't like this reality, but it's a simple truth, and it's what props up an industry of chronically bloated fees. I'm now a registered DriveSavers reseller so theoretically I have some investment in getting them more clients, but I will not be recommending them lightly. I may ultimately refer some people their way if I feel it's justified. But to tell you the truth I'd much rather spend my time and energy teaching people how to back up their data properly!

As this post is now very long, I'm going to continue in Part II: Backing Up Your Data - Software Recommendations and Strategy.