Storage drives — hard disk drives (HDDs), solid-state drives (SSDs), and NVMe drives — are among the most failure-prone components in a computer. Unlike CPU or RAM failures which can sometimes be resolved by reseating or replacing a module, a drive failure often means permanent, unrecoverable data loss. This makes storage troubleshooting one of the highest-stakes disciplines in IT support.
Objective 5.2 requires you to recognize thirteen distinct symptoms of drive and RAID failure, understand their causes, know how to diagnose them, and understand the appropriate response — including when to prioritize data recovery over repair. This section covers all thirteen symptoms in depth, along with the foundational knowledge of RAID levels needed to interpret RAID-specific symptoms.
Data First — Always
The cardinal rule of storage troubleshooting: back up or recover data before performing any repair actions. Running tools like chkdsk on a physically failing drive, or attempting a rebuild on a degraded RAID without a backup, can accelerate data loss. Identify the scope of the problem first. Recover data second. Repair third.
Before diagnosing symptoms, understanding the fundamental differences between storage technologies is essential — because many symptoms are technology-specific.
Exam Focus — HDD vs SSD Symptoms
Many symptoms in this objective (grinding, clicking, audible alarms) are HDD-specific. SSDs do not produce mechanical sounds. When the exam describes audible symptoms from a drive, the drive in question is always an HDD. Silent failures (missing drive, S.M.A.R.T. errors, data corruption) can occur with either technology.
Several symptoms in this objective (RAID failure, array missing, audible alarms) require a solid understanding of how each RAID level works, how many drive failures it can survive, and what happens when it fails. This section provides a complete reference.
| Level | Min Drives | Fault Tolerance | Usable Capacity | Performance | How it works |
|---|---|---|---|---|---|
| RAID 0 | 2 | None — 0 drive failures | 100% of all drives | Fastest reads/writes | Striping only. Data split across all drives. Zero redundancy. One drive fails = all data lost. |
| RAID 1 | 2 | 1 drive failure | 50% (size of one drive) | Read improved; writes same | Full mirror. Every write goes to both drives simultaneously. Either drive can serve reads. |
| RAID 5 | 3 | 1 drive failure | N-1 drives | Good read; write penalty | Striping with distributed parity. Parity rotates across all drives. One failed drive can be rebuilt from parity + remaining data. |
| RAID 6 | 4 | 2 drive failures | N-2 drives | Good read; larger write penalty | Striping with dual distributed parity. Two separate parity sets. Can survive any two simultaneous drive failures. |
| RAID 10 | 4 | 1 per mirrored pair | 50% | Best overall | Stripe of mirrors (RAID 1+0). Data striped across mirrored pairs. Very high fault tolerance if failures are in different pairs. |
A RAID array can exist in several states. Understanding these states is essential for interpreting RAID-related symptoms on the exam.
Critical Concept — RAID Is Not a Backup
RAID protects against drive hardware failure only. It does not protect against: accidental file deletion (deleted from one mirror = deleted from both), ransomware encryption (encrypts all drives in the array simultaneously), controller failure (if the RAID controller fails, all drives may be unreadable by a different controller), fire/flood/theft (all drives physically co-located), or software corruption. A proper backup strategy follows the 3-2-1 rule: 3 copies, on 2 different media types, with 1 offsite or offline.
LED status indicators on drives, RAID controllers, and server hardware provide a real-time visual representation of drive health. Understanding what each color and pattern means is essential for quickly assessing a situation without using software tools — particularly in server environments where drives are hot-swappable.
| LED Color / Pattern | Meaning | Action Required |
|---|---|---|
| Solid green | Drive healthy and operational | None — normal |
| Blinking green | Drive activity (reads/writes occurring) | None — normal activity |
| Solid amber / yellow | Drive has failed or is in a fault state | Replace the drive immediately |
| Blinking amber | Drive is predictively failing (S.M.A.R.T. warning) or array is rebuilding | Back up data; prepare replacement |
| Blinking green + amber (alternating) | RAID rebuild in progress on this drive | Do not remove the drive; wait for rebuild |
| No LED / dark | Drive not detected, not seated, or no power | Reseat drive; check power and data cables |
| Red | Drive failed (some systems use red instead of amber) | Replace the drive |
Context Note
LED behavior is not fully standardized across manufacturers. Dell, HP, and Lenovo server hardware all use slightly different color schemes. The exam tests the general concept — amber/yellow = fault/warning, green = healthy — rather than any specific vendor's implementation. Always consult the system documentation for precise meanings.
Many drives and systems have two separate LEDs: one for activity (read/write operations — expected to blink frequently) and one for health status. A common mistake is mistaking a solid health LED for a drive that is "on" but not being used. Distinguish between them: sustained lack of any activity LED during a period when the drive should be active (like during a system boot or large file copy) can indicate a problem.
A grinding noise from a computer's storage system is one of the most urgent symptoms a technician can encounter. Unlike the normal operational sounds of an HDD (quiet spinning, occasional soft seek sounds), grinding indicates mechanical destruction occurring inside the drive in real time.
Inside an HDD, the read/write heads float nanometers above the spinning platters on a cushion of air — they never physically touch during normal operation. This is called the head gap. When the head crashes — meaning it physically contacts the platter surface — the result is catastrophic. The head scrapes the magnetic coating off the platter, destroying both the head and the data.
A head crash produces a distinctive grinding or scraping sound. Other causes of grinding include:
Immediate Response Protocol
Power off the system immediately. Every additional second the drive spins causes more physical damage and more data loss. Do not attempt to run diagnostics on a grinding drive. Do not attempt chkdsk. Contact a data recovery specialist if the data is critical — physical recovery requires a cleanroom environment and is expensive. If the drive is already backed up, simply replace it.
It bears repeating that SSDs and NVMe drives cannot grind — they have no moving parts. A grinding sound is unambiguously an HDD symptom. If a system appears to make a grinding sound but contains only SSDs, investigate fans and other mechanical components.
Clicking sounds from an HDD are a well-known warning sign and should be treated with the same urgency as grinding. The most infamous pattern — a repeated clicking with brief pauses — is colloquially known as the click of death.
ddrescue (Linux) or Recuva to copy data off the drive. Work quickly.When a system powers on and the firmware cannot locate a valid bootable device, it displays an error message before handing off to the OS. Common messages include: "No boot device found", "Boot device not found", "Reboot and select proper boot device", "NTLDR is missing", or "Operating system not found."
This symptom has a wide range of causes — from a simple misconfigured boot order to a completely failed drive — and must be diagnosed methodically.
| Cause | Why It Happens | How to Verify |
|---|---|---|
| Wrong boot device in BIOS | Boot order set to USB or optical drive first; empty drive is selected | Enter BIOS/UEFI; verify boot order lists the OS drive first |
| Drive not detected by BIOS | Loose cable, failed drive, or failed SATA port | Check BIOS storage screen — is the drive listed? |
| Corrupt bootloader | Windows boot files (BCD/MBR) are damaged; drive is healthy but unbootable | Drive visible in BIOS; boot into WinRE and run Startup Repair or bootrec |
| MBR/GPT corruption | Master Boot Record or GUID Partition Table is damaged | Boot from USB; use diskpart to inspect partition table |
| Drive hardware failure | Drive has failed and cannot be read at all | Drive absent from BIOS; no activity LED; listen for sounds |
| Secure Boot conflict | Non-signed bootloader blocked by UEFI Secure Boot | Disable Secure Boot in UEFI temporarily and test |
| SATA mode mismatch | Drive configured as AHCI in BIOS but OS installed in IDE mode (or vice versa) | BIOS SATA controller mode setting |
Exam Focus
The exam frequently presents "bootable device not found" as the result of either a misconfigured BIOS boot order (most common, simplest fix) or a corrupt bootloader. The first step is always to check the BIOS boot order. If the boot order is correct and the drive is visible in BIOS, the next step is Startup Repair from Windows Recovery Environment (WinRE).
Data loss refers to files that disappear entirely. Data corruption refers to files that are present but contain incorrect, unreadable, or garbled content. Both symptoms can originate from the same underlying causes — the difference is simply whether the file system still has a reference to the data (corruption) or has lost it entirely (loss).
chkdsk C: /f /r — the /f flag fixes file system errors; /r locates bad sectors and attempts to recover readable data. Only use on a drive that is not making physical failure sounds.sfc /scannow checks and repairs corrupted Windows system files — distinct from drive-level corruption.RAID failure occurs when enough drives in an array have failed to exceed the array's fault tolerance. The result depends on the RAID level and the number of failed drives. A RAID failure can mean the array is degraded (still running, but at risk) or completely offline (data inaccessible).
| RAID Level | Degraded at | Failed (data inaccessible) at | Behavior during degraded state |
|---|---|---|---|
| RAID 0 | N/A — no redundancy | Any 1 drive failure | Immediately offline; all data lost |
| RAID 1 | 1 drive failure | Both drives fail | Runs on surviving drive; can rebuild to new drive |
| RAID 5 | 1 drive failure | 2+ drive failures | Continues running; second failure = total data loss |
| RAID 6 | 1–2 drive failures | 3+ drive failures | Continues running through two failures; second failure still degraded |
| RAID 10 | 1 drive per pair | Both drives in same pair fail | Continues if failures are in different pairs; fails if both drives in one pair fail |
When a failed drive in a redundant RAID is replaced, the controller initiates a rebuild — the process of regenerating the missing data from the remaining drives' parity or mirror data onto the new drive. This process is critical to understand for the exam.
Exam Focus
The exam will test whether you understand what happens when specific drives in specific RAID levels fail. Key facts: RAID 0 has no fault tolerance — any single failure destroys all data. RAID 5 needs exactly 3+ drives and can only tolerate 1 failure. A second failure during rebuild is a common scenario question. RAID 10 can tolerate multiple failures as long as they are not both from the same mirrored pair.
S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) is a monitoring system built into most modern HDDs, SSDs, and NVMe drives. The drive's firmware continuously monitors dozens of internal health parameters and stores their values. These values can be read by the OS or diagnostic tools to predict and detect drive failures before they become catastrophic.
Each S.M.A.R.T. attribute has a current value, a worst-ever value, a threshold, and a raw value. When the current value falls below the threshold for a critical attribute, the drive reports a S.M.A.R.T. failure — a prediction that the drive is likely to fail soon. The system BIOS may display a warning at boot: "S.M.A.R.T. Failure Predicted on Hard Disk. Immediately back up your data and replace the hard disk drive."
| Attribute ID | Name | What It Measures | Significance |
|---|---|---|---|
| 0x05 (5) | Reallocated Sectors Count | Number of bad sectors that have been remapped to spare sectors | Critical — any non-zero value indicates physical platter damage; rising count = imminent failure |
| 0xC5 (197) | Current Pending Sectors | Sectors flagged as unstable, waiting to be remapped | Critical — these sectors cannot be reliably read right now; data loss may occur on next read |
| 0xC6 (198) | Uncorrectable Sector Count | Sectors that could not be read after multiple attempts | Critical — data in these sectors is already lost; drive replacement urgent |
| 0x01 (1) | Read Error Rate | Frequency of errors when reading data from the platter | Important — rising rate indicates head or platter degradation |
| 0xC2 (194) | Temperature | Current drive operating temperature in Celsius | Above 55°C consistently = thermal stress reducing drive lifespan |
| 0x09 (9) | Power-On Hours | Total hours the drive has been powered on | Context — drives beyond 40,000–50,000 hours are aging; factor in with other attributes |
| 0xF1 (241) | Total LBAs Written (SSD) | Total data written to the SSD over its lifetime | SSD-specific — compare to manufacturer's endurance rating (TBW) |
Exam Focus
A S.M.A.R.T. failure message at POST is a prediction, not a current failure — the drive is still working but is forecasted to fail. The correct response is: immediately back up all data, then replace the drive. Do not ignore the warning. Do not simply run CHKDSK. Back up data first, then replace the drive.
When read or write operations that previously completed quickly begin taking significantly longer, the drive is experiencing one of several conditions — all of which warrant investigation. Extended read/write times are one of the earliest and most actionable warning signs of impending drive failure.
chkdsk C: /r identifies and attempts to recover bad sectors. Note: this will be slow if many bad sectors exist — that slowness itself is diagnostic.IOPS (Input/Output Operations Per Second) measures how many individual read or write operations a storage device can perform per second. It is the most important performance metric for workloads involving many small, random accesses — such as databases, virtual machines, and operating system boot drives.
| Drive Type | Sequential Read | Typical IOPS (4K Random) | Notes |
|---|---|---|---|
| HDD (7200 RPM) | ~150 MB/s | ~100–200 IOPS | Mechanical seek time is the bottleneck for random access |
| SATA SSD | ~550 MB/s | ~50,000–100,000 IOPS | Flash memory eliminates seek time; dramatically faster for random I/O |
| NVMe SSD (Gen 3) | ~3,500 MB/s | ~300,000–500,000 IOPS | PCIe interface removes SATA controller bottleneck |
| NVMe SSD (Gen 4) | ~7,000 MB/s | ~800,000–1,000,000 IOPS | Current consumer flagship performance tier |
An HDD performing at 50 IOPS instead of its rated 150 IOPS may indicate mechanical degradation — bad sectors causing retries, or head positioning issues. An SSD at 10,000 IOPS instead of 90,000 IOPS suggests severe NAND wear, a failing controller, or thermal throttling.
In RAID configurations, low IOPS can indicate a degraded array (operating without parity protection requires extra read operations for RAID 5/6 reconstruction), or a drive in the array that is underperforming and dragging down the entire array's throughput.
Practical Note
IOPS is typically measured with tools like CrystalDiskMark (Windows) or fio (Linux). On the exam, you will not be asked to calculate IOPS — you will be expected to understand that low IOPS is a symptom of drive degradation, and that HDDs have dramatically lower IOPS than SSDs due to mechanical seek time.
A drive that was previously accessible in the operating system has disappeared — it no longer shows up in File Explorer, Disk Management, or Device Manager. The drive may still be physically connected and even visible in BIOS. This symptom requires a structured diagnostic approach because the failure point can be anywhere from a loose cable to a failed file system to a completely dead drive.
| Question | Yes | No |
|---|---|---|
| Is the drive visible in BIOS/UEFI? | Drive hardware is alive — problem is logical or driver | Drive is completely failed, or cable/power issue |
| Is the drive visible in Disk Management? | Likely unformatted, wrong partition, or no drive letter assigned | Driver issue or OS-level detection failure |
| Does it show as "Unallocated" in Disk Management? | Partition table deleted/corrupt — use recovery software | Check for offline status; right-click → Online |
| Is the drive marked "Offline" in Disk Management? | Right-click → Online; may require administrator permission | Check for missing drive letter — assign one |
| Does S.M.A.R.T. show errors? | Physical drive failure — replace and restore from backup | Logical issue — attempt file system repair |
An array missing condition occurs when the RAID controller cannot locate or reconstruct the RAID array. The individual physical drives may be present, but the logical RAID volume they form is not visible to the operating system. This is a more severe condition than a single missing drive because it affects all data on the entire array.
Do Not Initialize
When Windows Disk Management detects disks from a missing array, it may prompt you to "Initialize Disk." Do not do this. Initializing the disk will write a new partition table, overwriting the RAID metadata and potentially making data recovery impossible. Click Cancel and use RAID recovery software instead.
Audible alarms in the context of drive and RAID issues refer to electronic beeping sounds generated by RAID controllers, server management hardware, or UPS (Uninterruptible Power Supply) units — not the mechanical sounds from HDD heads described in Symptoms 02 and 03. These are intentional alert sounds produced by system management electronics to notify an administrator of a fault condition.
| Source | Alarm Pattern | Meaning |
|---|---|---|
| RAID controller | Continuous or repeating beep | Drive failure detected in array; array may be degraded or failed |
| Server IPMI / BMC | Intermittent beeping + LED | Hardware fault detected; check management console for specific error code |
| NAS device | Beeping + front panel indicator | Drive failure, RAID degraded, or network connectivity loss |
| UPS (APC, Eaton) | Continuous beep | Running on battery power (power outage); battery low |
| UPS | 4 beeps repeating | Battery near end of life — replace battery |
| Storage enclosure | Alarm + amber LED | Drive slot has a failed drive; hot-swap replacement required |
In a server or NAS environment, the alarm is designed to be heard and acted upon immediately. The typical response sequence is:
One subtle but important topic is how S.M.A.R.T. interacts with RAID. S.M.A.R.T. is a per-drive metric — it monitors each individual physical drive's health independently. RAID controllers vary in how they expose or suppress this data.
chkdsk /f /r — file system repair and bad sector scandiskpart — partition table inspection and repairbootrec /rebuildbcd — boot record repairFinal Exam Reminders
Grinding or clicking = HDD mechanical failure. Power off. Back up. Replace.
S.M.A.R.T. failure at POST = prediction, not current failure. Back up immediately, then replace drive.
Bootable device not found = check BIOS boot order first. Then bootloader. Then cables.
RAID 0 = any single failure = all data lost. Zero redundancy.
RAID 5 degraded = still running but one more failure = total loss. Replace failed drive urgently.
Array missing prompt to initialize = do NOT initialize. Use recovery software.
Audible alarm from RAID = check LED on drive bay, access RAID management console, hot-swap failed drive.
RAID is not a backup = it protects against hardware failure only, not deletion, ransomware, or controller failure.