Physical Storage System

June 28, 2023

Physical Storage System

Metrics

Data Access Speed
Cost per Unit of Data
Reliability => data loss on system crash / physical failure
Persistency => volatile/non-volatile

Storage Hierarchy

primary storage : fastest, volatile (Cache, Memory)
secondary storage : non-volatile, on-line (Disk)
tertiary storage: non-volatile, off-line (CDRom, Tape)

Online : under control of CPU

Magnetic Disk

read-write head
circular track = set of sectors
platter = set of circular tracks => spins continuously => Random Access
cylinder = ith track of all platters

Performance Measurement

Access Time

from when a read or write request is issued to when data transfer begins
Seek Time + Rotational Latency
Seek Time
- time to reposition the arm over the correct track.
- random => avg = 1/2 worst
Rotational Latency
- time for the sector to come under the head
- rotation => avg = 1/2 of hz

Data Transfer Rate

The rate at which data can be retrieved from or stored to the disk.

Mean Time To Failure (MTTF)

The average time the disk is expected to run continuously without any failure.

Optimization

block 단위 data transfer
- larger block => less overhead, but more waste space
disk-arm scheduling
- elevator algorithm: arm을 한쪽 방향으로 움직이면서 그 방향 request 다 처리
file organization
- 한 파일의 데이터를 같은/인접한 block에 저장 => seek time 감소
nonvolatile write buffer
- caching과 같은 원리
log disk

RAID System

Multiple disks shown as a single disk of high speed & reliability

Reliability via Redundancy

Mirroring

Duplicate every disk => MTTF 비약적으로 상승 (둘 다 고장나야 해서)
Write both, Read one

Parity

Mirroring은 reliable한 만큼 too much redundancy
M개의 block을 (M+1)개의 disk에 저장 => 1개 disk failure까지는 복구 가능

Performance via Parallelism

Stripe data across multiple disk => Transfer rate 대략 N배
Bit-level => worse seek time
Block-level => block 단위로 striping, 여러 block access시 빠름

RAID Level

level 0: block striping + no redundancy
level 1: block striping + mirroring
- best write performance
level 5: block striping + block interleaved distributed parity
- lower storage overhead but higher write overhead

Discussion

Online Storage

Q. Why are the following storage mediums not widely used for on-line storage?

Magnetic tape
Optical storage

A. Magnetic tape: no random access, Optical storage: write once, read many

Flash Memory

Q. Given the characteristics of flash memory and the mechanics of data access in disk-based databases, suggest ways in which flash memory can be used to enhance database performance.

A. Write buffer

Parity

Q. Consider the data and parity-block arrangement on four disks:

Parity Block Arrangement

The $B_i$s represent data blocks; the $P_i$s represent parity blocks. Parity block $P_i$is the parity block for data blocks $B_{4i-3}$ to $B_{4i}. What, if any, problem might this arrangement present?

A. 4개의 block과 parity block까지 총 5개의 block 중 두 개 이상이 fail하면 복구가 안된다. 따라서 이 5개의 block은 서로 다른 disk에 저장되어 있어야 한다.

Scrubbing

Q. Data scrubbing is an error correction technique that uses a background task to periodically inspect the storage for errors, then corrects detected errors using redundant data in the form of different checksums or copies of data. Why do you think data scrubbing is used widely in most commercial RAID systems?

A. On-line correction은 user experience에 좋지 않고, correction 너무 안하면 error가 누적되기 때문에 “가끔씩” 주기적으로 해주는 것이 좋다. Correction은 전체 disk scan을 해야 해서 비싸므로 background task로 돌린다.

Back

Share on

Twitter Facebook LinkedIn

Jiho Kang

Physical Storage System

Physical Storage System

Metrics

Storage Hierarchy

Magnetic Disk

Performance Measurement

Access Time

Data Transfer Rate

Mean Time To Failure (MTTF)

Optimization

RAID System

Reliability via Redundancy

Mirroring

Parity

Performance via Parallelism

RAID Level

Discussion

Online Storage

Flash Memory

Parity

Scrubbing

Share on

You may also enjoy

Unittest

Poetry

Pipx

Pyenv