Database Normalization
Redundancy and Normalization
Design Goal
- 속성들 간의 관계가 모두 표현됨
- 불필요한 데이터 중복을 피함
- Facilitate enforcement of database integrity constraints
Anomaly
Data redundancy에 의해 발생하는 inconsistency/error => redundancy 없도록 decompose
- Insert anomaly: 새 데이터를 삽입하기 위해서 불필요한 데이터도 함께 삽입해야 하는 문제
- Deletion anomaly: 튜플 삭제시 다른 필요한 정보까지 삭제되는 문제
- Update anomaly: 중복 데이터 중 일부만 변경하여 consistency가 깨지는 문제
First Normal Form (1NF)
All attribute domains are atomic.
만약 atomic하지 않은 attribute가 있다면 여러 개의 atomic한 attribute로 분해.
Functional Dependency
For any legal relation $r(R)$, $\alpha \rightarrow \beta$ iff $\forall t_1, t_2 \in r, t_{1}[\alpha] = t_{2}[\alpha] \rightarrow t_{1}[\beta] = t_{2}[\beta]$.
$\alpha$ is a determinant of $\beta$ or $\beta$ is dependent to $\alpha$.
Legal relation is a relation that complies every integrity constraint.
Definition of Keys
- Superkey $K$
- $K \rightarrow R$
- Candidate key $K$
- $K \rightarrow R$
- $^\forall \alpha \subset K$, $\neg (\alpha \rightarrow R)$
Trivial FD
Satisfied by all relation $r(R)$. $\alpha \rightarrow \beta$ is trivial iff $\beta \subseteq \alpha$.
Partial/Full FD
$X \rightarrow Y$ is partial FD iff $^\exists \alpha \subset X, \alpha \rightarrow Y$. Otherwise, $X \rightarrow Y$ is full FD.
Transitive FD
$X \rightarrow Z$ is transitive FD iff $^\exists Y, (X \rightarrow Y) \land (Y \rightarrow Z)$.
Closure
The set of all functional dependencies logically implied by $F$ is the closure of $F$ (denoted $F^+$).
Armstrong’s Axiom
- reflexivity: $\beta \subseteq \alpha \Rightarrow \alpha \rightarrow \beta$
- augmentation: $\alpha \rightarrow \beta \Rightarrow \gamma \alpha \rightarrow \gamma \beta$
- transitivity: $(\alpha \rightarrow \beta) \land (\beta \rightarrow \gamma) \Rightarrow \alpha \rightarrow \gamma$
Armstrong’s axiom is sound and complete.
Second Normal Form (2NF)
All attributes are fullly dependent to candidate key.
candidate key의 부분집합이 중복되면 이 부분집합에 종속인 attribute는 당연히 중복된다.
해당 attribute를 추출하여 이 부분집합과 함께 별도의 relation으로 decompose.
Third Normal Form (3NF)
2NF + all attributes are not transitively dependent to candidate key.
2NF이더라도 candidate key가 아닌 attribute set에 종속인 attribute가 있을 수 있다. 이 때 candidate key → attribute set → attribute이므로 transitively dependent. 이 attribute set이 중복되면 attribute 또한 중복된다.
Boyce-Codd Normal Form (BCNF)
All determinants of FDs are superkey unless it is trivial.
Any relation is in 3NF if it is in BCNF.
- 2NF를 위배하는 FD가 있으면 해당 FD의 determinant는 superkey일 수 없으므로 BCNF를 위배
- 3NF를 위배하는 FD가 있으면 $Y$에 해당하는 FD는 superkey일 수 없으므로 BCNF를 위배
만약 BCNF를 위배하는 FD $X \rightarrow Y$가 있으면 Y만 따로 추출하여 별도의 relation으로 decompose.
Decomposition
${R_1, \ldots, R_n}$ is a decomposition of $R$ iff $R = \cup_{i=1}^{n}R_i$.
Lossless-Join Decomposition
For any legal relation, decomposed relation을 join했을 때 원래 relation과 같아짐.
${R_1, R_2}$ is a lossless decomposition if $(R_1 \cup R_2) \rightarrow R_1$ or $(R_1 \cup R_2) \rightarrow R_2$.
예를 들어, 한 relation이 다른 relation의 key를 attribute를 가지는 경우.
Dependency Preservation
Decompose시 각 relation의 FD의 union의 closure가 원래 relation의 FD closure와 같음
BCNF decomposition은 dependency를 preserve하지 못할 수도 있다.
Closure of Attribute Sets
For any closure of $\alpha$ under $F$ (denoted $\alpha^+$), $\beta \subseteq \alpha^+ \iff \alpha \rightarrow \beta \in F^+$.
closure = $\alpha$가 determinant인 가장 큰 attribute set.
알고리즘: $\alpha$에서 시작해서 repeat until no change
- $\forall \beta \rightarrow \gamma \in F$
- if $\beta \subseteq$ result, result $\rightarrow$ result $\cup \gamma$
사용처
- FD test: $\beta \subseteq \alpha^+ \iff \alpha \rightarrow \beta$
- superkey test: $R = \alpha^+ \iff R \subseteq \alpha^+ \iff \alpha \rightarrow R$
- compute $F^+$: $^\forall \gamma \subseteq R$, $\gamma \rightarrow S$ iff $S \subseteq \gamma^+$. $\gamma \rightarrow \gamma^+$라 자명
BCNF Test
- $\alpha \rightarrow \beta$ test = superkey test
- single relation test: test only $F$ rather than $F^+$ (증명 가능)
- 만약 decomposition이라면 dependency preservation이 안 될 수도 있어서 $F^+$ test
- BCNF decomposition할 때 처음 F^+를 기억해야 하는 이유
Denormalization
Peformance 때문에 normalize 안할 수도 있음 / 혹은 view 사용
- faster lookup
- extra space & extra execution time for updates
- extra code work & extra possibility of error
Exercise
Redundancy & Anomaly
- Redundancy: loan마다 branch 정보 (name, city, asset)이 중복됨
- Insert anomaly: loan이 없는 branch는 삽입할 수 없음
- Update anomaly: 한 튜플만 branch 정보를 바꾸면 consistency 깨짐
- Delete anomaly: L-17, L-14 삭제하면 Downtown branch 정보 사라짐
Armstrong
$R = (A, B, C, G, H, I)$, $F = {A \rightarrow B, A \rightarrow C, CG \rightarrow H, CG \rightarrow I, B \rightarrow H}$
$F^+ = {A \rightarrow ABCH, B \rightarrow BH, CG \rightarrow HI}^+$
BCNF
Lending-schema = (b-name, b-city, assets, c-name, loan#, amount) F = { b-name → b-city assets ; loan# → amount b-name }, Key = {c-name, loan#}
Branch = {b-name, b-city, assets} F = {b-name → b-city assets} Loan = {loan#, amount, b-name} F = {loan# → amount b-name} CustLoan = {c-name, loan#}
Discussion
Functional Dependency
Q. List all meaningful* FDs for the following relation schema Student.
Student(ID, Name, Address, Age, Dept, Dept_office, Dept_Chair, College, AdvisorID, AdvName, AdvDept)
A. ID → Name Address Age Dept AdvisorID ; Dept → Dept_office, Dept_Chair, College ; AdvisorID → AdvName AdvDept
BCNF
Q. Is the schema in BCNF? If not decompose the relation into a set of relation schemas, each of which is in BCNF.
A. Student (ID, Name, Address, Age, Dept, AdvisorID) Department (Dept, Dept_office, Dept_Chair) Advisor (AdvisorID, AdvName, AdvDept)
Functional Dependency
Q. Explain how functional dependencies can be used to indicate the following:
- a 1-to-1 relationship exists between students and advisors
- a many-to-1 relationship exists between students and advisors
A. 1-to-1 => {ID → AdvisorID, AdvisorID → ID} many-to-1 => {ID → AdvisorID}
BCNF
Q. Consider the following set of FDs on the relation schema r(A, B, C, D): A → BC, C → D. Give a BCNF decomposition of r.
A. R1 = (A, B, C), R2 = (C, D)
Lossless-Join Decomposition
Consider the following set F of FDs on the relation student(name, dept, college): F = {name → dept , dept → college}
Suppose student is decomposed into two relations stud1(name, dept) & stud2(name, college).
Q. Is this a lossless-join decomposition?
A. Yes
Q. Are relations stud1 and stud2 in BCNF?
A. Yes
Q. Is this a good decomposition?
A. No. It doesn’t preserve dependency dept → college.
Closure
Q. Consider the following set F of FDs on the relation R = (A, B, C, D). Compute F+.
F={ A → B; A → C; BC → D; DC → B}
A. A → R ; BC → BCD ; CD → BCD (trivial, reflexive + transitive 제외하고)
BCNF
Q. Show that if none of the dependencies in F causes a violation of BCNF for R, then none of the dependencies in F+ will cause a violation of BCNF either.
A. Armstrong으로 F에서 F+로 확장할 때 처음으로 나오는 violation을 생각해보자.
- reflexivity로 유도된거면 trivial => non-violation
- augmentation으로 유도된거면 determinant의 subset이 superkey => 모순
- transitivity로 유도된거면 transitive closure라서 determinant가 superkey => 모순