2015-01-10

學校課程/資料庫系統

資料庫複習 Ch.13

Chapter 13

Design Theory for Relational Databases

名詞定義

Functional Dependencies
- 定義
  X -> Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X , then they must also agree on all attributes in set Y .
- 特性
  - X->A₁A₂ … A_n holds for R exactly when each of X->A₁, X->A₂,…, X->A_n hold for R.
    例子 Drinkers(name, addr, beersLiked, manf, favBeer)
    FD : name -> addr favBeer & beersLiked -> manf
    name -> addr favBeer = name -> addr and name -> favBeer .
Keys of Relations
換句話說 superkey 可能是複合的好幾個 key 構成，因此 key 一定也是 superkey，不論 superkey 或者是 key 都可以決定所有的 attribute。特別注意，key 是最小單位，因此 key 的子集不會是 key。
- superkey
  K is a superkey for relation R if K functionally determines all of R.
- key
  K is a key for R if K is a superkey, but no proper subset of K is a superkey.
- 例子 Drinkers(name, addr, beersLiked, manf, favBeer)
  FD : name -> addr favBeer & beersLiked -> manf
  得到 {name, beersLiked} 是一個 superkey，發現 {name} 和 {beersLiked} 不是 superkey，得到 {name, beersLiked} 是一個 key。
Inference Test
如何找到 inferring FD？所謂的 inferring FD 就是藉由遞移關係得到的 FD，可以由原本集合中的幾個 FD 推斷出來，因此這些 inferring FD 不是必要存在的。那就先把 Y -> B 移除，再由剩下的 FD 組合，查看是否存在可以推論出 B 的可能，如果任何情況都推不出 B，則 Y -> B 不是 inferring FD。
We are given FD’s X1 -> A1, X2 -> A2,…, Xn -> An , and we want to know whether an FD Y -> B must hold in any relation that satisfies the given FD’s.
例子 If A -> B and B -> C hold, surely A -> C.
Closure Test
遞移閉包測試。
An easier way to test is to compute the closure of Y , denoted Y+ .
- Basis: Y+ = Y.
- Induction: Look for an FD’s left side X that is a subset of the current Y+ . If the FD is X -> A, add A to Y+ .

Relational Schema Design

schema design 目標是避免異常、冗餘。

Goal of relational schema design is to avoid anomalies and redundancy.

Update anomaly :
更新異常，當一個事實改變時，只會發生一次修改，而不會發生很多次的修改。意即針對一個關聯屬性修改，最多更動一個 row (entity)，不會更動數行的 row。
one occurrence of a fact is changed, but not all occurrences.
Deletion anomaly :
刪除異常，當一個 row 移除時，確定合法的事實被移除。如果發生關聯上屬性移除，但該屬性本應該存在，卻遭到移除而不存在資料庫中，就這麼地憑空消失。
valid fact is lost when a tuple is deleted.
Boyce-Codd Normal Form
無論怎麼找 FD，都會發現到 FD 的 left hand side X 都是 superkey，那麼 R 就是一個 BCNF 。
We say a relation R is in BCNF if whenever X ->Y is a nontrivial FD that holds in R, X is a superkey.
- nontrivial means Y is not contained in X.
- a superkey is any superset of a key (not necessarily a proper superset).
- 例子 Drinkers(name, addr, beersLiked, manf, favBeer)
  FD : name -> addr favBeer & beersLiked -> manf
  In each FD, the left side is not a superkey. Shows Drinkers is not in BCNF.
Decomposition into BCNF

BCNF_Decompose(R, FD)
    find X s.t.: X != (X+) != [all attributes] // X is one of FD's left hand side
    if (not found) then “R is in BCNF”
    let Y = (X+) - X
    let Z = [all attributes] - (X+)
    decompose R into R1(X union Y) and R2(X union Z) 
    continue to decompose recursively R1 and R2

Third Normal Form

接續上面的 Decomposition，將會產生一些問題。拆分上不合乎常理，發生過度的拆分，這問題發生在於給定的 FD 格式。

舉個例子 (street address, city, zip code)

FD : AB -> C and C -> B.
Example: A = street address, B = city, C = zip code.
There are two keys, {A, B} and {A, C} .
C -> B is a BCNF violation, so we must decompose into AC, BC.
顯然地 AC 有很嚴重的問題，既不存在 A -> C 也不存在 C -> A。

避免此問題

An attribute is prime if it is a member of any key. 也就是 attribute 是某一 key 的 subset。
X -> A violates 3NF if and only if X is not a superkey, and also A is not prime.

接續上個例子 (street address, city, zip code)

There are two keys, {A, B} and {A, C} .
Thus A, B, and C are each prime.
Although C -> B violates BCNF, it does not violate 3NF.

properties of a decomposition

最後，來看看得到的 3NF 和 BCNF 給予什麼特性？

There are two important properties of a decomposition:

Lossless Join :
it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original.
Dependency Preservation :
it should be possible to check in the projected relations whether all the given FD’s are satisfied.

資料不失真，保證可以藉由投影、Join 將分解的 schema 組合成原本的資料。同時所有的相依關係仍然保留，可以把原本 FD 都滿足所有的 schema。

Read More +

2015-01-10

學校課程/資料庫系統

資料庫複習 Ch.12

Chapter 12

E/R Model

Entity-Relationship Model 實體關係模型

The E/R model allows us to sketch database schema designs.

協助描繪資料庫 schema 設計，藉由 E/R model 映射到資料庫的 Relation model。

名詞定義

Entity = “thing” or object.
實體，對應到現實生活中的名詞，例如蘋果 mac2013、微軟 win8、 … 等。單指一個、一件實體存在。
Entity set = collection of similar entities.
很多類似的實體產生的集合，例如計算機、員工 … 等實體集合。
Similar to a class in object-oriented languages.
Attribute = property of (the entities of) an entity set.
屬性，通常是一個數值的呈現，例如計算機的運算能力、員工薪資、 … 等。
Attributes are simple values, e.g. integers or character strings, not structs, sets, etc.

繪製 E/R Diagrams 時，呈現方式

Entity set = rectangle.
實體集合使用矩形圖表示
Attribute = oval,
屬性使用橢圓表示，並且連到實體集合，表示這些實體集合都具有這些屬性特徵。
with a line to the rectangle representing its entity set.
Relationships = diamond.
用菱形連接兩個以上的實體集合，表示實體集合之間的關係。
A relationship connects two or more entity sets, with lines to each of the entity sets involved.
Relationship Set
The “value” of a relationship is a relationship set, a set of tuples with one component for each related entity set.
For the relationship Sells , we might have a relationship set like:

Bar	Beer
Joe’s Bar	Bud
Joe’s Bar	Miller
Sue’s Bar	Bud
Sue’s Bar	Pete’s Ale
Sue’s Bar	Bud Lite

Multiway Relationships
Our three binary relationships Likes , Sells , and Frequents do not allow us to make this distinction.

Bar	Drinker	Beer
Joe’s Bar	Ann	Miller
Sue’s Bar	Ann	Bud
Sue’s Bar	Ann	Pete’s Ale
Joe’s Bar	Bob	Bud
Joe’s Bar	Bob	Miller
Joe’s Bar	Cal	Miller
Sue’s Bar	Cal	Bud Lite

Rounded arrow = “exactly one,”
實心箭頭表示確切對應一個關係，如果一個人會喜愛好幾種酒，其中最愛某一種。
i.e., each entity of the first set is related to exactly one entity of the target set.
Roles
Label the edges between the relationship and the entity set with names called roles.
例如丈夫、妻子之間的關係、夥伴之間的關係。

Husband	Wife
Bob	Ann
Joe	Sue

Buddy1	Buddy2
Bob	Ann
Joe	Sue

Subclass = special case = fewer entities = more properties.
從物件導向來說，一個繼承關係，得到的 Subclass 會繼承原本的 Class 的特徵、關係，有可能會具有額外的特徵。
Let us suppose that in addition to all the properties (attributes and relationships) of beers, ales also have the attribute color . Isa triangles ( is a 的三角形) indicate the subclass relationship. Point to the superclass (指向父類別).
- 比較 OO 和 E/R 對 subclass 的定義
  也就是說，OO 只會用 is a 描述一個 subclass 的繼承關係。但在 E/R model 中，isa 相當一個欄位，用來描述它是哪一個子類別，也就是說在紀錄時，如果它在某一個子類別時，那麼在父類別中也會記錄到它。
  - In OO, objects are in one class only. Subclasses inherit from superclasses.
  - In contrast, E/R entities have representatives in all subclasses to which they belong.
    - Rule : if entity e is represented in a subclass, then e is represented in the superclass (and recursively up the tree).
Key
Key 是一個 attribute 的集合構成，得到 f(Key) = at most one entity 。繪製時，使用底線 (underline) 於作為 key attribute 表示。
is a set of attributes for one entity set such that no two entities in this set agree on all the attributes of the key.
Weak Entity Sets
為了去辨識某個實體，必須倚靠一對一、或者多對一的關係，來確切地找到某個實體。
Entity set E is said to be weak if in order to identify entities of E uniquely, we need to follow one or more many-one relationships from E and include the key of the related entities from the connected entity sets.
- 例如 Player(name, nubmer) - Plays_on -> Team(name)。當知道 Team 是哪一個時，才能辨識出 Player，否則名稱跟號碼可能會在不同隊中出現重複的。因此 Player 就是一個 Weak Entity Sets 。
Weak Entity-Set Rules
A weak entity set has one or more many-one relationships to other (supporting) entity sets.
並不是每一個 relationship 都是 weak entity set 需要的支持。然而被需要的 relationship 必然擁有一個 rounded arrow 。
The key for a weak entity set is its own underlined attributes and the keys for the supporting entity sets.
例如在之前的例子，可以使用 (player) number 和 (team) name 作為 Player 的 key。

Design Techniques

Design Techniques
- Avoid redundancy.
  避免冗餘
- Limit the use of weak entity sets.
  限制 weak entity set 的使用
- Don’t use an entity set when an attribute will do.
  如果能使用一個 attribute 表示，則無需增加一個 entity set 單獨表示一個 attribute。

分別對應以下幾點。

Redundancy = saying the same thing in two (or more) different ways
存在用好幾種方法說一件事情，或者提及某個屬性。
Entity Sets Versus Attributes
對應實體集合的屬性，要符合以下兩點，屬性應為很多實體共同擁有的名稱，並且擁有至少一個 nokey 屬性，或者屬於 many-one 或 many-many 關係中，many 那一方的 entity set。
An entity set should satisfy at least one of the following conditions:
- It is more than the name of something; it has at least one nonkey attribute. or
- It is the “many” in a many-one or many-many relationship
Weak Entity Sets 使用須知
- 避免濫用，通常能夠創建一個 unique ID 來表示一個實體。
- 當真的找不到一個普遍性的 unique ID 時，才會逼不得已使用 Weak entity set.

From E/R Diagrams to Relations

關於對應的名詞如下，轉換時找 attribute 時，E/R Diagrams 中的 relationships 也會成為一個 relation ，attribute 的找法，可以從關聯的 entity set 的 key 或者是關係的定義本身。

Entity set -> relation.
Attributes -> attributes.
Relationships -> relations whose attributes are only:
- The keys of the connected entity sets.
  將相連的實體集合的 key 值都拿來作為 attribute。
- Attributes of the relationship itself.
Combining Relations
結合 Relation，減少 Relation 的個數，將多對一的 relationships 表單，而實體集合屬於 many 的那一方，則可以把 one 那一方結合在一起。切記一定要是多對一，如果是多對多則會造成冗餘發生。
- The relation for an entity-set E
- The relations for many-one relationships of which E is the “many.”
- Example: Drinkers(name, addr) and Favorite(drinker, beer) combine to make Drinker1(name, addr, favBeer).

Handling Weak Entity Sets
解決 weak entity set 轉化成 relation 時，必然要為它準備一個 complete key 來辨識不同實體於關連到不同的實體集合。如果一個 supporting relationship 不具有 attribute，而且屬於一個冗餘的紀錄，那麼把 relationship 另一方 “one” 的 entity set 的 key 值加入到 weak entity set 所創建的 relation。
從之前的例子，把球隊名稱 team(name) 加入到 Player(number, name) 中，變成 relation Player(number, team_name, name)，這是在之前的 Combining Relations 中，就已經得到的結果，因此就可以忽略。
- Relation for a weak entity set must include attributes for its complete key (including those belonging to other entity sets), as well as its own, nonkey attributes.
- A supporting relationship is redundant and yields no relation (unless it has attributes).
Subclasses: Three Approaches
- Object-oriented : One relation per subset of subclasses, with all relevant attributes.
  父類別和子類別分開，各自成為一個 relation。
- Use nulls : One relation; entities have NULL in attributes that don’t belong to them.
  把子類別的屬性上提，如果父類別不存在該屬性則填入 null。
- E/R style : One relation for each subclass:
  存在父類別和子類別的 relation，但是共同屬性都會在父類別中，子類別只保留 key 和新增加的屬性。
  - Key attribute(s).
  - Attributes of that subclass.

Read More +

2014-12-30

學校課程/資料庫系統

資料庫正規化 Database BCNF

Problem

stackoverflow : Decomposing a relation into BCNF.

資料庫講到正規化，針對 1NF, 2NF, 3NF 的做法。正規化的目的
－要避免資料重複或相互矛盾的情形，並使資料庫在使用時能更有效率、更容易維護。

關於 1NF, 2NF, 3NF 的細節操作就上網查吧。也就是在還沒正規化前，可能會遭遇到

修改一個 f(x) = y 值時，可能修改非常大量的 row，因為有可能 f(x, z) = y，無用相依關係的 z 很多。
刪除某個元素時，造成 y 值沒有被任何相依，導致 y 的資訊無法存入資料庫。 … 如此類推。

現在給定 relation 和 functional dependencies，正規化它們！

舉個例子：(直接從 stackoverflow 的例子來說)

Relation : R(A, C, B, D, E)
Functional dependencies (FD): A -> B, C -> D

Functional dependencies 簡單來說就是一個決定性的一對多、或者是一對一的相依關係，可以根據 LHS (left hand side) 決策 RHS (right hand side)，當然 function dependencies 的訊息只有部分，也暗示著具有遞移關係。因此例如 A->B, B->C 則可以推論出 A->C 。

定義 X+ 為 X 的遞移閉包，因此假設知道 A->B, B->C 則 {A}+ = {A, B, C} ，假設知道 AB->C 則 {A, B}+ = {A, B, C}

回到例子，每一步要找到違反 BCNF 規則 (violates BCNF) 的 R’ 進行拆分。怎麼樣才算違反 BCNF 規則？可以找到一個 X != X+ != [R’ all attributes] ，就可以對 R’ 進行拆分。

Step 1:
S = {ABCDE}       
    R' = ABCDE, 
    find {A} != {A, B} != {A, B, C, D, E}, 
    R1 = (X+) = {A, B}
    R2 = R' - ((X+) - X) = {A, C, D, E}
Step 2:
S = {ACDE, AB}
    R' = ACDE,
    find {C} != {C, D} != {A, C, D, E}, 
    R1 = (X+) = {C, D}
    R2 = R' - ((X+) - X) = {A, C, E}
Step 3:
S = {ACE, AB, CD}
    // NOT FOUND END.

在程式實作 (非人工) 的時候，如何找到 X 滿足上述的條件很重要，由於 X 是所有變數的 subset 情況，相當於 O(2ⁿ) ，很多項目是無用的。假如 R’ = ABE ，針對 X = AC 是無效的操作，基本上只要拿所有 function dependencies 的 X = LHS 進行測試即可？這貪心的做法應該是對的。

BCNF_Decompose(R)
    find X s.t.: X != (X+) != [all attributes]
    if (not found) then “R is in BCNF”
    let Y = (X+) - X
    let Z = [all attributes] - (X+)
    decompose R into R1(X union Y) and R2(X union Z) 
    continue to decompose recursively R1 and R2

說真的，老師上課講的練習答案還有錯，講算法時也不夠確切，真不知道教授怎麼教的，還是我沒認真上課。當教授發現全班答題有極端的相似錯誤時，到底是我們學不好？還是他沒教好？教授一點都不會反思。

Sample Input

5 2
A->B
C->D
6 3
AC->B
C->D
A->EF
6 6
AB->C
BC->AD
D->E
CF->B
BC->A
BC->D 
5 3
AB->C
DE->C
B->E

Sample Output

A->B
S = {AB, ACDE}
C->D
S = {AB, CD, ACE}
  
C->D
S = {CD, ABCEF}
A->EF
S = {ABC, CD, AEF}
  
AB->C
S = {ABCDE, ABF}
D->E
S = {ABCD, DE, ABF}
  
AB->C
S = {ABD, ABCE}
B->E
S = {ABC, ABD, BE}

Solution

#include <stdio.h>
#include <string.h>
#include <map>
#include <set>
#include <queue>
#include <iostream>
#include <sstream>
#include <algorithm>
#include <assert.h>
using namespace std;
#define INF 0x3f3f3f3f
// stackoverflow Database: Decomposing a relation into BCNF
void parsingStmt(string s, string &l, string &r) {
    size_t pos = s.find("->");
    l = s.substr(0, pos);
    r = s.substr(pos + 2);
}
int toBitmask(string s) {
    int ret = 0;
    for (int i = 0; i < s.length(); i++) {
        assert(s[i] >= 'A' && s[i] <= 'Z');
        ret |= 1<<(s[i] - 'A');
    }
    return ret;
}
void printBitmask(int state) {
    for (int i = 0; (1<<i) <= state; i++) {
        if ((state>>i)&1)
            putchar(i + 'A');
    }
}
vector<int> LHS, RHS;
int findFD(int i) {
    int ss = i;
    for (int j = 0; j < LHS.size(); j++) {
        if ((LHS[j]&i) == LHS[j])
            ss |= RHS[j];
    }
    if (ss != i)
    	return findFD(ss);
    return ss;
}
vector<int> decomposing(vector<int> state, int fdlhs, int fdrhs) {
    vector<int> ret;
    for (int i = 0; i < state.size(); i++) { // X->Y
        int X = fdlhs, Y = fdrhs&state[i];
        if (Y != state[i] && (X&state[i]) == X && Y && X && X != Y) {
            ret.push_back(state[i]^X^Y);
            ret.push_back(Y);
        } else {
            ret.push_back(state[i]);
        }
    }
    sort(ret.begin(), ret.end());
    ret.resize(unique(ret.begin(), ret.end()) - ret.begin());
    return ret;
}
void printResult(vector<int> &ret) {
    printf("S = {");
    for (int i = 0; i < ret.size(); i++) {
        printBitmask(ret[i]);
        if (i != ret.size() - 1) {
            printf(", ");
        }
    }
    puts("}");
}
int main() {
    int n, m;
    string s;
    while (cin >> n >> m) {
        LHS.clear(), RHS.clear();
        for (int i = 0; i < m; i++) {
            cin >> s;
            string lhs, rhs;
            parsingStmt(s, lhs, rhs);
            LHS.push_back(toBitmask(lhs));
            RHS.push_back(toBitmask(rhs));
        }
        vector<int> ret;
        int update = 0;
        ret.push_back((1<<n) - 1);
        do {
            update = (int) ret.size();
            for (int i = 0; i < m; i++) {
                ret = decomposing(ret, LHS[i], findFD(LHS[i]));
                if (ret.size() > update) {	
                    printBitmask(LHS[i]);
                	printf("->");
                	printBitmask(RHS[i]);
                	puts("");
                    printResult(ret);
                	break;
                }            
            }
            update = ret.size() > update;
        } while (update);
    }
    return 0;
}

Read More +

2014-12-17

學校課程/資料庫系統

資料庫系統－劉寶鈞

single value function 的意思？問問 劉寶鈞 老師吧！

課程心得

強烈建議遠離 此課程，分析如下描述， 劉寶鈞 老師強烈建議把定義背得一模一樣，最好不要翻譯，因為很難達到一樣的水平，也很難說好。資料庫歷史悠久，因此歷史課程會持續一陣子，課程內容並沒有相當多的實務經驗。老師秉持著數年專業，保證嚴格批閱所有考卷。

如果沒人抱怨，我來說說這鬼畜的經歷，絕不能讓其消散。

期中考分析

保證題目描述與錯字 100% 複製考卷

傳統檔案系統為何不能共享而資料庫如何做到可以共享？(10 分)
傳統檔案系統的資料存在各自的電腦中，而且格式不一，有相當大的重複資料，由於各自管理所需要的資料，更新時間也會不一致，因此在共享的支援相當不利，共享的結果不一定是最新、不能同時匹配在不同的電腦中的資料。
資料庫系統將資料集中化管理，收集到同一個系統中，並且藉由 SQL 中的 DML 支持使用者進行共享資料的存取、檢索，由系統管理同一時間多名使用者對資料的存取。
上述為零分作答，劉寶鈞老師說明若沒提到 SCHEME DATA 一律零分。
以 Relation Model 為例說明 Data Model 之三要素。(10 分)
略說－有資料表示法、資料的操作、約束條件，舉幾個例子便可完事。
此題作答還算正常，但是沒有舉例子大致上會被扣到慘不忍睹。
比較說明 DDL 及 DML。(10 分)
略說－Data Definition Language、Data Manipulation Language，分別是定義資料庫、資料表用的，另一個是對使用者詢問、操作資料。
此題作答還算正常，但是沒有舉例子大致上會被扣到慘不忍睹。
何謂 3-value logic ？並證明 P OR (NOT P) = 1 在 2-value logic 是成立的，但在 3-value logic is not always true。(10 分)
3-value logic 分別為 true, false, unknown。

在 2-value 中

P	NOT P	P OR (NOT P)
T	F	T
F	T	T

在 3-value 中

P	NOT P	P OR (NOT P)
T	F	T
F	T	T
unknown	unknown	unknown

用 0 表示 false, 1 表示 true, 1/2 表示 unknown，AND = MIN, OR = MAX, NOT = 1 - x。
=> P = 1/2, P OR (NOT P) = MAX(0.5, 1 - 0.5) = 0.5 = unknown。

unknown 不屬於 true，因此 3-value 在 P OR (NOT P) = 1 not always true。
以上作答零分，劉寶鈞老師在考卷上對 unknown 用紅筆寫了 What ? 一開始直接零分，之後才勉為其難拿到五分。投影片上面也這麼寫的，到底在 What ? 什麼勁，你自己拿出來講的東西上都這麼寫，寫下去分數還沒有？

寫出若含 NULL value 五個 single value function 的規則。(10 分)
WHAT the fuck about single value function ?
略列表 AND OR NOT EQUAL PRODUCT 的幾種情況。
上述為零分作答，劉寶鈞老師說明 single value function 的要寫出 aggregate function。我問老師「為什麼不直接寫 aggregate function？」老師回答道「就是故意不這麼寫。」
寫出 SQL query 之 SELECT, FROM, WHERE, GROUP BY, HAVING 之義意。(10 分)
錯字直接按表抄，這一題原本對於 HAVING 的解釋不夠完善，掛掉直接只剩下五分。WTF，五個定義錯一個就直接砍一半分數？對於 HAVING 只有寫提供 WHERE 無法進行 aggregation function 的判斷條件，必須與 GROUP BY 一起使用。這樣難道錯了嗎？GROUP BY 都解釋了，你還說我錯？

結語

我不是肚子裡的肥蟲，一定是我蠢得無可救藥，拿了不及格的成績？

很久沒有衝動想要殺人，這下子又開始想殺人。

助教不替老師改考卷，讓老師這樣改考卷行嗎？

我是個壞學生，這門課真的氣死人，出口罵髒話根本不足以洩憤。

Read More +