Keys
- A Key is one or more columns that is used to identify particular rows in a relation.
- Keys with two or more columns are called composite keys
- Superkey
- From Connolly
- A superkey is an attribute, or set of attributes that uniquely identify a tuple within a relation.
- SKU, SKU_Description, (SKU, SKU_Description), (SKU, Quantity) ... are all superkeys
- Candidate Keys
- A candidate key is a key which determines all of the columns in a relation.
- SKU_DATA has two candidate keys, SKU, SKU_Description
- ORDER_ITEM has just one, (OrderNumber, SKU),
- (Quantity, Price) does not determine the entire table, just ExtendedPrice
- Candidate keys are defined in Connolly as superkeys where no proper subset is a superkey within the relation.
- Candidate keys K of a relation R have two properties
- Uniqueness - each tuple of R is uniquely identified by a value of K
- irreducibility - no proper subset of K has the uniqueness property.
- Connolly states "Identifying a candidate key requires that we know the 'real world' meaning of the attribute(s) involved so that we can decide whether duplicates are possible."
- By design, we know SKUs are unique.
- But we may have two SKU_Descriptions which are not.
- Primary Keys
- A primary key is a candidate key that is selected to identify tuples uniquely within a relation.
- The DBMS uses primary keys to organize data in a relation.
- In the worst case, the entire set of attributes is a primary key
- Generally the smaller the better for primary keys.
- So SKU would be preferred to SKU_Description,
- The uniqueness of a primary key is a entity integrity constraint
- Alternate Keys
- Candidate keys which are not selected as the primary key are known as alternate keys
- It is sometimes useful to note alternate keys
- We will do this in the next chapters.
- Surrogate Keys
- A surrogate key is an artificial column that is added to a relation to serve as the primary key.
- This is used in the case of complex or inappropriate candidate keys.
- Consider STUDENT(FirstName, LastName, DOB, EMail, AdmitDate)
- There is no determinant here.
- So a StudentNumber field would be introduced.
- This would be a surrogate key.
- This can (and should) be generated by the DBMS
- Probably an integer value, which will allow for fast indexing.
- Surrogate keys have several problems
- They are artificial and have no meaning
- They may be different in different databases for the same items.
- The Social Security Number problem. They may be treated as something with meaning, and used outside of the context.
- Since they have no meaning, data will not be organized in any "useful" way, ie SKU will organize data along SKUs, which is what is stored in the data.
- Subtle information security: difference in keys can indicate the number of transactions which have occurred.
- Database users might make assumptions based on key values (Higher keys mean later inserts ....).
- Foreign Keys
- An attribute or set of attributes, within one relation that matches the candidate key of some (possibly same) relation.
- It represents the relationship between tuples in two relations.
- A referential integrity constraint forces the foreign key to match the corresponding value.
- He notes on page 131
- There are three constraints mentions so far
- Domain integrity constraint
- entity integrity constraint
- Referential integrity constraint
- These three create database integrity which means that data in a database is useful and meaningful.