Keys

A Key is one or more columns that is used to identify particular rows in a relation.
Keys with two or more columns are called composite keys
Superkey
- From Connolly
- A superkey is an attribute, or set of attributes that uniquely identify a tuple within a relation.
- SKU, SKU_Description, (SKU, SKU_Description), (SKU, Quantity) ... are all superkeys
Candidate Keys
- A candidate key is a key which determines all of the columns in a relation.
  - SKU_DATA has two candidate keys, SKU, SKU_Description
  - ORDER_ITEM has just one, (OrderNumber, SKU),
    - (Quantity, Price) does not determine the entire table, just ExtendedPrice
  - Candidate keys are defined in Connolly as superkeys where no proper subset is a superkey within the relation.
- Candidate keys K of a relation R have two properties
  - Uniqueness - each tuple of R is uniquely identified by a value of K
  - irreducibility - no proper subset of K has the uniqueness property.
- Connolly states "Identifying a candidate key requires that we know the 'real world' meaning of the attribute(s) involved so that we can decide whether duplicates are possible."
- By design, we know SKUs are unique.
- But we may have two SKU_Descriptions which are not.
Primary Keys
- A primary key is a candidate key that is selected to identify tuples uniquely within a relation.
- The DBMS uses primary keys to organize data in a relation.
- In the worst case, the entire set of attributes is a primary key
- Generally the smaller the better for primary keys.
- So SKU would be preferred to SKU_Description,
- The uniqueness of a primary key is a entity integrity constraint
Alternate Keys
- Candidate keys which are not selected as the primary key are known as alternate keys
- It is sometimes useful to note alternate keys
- We will do this in the next chapters.
Surrogate Keys
- A surrogate key is an artificial column that is added to a relation to serve as the primary key.
- This is used in the case of complex or inappropriate candidate keys.
- Consider STUDENT(FirstName, LastName, DOB, EMail, AdmitDate)
  - There is no determinant here.
  - So a StudentNumber field would be introduced.
  - This would be a surrogate key.
- This can (and should) be generated by the DBMS
- Probably an integer value, which will allow for fast indexing.
- Surrogate keys have several problems
  - They are artificial and have no meaning
  - They may be different in different databases for the same items.
  - The Social Security Number problem. They may be treated as something with meaning, and used outside of the context.
  - Since they have no meaning, data will not be organized in any "useful" way, ie SKU will organize data along SKUs, which is what is stored in the data.
  - Subtle information security: difference in keys can indicate the number of transactions which have occurred.
  - Database users might make assumptions based on key values (Higher keys mean later inserts ....).
Foreign Keys
- An attribute or set of attributes, within one relation that matches the candidate key of some (possibly same) relation.
- It represents the relationship between tuples in two relations.
- A referential integrity constraint forces the foreign key to match the corresponding value.
He notes on page 131
- There are three constraints mentions so far
  - Domain integrity constraint
  - entity integrity constraint
  - Referential integrity constraint
- These three create database integrity which means that data in a database is useful and meaningful.