The ER model describes data as entities, relationships and attributes. The basic concept introduced in ER model is an entity, which is a thing or object in the real world with an independent existence. An entity may be an object with a physical existence (for example, a particular person, car, house or employee) or it may be an object with a conceptual existence (for instance, a company, a job or a university course).
Each entity has attributes - the particular properties that describe it. For example, an EMPLOYEE entity may be described by the employee's name, age, address, job and salary. Each entity (row in a table) will have a value for each of its attributes.
Attribute Types:
Attributes that define an entity may be classified into any one of the following types:
- Simple or composite
- Single-valued or multi-valued
- Stored or Derived
Composite attributes are attributes that can be sub-divided into multiple simple attributes. For example, the Address attribute of the EMPLOYEE entity can be subdivided into Stree_Address, City, State and Zip. On the other hand, attributes that can be divisible further are called simple or atomic attributes.
Single Vs Multi-valued Attributes:
Attributes that can hold single value for describing an entity are called single-valued attributes. Most of the attributes of an entity are of this type. For example, age is a single-valued attribute of a person. In some cases, an attribute can have a set of values for the same entity - for instance, qualification attribute of a person.
One person may not have any qualification, another person may have obtained one degree, and a third person may have two or more degrees; therefore, different people can have different number of values for the qualification attribute. Such attributes are called multivalued attributes. A multivalued attributes may have lower and upper bounds to constrain the number of values allowed for each individual entity.
Stored Vs Derived Attributes:
In some cases, two (or more) attribute values are related - for example, the age and birth_date of a person. For a particular person entity, the value of age can be determined from the current (today's) date and the value of that person's birth_date. The age attribute is hence called a derived attribute and is said to be derivable from the birth_date attribute, which is called a stored attribute.
Some attribute values can be derived from related entities; for example, an attriute number_of_employees of a DEPARTMENT entity can be derived by counting the number of employees related to (working for) that department.
Null Values:
In some cases, a particular entity may not have an applicable value for an attribute. For example, the Apartment_Number attribute of an address applies only to addresses that are in apartment buildings and not to other types of residences, such as single-family homes. Similarly, a College_Degree attribute applies only to people with college degrees. For such situations, a special value called NULL is introduced. The meaning of the value NULL is not applicable in this context.
An address of a single-family home would have NULL for its Apartment_Number attribute, and a person with no college degree would have NULL for College_Degrees. NULL can also be used if we do not know the value of an attribute for a particular entity. For example, if we do not know the home phone number of a person, then it indicates that the data is unknown.
Entity Types and Keys
A database usually contains group of entities that are related to each other. Each entity in a database refers to the table used for storing the details about a particular entity in the real world. An entity is described using a name, a set of attributes and one or more constrains to be imposed on the entity.
An entity type refers to the schema used for describing the structure of an entity in a database. It defines a collection (or set) of entities that have the same attributes. For example, the employee entity in a company database has the schema that defines the structure for storing the information of hundreds of employees. Each employee record in the employee table share the same set of attributes, but each entity (row) has its own value(s) for each attribute.
Keys are nothing but attributes (one or more) of an entity that distinguishes tuples within a given relation based on the values that are stored in them collectively. That is, the values of attributes forming the key uniquely identify the tuple in a relation. In other words, no two tuples in a realation are allowed to have exactly the same value for all attributes used for forming the key.
A superkey is a set of one or more attributes that, taken collectively, allow us to identify uniquely a tuple in the relation. For example, the customer_id attribute of the relation customer is sufficient to distinguish one customer tuple from another. Thus customer_id is a super key. Similarly, the combination of customer_name and customer_id is a superkey for the relation customer. But, the customer_name attribute of customer is not a super key, because several people might have the same name.
Candidate key is a super key for which no proper subset is a super key. A candidate key is also known as minimal super key. It is possible that several distinct sets of attributes could serve as a candidate key. For instance, in customer relation, both {customer_id} and {customer_name, customer_street} are candidate keys. Where as {customer_id, customer_name} is not a candidate key, since the attribute customer_id along is a candidate key.
Primary key is a candidate key that is chosen by the database designer as the principal means of idneityfing tuples within a relation. A key (whether primary, candidate or super) is a property of a relation that imposes certain constraint on the entire realtaion. It makes sure that no two individual tuples have the same value on the key attributes at the same time.
The primary key must be chosen such that its attribute values are never, or very rarely, changed. For instance, the address field of a person should not be part of the primary key, since it is likely to change. Social-security numbers (SSN), on the other hand, are guaranteed to never change. Similarly, Unique Identifiers (UID) generated by enterprises may be used as a primary key of a person within that particular organizataion.
A foreign key in a relation is a super key of antoher realtaion with the help of which realtionship among the tuples of those two realations can be specified. For instance, a relation schema, say r1, may include among its attributes the primary key of antoher relation schema, say r2. This attribute is called a foreign key from r1, referencing r2.
Here is an example of foreign key: the attribute branch_name of Account_schema is a foreign key from Account_schema referencing Bracnch_schema, since branch_name is the primary key of Branch_schema.
Schema Diagram:
It is customary to list the primary key attributes of a relation schema before the other attributes; for example, the branch_name attribute of Branch_schema is listed first, since it is the primary key.
A database schema, along with primary key and foreign key dependencies, can be depicted pictorially, by schema diagram. Here is a schema diagram for the banking enterprise:
In a schema diagram, each relation appears as a box, with the attributes listed inside it and the realation name above it. If there are primary attributes, a horizontal line crosses the box, with the primary attributes listed above the line. Foreign key dependencies appear as arrows from the foreign key attributes of the referencing relation to the primary key of the referenced relation.