Relational Database Normalisation

database schemas

A schema is a collection of components and database objects under the control of a given database user. Each Oracle Portal application maps to an Oracle database schema. The schema stores the components owned by the application. In addition, the schema can store the database objects on which the components are based.

You can map an application to schemas currently in the Oracle database where Oracle Portal is installed. When you create the schema, you must select the Application Schema check box to designate that applications can be built in the schema.

A Short Oracle Tutorial For Beginners (ctd)

What is a relational database?

As mentioned before, a relational database is based on the separation and independence of the the logical and physical representations of the data. This provides enormous flexibility and means you can store the data physically in any way without affecting how the data is presented to the end user. The separation of physical and logical layers means that you can change either layer without affecting the other.

A relational database can be regarded as a set of 2-dimensional tables which are known as “relations” in relational database theory. Each table has rows (“tuples”) and columns (“domains”). The relationships between the tables is defined by one table having a column with the same meaning (but not necessarily value) as a column in another table.

For example consider a database with just 2 tables :

emp(id number
,name varchar2(30)
,job_title varchar2(20)
,dept_id number)

holding employee information and

dept(id number
,name varchar2(30))

holding department information

There is an implied relationship between these tables because emp has a column called dept_id which is the same as the id column in dept. In Oracle this is usually implemented by what’s called a foreign-key relationship which prevents values being stored that are not present in the referenced table.

Relational databases obtain their flexibility from being based on set theory (also known as relational calculus) which enables sets or relations to be combined in various ways, including:

  • join/intersection
  • union (i.e. the sum of 2 sets);
  • exclusive “OR” (i.e. the difference between 2 sets)
  • and outer-join which is a combination of intersecting and exclusive or ing.

The intersection or join between 2 sets (in this case, tables) produces only those elements that exist in both sets.

Therefore, if we join Emp and Dept on department id, we will be left with only those employees who work for a department that is in the dept table and only those departments which have employees who are in the emp table.

The union produces the sum of the tables – meaning all records in Emp and all records in Dept. and this may be with or without duplicates.

Let’s use the following data to provide specific examples:

Emp

Id

Name

Dept Id

1

Bill Smith

3

2

Mike Lewis

2

3

Ray Charles

3

4

Andy Mallory

4

5

Mandy Randall

6

6

Allison White

1

Dept

Id

Name

1

HR

2

IT

3

Marketing

4

Sales

5

Finance

The join of Emp and Dept. on the department id would produce the following result:

Emp.Id

Emp.Name

Dept.Id

Dept.Name

1

Bill Smith

3

Marketing

2

Mike Lewis

2

IT

3

Ray Charles

3

Marketing

4

Andy Mallory

4

Sales

6

Allison White

1

HR

The union of Emp and Dept. would produce the following results

Id

Name

1

Bill Smith

2

Mike Lewis

3

Ray Charles

4

Andy Mallory

5

Mandy Randall

1

HR

2

IT

3

Marketing

4

Sales

5

Finance

The union operator is only allowed when the number and data types of the columns in the 2 sets are the same. It is not normally be used to combine sub sections from one or more tables rather than entire tables.

There are other operators and variations but there isn’t the space or the time to provide full details in this short Oracle tutorial.

The later versions of Oracle (Oracle 8, Oracle 8i, Oracle 9i and Oracle 10g) are support both relational and object-oriented features. The relational features are more prominent at the moment, but this is beginning to change. In this context an object has both attributes and methods (programs stored with the object that performs a certain action or task) and in a true object-oriented database would belong to a class and would allow multilevel inheritance.

Relational Database Normalisation


Normalisation is the process of using a wide table with many columns and few rows and reconstructing that table into several subset tables with fewer columns and more rows. The objective is to use storage space efficiently, eliminate redundant data, reduce or eliminate inconsistent data and ease future maintenance. The most important rule of normalisation is that the normalised data must be reconstructable into the original unnormalised or flat spreadsheet-like view of the data.

Forms of Normalisation

The forms of normalisation are called normal forms.

· First Normal Form – removal of repeating groups.

· Second Normal Form – no non-key attributes must depend on a portion of the primary key.

· Third Normal Form – no attributes must depend on other non-key attributes.

A database must be in first normal form to be in second normal form and in second normal form to be in third normal form. There are additional normal forms but these additional normal forms are rarely applied. In fact it is sometimes best not to apply third and even second normal forms in order to preserve processing speed and efficiency. The concept of a data warehouse is effectively a partially denormalised database. In fact the process of denormalisation into data warehouses exhibits where relational database technology falls short of practicality. This is one possible area of implementation where object databases may become more widely used in the future.

First Normal Form

By definition, First Normal Form eliminates repeating groups. What this means is that a separate table must be made for each set of related attributes and give each table its own primary key. The best way to explain this is by example. Let us say we have a customer database where customers make orders. In a spreadsheet we would have the customer name and address repeated for every order. Application of first normal form would divide the customers and orders into two separate entities or tables. The customer table would contain customer details without order details and the order table would contain order details with only a reference to the appropriate customer.

Second Normal Form

Second Normal Form eliminates redundant data. Thus if an attribute relies on only part of a multi-valued key, that attribute must be removed to a separate table. More specifically the Second normal form requires that no non-key attributes are dependant upon a portion of the primary key. A primary key uniquely identifies a instance within an entity. Second normal form applies to tables where constituents of the primary key effectively refer to repeating groups within the same table. These duplications should be removed to an additional table. For instance, if a student takes a course then the course that the student is taking is uniquely identifed by the student and the course as the primary key. Within this entitiy we also have descriptions of the course. Therefore the course decriptions will make up the contents of the additional entity and the descriptive details of the course would be removed from the table containing the student details. The result would be two tables, one with student details and the other with course details.

Third Normal Form

Third Normal Form eliminates columns not dependant on a table’s primary key. This translates to the fact that if an attribute does not contribute to the description of a key then it must be moved to another table. Third normal form requires that all columns in a table contain data about the entity that is defined by the primary key. In other words each entity should have attributes applicable to itself only.

Fourth and Fifth Normal Forms

Fourth and Fifth Normal Forms are rarely implemented.

Fourth Normal Form

Fourth Normal Form eliminates multiple independant relationships. Thus no table can have more than a single many-to-one or many-to-many relationships which are not directly related.

Fifth Normal Form

Fifth Normal Form requires isolation of logically related many-to-many relationships.

The Truth About Normalisation

1. Using normalisation extensively can seriously affect database performance. What is theoretically elegant and mathematically correct is not always acceptable or practical in a commercial environment.

2. Due to the fact that I have personally had many years experience designing and constructing relational databases I tend to ignore the seperate steps of first, second and third normal form. I usually simply jump to a combination of all three by using all three at once. Normalisation is usually best explained by example. Examples will ensue in following pages.


request new questions/article: send email to freequestionbank at gmail dot com