Arg81
Well-known member
First off - credit for this should go to PC Pro as I've reworded one of their articles.
After replying to, or seeing a lot of the same post here regarding database design, I thought that this may come in handy to readers, and if so, could be made a sticky.
Using a single-table database instead of a relational database
1) Atomic Data
Atomic data is where you try and split the data into as many columns as you can.
For example - Customer Name. In the screenshot below you can see that the first and last names are held in one column.
This isn't very good because if you want to create a query ordered by Surname, you can't use a simple query. Instead you would need to create a query that dismantles the Customer name, manipulate the data so you are left with only the surname, and then sort by this.
...Not very easy, not very user friendly, not very good.
Another error in the screenshot is that the designer is unsure how many products can be ordered. There is no minimum and there is no maximum. Because the table only allows you to record 2 product names and product quantity, if a customer orders 3 products, you have to insert another row into your main table to allow for this.
Now this causes major issues. To find out how many orders have been placed for a customer, a simple query will not suffice - counting the rows will not give the correct answer, as 2 rows in the example here equate to the same orderNo.
Also it's impossible to query how many "bolts" have been sold, because in the example there are 2 product columns, and bolt could exist in either column. You would have to create an advanced query to search multipe columns.
Normalisation
What is normalisation?
Well, it means that data pertaining to distinctly different objects should be stored in seperate tables, but in the example of the single table model, we have customer data alongside product data.
Well what does it mean?
Basically, you end up with large amounts of duplicate data, which can cause problems down the line.
For instance; Each customer name must be entered every time that customer places an order, and product names must be typed in again and again.
Another example, you want to be able to display customers. OK, so having duplicate names in fields you can run a "SELECT DISTINCT" query....but what if someone has taken shortcuts, and instead of writing Gordon Green, instead writes G Green. Are these 2 people actually the same person? Or is it coincidence they share the same surname and starting first name letter?
what about typo's? In the example below, is Anna Langsdon the same person as Anne Langsdon just spelt incorrectly? You would assume so because they share the same telephone number...
Here, your distinct query will bring back incorrect results to the user.
How to solve these problems
simple - do not use a single table database!!! Microsoft Access, SQL, Oracle and the rest are designed as RDS (Relational Database System) which means you can link (relate) multiple tables to one-another.
In our example database, you would create a table for Customers, a table for Products, a table for Orders and a final table for OrderDetails (explained later). Below is an example setup
Customer table
CustomerID | First Name | Surname | Address1 | TownCity | PostCode
Product table
ProductID | ProductName
Order table
OrderID | CustomerID | OrderDate
OrderDetail table
OrderID | ProductID | Quantity
So what is the orderDetail table for? Well, after normalising the data and creating our relations, we see that ONE Customer can make MANY Orders of MANY Products.
This is called MANY-to-MANY relationship, and requires 2 ONE-to-MANY relationships to break it up - we basically need a "middle man" table sat that then links ONE order to MANY orderDetails and ONE Product to MANY orderDetails.
The final relationships now become apparent:
Customer Table (customerID) one --------- many (customerID) Order Table
^^ one customer can have many orders.
Order Table (OrderID) one ---------- many (OrderID) OrderDetail Table
^^ one order can have many details
Product Table (ProductID) one ----------- many (ProductID) OrderDetail Table
^^ one product can appear in many details
Primary Keys & Foreign Keys
The primary key in each table is it's "unique identifier" The PK for customer's is CustomerID, the PK for Orders is OrderID and the PK for Products is ProductID. The OrderDetails table is a "linking" table as described earlier to split a many-to-many relationship, and therefore doesn't require a Primary Key.
However, it does have 2 Foreign Keys. A FK is a column that links the table to another table with the same column name.
As you can see from the relations above, the customer table CustomerID is the Primary Key, whilst the CustomerID in the Order table is the Foreign key.
Once you have your relations in place, you can then add Referencial Integrity
This is applied to the relational "joins" and ensures that the database engine will automatically check every entry made into a foreign key column to ensure that it points to a valid primary key.
For example: an OrderDetail can not have an OrderID = 1, ProductID = 6 when ProductID 6 doesn't even exist in the Product table.
Summary
The relational database model offers major advantages. This include:
- The data is stored as efficiently as possible. I.E By using multiple tables and linking them, each piece of data is stored once and only once
- Integrity of data is protected. As said earlier, an order of ProductID 6 cannot take place if ProductID 6 doesn't appear in the Product table!
- All queries to the database will bring back accurate and valid results.
As I say, a lot of the above is from an article that appears in this month's PC Pro.
I've tried to reword it and add to it so that a lot comes across understandably to the reader of this post.
I hope that people can take this post to help them develop as a Database Administrator / Programmer, and that others will pass this on to those who need to understand the basics.
There are many articles available on the net or in books - the major thing you need to remember is to always always "normalise" your data first, so that you remove duplication, and then use relational tables to link your data together.
Happy database creating / programming!!!
After replying to, or seeing a lot of the same post here regarding database design, I thought that this may come in handy to readers, and if so, could be made a sticky.
Using a single-table database instead of a relational database
1) Atomic Data
Atomic data is where you try and split the data into as many columns as you can.
For example - Customer Name. In the screenshot below you can see that the first and last names are held in one column.
This isn't very good because if you want to create a query ordered by Surname, you can't use a simple query. Instead you would need to create a query that dismantles the Customer name, manipulate the data so you are left with only the surname, and then sort by this.
...Not very easy, not very user friendly, not very good.
Another error in the screenshot is that the designer is unsure how many products can be ordered. There is no minimum and there is no maximum. Because the table only allows you to record 2 product names and product quantity, if a customer orders 3 products, you have to insert another row into your main table to allow for this.
Now this causes major issues. To find out how many orders have been placed for a customer, a simple query will not suffice - counting the rows will not give the correct answer, as 2 rows in the example here equate to the same orderNo.
Also it's impossible to query how many "bolts" have been sold, because in the example there are 2 product columns, and bolt could exist in either column. You would have to create an advanced query to search multipe columns.
Normalisation
What is normalisation?
Well, it means that data pertaining to distinctly different objects should be stored in seperate tables, but in the example of the single table model, we have customer data alongside product data.
Well what does it mean?
Basically, you end up with large amounts of duplicate data, which can cause problems down the line.
For instance; Each customer name must be entered every time that customer places an order, and product names must be typed in again and again.
Another example, you want to be able to display customers. OK, so having duplicate names in fields you can run a "SELECT DISTINCT" query....but what if someone has taken shortcuts, and instead of writing Gordon Green, instead writes G Green. Are these 2 people actually the same person? Or is it coincidence they share the same surname and starting first name letter?
what about typo's? In the example below, is Anna Langsdon the same person as Anne Langsdon just spelt incorrectly? You would assume so because they share the same telephone number...
Here, your distinct query will bring back incorrect results to the user.
How to solve these problems
simple - do not use a single table database!!! Microsoft Access, SQL, Oracle and the rest are designed as RDS (Relational Database System) which means you can link (relate) multiple tables to one-another.
In our example database, you would create a table for Customers, a table for Products, a table for Orders and a final table for OrderDetails (explained later). Below is an example setup
Customer table
CustomerID | First Name | Surname | Address1 | TownCity | PostCode
Product table
ProductID | ProductName
Order table
OrderID | CustomerID | OrderDate
OrderDetail table
OrderID | ProductID | Quantity
So what is the orderDetail table for? Well, after normalising the data and creating our relations, we see that ONE Customer can make MANY Orders of MANY Products.
This is called MANY-to-MANY relationship, and requires 2 ONE-to-MANY relationships to break it up - we basically need a "middle man" table sat that then links ONE order to MANY orderDetails and ONE Product to MANY orderDetails.
The final relationships now become apparent:
Customer Table (customerID) one --------- many (customerID) Order Table
^^ one customer can have many orders.
Order Table (OrderID) one ---------- many (OrderID) OrderDetail Table
^^ one order can have many details
Product Table (ProductID) one ----------- many (ProductID) OrderDetail Table
^^ one product can appear in many details
Primary Keys & Foreign Keys
The primary key in each table is it's "unique identifier" The PK for customer's is CustomerID, the PK for Orders is OrderID and the PK for Products is ProductID. The OrderDetails table is a "linking" table as described earlier to split a many-to-many relationship, and therefore doesn't require a Primary Key.
However, it does have 2 Foreign Keys. A FK is a column that links the table to another table with the same column name.
As you can see from the relations above, the customer table CustomerID is the Primary Key, whilst the CustomerID in the Order table is the Foreign key.
Once you have your relations in place, you can then add Referencial Integrity
This is applied to the relational "joins" and ensures that the database engine will automatically check every entry made into a foreign key column to ensure that it points to a valid primary key.
For example: an OrderDetail can not have an OrderID = 1, ProductID = 6 when ProductID 6 doesn't even exist in the Product table.
Summary
The relational database model offers major advantages. This include:
- The data is stored as efficiently as possible. I.E By using multiple tables and linking them, each piece of data is stored once and only once
- Integrity of data is protected. As said earlier, an order of ProductID 6 cannot take place if ProductID 6 doesn't appear in the Product table!
- All queries to the database will bring back accurate and valid results.
As I say, a lot of the above is from an article that appears in this month's PC Pro.
I've tried to reword it and add to it so that a lot comes across understandably to the reader of this post.
I hope that people can take this post to help them develop as a Database Administrator / Programmer, and that others will pass this on to those who need to understand the basics.
There are many articles available on the net or in books - the major thing you need to remember is to always always "normalise" your data first, so that you remove duplication, and then use relational tables to link your data together.
Happy database creating / programming!!!