How to make a bad database

Arg81

Well-known member
Joined
Mar 11, 2005
Messages
949
Location
Midlands, UK
Programming Experience
1-3
First off - credit for this should go to PC Pro as I've reworded one of their articles.

After replying to, or seeing a lot of the same post here regarding database design, I thought that this may come in handy to readers, and if so, could be made a sticky.

Using a single-table database instead of a relational database

1) Atomic Data
Atomic data is where you try and split the data into as many columns as you can.
For example - Customer Name. In the screenshot below you can see that the first and last names are held in one column.
This isn't very good because if you want to create a query ordered by Surname, you can't use a simple query. Instead you would need to create a query that dismantles the Customer name, manipulate the data so you are left with only the surname, and then sort by this.
...Not very easy, not very user friendly, not very good.

Another error in the screenshot is that the designer is unsure how many products can be ordered. There is no minimum and there is no maximum. Because the table only allows you to record 2 product names and product quantity, if a customer orders 3 products, you have to insert another row into your main table to allow for this.

Now this causes major issues. To find out how many orders have been placed for a customer, a simple query will not suffice - counting the rows will not give the correct answer, as 2 rows in the example here equate to the same orderNo.

Also it's impossible to query how many "bolts" have been sold, because in the example there are 2 product columns, and bolt could exist in either column. You would have to create an advanced query to search multipe columns.

Normalisation
What is normalisation?
Well, it means that data pertaining to distinctly different objects should be stored in seperate tables, but in the example of the single table model, we have customer data alongside product data.

Well what does it mean?
Basically, you end up with large amounts of duplicate data, which can cause problems down the line.
For instance; Each customer name must be entered every time that customer places an order, and product names must be typed in again and again.
Another example, you want to be able to display customers. OK, so having duplicate names in fields you can run a "SELECT DISTINCT" query....but what if someone has taken shortcuts, and instead of writing Gordon Green, instead writes G Green. Are these 2 people actually the same person? Or is it coincidence they share the same surname and starting first name letter?
what about typo's? In the example below, is Anna Langsdon the same person as Anne Langsdon just spelt incorrectly? You would assume so because they share the same telephone number...
Here, your distinct query will bring back incorrect results to the user.


How to solve these problems
simple - do not use a single table database!!! Microsoft Access, SQL, Oracle and the rest are designed as RDS (Relational Database System) which means you can link (relate) multiple tables to one-another.

In our example database, you would create a table for Customers, a table for Products, a table for Orders and a final table for OrderDetails (explained later). Below is an example setup

Customer table
CustomerID | First Name | Surname | Address1 | TownCity | PostCode

Product table
ProductID | ProductName

Order table
OrderID | CustomerID | OrderDate

OrderDetail table
OrderID | ProductID | Quantity


So what is the orderDetail table for? Well, after normalising the data and creating our relations, we see that ONE Customer can make MANY Orders of MANY Products.
This is called MANY-to-MANY relationship, and requires 2 ONE-to-MANY relationships to break it up - we basically need a "middle man" table sat that then links ONE order to MANY orderDetails and ONE Product to MANY orderDetails.

The final relationships now become apparent:

Customer Table (customerID) one --------- many (customerID) Order Table
^^ one customer can have many orders.

Order Table (OrderID) one ---------- many (OrderID) OrderDetail Table
^^ one order can have many details

Product Table (ProductID) one ----------- many (ProductID) OrderDetail Table
^^ one product can appear in many details

Primary Keys & Foreign Keys
The primary key in each table is it's "unique identifier" The PK for customer's is CustomerID, the PK for Orders is OrderID and the PK for Products is ProductID. The OrderDetails table is a "linking" table as described earlier to split a many-to-many relationship, and therefore doesn't require a Primary Key.

However, it does have 2 Foreign Keys. A FK is a column that links the table to another table with the same column name.
As you can see from the relations above, the customer table CustomerID is the Primary Key, whilst the CustomerID in the Order table is the Foreign key.

Once you have your relations in place, you can then add Referencial Integrity
This is applied to the relational "joins" and ensures that the database engine will automatically check every entry made into a foreign key column to ensure that it points to a valid primary key.
For example: an OrderDetail can not have an OrderID = 1, ProductID = 6 when ProductID 6 doesn't even exist in the Product table.


Summary

The relational database model offers major advantages. This include:

- The data is stored as efficiently as possible. I.E By using multiple tables and linking them, each piece of data is stored once and only once

- Integrity of data is protected. As said earlier, an order of ProductID 6 cannot take place if ProductID 6 doesn't appear in the Product table!

- All queries to the database will bring back accurate and valid results.


As I say, a lot of the above is from an article that appears in this month's PC Pro.
I've tried to reword it and add to it so that a lot comes across understandably to the reader of this post.

I hope that people can take this post to help them develop as a Database Administrator / Programmer, and that others will pass this on to those who need to understand the basics.
There are many articles available on the net or in books - the major thing you need to remember is to always always "normalise" your data first, so that you remove duplication, and then use relational tables to link your data together.

Happy database creating / programming!!! :cool:
 

Attachments

  • tempDB.jpg
    tempDB.jpg
    69.1 KB · Views: 37
Its funny.. it sounds almost laughable that people wouldnt know this, but we were having argument #2 last night at work...

The idea was to store all the letters that need printing and merging, in one table, and print them at the end of the day.

We got into discussion about how to handle the variable number of fields on a letter - some letters have just a name and address, others have extra info like card numbers and due amounts on.

I said to handle this, one table with (at least) 3 columns would be needed to store the merge data - letter id, field name and field value - would be best, as this table can record multiple fields per letter (multiple rows per letter)

The other db guy said that one table with as many columns as there were fields on the letter with the highest field count, would be required and not all of them would be filled in, one row per letter


We debated for some time, as i was making the stand that if we did it that way and this were an Orders type table then the table would need as many columns as there were products in the products table..
He was making the point that if we did it "my way", then all databases would ever be is a bunch of 3 column tables - id, name, value, and we dont do databases like that..

Somtimes it is genuinely hard to decide what to do and in this case i chose to fall back to the purpose of a row - to store directly related data that will only ever exist in a 1:1 relationship (as in {1}:{1}, not {1}:{0,1} - to borrow regular expression syntax)

A person will only ever have one age or social security number, so these items will go on one row of one table, and the role of the table is to relate them. In this scenario I was arguing that we were relating very-variable field names with values, rather than relating values with letters. The only way to ensure that the notion of a rows purpose was maintained was to make the varying field name a row data entry rather than a column name.


A wider debate exists about this issue - it's calles O/R Mapping - if our letters were objects in VB, all stemming from a common parent but the fields they exported being the defining difference, how would we map these to a single back end table for long term storage in a scenario such as this? That's one of the aspects of an O/R Impedance Mismatch!

A conclusion was never reached.. :)
 
Back
Top