Question keep database synchronized when I have multiple clients ?

giovaniluigi

Member
Joined
Jan 9, 2011
Messages
16
Programming Experience
5-10
I'm a beginner in databases, and I'm developing a simple application.
I'm using SQL Server in my shared server to hold all the database.

Database is accessed by client app using an internet connection.

My client application is for Windows Forms developed in VB.NET using ADO.NET.
I have inserted some controls using the built-in tools in visual studio 2010 to create the database link.
The connection is done when app starts by a TableAdapter component.
After this, I use a dataset to store the data.

My question is how to update the online database, keeping it synchronized ?
For example: My friend and I open the app at the same time and we start working, how can I do to avoid conflicts and data loss ?
Is there any special feature for doing this ?
 
What you are talking about is called "concurrency" and there are three ways to handle it:

1. Last in wins: This means that the last data to arrive is saved regardless. This can result in data loss if, for example, two users retrieve the same data and user 1 edits column A and saves and then user 2 edits column B and saves. In that case, the original value of column A will be saved again by user 2 and will overwrite the new value saved by user 1, so user 1's changes are lost.

2. Pessimistic concurrency: This means that any data retrieve by a user is locked so no other user can access it, except possibly in a read-only manner. This guarantees that data loss will not occur but it also possibly denies users access to data that will never be edited. For instance, if user 1 retrieve 10,000 records then all those 10,000 will be locked. User 1 may only edit a small number of those records, or possibly none at all, but user 2 is still unable to access them until user 1 releases the lock. This is the default concurrency method of ADO.

3. Optimistic concurrency: This means that all users are given free access to all data with no locks applied, but a check is performed when saving to make sure that the data currently in the database is the same as it was when the records being saved were retrieved. If the data is the same then the changes are saved but if the data is not the same then the behaviour is application-specific, i.e. it's up to you to write code to handle it.

When using ADO.NET, many code generation tools will use optimistic concurrency by default. It involves writing more complex SQL code in your UPDATE statement and then handling any ConcurrencyException that might be thrown when saving data. If you search online for information on optimistic concurrency and ADO.NET then you can find some specifics on how to implement the various options. Usually you would notify the user that the data has changed, reload the data and then let them check, re-edit if required and then save again. When you reload you can replace the data you have or merge the two.
 
Thanks jmcilhinney for your clear explanation.

I just have one doubt.

The "Pessimistic concurrency" is the default according to your explanation, but I can't see it happening in my code.

When I update my local dataset it should be locked for other users right ?
I understood that when I start to update it locks, then I finish to update and it is released correct ? so it will be locked for a very small slice of time.

Imagine a table with the following columns:

Name - Age - Country

Now imagine that I have two records in this table:

1 - John - 25 - USA
2 - Marie - 22 - Germany

Then I'm editing the record #1 while another person is editing the record #2
Each of us has a copy of table in dataset.

Now, how can I lock to avoid the other person to edit record #1 and lock me to edit record #2 ?

I could do it manually by code with a column inside the table to control it but I would like to advise the user when I release the record that I'm editing so this person don't need to manually refresh.
I think that ADO.NET should have something about that already implemented... I just don't know...
Because if I try this example in my code now, what will happen seems like the "Last in wins". Each person will update a new copy of local dataset.
 
The "Pessimistic concurrency" is the default according to your explanation, but I can't see it happening in my code.
I said that pessimistic concurrency is the default in ADO, not in ADO.NET. They are two different things. ADO was the default data access technology in VB6, superseding DAO. ADO.NET is the default data access technology for VB.NET and other .NET languages too.

If you want to implement table- or row-level locking in ADO.NET then you'll have to do it yourself. I would recommend against it unless you're sure that there are likely to be numerous collisions and that data is not going to be locked for long periods of time. If that's the way you want to go then I'm sure that you would be able to find examples online. I've seen it mentioned on this very site but it's not something that I've ever done or plan to do unless it's required by a customer.
 
ok, my bad... I understood now.

Just two more questions... before looking at internet examples...

Imagine that I hired you to develop a very simple database with one table but a lot of users, what strategy you would take to treat the collisions or avoid them, and also keep the system updated for all users ?
(By keeping the system updated I meant that if one user submit an update all the other users get this update automatically)

Do you think that using a Timer to check if there is some lock/release is a good idea ?

Please, its just to capture the idea of a experient person, I will research on internet about this...
Thanks.
 
With regards to concurrency, I would do pretty much what I said before: implement optimistic concurrency in my SQL, handle any ConcurrencyExceptions, notify the user of the issue, reload the data, have the user inspect and re-edit the data as required and then save again.

As for keeping data up to date for all users, there are really only two ways to do that: notification (push) and polling (pull). Notification is not implemented natively in many databases and implementing your own would be tricky. If you're using SQL Server and VB.NET then you can use the SqlDependency, but that is not intended for systems with large numbers of clients, as only a small number of dependencies should be registered at a time.

That basically leaves polling, which would involve, as you suggest, a Timer in the application that queries the database periodically. The thing is, you generally don't want to be retrieving all the data every time when there's a high likelihood that only a small amount, or possibly none will have changed. For that reason, it's a good idea to add a column to each relevant table to contain the last update time. You can then retrieve all the data the first time and then each subsequent query can retrieve only data that has a last updated value that is later than the latest one in your local data.

That also means that you cannot delete any data, because you would not be able to propagate that change to a client. You would need to add a column to flag records that have been logically deleted and then ignore those record in the application. That's a common thing to do anyway. Many systems never delete a record just so that they have an audit record of the data that existed.
 
Well, it seems that the best approach is to use date and time in records and maybe also a Timer to check them and eventually inform user about changes.

Of course that this is my opinion for my application...
But it seems that we have not too much options also so the Date and Time in record seems to be the most reliable.

Thanks again for your help. :victorious:
 

Latest posts

Back
Top