Question convert humbly seeks help storing/accessing large dataset

vector07

Member
Joined
Nov 23, 2009
Messages
5
Programming Experience
3-5
Hey all,

I've written a stock analysis program in Excel's VBA, but I've decided to port the program to VB.net. And by port I mean... recreate, more like. A few problems in Excel inspired me to make the switch. 1. dealing with large amounts of data was SLOW. Especially things as simple as saving my file took about 25 seconds. 2. the methods for downloading files off the internet and importing them were flawed, slow and unwieldy. Other things too, but I digress.

THE QUESTION
VB.net is really confusing to me on the subject of data storage. In Excel VBA I used massive arrays to store all my data in memory. However, since redim preserve is gone (or out of vogue?), what kind of data structure in VB.net is best suited for holding large quantities of dynamically resizeable data? So far I've encountered Collections (like StringCollection), arraylist, dataset (read about on this forum), and a few more. I'm having trouble understanding the difference between all of these; and most importantly, I haven't found any posts on this forum or elsewhere espousing the relative speed of storing data to, and accessing from all these structures.

MORE INFORMATION
What does the data look like? I'm downloading .csv (OHLC) files from the interwebs that contain ~5-7 columns of data that will range from 400 to 3600 entries in each file; and there will be 500 to 3000 files. On top of that, my program will be effectively adding columns parallel to these files (probably in new files..). Just as an example: I load the basic stock data for MSFT. From that data I generate a new index, the RSI for example (a stock analysis index), and I save the RSI data in a new file. I want to calculate the RSI for every entry of every stock file. Make sense?

So you know where I'm coming from--I'd call myself an expert beginner. I've programmed in C, C++, VB6, etc, but never really mastered much of the technical aspects of any language. I'm new to the .NET framework. I *know* what I'm describing is essentially a relational database, but I really would prefer not using SQL or other true database structures.

LASTLY. Honestly the hardest thing about this transition is the difference in the built-in help files. In excel VBA everything was extremely well described and documented, with examples. In VB.NET I've run into alot of poorly documented "namespace/class" style help files that remind me more of old clunky C++ rather than VB6. What do you all use as reference?
 
It sounds to me like you need to move to a database. If you have a budget, I'd look at SQL Server...if you don't I'd look at MySQL

As far as assistance, goes you already have one invaluable ally in your corner...these forums. :)
 
As I said in the original post, I really really don't want to use SQL. I have no budget, this is just a fun, large scale hobby project.
 
For that I would say use the free version of Sql Server (Express). The files are actually not to large but in order to store everything permanetly and together it would be best added to a database such as Sql Server as suggested or even Access.
 
Let me clarify; I have no intention of using access, SQL, mySQL or any external database unless it turns out that using built-in vb.net data structures is incredibly slow. To reiterate, the thrust of my question is:

THE QUESTION
VB.net is really confusing to me on the subject of data storage. In Excel VBA I used massive arrays to store all my data in memory. However, since redim preserve is gone (or out of vogue?), what kind of data structure in VB.net is best suited for holding large quantities of dynamically resizeable data? So far I've encountered Collections (like StringCollection), arraylist, dataset (read about on this forum), and a few more. I'm having trouble understanding the difference between all of these; and most importantly, I haven't found any posts on this forum or elsewhere espousing the relative speed of storing data to, and accessing from all these structures.
 
I haven't found any posts on this forum or elsewhere espousing the relative speed of storing data to, and accessing from all these structures.
Thats because no uses these structures to store data...they use a database.
 
I see.

My data is stored in these text files; I don't mean to say that those structures store data all the time. The structures I mentioned are for reading parts or all of the data into memory when they used during program runtime. Are you saying that *no one* does this anymore? As one poster above pointed out, this is alot of data but it's not *that* much.
 
The files themselves are not large each but up to 500 files, the data should be consolidated in a database and only the info needed at a time should be queried and kept in memory. You will get much better performance this way
 
I see.

My data is stored in these text files; I don't mean to say that those structures store data all the time. The structures I mentioned are for reading parts or all of the data into memory when they used during program runtime. Are you saying that *no one* does this anymore? As one poster above pointed out, this is alot of data but it's not *that* much.

You store all your data in a text file, you read it into memory, you manipulate it, and you save it to a text file

You store all your data in a text file database, you read it into memory, you manipulate it, and you save it to a text file database

?

Is your text file thus just a database, except it lacks any form of sophisticated querying at all, leaving you to read it in, do all the manipulation and addition yourself, etc

i.e. youre writing your own kind of database - reinventing a wheel that has been invented very well already.

If you don't want to transition to using a proper database straight out, try using datasets: read/writexml will load and save, and they support some limited kind of data manipulation through the DataTable.Compute Method (System.Data) and DataColumn.Expression Property (System.Data)
And eventually you'll get tired of how limited they are and move to a full DB - If youre doing financial amnipulation I'd use OracleXE; it's free and IMHO much better than SQLserver. It integrates with the IDE too

For the main of my documentation I just google the method I want and read MSDN. To find the method I want, I google and read forums posts. I also use intellisense and the Object Browser
 
MySQL is free as well
But the integration with the VS IDE sucks which gives it a vertical learning curve. Free does not always come cheap! Try OracleXE, non commercial Oracle, SQLSX or Access instead
 
You are correct, I'm on a course to reinvent the wheel. It's only because from where I stand, it looks easier to do that than learn how to operate one of these databases that you're talking about. Every tutorial on how to incorporate a database I've found looks like it takes hours just to set it up; plus, I know zero SQL. Is it not actually that complicated?
 
The difficulty involved in any learning curve is dependent on the individual doing the learning.

Take a look at MySQL, as well as the others mentioned, and you decide.

SQL, the language, is no more or less difficult to learn than any other language. There are a number of online SQL tutorials...take a look at one, and again, you decide.
 
Back
Top