Question Split, search and append files

Pie Captain

Member
Joined
Mar 31, 2010
Messages
14
Location
Bishops Waltham, Hampshire, United Kingdom
Programming Experience
1-3
Hi All,
I have 270 files that I need to merge.
These files are basically database files containing information strings.

so far straight forward. the problem is that some of the information strings are duplicated so i need to be able to extract these informations strings from file2 and ensure they don't exist in file1 before appending the information to file1

the other problem I have is that there is no common delimiter. All I know is that the information strings are 216 bytes long.

I think I need to use a byte array, but I'm really not sure how!

Sorry if this is really simple and has been covered before but i haven't been able to find it :(

Thanks in advance for any help!
 
Using streamreaders and string functions (Exists, Contains, Substring, etc.) you should be able to do what you need. If there is a key value that you can pull from each "line" of information possibly store it in a hashtable (key, value) or something to ensure uniqueness.
 
i had a look into StreamReader and found this site

.NET Streams Explained - O'Reilly Media

From that I mashed together the code below


VB.NET:
Imports System.IO
Public Class Form1

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Dim di As New IO.DirectoryInfo(System.Environment.CurrentDirectory)
        Dim diar1 As IO.FileInfo() = di.GetFiles("*.crd", SearchOption.AllDirectories)
        Dim dra As IO.FileInfo
        For Each dra In diar1
            If Not dra.FullName = System.Environment.CurrentDirectory & "\Output.crd" Then ListBox1.Items.Add(dra.FullName)
        Next

    End Sub

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim s1, s2 As FileStream
        Dim br As BinaryReader
        Dim bw As BinaryWriter
        s2 = New FileStream(System.Environment.CurrentDirectory & "\Output.crd", FileMode.Append, FileAccess.Write)
        bw = New BinaryWriter(s2)
        For Each item In ListBox1.Items
            s1 = New FileStream(item, FileMode.Open, FileAccess.Read)
            br = New BinaryReader(s1)


            Dim byteRead As Byte
            Dim j As Integer
            For j = 0 To br.BaseStream.Length() - 1
                byteRead = br.ReadByte
                bw.Write(byteRead)

            Next
            br.Close()

        Next
        bw.Close()
    End Sub
End Class

This basically works, but doesn't check for duplicate datachunks

but I'm not sure how to make it read 216 bytes at a time and search the output file for that chunk

Pretty sure the answer is in here though,

VB.NET:
            Dim byteRead As Byte
            Dim j As Integer
            For j = 0 To br.BaseStream.Length() - 1
                byteRead = br.ReadByte
                bw.Write(byteRead)

I just need a bit of a poke in the right direction
 
is this it?

VB.NET:
            Dim byteRead(215) As Byte
            Dim j As Integer
            For j = 0 To br.BaseStream.Length() - 1
                byteRead = br.ReadBytes(215)


                bw.Write(byteRead)

            Next

it seems to still write the file correctly

i just need to put in something like

If NOT byteRead exists in s2 then bw.Write(byteRead)

obviously thats just a psuedocody version of what i need...

can i even read a stream thats been used with the options FileMode.Append and FileAccess.Write?
 
Here's the documentation for BinaryReader.ReadBytes Method (System.IO)

In the 2nd example they show reading a specified chunk size from the file.

1. You'll want to look into creating an IEnumerable of some sort to store the records from the file (I'd suggest a List(Of String)).

2. Once you've read a record then check to see if the list contains the record. If it doesn't then add it to the list and write it to the file. If the list does contain the record then it's a duplicate.

3. Repeat until you've read every record from the file.
 
Cheers for that! i did post closer to the time, but I got logged out and it was late and i needed to go to bed!

I ended up cheating i guess. I read the output file to a list, separated into chunks, then as i read each chunk in from the sourcefile i updated the output list and the output file if said chunk didn't already exist.

simple but effective :)

cheers for all the help all!
 
Back
Top