download PDF Stream

developedRowan

New member
Joined
Mar 16, 2011
Messages
2
Programming Experience
5-10
I need to download a list of PDF files from an Australian Government website, and i dont want to download them manually, so i need a program to download them.

The website we are requiring documents from is: https://www.ebs.tga.gov.au/ebs/picmi/picmirepository.nsf/PICMI?OpenForm&k=A

Below is my current code:
VB.NET:
Public Class Form1
    Public agreementPassed As [COLOR=#339999]Boolean[/COLOR] = False
    Public hyperLinkList As List(Of [COLOR=#339999]String[/COLOR]) = New List(Of [COLOR=#339999]String[/COLOR])

    Private Sub Button1_Click(ByVal sender As System.[COLOR=#339999]Object[/COLOR], ByVal e As System.EventArgs) Handles Button1.Click
        agreementPassed = True
For Each singleLink As [COLOR=#339999]String[/COLOR] In hyperLinkList
        WebBrowser1.Name = [COLOR=#800080]"pdf"
[/COLOR]        WebBrowser1.Navigate(New Uri(singleLink))
    Next
End Sub

Private Sub Button2_Click(ByVal sender As System.[COLOR=#339999]Object[/COLOR], ByVal e As System.EventArgs) Handles Button2.Click
    WebBrowser1.Navigate(New Uri(hyperLinkList([COLOR=#000080]0[/COLOR])))
End Sub

Private Sub WebBrowser1_DocumentCompleted(ByVal sender As [COLOR=#339999]Object[/COLOR], ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
        If agreementPassed Then
            IO.File.WriteAllText([COLOR=#800080]"C:\temp.pdf"[/COLOR], New IO.StreamReader(WebBrowser1.DocumentStream).ReadToEnd)
        End If
    End Sub
End Class

I populate the LinkedList by another button i havn't included, but that works. Button 2 is used to just display the first page to agree to the conditions and generate the cookie. When i click Button1 its meant to save each link. All that happens though is the last URL opens the PDF in Adobe outside of the program and the HTML page that is seen when you need to click "I Agree" is saved to Temp.pdf.

I'm open to any ideas on how to do this, and if i need to get rid of the WebBrowser, i'm happy to do that too.
 
You never let any but the last document load. You are using a For Each loop in the Button.Click event handler so the instant you have started navigating to one page you then start navigating to the next. Your code is akin to typing an address into a browser and hitting Go and then immediately entering another address and hitting Go again, without waiting for the first page to load in between.

If you want to use a WebBrowser then you can't use a loop. You need to wait for each page to load before navigating to the next, which means you don't navigate to the next page until the DocumentCompleted event has been raised and the page processed.

You've got other problems too though. You've only got one file path there so, even if you do save every file, you'll be overwriting the previous one anyway.

I don't really think that you should be using a WebBrowser for this anyway. A WebBrowser is for displaying documents to the user. Do you really want the user to see each of these PDF documents? It doesn't appear so. You should just be downloading the file directly, which you can do using a WebClient or the My.Computer.Network object.

You also need to consider how you want the application to behave during the download. If you download the files on the UI thread then the app will freeze while it happens. If you use multi-threading then you need to decide what the user can do while the download takes place and how to notify them when it's done.
 
Thanks for all that, i can move all the stuff around to load the pages after they have loaded by then moving to the next link in the list after the Document is completed, which would fix the browser only opening the one page.

I have only got the one file path there to develop the application and see if it's working. In the end the name will be derived from the URL string. However, the main problem is that when i read the stream from that file i get HTML saved in there showing the I Agree page, even though the web browser control doesn't display a page and the PDF opens in an Acrobat window outisde of the application.

I will be the user of this application, so i'm not too fussed as to how it works, but i went for this approach to first agree to the agreement webpage which creates the cookie for future PDF requests so you dont get the agreement page again (cookie is created by JavaScript in the page). Can WebClient do this too with supporting the cookie? I have read that you can submit the form, which could be used to generate the cookie, i'm just not too sure if WebClient can support the cookie, obviously to use this the WebClient would be defined outside the function to keep it's instance across the whole application.

I'm not too fussed how it handles when it's downloading, my machine has enough Virtual machines in the background to be running on one and freezing the application screen, as long as it's working.
 
Last edited:
Back
Top