Question How to grab sections of a website and put them into the app.

Haxaro

Well-known member
Joined
Mar 13, 2009
Messages
105
Programming Experience
1-3
I want to make a desktop application for MyLifeIsAverage.com

if anyone is familiar with any of these sites (Fmylife etc), they will know that the text is in the white bubbles going down the screen.

Is it possible to grab just the stories, and ignore everything else?

I have tried this kind of thing:

VB.NET:
Private Sub Grab_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Grab.Click
        Dim Actual
        Actual = Mid(MLIABrowser.Document.Body.InnerText, 397, 1000000)
        RichTextBox1.Text = Actual
    End Sub

This just gets all the text after the 397 character, but there are several problems with this:

- The adds on the page are different every time, so it would be impossible to get a constant

- There is still text on the sides and bottom that is being grabbed

- I also do not want the "comments" etc shown. JUST the story...

any ideas of how to do this???

(There has been no API released for the site yet)
 
When working with the browser document content you should think in terms of the object tree, the html formatting structure. Looking at the source one quickly finds that each quote is contained in a span element, so is a lot of other irrelevant content. The only attribute the span nodes in interest have is the id attribute, this attribute value is "ls_contents-" and a number. When you look up all span elements (GetElementsByTagName) and filter out those attributes (GetAttribute, StartsWith) you see right away that no other span elements has this kind of id, it was a complete match in first attempt. Then you get the element text (InnerText) and trim it for all that surrounding whitespace (Trim). Here's the code sample:
VB.NET:
For Each span As HtmlElement In Me.WebBrowser1.Document.GetElementsByTagName("span")
    If span.GetAttribute("id").StartsWith("ls_contents-") Then
        Me.ListBox1.Items.Add(span.InnerText.Trim)
    End If
Next
When do you call this code? When the document is loaded and ready, for example from a button Click event. There is also a DocumentCompleted event that can be used, if you use this you should filter out incomplete readystates like this:
VB.NET:
If Me.WebBrowser1.ReadyState = WebBrowserReadyState.Complete Then
 
Hey,
Thanks SO MUCH! thats awesome, but would there be a way (im sure there is), of making it so it only grabs one at a time, and i can click a 'next' button, and it will take it oteh next one?

Thanks :)
 
Get all the items and drop them into a queue, Queue(Of String), Dequeue one by one.
VB.NET:
Private q As New Queue(Of String)
VB.NET:
q.Enqueue(span.InnerText.Trim)
VB.NET:
If q.Count > 0 Then Me.Label1.Text = q.Dequeue
 
Hey, thanks for that, although i asked that question without really thinking.

i used this:

VB.NET:
 If ListBox1.Items.Count <> 0 Then
            Numberinbox = ListBox1.Items.Count
            NumberUpTo = NumberUpTo + 1
            If NumberUpTo < Numberinbox Then
                RichtextBox1.Text = ListBox1.Items.Item(NumberUpTo) & Chr(13) & Chr(13) & "This is story: " & NumberUpTo & "/" & ListBox1.Items.Count
            ElseIf NumberUpTo = Numberinbox Then
            End If
        End If

and i similar thing to go back.

I also put in a random button aswell.

but thanks for the help :)
 
Back
Top