Download alles van de PDC

Misschien heb je al het een en ander gedownload van de officiële PDC site.
Er staan daar echter zoveel Powerpoint presentaties, voorbeeld code zips en documenten, dat je een rsi arm krijgt van het klikken en opslaan.

Sean ‘Early’ Campbell & Scott ‘Adopter’ Swigart hebben daar wat op gevonden. Ze hebben een stukje code gemaakt waarmee je alles even snel kan downloaden! 🙂

Helaas is het VB.NET! 😉
Kan iemand het misschien even converteren? Ofwel alle Dim’s weghalen en overal ; achter zetten!? :-p

Option Compare Text
Imports System.Net
Imports System.IO
Imports System.Text.RegularExpressions
Module Module1
    Sub Main()
        SiteSweep("", "c:PDC")
        SiteSweep("", "c:PDC")
    End Sub
    Public Sub SiteSweep(ByVal source As String, ByVal dest As String)
        ' needed to deal with relative paths
        Dim root As String = Left(source, source.IndexOf("/", 7))
        Dim current As String = Left(source, source.LastIndexOf("/") + 1)
        ' pull page
        Dim w As New WebClient
        Dim sr As New StreamReader(w.OpenRead(source))
        Dim s As String = sr.ReadToEnd()
        ' find hrefs
        Dim r As New Regex("hrefs*=s*(?:""(?<1>[^""]*)""|(?<1>S+))", _
            RegexOptions.IgnoreCase Or RegexOptions.Compiled)
        ' get rid of dups
        Dim d As New Hashtable
        For Each m As Match In r.Matches(s)
            Dim url As String = m.Groups(1).Value
            ' find only certain file types.  This could have been done with the 
            ' previous regex, except (1) I ripped that regex off of MSDN, and (2)
            ' I plan on running the app all of one time, so who cares.
            If Right(url, 4) = ".ppt" Or Right(url, 4) = ".zip" Or Right(url, 4) = ".doc" Then
                If Left(url, 7) <> "http://" Then
                    If url.StartsWith("/") Then
                        url = root & url
                        url = current & url
                    End If
                End If
                d(url) = Right(url, Len(url) - url.LastIndexOf("/") - 1)
            End If
        If Not Directory.Exists(dest) Then
        End If
        ' download each file.  If the download bombs, try again, unless you get
        ' a 415 or 404 because there appears to be a problem with one some of the 
        ' files, or they are hrefs that are commented out, and my regex ain't smart
        ' enough to figure that out.
        For Each s In d.Keys
            Dim isDownloaded As Boolean = False
            While Not isDownloaded
                    Console.WriteLine("Downloading:" & s)
                    If Not File.Exists(dest & "" & d(s)) Then
                        w.DownloadFile(s, dest & "" & d(s))
                    End If
                    isDownloaded = True
                Catch exc As Exception
                    If exc.Message.IndexOf("(415)") >= 0 Or exc.Message.IndexOf("(404)") Then
                        isDownloaded = True
                    End If
                End Try
            End While
    End Sub
End Module

You may also like...

2 Responses

  1. Volgens Pascal werkt ‘t niet binnen LogicaCMG door de proxy! Thuis proberen dus! 🙂

  2. Hmmm, bij geprobeerd en daar werkt ‘t wel. Het resultaat is 350MB aan PowerPoint presentaties met nietszeggende namen zoals ARC334R.ppt 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *