PASS Download Scripts

Automating PASS Downloads

Since the news that PASS will be going offline January 15, 2021, I've been working on archiving as much material as I can. This blog will detail the techniques I've been using in hopes of being useful to others in the community.

BIG DISCLAIMER

Please DO NOT redistribute any copyrighted material from PASS.org, especially any videos, even those on their YouTube channel. Any such material you download is for your personal use only. If in any doubt, DO NOT PUBLISH/SHARE/DISTRIBUTE.

Pre-requisite software

Most or all of my examples will use the Windows command line and its utilities. There are PowerShell equivalents, but I don't have the necessary skills or time to convert them. I'm also confident in the techniques I'm describing as I've used them for several years.

For downloading, I'm using the cURL utility for Windows, you can download it here: https://curl.haxx.se/download.html. Please be extremely careful downloading it from other sources. If you are using Linux or OSX/MacOS then cURL is already available. You may need to modify some of the parameters or string delimiters (double quotes vs. single quotes)

cURL can easily automate downloading from URLs that include numeric ranges, by containing the range in square brackets [x-y]. For instance:

curl -k -o "#1.homepage" "https://www.sqlsaturday.com/[1-10]"

Will enumerate https://www.sqlsaturday.com/1, https://www.sqlsaturday.com/2, etc. through https://www.sqlsaturday.com/10. The -o parameter will save to a specific file name, the "#1" portion will substitute the number used in the range portion of the URL. I'm using an extension ".homepage" to store the HTML of that URL. You can use any extension you like, this is just to categorize each type of page. The -k parameter will always access HTTPS URLs even if there's a certificate error.

SQL Saturday Data

The following cURL statements can be used to get the associated pages from SQLSaturday.com. I'm using a range from 1 to 1040, this exceeds the actual number of events. Some events prior to #357 may return no data, you can adjust the range or just discard the results after. All files will be uniquely named unless you modify the parameters:

Home Pages: 	curl -o "#1.homepage" "https://www.sqlsaturday.com/[1-1040]"
Sponsor Plans: 	curl -o "#1.sponsorplan" "https://www.sqlsaturday.com/[1-1040]/Sponsors/Sponsor-Plan"
Sponsors: 		curl -o "#1.sponsors" "https://www.sqlsaturday.com/[1-1040]/Sponsors"
Schedule: 		curl -o "#1.schedule" "https://www.sqlsaturday.com/[1-1040]/Sessions/Schedule"
Submitted:		curl -o "#1_page01.submitted" "https://www.sqlsaturday.com/[1-1040]/Sessions/Submitted-Sessions"
XML Feed:		curl -o "#1.feed.xml" "http://www.sqlsaturday.com/eventxml.aspx?sat=[1-1040]"

Please note that the Submitted sessions URL only shows the first page of results. The subsequent pages are accessed by JavaScript-activated clicks from the page. I've been unable to get those additional submissions using cURL, but I've had some success with the technique used by John Morehouse in this blog article: https://sqlrus.com/2018/01/javascript-postback-download-via-powershell/

You'll want to make the PowerShell activate the "dnn$ctr#####$Listing$rptPagersBottom$ctl02$btnPager" controls on the page, where "#####" is an internal ID number of some kind, and will vary from one SQL Saturday event to another.

Also note that there are links to session details in those HTML pages. You'll have to process the Schedule and Submitted HTML to extract them, but once that's done you can use cURL to download those details as additional HTML files. There's also another method to get those links that I'll describe right now.

All PASS Event Data

PASS uses an Odata feed to provide XML output of their events, originally for the Guidebook application. These URLs also use numeric ranges, but they differ from the IDs used for SQL Saturday events. These include SQL Saturdays, PASS Summit, 24 Hours of PASS, PASS Marathons, BA Conferences, and a few other types. Additionally there are several APIs that return different data, as shown here:

General Event:				http://feeds.pass.org/public/events/OratorDataService.svc/Events(#)
Speaker details for Event:	http://feeds.pass.org/public/events/OratorDataService.svc/Events(#)/PASS_OData_Orator_SessionSpeaker
Sessions details for Event:	http://feeds.pass.org/public/events/OratorDataService.svc/Events(#)/PASS_OData_Orator_Session
SessionFiles for Event: 	http://feeds.pass.org/public/events/OratorDataService.svc/Events(#)/PASS_OData_Orator_SessionFile

The "#" indicates the event ID, and each call returns an XML version of the data, rather than HTML. You can parse that XML as you like, and relate/join the different feed types by event ID. Using cURL as before:

curl -k -o "#1.event" "http://feeds.pass.org/public/events/OratorDataService.svc/Events([1-1200])"
curl -k -o "#1.speakers" "http://feeds.pass.org/public/events/OratorDataService.svc/Events([1-1200])/PASS_OData_Orator_SessionSpeaker"
curl -k -o "#1.sessions" "http://feeds.pass.org/public/events/OratorDataService.svc/Events([1-1200])/PASS_OData_Orator_Session"
curl -k -o "#1.sessionfiles" "http://feeds.pass.org/public/events/OratorDataService.svc/Events([1-1200])/PASS_OData_Orator_SessionFile"

The Event feed has elements "EventID" and "EventName" that describe the event, for example:

      <d:EventID m:type="Edm.Int32">1123</d:EventID>
      <d:EventName>PASS Virtual Summit 2020</d:EventName>

      <d:EventID m:type="Edm.Int32">800</d:EventID>
      <d:EventName>SQLSaturday #695 - Guatemala 2018</d:EventName>

The Speakers, Sessions, and SessionFiles feeds all contain their respective details for their event. Note that these only include scheduled sessions for the event, not all submissions. There are additional URLs in these XML files for session files and additional details, you can parse these and automate those downloads via cURL or another utility.