Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Exercise 1 – Prepare your data set
- 1. Connect to Redshift cluster XX and table YY. Set a refresh schedule on the dataset.
- Step 1: Go to Manage Data -> New data set -> Redshift (manual connect).
- Step 2: Enter the following to connect to a data source
- Data source name: QS 101 workshop <custom>
- Database server: redshift-cluster-1.cmtrp6jcotm9.us-east-1.redshift.amazonaws.com
- Port: 8192
- Database name: dev
- Username: awsuser
- Password: QSAnalytics123$
- Select this schema: sandbox
- Select this table: home_prices
- Choose ‘Use custom SQL’
- Step 3: Enter the following SQL. We will only need state, county name, regionname, timestamp and price for the analysis.
- Select state, countyname, regionname, sizerank, price, timestamp from sandbox.home_prices
- Choose ‘Import to SPICE for quicker analysis’ and hit ‘Edit/Preview data’
- Step 4: Edit data set schema
- On the top left, ensure the ‘Data source’ selected is SPICE. Make the following changes,
- • change the column name countyname to ‘county’ and data type from string to ‘County’
- • change the column name regionname to ‘city’ and data type to ‘City’
- • change the column name timestamp to ‘date’
- Step 5: Create a geospatial hierarchy
- • Open the sub-menu from the ‘state’ field and select ‘Add to a hierarchy’.
- • Create a new geospatial hierarchy and call it ‘location’, choose ‘This hierarchy is for a single country’ and select ‘United States’.
- • Add state and city to the hierarchy.
- Step 6: Add a calculated field to extract the year from the date field (we will need this later)
- • Go to the fields tab on the left pane.
- • Select ‘New Field’ -> give a field name ‘year’ and enter formula toString(extract("YYYY",{date}))
- • Change the data type to string
Add Comment
Please, Sign In to add comment