Untitled

Exercise 1 – Prepare your data set
1.	Connect to Redshift cluster XX and table YY. Set a refresh schedule on the dataset.
Step 1: Go to Manage Data -> New data set -> Redshift (manual connect).
Step 2: Enter the following to connect to a data source
Data source name: QS 101 workshop <custom>
Database server: redshift-cluster-1.cmtrp6jcotm9.us-east-1.redshift.amazonaws.com
Port: 8192
Database name: dev
Username: awsuser
Password: QSAnalytics123$
Select this schema: sandbox
Select this table: home_prices

Choose ‘Use custom SQL’

Step 3: Enter the following SQL. We will only need state, county name, regionname, timestamp and price for the analysis.
Select state, countyname, regionname, sizerank, price, timestamp from sandbox.home_prices
Choose ‘Import to SPICE for quicker analysis’ and hit ‘Edit/Preview data’
Step 4: Edit data set schema
On the top left, ensure the ‘Data source’ selected is SPICE. Make the following changes,
•	change the column name countyname to ‘county’ and data type from string to ‘County’
•	change the column name regionname to ‘city’ and data type to ‘City’
•	change the column name timestamp to ‘date’
Step 5: Create a geospatial hierarchy

•	Open the sub-menu from the ‘state’ field and select ‘Add to a hierarchy’.
•	Create a new geospatial hierarchy and call it ‘location’, choose ‘This hierarchy is for a single country’ and select ‘United States’.
•	Add state and city to the hierarchy.
Step 6: Add a calculated field to extract the year from the date field (we will need this later)
•	Go to the fields tab on the left pane.
•	Select ‘New Field’ -> give a field name ‘year’ and enter formula toString(extract("YYYY",{date}))
•	Change the data type to string