Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # Linear Regression Analysis
- # Mike Kerry - Feb 2021 - acclivity2@gmail.com
- # This was a fun project. We are given the prices of pizzas at 5 different sizes (6, 8, 10, 14 and 18 inches)
- # The task is to compute a straight line graph through these points, finding the line that gives the minimum value
- # of sum-of-squares differences from the actual prices.
- # Having found this line, use it to predict the price of a 12" pizza
- # I started by finding the slope of the existing graph between each adjacent pair of points.
- # I found the minimum and maximum slope, and I assumed that my final prediction line would lie between these slopes.
- # Using these min and max slopes, I made initial "inspired guesses" at the mininum and maximum price of a 6" pizza
- # based on my least-sum-of-squares line.
- # I then computed the sum-of-squares for 500 starting prices (between min and max 6" pizza) and for 500 slopes for
- # each of these starting prices. I recorded the line that gave me the least sum-of-squares of errors in price
- # Using my best-fit line, I computed the price of a 12" pizza. This came out as $13.68 (which was the right answer!)
- # All this was done without using any Python Library.
- # (Interestingly, this is a simplified version of the first computer program I ever wrote,
- # which was a Multiple Regression Analysis written while I was at college in 1961)
- x = [6, 8, 10, 14, 18] # Sizes of pizzas in inches
- y = [7, 9, 13, 17.5, 18] # Price of pizza in $ at each of the given sizes
- num = len(x) # Number of points in the given graph
- slopes = [] # a list of the slopes of the graph for all adjacent points
- for i in range(1, num): # look at each x interval
- xdiff = x[i] - x[i-1]
- ydiff = y[i] - y[i-1]
- slopes.append(ydiff/xdiff) # compute the slope and append to list of slopes
- minslope = min(slopes)
- maxslope = max(slopes)
- # we will integrate between these two slope values
- mid = num // 2 # Find a pizza size and price around the middle of the range
- # Apply our min and max slopes to this mid-point pizza, and hence find a min and max price for a 6" (starting) pizza
- midxdif = x[mid] - x[0]
- maxystart = y[mid] - (midxdif * minslope)
- minystart = y[mid] - (midxdif * maxslope)
- # We will integrate between these starting prices
- deltastart = (maxystart - minystart) / 500 # We will compute for 500 values of price (Y) at x[0]
- deltaslope = (maxslope - minslope) / 500 # And compute for 500 values of slope per Y start
- leastssq = 999999.0 # This will be the least sum-of-squares value of all 250,000 loops
- bestystart = 0.0
- bestslope = 0.0
- ystart = minystart
- while ystart < maxystart: # integrate over 500 start values
- aslope = minslope
- while aslope <= maxslope: # integrate over 500 values of slope of graph
- sumssq = 0.0
- for i in range(num):
- predicted_y = ystart + (x[i] - x[0]) * aslope
- sumssq += (predicted_y - y[i]) ** 2
- if sumssq < leastssq:
- leastssq = sumssq # record the least sum-of-squares so far
- bestystart = ystart # record the corresponding starting price
- bestslope = aslope # and the corresponding graph slope
- aslope += deltaslope # bump to next slope value
- ystart += deltastart # bump to next starting price
- # ----------------------------------------------------------------------------------
- print('Best ystart = $%.2f Best slope = %.2f Least SSQ = %.2f' % (bestystart, bestslope, leastssq))
- print()
- # compute best fit Y values for each value of X
- # (Not actually required, unless we wanted to plot this line)
- bestylist = []
- for j in range(num):
- dxj = x[j] - x[0]
- yj = bestystart + dxj * bestslope
- bestylist.append(yj)
- # Compute prediction for x = 12 (price for 12" pizza)
- px = 12
- dx = px - x[0]
- predict_y = bestystart + (dx * bestslope)
- print('Predicted price of 12" pizza: %.2f ' % (predict_y))
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement