
Comparing asymmetric semantic search models - ex "strengths of ARIMA models"

Aug 29th, 2022 (edited)
  1. comparison of SBERT models for asymmetric semantic search on course data (for searching for docs that are relevant quickly
  3. query for all 3 models is the same: strengths of ARIMA models
  6. ## sentence-transformers/msmarco-distilbert-base-tas-b
  8. [{'Rank': 1,
  9. 'Search Score': 104.0884,
  10. 'doc_dir': 'TB-forecasting-principles',
  11. 'doc_name': 'OCR_Ch-9-ARIMA models-FPP_',
  12. 'doc_relative_loc': 70.968,
  13. 'doc_text': '0, 1 ) 1, 1, 0 ) 12 of these models, the best is the arima ( 3, '
  14. '0, 1 ) ( 0, 1, 2 ) 12 model ( i. e, it has the smallest aicc '
  15. 'value ). ( fit < arima ( h02, order = c ( 3, 0, 1 ), seasonal - '
  16. 'c ( 0, 1, 2 ), lambda = 0 ) ) # > series : h0z # > arima ( 3, '
  17. '0, 1 ) ( 0, 1, 2 ) [ 12 ] # > box cox transformation : zambda = '
  18. '0 # > coefficients : # ari ar2 ar3 mal # > 0 160 0. 548 0. 568 '
  19. '0. 383 # > 5. e _ 0. 164 0. 088 0. 094 0. 190 smal sma2 0 5220 '
  20. '177 0. 086 0. 087 # > # > sigma ^ 2 estimated as 0. 00428 : log '
  21. 'likelihood - 250 # > aic = - 486 _ 1 aicc = - 485. 5 bic = - '
  22. '463. 3 checkresiduals ( fit, zag - 36 ) reslduals from arima',
  23. 'id_within_doc': 66},
  24. {'Rank': 3,
  25. 'Search Score': 101.2414,
  26. 'doc_dir': 'TS-Analysis-apps-in-R-pgsplit',
  27. 'doc_name': 'OCR_5_201_cryer - time series analysis apps in R_',
  28. 'doc_relative_loc': 94.231,
  29. 'doc_text': 'arima models are just special cases of our general arima '
  30. 'models. as such, all of our work on parameter estimation in '
  31. 'chapter 7 carries over t0 the seasonal case. exhibit 10. 10 '
  32. 'gives the maximum likelihood estimates and their standard '
  33. 'errors for the arima ( 0, 1, 1 ) x ( 0, 1, 1912 model for coz '
  34. 'levels. exhibit 10. 10 parameter estimates for the coz model '
  35. 'coefficient estimate 0. 5792 0. 8206 standard error 0. 0791 0. '
  36. '1137 82 0. 5446 : log - likelihood = 139. 54, aic = 283. 08 ml. '
  37. 'co2 - arima ( co2, order - c ( 0, 1, 1 ) seasonal - list ( '
  38. 'order - c ( 0, 1, 1 ), period - 12 ) ) ml _ co2 238 seasonal '
  39. 'models the coefficient estimates are all highly significant, '
  40. 'and we proceed to check further on this model _ diagnostic '
  41. 'checking to check the estimated the arima ( o, 1, 1 ) x ( 0, 3 '
  42. '1, 1 ) 12 model, we first look at the time series plot of the '
  43. 'residuals. exhibit 10. 11 gives this plot for standardized '
  44. 'residuals. other than some strange behavior in the middle of '
  45. 'the series,',
  46. 'id_within_doc': 98},
  47. {'Rank': 5,
  48. 'Search Score': 100.9896,
  49. 'doc_dir': 'course-slides',
  50. 'doc_name': 'OCR_ATS_Slides_v220216__7',
  51. 'doc_relative_loc': 0.0,
  52. 'doc_text': 'arima, sarima & garch fitting an arima in r plausible models '
  53. 'for the logged oil prices after inspection of acfipacf of the '
  54. 'differenced series ( that seems stationary ) : arima ( 1, 1, 1 '
  55. ') or arima ( 2, 1, 1 ), the former has lower aic arima ( lop, '
  56. 'order - c ( 11, 1 ) ) coefficients : arl mal0. 2987 0. 5700 s. '
  57. 'e. 0. 2009 0. 1723 sigma ^ 2 = 0. 006642 : 11 261. 11, a = 518. '
  58. '22 alternative r command with equivalent result : arima ( drop, '
  59. 'order - c ( 1, 0, 1 ), include mean - false ) 291 arima, sarima '
  60. '& garch example : residuals for arima ( 1, 1, 1 ) residuals '
  61. 'from arima ( 1, 1, 1 ) rwwlimhlwkv wwmmv 5 1990 1995 2000 2005 '
  62. '3 3 3 g 8 3 3 3 2 8 3 3 5 10 15 20 25 30 35 5 10 15 20 25 30 35 '
  63. 'lag lag 292 ivrwukv arima, sarima & garch rewriting arima as '
  64. 'non - stationary arm',
  65. 'id_within_doc': 0},
  66. {'Rank': 6,
  67. 'Search Score': 100.7397,
  68. 'doc_dir': 'TB-forecasting-principles',
  69. 'doc_name': 'OCR_Ch-10-Dynamic regression models-FPP_',
  70. 'doc_relative_loc': 61.765,
  71. 'doc_text': 'more " wiggly " seasonal pattern and simpler arima models are '
  72. 'required to capture other dynamics. the aicc value is minimised '
  73. 'for k 5, with a significant jump going from k = 4 to k = 5, '
  74. 'hence the forecasts generated from this model would be the ones '
  75. 'used : cafe04 < window ( auscafe, start - 2004 ) plots < list ( '
  76. ') for ( i in seq ( 6 ) ) { fit < auto. arima ( cafe04, xreg '
  77. 'fourier ( cafe04, k = i ), seasonal false, iambda 0 ) plots [ [ '
  78. 'i ] ] < autoplot ( forecast ( fit, xreg - fourier ( cafe04, k = '
  79. 'i, h = 24 ) ) ) + xlab ( paste ( " k = " 1, aicc = " round ( '
  80. 'fit [ [ " aicc " ] ], 2 ) ) ) + ylab ( " " ) + ylim ( 1. 5, 4. '
  81. '7 ) gridextra : : grid. arrange ( plots [ [ 1 ] ], plots [ [ 2 '
  82. '] ], plots [ [ 3 ] ], plots [ [ 4 ] ], plots [ [ 5 ] ], plots [ '
  83. '[ 6',
  84. 'id_within_doc': 21}]
  89. ## sentence-transformers/msmarco-bert-base-dot-v5
  91. [{'Rank': 1,
  92. 'Search Score': 169.94,
  93. 'doc_dir': 'TB-forecasting-principles',
  94. 'doc_name': 'OCR_Ch-9-ARIMA models-FPP_',
  95. 'doc_relative_loc': 45.161,
  96. 'doc_text': 'an arima ( 3, 1, 0 ) model along with variations including '
  97. 'arima ( 4, 1, 0 ), arima ( 2, 1, 0 ), arima ( 3, 1, 1 ), etc. '
  98. 'of these, the arima ( 3, 1, 1 ) has a slightly smaller aicc '
  99. 'value. ( fit < arima ( eeadj order - c ( 3, 1, 1 ) ) ) # > '
  100. 'series : eeadj # > arima ( 3, 1, 1 ) # > # > coefficients : # > '
  101. 'arl ar2 ar3 # > 0. 004 0. 092 0. 370 mal 0. 392 # > 5. e. 0. '
  102. '220 0. 098 0. 067 0. 243 # > # > sigma ^ 2 estimated as 9. 58 : '
  103. 'log likelihood = - 492 7 # > aic - 995. 4 aicc = 995. 7 bic = '
  104. '1012 lag lag 6. the acf plot of the residuals from the arima ( '
  105. '3, 1, 1 ) model shows that all autocorrelations are within the '
  106. 'threshold limits, indicating that the residuals are behaving '
  107. 'like white noise. a portmantea',
  108. 'id_within_doc': 42},
  109. {'Rank': 2,
  110. 'Search Score': 168.7032,
  111. 'doc_dir': 'TB-forecasting-principles',
  112. 'doc_name': 'OCR_Ch-13-Some practical forecasting issues-FPP_',
  113. 'doc_relative_loc': 76.471,
  114. 'doc_text': 'test < arima ( test, model - cafe. train ) accuracy ( cafe. '
  115. 'test ) # > me rmse mae mpe mape # > training set 0 002622 0. '
  116. '04591 0. 034130. 07301 1. 002 # > mase acf1 # > train ing set 0 '
  117. '1899 ~ 0. 05704 note that arima ( does not re - estimate in '
  118. 'this case. instead, the model obtained previously ( and stored '
  119. 'as cafe. train ) is applied to the test data. because the model '
  120. 'was not re - estimated, the " residuals " obtained here are '
  121. 'actually one - step forecast errors consequently, the results '
  122. 'produced from the accuracy ( ) command are actually on the test '
  123. 'set ( despite the output saying ( training set " ) 12. 9 '
  124. 'dealing with missing values and outliers real data often '
  125. 'contains missing values, outlying observations, and other messy '
  126. 'features. dealing with them can sometimes be troublesome '
  127. 'missing values missing data can arise for many reasons, and it '
  128. 'is worth considering whether the missingness will induce bias '
  129. 'in the forecasting model. for example, suppose we are studying '
  130. 'sales data for a store, and missing values occur on public '
  131. 'holidays when the store is closed. the following day may have '
  132. 'increased sales as',
  133. 'id_within_doc': 26},
  134. {'Rank': 3,
  135. 'Search Score': 168.5765,
  136. 'doc_dir': 'course-script',
  137. 'doc_name': 'OCR_ATS_Script_v220214__6',
  138. 'doc_relative_loc': 55.0,
  139. 'doc_text': 'most plausible parsimonious integrated models include the arima '
  140. '( 0, 1, 1 ) and the arima ( 1, 1, 1 ). the former cannot '
  141. 'reasonably capture the dependencies ; the residuals are still '
  142. 'correlated and violate the white noise assumption. the arima ( '
  143. '1, 1, 1 ) is much better in this regard. however, its aic value '
  144. 'is worse than the one of the arima ( 2, 0, 1 ) considered '
  145. 'previously : we again employ auto. arima ( ) for a non - '
  146. 'stepwise grid search over all arima ( p, 1, 4 ) with p, q < 5 '
  147. 'and p + q < 5 _ since we want to avoid a drift - term and '
  148. 'directly work on the differenced data, we have to set allowmean '
  149. '- false _ fit < auto. arima ( diff ( tdf ) max p - 5, max 9 - '
  150. '5, stationary - true, allow mean - false, stepwise - false, ic '
  151. '= " a " ) 123 lag 6 sarima and garch models fit series : diff ( '
  152. 'tdf ) arima ( 2, 0, 1 ) with zero mean coefficients : arl ar2 '
  153. 'mal 0. 4219 0. 12490. 961',
  154. 'id_within_doc': 22},
  155. {'Rank': 4,
  156. 'Search Score': 168.526,
  157. 'doc_dir': 'TB-theory-and-methods-1992',
  158. 'doc_name': 'OCR_11_Model Building and Forecasting with ARIMA Processes_Time '
  159. 'Series Theory and Methods_',
  160. 'doc_relative_loc': 5.0,
  161. 'doc_text': 'an arima model is the slowly decaying positive sample '
  162. 'autocorrelation function seen in figure 9. 1. if therefore we '
  163. 'were given only the data and wished to find an appropriate '
  164. 'model it would be natural to apply the operator v = 1 b '
  165. 'repeatedly in the hope that for some j, { vix, } will have a '
  166. 'rapidly decaying sample autocorrelation function compatible '
  167. 'with that ofan arma process with no zeroes of the '
  168. 'autoregressive polynomial near the unit circle. for the '
  169. 'particular time series in this example, one application of the '
  170. 'operator produces the realization shown in figure 9. 2, whose '
  171. 'sample autocorrelation and partial autocorrelation functions '
  172. 'suggest an ar ( l ) model for { vx, } the maximum likelihood '
  173. 'estimates of $ and 02 obtained from pest ( under the assumption '
  174. 'that e ( vx, ) = 0 ) are. 808 and. 978 respectively, giving the '
  175. 'model, 89. 1. arima models for non - stationary time series 277 '
  176. '3 2 ~ 2 5 20 40 60 80 100 ( a ) 120 140 160 180 200 0. 9 0. 8 '
  177. '0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 8 & - & 0 _ ~ 0',
  178. 'id_within_doc': 6}]
  184. ## sentence-transformers/msmarco-distilbert-cos-v5
  187. [{'Rank': 1,
  188. 'Search Score': 0.5348,
  189. 'doc_dir': 'course-slides',
  190. 'doc_name': 'OCR_ATS_Slides_v220216__7',
  191. 'doc_relative_loc': 0.0,
  192. 'doc_text': 'arima, sarima & garch fitting an arima in r plausible models '
  193. 'for the logged oil prices after inspection of acfipacf of the '
  194. 'differenced series ( that seems stationary ) : arima ( 1, 1, 1 '
  195. ') or arima ( 2, 1, 1 ), the former has lower aic arima ( lop, '
  196. 'order - c ( 11, 1 ) ) coefficients : arl mal0. 2987 0. 5700 s. '
  197. 'e. 0. 2009 0. 1723 sigma ^ 2 = 0. 006642 : 11 261. 11, a = 518. '
  198. '22 alternative r command with equivalent result : arima ( drop, '
  199. 'order - c ( 1, 0, 1 ), include mean - false ) 291 arima, sarima '
  200. '& garch example : residuals for arima ( 1, 1, 1 ) residuals '
  201. 'from arima ( 1, 1, 1 ) rwwlimhlwkv wwmmv 5 1990 1995 2000 2005 '
  202. '3 3 3 g 8 3 3 3 2 8 3 3 5 10 15 20 25 30 35 5 10 15 20 25 30 35 '
  203. 'lag lag 292 ivrwukv arima, sarima & garch rewriting arima as '
  204. 'non - stationary arm',
  205. 'id_within_doc': 0},
  206. {'Rank': 3,
  207. 'Search Score': 0.5068,
  208. 'doc_dir': 'course-script',
  209. 'doc_name': 'OCR_ATS_Script_v220214__6',
  210. 'doc_relative_loc': 10.0,
  211. 'doc_text': 'is at 0. 3056, providing further evidence that the remaining '
  212. 'dependence is insignificant : 5. 5. 3 aic - based model choice '
  213. 'we have explained above how the order of arma ( p, q ) models '
  214. 'can be found by inspecting acf and pacf and complementing this '
  215. 'with classical model selection approaches and residual '
  216. 'analysis. another alternative is to run a criterion - based '
  217. 'model selection. in r, this is conveniently possible by using '
  218. 'the function auto arima ( ) from the library ( forecast ). '
  219. 'however, handle this with care : the function will always '
  220. 'identify a " best fitting " arma ( p, q ) model, but it is, of '
  221. 'course, not guaranteed that it fits the data well. moreover, '
  222. 'usage of the function is somewhat 112 5 stationary time series '
  223. 'models are tricky, as many arguments need to be set. we first '
  224. 'address the definition of the information criteria, as they are '
  225. 'central to the auto. arima ( ) function : aic = 2log ( l ) + 2 '
  226. '( p + q + k + 1 ) here, the first term measures how well the '
  227. 'model fits the training data with the value of the log - '
  228. 'likelihood function as the goodness - of - fit measure. the '
  229. 'second term penalizes model complexity, where p and',
  230. 'id_within_doc': 4},
  231. {'Rank': 5,
  232. 'Search Score': 0.4985,
  233. 'doc_dir': 'TS-Analysis-apps-in-R-pgsplit',
  234. 'doc_name': 'OCR_4_151_cryer - time series analysis apps in R_',
  235. 'doc_relative_loc': 91.667,
  236. 'doc_text': '( 1, 1 ) model for the color series coefficients : ar1 ma1 '
  237. 'intercept 0. 6721 ~ 0. 1467 74. 1730 s. e 0. 2147 0. 2742 2. '
  238. '1357 sigma ^ 2 estimated as 24. 63 : log - likelihood = 105. '
  239. '94, aic = 219. 88 arima ( color order - c ( 1, 0, 1 ) ) as we '
  240. 'have noted, any arma ( p, q ) model can be considered as '
  241. 'special case of a more general arma model with the additional '
  242. 'parameters equal t0 zero. however ; when generalizing arma '
  243. 'models, we must be aware of the problem of parameter redundancy '
  244. 'or lack of identifiability : to make these points clear ; '
  245. 'consider an arma ( 1, 2 ) model : yt = $ y _ 1 + e101e1 - 1 ~ '
  246. '02e, - 2 8. 2. 1 ) now replace t by t _ 1 to obtain yi _ 1 = $ '
  247. 'y, _ 2 + e _ 1 ~ 01e2 ~ 02e, - 3 8. 2. 2 ) if we multiply both '
  248. 'sides of equation ( 8. 2. 2 ) by any constant c and then '
  249. 'subtract it from',
  250. 'id_within_doc': 99},
  251. {'Rank': 7,
  252. 'Search Score': 0.4933,
  253. 'doc_dir': 'course-slides',
  254. 'doc_name': 'OCR_ATS_Slides_v220216__4',
  255. 'doc_relative_loc': 85.0,
  256. 'doc_text': '68 62839 f. arima mle < _ arima ( log ( lynx ) 1 order - c ( 2, '
  257. '0, 0 ) ) coefficients : arl ar2 intercept 1. 37760. 7399 6. 68 '
  258. '63 s. e. 0. 0614 0. 0612 0. 1349 sigma ^ 2 - 0. 271 ; log - '
  259. 'likelihood - - 88. 58 ; aic185. 15 while mle by default assumes '
  260. 'gaussian innovations, it performs reasonably in coefficient '
  261. 'estimation and points predictions for other distributions as '
  262. 'long as they are not extremely skewed or have very precarious '
  263. 'outliers. however, the standard errors are biased. 186 '
  264. 'autoregressive models practical aspects all four estimation '
  265. 'methods are asymptotically equivalent, and the differences are '
  266. 'usually small, even on finite samples. all four estimation '
  267. 'methods are non - robust against outliers and perform best on '
  268. 'approximately gaussian data : function arima ( ) provides '
  269. 'standard errors for m ; 01, 0 so p that statements about '
  270. 'significance become feasible, and confidence intervals for the '
  271. 'parameters can be built. ar. ols ( ), ar. yw ( ) & ar burg ( ) '
  272. 'allow for a convenient choice of the optimal',
  273. 'id_within_doc': 17}]
  278. ## sentence-transformers/multi-qa-mpnet-base-dot-v1
  280. [{'Rank': 1,
  281. 'Search Score': 23.9894,
  282. 'doc_dir': 'TB-time-seriesR-cowpertwait',
  283. 'doc_name': 'OCR_10_Non-stationary Models_intro time series in R - '
  284. 'cowperwait_',
  285. 'doc_relative_loc': 42.5,
  286. 'doc_text': 'range of models by a trial - and - error approach involving '
  287. 'just editing a command on each trial to see if an improvement '
  288. 'in the aic occurs. alternatively ; we could write a simple '
  289. 'function that fits a range of arima models and selects the best '
  290. '- fitting model this approach works better when the conditional '
  291. 'sum of squares method css is selected in the arima function ; '
  292. 'as the algorithm is more robust _ to avoid over parametrisation '
  293. '; the consistent akaike information criteria ( caic ; see '
  294. 'bozdogan ; 1987 ) can be used in model selection an example '
  295. 'program follows _ get. best arima < function ( x. ts, maxord c '
  296. '( 1, 1, 1, 1, 1, 1 ) ) best aic < 1e8 < length ( x. ts ) for ( '
  297. 'p in 0 : maxord [ 1 ] ) for ( d in 0 : maxord [ 2 ] ) for ( q '
  298. 'in 0 : maxord [ 3 ] ) for ( p in 0 : maxord [ 4 ] ) for ( d in '
  299. '0 : maxord [ 5 ] ) for ( q in 0 : maxord [ 6 ] ) { fit < arima '
  300. '( x. ts _ order c ( p, d, 9 ) seas list ( order c',
  301. 'id_within_doc': 17},
  302. {'Rank': 2,
  303. 'Search Score': 23.9238,
  304. 'doc_dir': 'TB-forecasting-principles',
  305. 'doc_name': 'OCR_Ch-9-ARIMA models-FPP_',
  306. 'doc_relative_loc': 75.269,
  307. 'doc_text': '0 ) 12 0. 0679 the models chosen manually and with auto. arimal '
  308. ') are both in the top four models based on their rmse values. '
  309. 'when models are compared using aicc values, it is important '
  310. 'that all models have the same orders of differencing : however, '
  311. 'when comparing models using a test set, it does not matter how '
  312. 'the forecasts were produced the comparisons are always valid '
  313. 'consequently, in the table above, we can include some models '
  314. 'with only seasonal differencing and some models with both first '
  315. 'and seasonal differencing, while in the earlier table '
  316. 'containing aicc values, we only compared models with seasonal '
  317. 'differencing but no first differencing : none of the models '
  318. 'considered here pass all of the residual tests. in practice, we '
  319. 'would normally use the best model we could find, even if it did '
  320. 'not pass all of the tests. forecasts from the arima ( 3, 0, 1 ) '
  321. '( 0, 1, 2 ) 12 model ( which has the lowest rmse value on the '
  322. 'test set, and the best aicc value amongst models with only '
  323. 'seasonal differencing ) are shown in figure 8. 26. h0z % > '
  324. 'arima ( order - c ( 3, 0, 1 ), seasonal - c',
  325. 'id_within_doc': 70},
  326. {'Rank': 3,
  327. 'Search Score': 23.4669,
  328. 'doc_dir': 'lecture-audio',
  329. 'doc_name': 'SC_lecture_7_apr_4_v_2_c_transcription_10',
  330. 'doc_relative_loc': 88.679,
  331. 'doc_text': "the other hand, it's also not so easy to develop a process that "
  332. "removes this dependency. you'd have to increase the model "
  333. 'orders quite a bit and estimate many more certifications, which '
  334. 'also brings some disadvantages, so to some extent, one '
  335. 'sometimes also accept is certainly a remaining dependency is a '
  336. "lot more disturbing if it's on the first couple of flags rather "
  337. "than besides at the higher lag. it's more tolerated if it's "
  338. "small in magnitude rather than when it's large and magnitude. "
  339. "it's more tolerated when it's only at the single lack, which "
  340. "here, in fact, it is not. there's a second in both, but it's "
  341. "very small. ya. so that's how modeling works. so you always "
  342. 'have this tirade off into the complexity of the model. if the '
  343. 'larger model does not clean advantages and practical '
  344. 'advantages, this is not just removing this, but also practical '
  345. 'advantages. one often proceeds with the smaller model oak. so '
  346. "that's at the end of this example, the end of this chapter on "
  347. 'armapcu. and well, we go to the first application of these '
  348. 'arima processes, which is serious regression at times. so let '
  349. 'me try to explain. so time',
  350. 'id_within_doc': 47},
  351. {'Rank': 4,
  352. 'Search Score': 23.1942,
  353. 'doc_dir': 'TB-intro-TS-and-Forecasting-Brockwell',
  354. 'doc_name': 'OCR_21_Index_Introduction to Time Series and Forecasting_',
  355. 'doc_relative_loc': 40.0,
  356. 'doc_text': 'based 0n confidence regions, forecasting arima processes, 173 - '
  357. '177 369 - 370 forecast function, 182 - 183 uniformly most '
  358. 'powerful test ; 369 h - step predictor ; 175 mean square error '
  359. '0f, 174 forecast density, 289 forward prediction errors, 130 '
  360. 'iarch ( o ) process, 209 fourier frequencies, 107, 109 igarch ( '
  361. 'p, q ) process, 208 fourier indices, 11 independent random '
  362. 'variables, 30, 36, 214 fractionally integrated arma process, '
  363. '339 identification techniques, 163 - 169 estimation of, 340 for '
  364. 'arma processes, 164 422 index identification techniques ( cont '
  365. ': ) for ar ( p ) processes, 142 for ma ( q ) processes, 153 for '
  366. 'seasonal arima processes, 177 igarch ( p, 4 ) process, 208, 209 '
  367. 'iid noise, 6 _ 7, 14 sample acf of, 53 multivariate, 235 '
  368. 'innovations, 62, 271 innovations algorithm, 62 - 65, 132 - 137 '
  369. 'fitted innovations ma ( m ) model, 133 multivariate, 247 input, '
  370. '45, 112, 333 integrated volatility, 217, 218, 220, 226 '
  371. 'intervention analysis, 331 - 334 invertible arma process, 76 '
  372. 'multivariate arma process, 244 investment strategy, 221',
  373. 'id_within_doc': 10}]
  377. ## sentence-transformers/msmarco-MiniLM-L6-cos-v5
  379. [{'Rank': 1,
  380. 'Search Score': 0.6511,
  381. 'doc_dir': 'TB-time-seriesR-cowpertwait',
  382. 'doc_name': 'OCR_10_Non-stationary Models_intro time series in R - '
  383. 'cowperwait_',
  384. 'doc_relative_loc': 42.5,
  385. 'doc_text': 'range of models by a trial - and - error approach involving '
  386. 'just editing a command on each trial to see if an improvement '
  387. 'in the aic occurs. alternatively ; we could write a simple '
  388. 'function that fits a range of arima models and selects the best '
  389. '- fitting model this approach works better when the conditional '
  390. 'sum of squares method css is selected in the arima function ; '
  391. 'as the algorithm is more robust _ to avoid over parametrisation '
  392. '; the consistent akaike information criteria ( caic ; see '
  393. 'bozdogan ; 1987 ) can be used in model selection an example '
  394. 'program follows _ get. best arima < function ( x. ts, maxord c '
  395. '( 1, 1, 1, 1, 1, 1 ) ) best aic < 1e8 < length ( x. ts ) for ( '
  396. 'p in 0 : maxord [ 1 ] ) for ( d in 0 : maxord [ 2 ] ) for ( q '
  397. 'in 0 : maxord [ 3 ] ) for ( p in 0 : maxord [ 4 ] ) for ( d in '
  398. '0 : maxord [ 5 ] ) for ( q in 0 : maxord [ 6 ] ) { fit < arima '
  399. '( x. ts _ order c ( p, d, 9 ) seas list ( order c',
  400. 'id_within_doc': 17},
  401. {'Rank': 2,
  402. 'Search Score': 0.6476,
  403. 'doc_dir': 'TB-forecasting-principles',
  404. 'doc_name': 'OCR_Ch-9-ARIMA models-FPP_',
  405. 'doc_relative_loc': 73.118,
  406. 'doc_text': ': the model can still be used for forecasting, but the '
  407. 'prediction intervals may not be accurate due to the correlated '
  408. 'residuals. next we will try using the automatic arima algorithm '
  409. ': running auto. arimal ) with all arguments left at their '
  410. 'default values led to an arima ( 2, 1, 3 ) ( 0, 1, 1 ) 12 '
  411. 'model. however ; the model still fails the ljung - box test : '
  412. 'sometimes it is just not possible to find a model that passes '
  413. 'all of the tests. test set evaluation : we will compare some of '
  414. 'the models fitted so far using a test set consisting of the '
  415. 'last two years of data : thus, we fit the models using data '
  416. 'from july 1991 to june 2006, and forecast the script sales for '
  417. 'july 2006 june 2008. the results are summarised in the '
  418. 'following table table 8. 2 : rmse values for various arima '
  419. 'models applied to the hoz monthly script sales data : model '
  420. 'rmse arima ( 3, 0, 1 ) ( 0, 1, 2 ) 12 0. 0622 arima ( 3, 0, 1 ) '
  421. '( 1, 1, 1 ) 12 0. 0630 arima ( 2, 1, 4 ) ( 0, 1, 1 ) 12 0. 0632',
  422. 'id_within_doc': 68},
  423. {'Rank': 3,
  424. 'Search Score': 0.6426,
  425. 'doc_dir': 'course-script',
  426. 'doc_name': 'OCR_ATS_Script_v220214__6',
  427. 'doc_relative_loc': 72.5,
  428. 'doc_text': 'searching for cut - offs. mostly, these are far from evident ; '
  429. 'and thus, an often applied alternative is to consider all '
  430. 'models with p, 9, p, q < 2 and doing an aic - based grid '
  431. 'search, function auto _ arima ( ) may be very handy for this '
  432. 'task for our example, the sarima ( 2, 1, 2 2 ) ( 2, 1, 2 ) " 2 '
  433. 'has the lowest value and also shows satisfactory residuals, '
  434. 'although it seems to perform slightly less well than the sarima '
  435. "( 14, 1, 11 ) 00, 1, 0 )'12 the r - command for the former is : "
  436. 'fit < = arima ( log ( beer ) order - c ( 2, 1, 2 ) seasonal = c '
  437. '( 2, 1, 2 ) ) forecast of log ( beer ) with sarima ( 2, 1, 2 ) '
  438. '( 2, 1, 2 ) 3 3 [ 5 3 9 3 1985 wu 1986 1987 1988 time 1989 1990 '
  439. '1991 as it was mentioned in the introduction to this section, '
  440. 'one of the main advantages of arima and sarima models is that '
  441. 'they allow for quick and convenient forecasting : while this '
  442. 'will be discussed in depth later in section 8, we here provide '
  443. 'a first example to show the',
  444. 'id_within_doc': 29},
  445. {'Rank': 8,
  446. 'Search Score': 0.61,
  447. 'doc_dir': 'course-slides',
  448. 'doc_name': 'OCR_ATS_Slides_v220216__7',
  449. 'doc_relative_loc': 0.0,
  450. 'doc_text': 'arima, sarima & garch fitting an arima in r plausible models '
  451. 'for the logged oil prices after inspection of acfipacf of the '
  452. 'differenced series ( that seems stationary ) : arima ( 1, 1, 1 '
  453. ') or arima ( 2, 1, 1 ), the former has lower aic arima ( lop, '
  454. 'order - c ( 11, 1 ) ) coefficients : arl mal0. 2987 0. 5700 s. '
  455. 'e. 0. 2009 0. 1723 sigma ^ 2 = 0. 006642 : 11 261. 11, a = 518. '
  456. '22 alternative r command with equivalent result : arima ( drop, '
  457. 'order - c ( 1, 0, 1 ), include mean - false ) 291 arima, sarima '
  458. '& garch example : residuals for arima ( 1, 1, 1 ) residuals '
  459. 'from arima ( 1, 1, 1 ) rwwlimhlwkv wwmmv 5 1990 1995 2000 2005 '
  460. '3 3 3 g 8 3 3 3 2 8 3 3 5 10 15 20 25 30 35 5 10 15 20 25 30 35 '
  461. 'lag lag 292 ivrwukv arima, sarima & garch rewriting arima as '
  462. 'non - stationary arm',
  463. 'id_within_doc': 0}]
