Advertisement
pszemraj

Comparing Asymmetric models p2 asymmetric semantic search

Aug 29th, 2022
151
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 19.33 KB | None | 0 0
  1. case 2 | comparison of SBERT models for asymmetric semantic search on course data (for searching for docs that are relevant quickly
  2.  
  3. query for all 3 models is the same, QUERY: deciding whether a time series is stationary
  4.  
  5. ## sentence-transformers/msmarco-bert-base-dot-v5
  6.  
  7. [{'Rank': 1,
  8. 'Search Score': 171.1836,
  9. 'doc_dir': 'TB-theory-and-methods-1992',
  10. 'doc_name': 'OCR_3_Stationary Time Series_Time Series Theory and Methods_',
  11. 'doc_relative_loc': 28.205,
  12. 'doc_text': ', - 1 ) ( ( + 02 ) oz if h = 0, 0oz if h = + 1, if | hl > 1, 14 '
  13. '1. stationary time series and hence { x, } is stationary. in '
  14. 'fact it can be shown that { x, } is strictly stationary ( see '
  15. 'problem 1. 1 ). example 1. 3. 3. let sx if t is even, x, = ( y '
  16. '+ 1 if t is odd. where { y } is a stationary time series. '
  17. 'although cov ( x, th, x, ) = yr ( h ), { x, } is not stationary '
  18. 'for it does not have a constant mean. example 1. 3. 4 referring '
  19. 'to example 1. 2. 3, let s be the random walk s = x + xz + + x, '
  20. 'where x,, x2, are independent and identically distributed with '
  21. 'mean zero and variance 02. for h > 0, cov ( sth, s ) = cov xi, '
  22. 'ax ) cov xax ) 02 and thus s is not stationary. stationary '
  23. 'processes play a crucial role in the analysis of time series of '
  24. 'course many observed time series ( see section 1. 1 ) are '
  25. 'decidedly nonstationary in appearance. frequently',
  26. 'id_within_doc': 22},
  27. {'Rank': 2,
  28. 'Search Score': 171.1249,
  29. 'doc_dir': 'lecture-audio',
  30. 'doc_name': 'SC_lecture_1_feb_21_v_2_c_transcription_4',
  31. 'doc_relative_loc': 41.818,
  32. 'doc_text': 'examples are stationary or not. so this is the question you ask '
  33. ': this is a time series that is not stationary because here, '
  34. "it's very imped. well, have to be careful. even more so, making "
  35. 'a statement that this time series is not stationary is too '
  36. 'bold. stationary is a property of a time series process of the '
  37. "random reveals here is only see data ; in fact, i don't know "
  38. 'how the serious process is. the time series process which '
  39. "generated these data is either stationary, or it's not. but i "
  40. "don't know the process, and it's hard to make a statement about "
  41. 'it. what i see is data is observations. and of course, if the '
  42. 'data looks like this and it seems highly implausible that the '
  43. 'underlying data generating process is stationary because here '
  44. 'it seems that the means of this observation, so the is of that '
  45. 'one is not identical with the most of this one. it also seems '
  46. 'very unlikely that i could take a snip it out of this time '
  47. 'series and shift it somewhere and exchange this snip and her '
  48. 'with that one so that it would look very puzzling on this '
  49. 'ground ; it seems very implausible that this is generated from '
  50. 'a stationary process. so basically, here, we do',
  51. 'id_within_doc': 23},
  52. {'Rank': 3,
  53. 'Search Score': 170.6889,
  54. 'doc_dir': 'practice-exams',
  55. 'doc_name': 'OCR_e-2020-wi-problems_',
  56. 'doc_relative_loc': 10.0,
  57. 'doc_text': 'series process? a ) yes b ) no ; because the mean is not 0 no '
  58. 'because there is a trend d ) more than one of the above answers '
  59. 'is correct 3 200 400 600 800 1000 1200 time 3 u six _ ( 3 '
  60. 'points ) is it reasonable to model the following time series '
  61. 'with a stationary time series process? a ) yes ; under all '
  62. 'circumstances b ) yes ; but only if the cyclic component is non '
  63. '- deterministic c ) yes ; but only if the cyclic component is '
  64. 'deterministic 8 " 200 400 600 800 time 7 _ ( 3 points ) the r - '
  65. 'functions acf ( ) and pack ( ) are used to compute and '
  66. 'visualize the theoretical acf and pacf of an observed time '
  67. 'series. true b ) false 8 ( 3 points ) time series xt can be '
  68. 'decomposed as xt = mt + st + rt ; where it is the trend ; st '
  69. 'the seasonal component, and rt the remainder term ; only if xt '
  70. 'is a ) white noise a stationary time series 6 ) a non - '
  71. 'stationary time series that can be made stationary taking '
  72. 'differences at appropriate lags d ) none of the above answers '
  73. 'is correct. nine _ ( 4 points ) suppose we are given a '
  74. 'stationary time series xt and that we also consider y',
  75. 'id_within_doc': 3}]
  76.  
  77. ---------------------------------------------------------------------------------------------
  78.  
  79.  
  80. ## sentence-transformers/msmarco-distilbert-cos-v5
  81.  
  82.  
  83. [{'Rank': 1,
  84. 'Search Score': 0.6858,
  85. 'doc_dir': 'TB-theory-and-methods-1992',
  86. 'doc_name': 'OCR_3_Stationary Time Series_Time Series Theory and Methods_',
  87. 'doc_relative_loc': 28.205,
  88. 'doc_text': ', - 1 ) ( ( + 02 ) oz if h = 0, 0oz if h = + 1, if | hl > 1, 14 '
  89. '1. stationary time series and hence { x, } is stationary. in '
  90. 'fact it can be shown that { x, } is strictly stationary ( see '
  91. 'problem 1. 1 ). example 1. 3. 3. let sx if t is even, x, = ( y '
  92. '+ 1 if t is odd. where { y } is a stationary time series. '
  93. 'although cov ( x, th, x, ) = yr ( h ), { x, } is not stationary '
  94. 'for it does not have a constant mean. example 1. 3. 4 referring '
  95. 'to example 1. 2. 3, let s be the random walk s = x + xz + + x, '
  96. 'where x,, x2, are independent and identically distributed with '
  97. 'mean zero and variance 02. for h > 0, cov ( sth, s ) = cov xi, '
  98. 'ax ) cov xax ) 02 and thus s is not stationary. stationary '
  99. 'processes play a crucial role in the analysis of time series of '
  100. 'course many observed time series ( see section 1. 1 ) are '
  101. 'decidedly nonstationary in appearance. frequently',
  102. 'id_within_doc': 22},
  103. {'Rank': 2,
  104. 'Search Score': 0.6795,
  105. 'doc_dir': 'lecture-audio',
  106. 'doc_name': 'SC_lecture_1_feb_21_v_2_c_transcription_4',
  107. 'doc_relative_loc': 43.636,
  108. 'doc_text': 'that this is generated from a stationary process. so basically, '
  109. "here, we don't have stationary. hence, the variant rule is as "
  110. "it's written on that slide variant has to be constant, and the "
  111. 'conjecture of that is that these time series are very unlikely '
  112. 'to be stationary, so be careful about the wording. so the time '
  113. '- serious process underlying these data is unlikely to be '
  114. "stationary's the correct wording. so usually, just say, well, "
  115. "the series is not stationary, but it's a bit of a bold "
  116. 'statement. but this is what i usually say, and you must '
  117. 'understand what it means exactly, so well, time will be up in a '
  118. "few seconds. this one is here. stationary or not, it's a good "
  119. 'question. can we move around snippets or not? if you look at '
  120. 'this feature, for example, in the first place, you may say no, '
  121. "it's so regular, so if we just shift that a little bit, then it "
  122. 'does not look good anymore. but then, if you look at that, it '
  123. 'seems to be that nature. so, we will consider this as great '
  124. 'stationery. okay, so we have to stop here yet. maybe just a '
  125. 'last remark : what we',
  126. 'id_within_doc': 24},
  127. {'Rank': 3,
  128. 'Search Score': 0.6635,
  129. 'doc_dir': 'lecture-audio',
  130. 'doc_name': 'SC_lecture_2_feb_28_v_2_c_transcription_5',
  131. 'doc_relative_loc': 23.404,
  132. 'doc_text': 'because these are simulated data. so here i know the time semi '
  133. '- time series line for the data generation. so i can make a '
  134. 'strict statement about whether the time series process is '
  135. "stationary. however, you only see the data, so you're obviously "
  136. 'in a more difficult position, so you have to make up your mind '
  137. 'with the usual tools. he forgot to prepare many questions, but '
  138. 'maybe, we can clarify quickly by just answering. what do you '
  139. 'think is the torso in the usual form, stationary or not, in '
  140. 'this first example? or maybe we have to vote for man, who is '
  141. "for stationary in this one. so and maybe i can add i don't try "
  142. 'to trick you. i mean, i could always produce something that '
  143. "looks stationary. and then i could say, well, it's not because "
  144. 'i added a tiny epsilon that makes it not detectable stationery. '
  145. "but i don't try to trick you. well, it's fairer. so stationary. "
  146. "who's for stationary at this is the majority, and it's true. "
  147. 'this arises from a stationary process. yes, so what about this '
  148. "one? so if you have to decide who is stationary, it's very few, "
  149. 'only you, but understandably, this is',
  150. 'id_within_doc': 11}]
  151.  
  152.  
  153.  
  154. ---------------------------------------------------------------------------------------------
  155.  
  156.  
  157. ## sentence-transformers/msmarco-distilbert-base-tas-b
  158.  
  159.  
  160. [{'Rank': 1,
  161. 'Search Score': 105.9459,
  162. 'doc_dir': 'TB-time-seriesR-cowpertwait',
  163. 'doc_name': 'OCR_9_Stationary Models_intro time series in R - cowperwait_',
  164. 'doc_relative_loc': 3.333,
  165. 'doc_text': 'in this chapter _ the term stationary was discussed in previous '
  166. 'chapters ; we now give a more rigorous definition. 6. 2 '
  167. 'strictly stationary series a time series model { tt } is '
  168. 'strictly stationary if the joint statistical distribution of '
  169. 'tt1 itn is the same as the joint distribution of tt1 + m ; ttn '
  170. '+ m for all t1, tn and m, s0 that the distribution is unchanged '
  171. 'after an arbitrary time shift _ note that strict stationarity '
  172. 'implies that the mean and variance are constant in time and '
  173. 'that the autocovariance cov ( xt, x $ only depends on lag k = '
  174. 'it s | and can be written ~ ( k ). if a series is not strictly '
  175. 'stationary but the mean and variance are constant in time and '
  176. 'the autocovariance only ps. p cowpertwait and a. v. metcalfe, '
  177. 'introductory time series with r, use r, doi 10. 1007 / 978 - 0 '
  178. '- 387 - 88698 - 5 _ 6, springer science + business media, llc '
  179. "2009 121 122 stationary models depends on the lag ;'then the "
  180. 'series is called second - order stationary : we focus on the '
  181. 'second - order properties in this chapter ; but the stochastic '
  182. 'processes discussed are',
  183. 'id_within_doc': 1},
  184. {'Rank': 2,
  185. 'Search Score': 105.4406,
  186. 'doc_dir': 'TB-theory-and-methods-1992',
  187. 'doc_name': 'OCR_3_Stationary Time Series_Time Series Theory and Methods_',
  188. 'doc_relative_loc': 28.205,
  189. 'doc_text': ', - 1 ) ( ( + 02 ) oz if h = 0, 0oz if h = + 1, if | hl > 1, 14 '
  190. '1. stationary time series and hence { x, } is stationary. in '
  191. 'fact it can be shown that { x, } is strictly stationary ( see '
  192. 'problem 1. 1 ). example 1. 3. 3. let sx if t is even, x, = ( y '
  193. '+ 1 if t is odd. where { y } is a stationary time series. '
  194. 'although cov ( x, th, x, ) = yr ( h ), { x, } is not stationary '
  195. 'for it does not have a constant mean. example 1. 3. 4 referring '
  196. 'to example 1. 2. 3, let s be the random walk s = x + xz + + x, '
  197. 'where x,, x2, are independent and identically distributed with '
  198. 'mean zero and variance 02. for h > 0, cov ( sth, s ) = cov xi, '
  199. 'ax ) cov xax ) 02 and thus s is not stationary. stationary '
  200. 'processes play a crucial role in the analysis of time series of '
  201. 'course many observed time series ( see section 1. 1 ) are '
  202. 'decidedly nonstationary in appearance. frequently',
  203. 'id_within_doc': 22},
  204. {'Rank': 3,
  205. 'Search Score': 105.239,
  206. 'doc_dir': 'lecture-audio',
  207. 'doc_name': 'SC_lecture_3_mar_7_v_2_c_transcription_6',
  208. 'doc_relative_loc': 35.294,
  209. 'doc_text': 'in this first part might be slightly lower than, for example, '
  210. 'in this part. so here, the decision is not clear. certainly, '
  211. 'this series here, as it appears, is not very far from what you '
  212. 'could produce from a stationary time series process. so here '
  213. 'for this series. not a very clear reject of that hypothesis '
  214. "that it's generated from a stationary process. still, and that "
  215. "is by experience to some extent. by well, let's say theoretical "
  216. 'motivation that there are time series processes which work '
  217. 'pretty well if you declare this as being none stationary and '
  218. "requiring another difference. i'll summarize it, but let's just "
  219. 'look at the result. so we say, well, this is cannon stationery, '
  220. 'so it still has some trend in it, and we just do a different '
  221. "step at it one. so this is what we do again. i mean, it's very "
  222. 'useful to write that using these operations with the makeshift '
  223. 'operator. so why is it the seasonally different series, and '
  224. 'then we just make differences at onto one, lack one again to '
  225. 'remove that potential trend? this is the full equation, and '
  226. 'this is the series we obtain. you certainly have a lot more '
  227. 'stationary than the one in i think you can not object',
  228. 'id_within_doc': 18}]
  229.  
  230. ---------------------------------------------------------------------------------------------
  231.  
  232.  
  233. ## sentence-transformers/multi-qa-mpnet-base-dot-v1
  234.  
  235. [{'Rank': 1,
  236. 'Search Score': 27.1127,
  237. 'doc_dir': 'lecture-audio',
  238. 'doc_name': 'SC_lecture_2_feb_28_v_2_c_transcription_5',
  239. 'doc_relative_loc': 8.511,
  240. 'doc_text': 'with the hypotheses, the question arises whether there are any '
  241. 'formal tests for stationary. and while some tests exist, so the '
  242. 'script has some hints and does not have them on the slides, '
  243. 'there are in some kind practically worthless, and the reason '
  244. 'for that is that a particular test for stationary typically '
  245. 'only addresses a very particular violation of the stationary '
  246. 'points. so there is nothing like a global test that would work '
  247. 'in every situation. but mostly, these tests are such that they '
  248. 'are very specific for certain kinds of stationary violations, '
  249. 'and they have little to no power for other violations. so the '
  250. 'only place for those formal tests is if you have some specific '
  251. 'potential violation of the stationary assumption. you know that '
  252. 'the tests you will be using an address that issue, and you '
  253. 'could get an answer. the deviation you observe in the data is '
  254. 'important enough to call it a significant violation of '
  255. "stationary, but globally they don't tend to work. and you get "
  256. 'far further with visual analysis based on the time series plot '
  257. 'and kind of good feeling witches associated with it, then how '
  258. 'does it work? i suggest we take time - series blocks and decide '
  259. "based on them. and the rule is that's the one in the yellow "
  260. 'box, and i think this helps pretty far. so we must',
  261. 'id_within_doc': 4},
  262. {'Rank': 2,
  263. 'Search Score': 26.9956,
  264. 'doc_dir': 'course-slides',
  265. 'doc_name': 'OCR_ATS_Slides_v220216__1',
  266. 'doc_relative_loc': 50.0,
  267. 'doc_text': 'non - stationarity in a stationary series, one can move any '
  268. 'random snippet of the series to any other location of choice. '
  269. 'if that does not seem right, it is unlikely that the underlying '
  270. 'process is stationary. particular violations of stationarity : '
  271. 'trend, i. e. non - constant expected value seasonality, i. e. '
  272. 'deterministic, periodical oscillations, non - constant '
  273. 'variation, i. e. multiplicative error non - constant dependency '
  274. 'structure remark : some periodical oscillations, as in the lynx '
  275. 'data, are stochastic and originate from a stationary process. '
  276. 'however, the boundary between the two is fuzzy. 24 mathematical '
  277. 'concepts strategies for detecting non - stationarity 1 ) time '
  278. 'series plot not being able to move any arbitrary snippet non - '
  279. 'constant expectation ( trendy seasonal effect ) changes in the '
  280. 'dependency structure non - constant variation 2 ) correlogram ( '
  281. 'presented later _ _ ) non - constant expected value ( trendy '
  282. 'seasonal effect ) changes in the dependency structure a ( '
  283. 'sometimes ) useful trick, especially when working with the '
  284. 'correlogram ; is to split up the series in two or more parts, '
  285. 'and producing plots for each of the pieces separately. 25 '
  286. 'mathematical concepts example : simulated time series 1 '
  287. 'simulated time series',
  288. 'id_within_doc': 12},
  289. {'Rank': 3,
  290. 'Search Score': 26.0594,
  291. 'doc_dir': 'TB-forecasting-principles',
  292. 'doc_name': 'OCR_Ch-9-ARIMA models-FPP_',
  293. 'doc_relative_loc': 2.151,
  294. 'doc_text': ') annual number of strikes in the us ; ( d ) monthly sales of '
  295. 'new onefamily houses sold in the us ; ( e ) annual price of a '
  296. 'dozen eggs in the us ( constant dollars ) ; ( f ) monthly total '
  297. 'of pigs slaughtered in victoria, australia ; ( g ) annual total '
  298. 'of lynx trapped in the mckenzie river district of north - west '
  299. 'canada ; ( h ) monthly australian beer production ; ( i ) '
  300. 'monthly australian electricity production : consider the nine '
  301. 'series plotted in figure 8. 1. which of these do you think are '
  302. 'stationary? obvious seasonality rules out series ( d ), ( h ) '
  303. 'and ( i ). trends and changing levels rules out series ( a ), ( '
  304. 'c ), ( e ), ( f ) and ( i ). increasing variance also rules out '
  305. '( i ). that leaves only ( b ) and ( g ) as stationary series. '
  306. 'at first glance, the strong cycles in series ( g ) might appear '
  307. 'to make it nonstationary. but these cycles are aperiodic they '
  308. 'are caused when the lynx population becomes too large for the '
  309. 'available feed, so that they stop breeding ww and the '
  310. 'population falls to low numbers, then the regeneration of their '
  311. 'food sources allows the population to grow again, and s0 on.',
  312. 'id_within_doc': 2}]
  313.  
  314.  
  315. ----------------------------
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement