Guest User

Untitled

a guest
Mar 23rd, 2025
18
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.88 KB | None | 0 0
  1. I got a model I believe to be o1 in the arena's battle mode and attempted to run the entire experiment at once (it takes upwards of 100 tries to get o1, so I wanted to avoid doing it again), but it errored out after sending the 5th set of markets.
  2.  
  3. This unfortunately means I couldn't verify that the model I believed was o1 actually was o1, but I'm pretty confident based on the following:
  4. - Model B's response to "Who are you?" was "I'm ChatGPT, a large language model developed by OpenAI. I'm here to help answer your questions, provide information, and engage in conversation on a wide range of topics. How can I assist you today?", which is a response I've only seen from older 4o models and o1.
  5. - Model A was pretty clearly gpt-4o-2024-11-20 based on the emoji usage.
  6. - There was a 15+ second delay between prompting and getting responses, indicating that Model B is a thinking model (there are no thinking models that use emojis like Model A). This means the only possible model Model B could have been is o1.
  7. - I have seen very similar responses to the "Who are you?" question from o1, although it doesn't always mention that it's ChatGPT.
  8.  
  9. Logs are here:
  10. Model B conversation (very likely o1): https://pastebin.com/naPiP4Pn
  11. Model A conversation (gpt-4o-2024-11-20; not used for resolution and I ignored its outputs, but might be interesting to compare): https://pastebin.com/V2EqZxVK
  12.  
  13. I'll have to rerun this anyway to get the 5th prediction, so I'll verify that o1 gives the same response to "Who are you?" and outputs a similar style of response before using the first four predictions to resolve the market.
  14.  
  15. There were a lot of issues with this experiment:
  16. - I screwed up phrasing the prompt at the beginning and had to redo it.
  17. - @Bayesian
  18. 's advice to add "move the market by 5% or more" to the prompt caused the model to move every market exactly 5%, which is not what o1 did in previous experiments without this prompt.
  19. - Running this experiment so long after o1 came out means it's missing critical information on current events, most notably anything relating to Trump. (It did search Trump / Musk articles and update after it was baffled by Manifold's high probability on the layoff market).
  20. - This makes the web searches more critical, and means this market is really measuring the model's google search skill and how good its priors for current events were.
  21. - I just took random binary markets from the Best feed in Manifold with at least 20 traders, but this caused a lot of low quality markets at 1% or 99% to be included, limiting the model's options. This also included markets with very long resolution times, and the model placed a trade on a market that resolves in 2030.
  22. - I ran the model calls in sequence, meaning it had access to the results of previous web searches for each subsequent prediction. This will not be true for the 5th prediction.
  23. - I didn't provide market liquidity information, info on limit orders, or market descriptions (except for the second question where the model searched the market description itself).
  24.  
  25. With that said, the model just ignored the web search results and went with its initial guess for every market, so I'm not sure how much most of these issues actually affected the results.
  26.  
  27. o1's first four predictions:
  28.  
  29. https://manifold.markets/TonyGao/will-the-fertility-rate-of-south-ko: Move market 47% -> 52%, Ṁ325 on YES (filling limit order) - Ṁ654 payout if market resolves YES. Potential profit: Ṁ329
  30. Will the fertility rate of South Korea increase from 2023 to 2025?
  31. 46% chance. Curious if the martial law attempt will indirectly increase fertility, by potential mechanisms like sense of national crisis or political gender reconciliation. Source will be https://www.index.go.kr/unify/idx-info.do?pop=1&idxCd=5061 Number to beat is 0.72 from 2023. If 2025 is higher than that, resolves Yes, otherwise No.
  32.  
  33.  
  34. https://manifold.markets/TamayBesiroglu/will-ai-be-capable-of-producing-ann: Move market 68% -> 63%, Ṁ2079 on NO (filling limit order) - Ṁ6089 payout if market resolves NO. Potential profit: Ṁ4010
  35.  
  36. https://manifold.markets/Ziddletwix/will-trump-elon-cut-250000-governme: Move market 62% -> 57% (inferred from model's response, it didn't actually give a percentage to move to), Ṁ1785 on NO (filling limit order) - Ṁ4,298 payout if market resolves NO. Potential profit: Ṁ2513
  37.  
  38. https://manifold.markets/WalterMartin/will-the-us-eliminate-the-departmen: Move market 21% -> 16%, Ṁ2877 on NO - Ṁ3525 payout if market resolves NO. Potential profit: Ṁ648
  39.  
  40. The highest liquidity market was the one that resolves in 2030, so there's a good chance (unless the 5th trade is big and resolves quickly) that this market won't resolve until that one does.
  41.  
  42. In retrospect I feel like this was a bad market idea with too much variance and prompt sensitivity, and I made many mistakes trying to execute it. Open to N/Aing if there's a trader consensus for it.
Advertisement
Add Comment
Please, Sign In to add comment