Guest User

Untitled

a guest
Jun 22nd, 2017
665
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 35.15 KB | None | 0 0
  1.  
  2.  
  3. **UPDATE 1 - RU/m**
  4.  
  5. I read through the [documentation on Request Units][2], and noticed it talked about RU/s and RU/m.
  6.  
  7. Unfortunately it had very little information, and what it did have was very unclear.
  8.  
  9. But there was a second [documentation][3] page which then went in to more detail.
  10.  
  11. From what I can gather RU/m provides a way to reliably burst about the RU/s you have set.
  12.  
  13. So, if you have 400 RU/s you can also have 4000 RU/m. If you exceed your RU/s then you can take RU/m to "top up" the throughput. So an 800 RU/s query would consume 400 RU/s and then 400 RU from the RU/m.
  14.  
  15. Since I'm seeing performance which is below my RU/s threshold I didn't think adding RU/m would affect my situation, but I tired it anyway.
  16.  
  17. I re-ran my query and got 8728 documents, in 187595.148 ms, using 3917.92 RUs.
  18.  
  19. That's even slower than the first run, though I expect that's due to variations in network and other environment conditions between runs.
  20.  
  21. (So I have now disabled RU/m again since it does cost money but provides no benefit here.)
  22.  
  23. **UPDATE 2 - The Polygon**
  24.  
  25. Here's the polygon I'm using in the query:
  26.  
  27. Area = new Polygon(new List<LinearRing>()
  28. {
  29. new LinearRing(new List<Position>()
  30. {
  31. new Position(1.8567 ,51.3814),
  32.  
  33. new Position(0.5329 ,51.4618),
  34. new Position(0.2477 ,51.2588),
  35. new Position(-0.5329 ,51.2579),
  36. new Position(-1.17 ,51.2173),
  37. new Position(-1.9062 ,51.1958),
  38. new Position(-2.5434 ,51.1614),
  39. new Position(-3.8672 ,51.139 ),
  40. new Position(-4.1578 ,50.9137),
  41. new Position(-4.5373 ,50.694 ),
  42. new Position(-5.1496 ,50.3282),
  43. new Position(-5.2212 ,49.9586),
  44. new Position(-3.7049 ,50.142 ),
  45. new Position(-2.1698 ,50.314 ),
  46. new Position(0.4669 ,50.6976),
  47.  
  48. new Position(1.8567 ,51.3814)
  49. })
  50. })
  51.  
  52. I have also tried reversing it (since ring orientation matters), but the query with the reversed polygon took significantly longer (I don't have the time to hand) and returned 91272 items.
  53.  
  54. Also, the coordinates are specified as Longitude/Latitude, as [this is how GeoJSON expects them][4] (i.e. as X/Y), rather than the traditional order used when speaking of Latitude/Longitude.
  55.  
  56. > The GeoJSON specification specifies longitude first and latitude second.
  57.  
  58. **UPDATE 3 - Data Size and Connection Speed**
  59.  
  60. The overview on the Azure Portal for my collection is showing 110mb of data.
  61.  
  62. Running a speed test on my internet connection shows a consistent 65 Mbit/s.
  63.  
  64. So ignoring any other factors (such as how long it takes to collate the data server-side), it should take ~14 s for the entire data set to download in one big chunk.
  65.  
  66. Out of interest I tried getting all the documents, and it took 1923070.0002 ms, which is 1923 seconds, or 32 minutes.
  67.  
  68. This is the query I used:
  69.  
  70. var query = client
  71. .CreateDocumentQuery<T>(documentCollectionUri)
  72. .Where(s => s.Type == this.subscriptionType);
  73.  
  74. var documents = await this.QueryTrackingUsedRUsAsync(query);
  75.  
  76. And I again then passed it to my `QueryTrackingUsedRUsAsync` method for tracking the RUs.
  77.  
  78. **UPDATE 4 - Azure Portal Statistics**
  79.  
  80. Here's the "Max consumed RU/s per physical partition" graph from just after I'd finished the query just above, which is retrieving all documents without the geospatial filter. They query still contains a check against a property which is an enum. This enum indicates the "type" of the document as in the future more than one "type" of these documents will be stored in the collection.
  81.  
  82. [![Azure Portal - Max RUs][5]][5]
  83.  
  84. It appears the Max RUs never go above ~80. And that seems to be some kind of hard limit.
  85.  
  86. Again, I find this confusing as from what I've read it should be using the full throughput, and from one of the channel 9 videos I watched it said that the documents will be collated and then sent as one big chunk.
  87.  
  88. I'm either misunderstanding what should be happening, or I've got some insane client-side code that is screwing things up, or there's something wrong server-side.
  89.  
  90. **UPDATE 5 - Non-Spatial Parts**
  91.  
  92. There are two things I'm doing during the querying which I haven't yet cast much of a critical eye on.
  93.  
  94. The first is the method which counts used RUs. Having that information is very useful, and is very much recommended in the documentation so you can monitor and tune your queries.
  95.  
  96. (As an aside: considering that this information is so important, it's a pain in the arse to access using the .Net SDK.)
  97.  
  98. The second is that there's also a filter based on an enum property. This enum indicates the "document type", and will be used to differentiate documents in the future. (e.g. the collection contains DocumentAs and DocumentBs, and I only care about As.)
  99.  
  100. So I created a new test, similar to my previous test, where all documents are retrieved omitting the spatial query. In addition it also omits the Document Type filter, and it no longer uses the RU tracking.
  101.  
  102. Here's my query:
  103.  
  104. var query = client.CreateDocumentQuery<T>(documentCollectionUri);
  105. var documents = query.ToList();
  106.  
  107. It took 1802902.6053 ms to get all 100,000 items, which is ~1803 seconds, or ~30 minutes.
  108.  
  109. These results seem to indicate that there is an issue with returning large numbers of documents.
  110.  
  111. As mentioned above the data is 110mb total, which should take roughly 14 seconds to transmit at the speed of my internet connection.
  112.  
  113. There is obviously a very large difference between 14 seconds and 30 minutes.
  114.  
  115. **UPDATE 6 - Local VS Cloud**
  116.  
  117. As mentioned, I have been running this code on my local network.
  118.  
  119. Eventually it will be run in the same data center as the CosmosDB, but during development/testing I was willing to take the extra performance hit.
  120.  
  121. From what I understand the main difference between the two environments is the latency of network calls. The bandwidth is less important as it's not very much data.
  122.  
  123. From what I've heard in a channel 9 video, the data is collected and merged in to one big chunk for sending.
  124.  
  125. If it is indeed sent in a single chunk then the latency differences should be almost negligible as it'll only be a single call.
  126.  
  127. If there is a separate call for each of the 100,000 items then the latency would build up a lot more. (100,000 times more heh).
  128.  
  129. Running a few tests it seems the latency to the azure data center is roughly 55 ms from my local network, and it should be <10 ms in the data center itself.
  130.  
  131. Running it on a VM in the cloud took 1331346.3065 ms, which is ~1331 seconds, or 22 minutes.
  132.  
  133. That's better, but it still seems, well, unusable.
  134.  
  135. The Azure graphs look the same too, the throughput seems to top out at 80 RU/s.
  136.  
  137. **UPDATE 7 - Sample Document**
  138.  
  139. Here's the JSON for one of my documents:
  140.  
  141. {
  142. "GeoTrigger": null,
  143. "SeverityTrigger": -1,
  144. "TypeTrigger": -1,
  145. "Name": "13, LONSDALE SQUARE, LONDON, N1 1EN",
  146. "IsEnabled": true,
  147. "Type": 2,
  148. "Location": {
  149. "$type": "Microsoft.Azure.Documents.Spatial.Point, Microsoft.Azure.Documents.Client",
  150. "type": "Point",
  151. "coordinates": [
  152. -0.1076407397346815,
  153. 51.53970315059827
  154. ]
  155. },
  156. "id": "0dc2c03e-082b-4aea-93a8-79d89546c12b",
  157. "_rid": "EQttAMGhSQDWPwAAAAAAAA==",
  158. "_self": "dbs/EQttAA==/colls/EQttAMGhSQA=/docs/EQttAMGhSQDWPwAAAAAAAA==/",
  159. "_etag": "\"42001028-0000-0000-0000-594943fe0000\"",
  160. "_attachments": "attachments/",
  161. "_ts": 1497973747
  162. }
  163.  
  164. **UPDATE 8 - The Repro Code**
  165.  
  166. I decided it would be best if I shared the majority of my code as that may be part of the problem.
  167. There is one small part I haven't included, which is initially retrieving the addresses for creating the test documents.
  168. It involves proprietary data and some processing that isn't relevant to the DocumentDB interaction.
  169. It can be replaced by generating random coordinates within the UK bounds.
  170.  
  171. These are some of the execution times from yesterday:
  172.  
  173. 1,000 2278.8806ms
  174. 10,000 17516.9564ms
  175. 100,000 173262.2865ms
  176.  
  177. As these times seem pretty linear, I'm dropping from 100,000 documents to 10,000 to improve the rate at which I can iterate revisions.
  178.  
  179. Here's the repro code:
  180.  
  181. using System;
  182. using System.Collections.Generic;
  183. using System.Configuration;
  184. using System.Diagnostics;
  185. using System.Linq;
  186. using System.Runtime.CompilerServices;
  187. using System.Threading;
  188. using System.Threading.Tasks;
  189. using Microsoft.Azure.Documents;
  190. using Microsoft.Azure.Documents.Client;
  191. using Microsoft.Azure.Documents.Spatial;
  192.  
  193. namespace Repro.Cli
  194. {
  195. public class Program
  196. {
  197. static void Main(string[] args)
  198. {
  199. //AJ: Init logging
  200. Trace.AutoFlush = true;
  201. Trace.Listeners.Add(new ConsoleTraceListener());
  202. Trace.Listeners.Add(new TextWriterTraceListener("trace.log"));
  203.  
  204. //AJ: Increase availible threads
  205. //AJ: https://docs.microsoft.com/en-us/azure/storage/storage-performance-checklist#subheading10
  206. //AJ: https://github.com/Azure/azure-documentdb-dotnet/blob/master/samples/documentdb-benchmark/Program.cs
  207. var minThreadPoolSize = 100;
  208. ThreadPool.SetMinThreads(minThreadPoolSize, minThreadPoolSize);
  209.  
  210. //AJ: https://docs.microsoft.com/en-us/azure/cosmos-db/performance-tips
  211. //AJ: gcServer enabled in app.config
  212. //AJ: Prefer 32-bit disabled in project properties
  213.  
  214. //AJ: DO IT
  215. var program = new Program();
  216.  
  217. Trace.TraceInformation($"Starting @ {DateTime.UtcNow}");
  218. program.RunAsync().Wait();
  219. Trace.TraceInformation($"Finished @ {DateTime.UtcNow}");
  220.  
  221. //AJ: Wait for user to exit
  222. Console.WriteLine();
  223. Console.WriteLine("Hit enter to exit...");
  224. Console.ReadLine();
  225. }
  226.  
  227. public async Task RunAsync()
  228. {
  229. using (new CodeTimer())
  230. {
  231. var client = await this.GetDocumentClientAsync();
  232. var documentCollectionUri = UriFactory.CreateDocumentCollectionUri(ConfigurationManager.AppSettings["databaseID"], ConfigurationManager.AppSettings["collectionID"]);
  233.  
  234. //AJ: Prepare Test Documents
  235. //var documentCount = 10000; //AJ: 10,000
  236. //var documentsForUpsert = this.GetDocuments(documentCount);
  237. //await this.UpsertDocumentsAsync(client, documentCollectionUri, documentsForUpsert);
  238.  
  239. var allDocuments = this.GetAllDocuments(client, documentCollectionUri);
  240.  
  241. var area = this.GetArea();
  242. var documentsInArea = this.GetDocumentsInArea(client, documentCollectionUri, area);
  243. }
  244. }
  245.  
  246. private async Task<DocumentClient> GetDocumentClientAsync()
  247. {
  248. using (new CodeTimer())
  249. {
  250. var serviceEndpointUri = new Uri(ConfigurationManager.AppSettings["serviceEndpoint"]);
  251. var authKey = ConfigurationManager.AppSettings["authKey"];
  252.  
  253. var connectionPolicy = new ConnectionPolicy
  254. {
  255. ConnectionMode = ConnectionMode.Direct,
  256. ConnectionProtocol = Protocol.Tcp,
  257. RequestTimeout = new TimeSpan(1, 0, 0),
  258. RetryOptions = new RetryOptions
  259. {
  260. MaxRetryAttemptsOnThrottledRequests = 10,
  261. MaxRetryWaitTimeInSeconds = 60
  262. }
  263. };
  264.  
  265. var client = new DocumentClient(serviceEndpointUri, authKey, connectionPolicy);
  266.  
  267. await client.OpenAsync();
  268.  
  269. return client;
  270. }
  271. }
  272.  
  273. private List<TestDocument> GetDocuments(int count)
  274. {
  275. using (new CodeTimer())
  276. {
  277. return External.CreateDocuments(count);
  278. }
  279. }
  280.  
  281. private async Task UpsertDocumentsAsync(DocumentClient client, Uri documentCollectionUri, List<TestDocument> documents)
  282. {
  283. using (new CodeTimer())
  284. {
  285. //TODO: AJ: Parallelise
  286. foreach (var document in documents)
  287. {
  288. await client.UpsertDocumentAsync(documentCollectionUri, document);
  289. }
  290. }
  291. }
  292.  
  293. private List<TestDocument> GetAllDocuments(DocumentClient client, Uri documentCollectionUri)
  294. {
  295. using (new CodeTimer())
  296. {
  297. var query = client
  298. .CreateDocumentQuery<TestDocument>(documentCollectionUri, new FeedOptions()
  299. {
  300. MaxItemCount = 1000
  301. });
  302.  
  303. var documents = query.ToList();
  304.  
  305. return documents;
  306. }
  307. }
  308.  
  309. private Polygon GetArea()
  310. {
  311. //AJ: Longitude,Latitude i.e. X/Y
  312. //AJ: Ring orientation matters
  313. return new Polygon(new List<LinearRing>()
  314. {
  315. new LinearRing(new List<Position>()
  316. {
  317. new Position(1.8567 ,51.3814),
  318.  
  319. new Position(0.5329 ,51.4618),
  320. new Position(0.2477 ,51.2588),
  321. new Position(-0.5329 ,51.2579),
  322. new Position(-1.17 ,51.2173),
  323. new Position(-1.9062 ,51.1958),
  324. new Position(-2.5434 ,51.1614),
  325. new Position(-3.8672 ,51.139 ),
  326. new Position(-4.1578 ,50.9137),
  327. new Position(-4.5373 ,50.694 ),
  328. new Position(-5.1496 ,50.3282),
  329. new Position(-5.2212 ,49.9586),
  330. new Position(-3.7049 ,50.142 ),
  331. new Position(-2.1698 ,50.314 ),
  332. new Position(0.4669 ,50.6976),
  333.  
  334. //AJ: Last point must be the same as first point
  335. new Position(1.8567 ,51.3814)
  336. })
  337. });
  338. }
  339.  
  340. private List<TestDocument> GetDocumentsInArea(DocumentClient client, Uri documentCollectionUri, Polygon area)
  341. {
  342. using (new CodeTimer())
  343. {
  344. var query = client
  345. .CreateDocumentQuery<TestDocument>(documentCollectionUri, new FeedOptions()
  346. {
  347. MaxItemCount = 1000
  348. })
  349. .Where(document => document.Location.Intersects(area));
  350.  
  351. var documents = query.ToList();
  352.  
  353. return documents;
  354. }
  355. }
  356. }
  357.  
  358. public class TestDocument : Resource
  359. {
  360. public string Name { get; set; }
  361. public Point Location { get; set; } //AJ: Longitude,Latitude i.e. X/Y
  362.  
  363. public TestDocument()
  364. {
  365. this.Id = Guid.NewGuid().ToString("N");
  366. }
  367. }
  368.  
  369. //AJ: This should be "good enough". The times being recorded are seconds or minutes.
  370. public class CodeTimer : IDisposable
  371. {
  372. private Action<TimeSpan> reportFunction;
  373. private Stopwatch stopwatch = new Stopwatch();
  374.  
  375. public CodeTimer([CallerMemberName]string name = "")
  376. : this((ellapsed) =>
  377. {
  378. Trace.TraceInformation($"{name} took {ellapsed}, or {ellapsed.TotalMilliseconds} ms.");
  379. })
  380. { }
  381.  
  382. public CodeTimer(Action<TimeSpan> report)
  383. {
  384. this.reportFunction = report;
  385. this.stopwatch.Start();
  386. }
  387.  
  388. public void Dispose()
  389. {
  390. this.stopwatch.Stop();
  391. this.reportFunction(this.stopwatch.Elapsed);
  392. }
  393. }
  394. }
  395.  
  396. This code had a very interesting result ... **it's faster!**
  397.  
  398. Or to put it another way ... it fails at being a reproduction, as it doesn't reproduce the issue.
  399.  
  400. I am happy with this result! It means I now have some "working" code, and some "nonworking" code, and I can slowly apply each difference to the working code until it stops working.
  401.  
  402. This is the log when running locally:
  403.  
  404. Starting @ 22/06/2017 11:02:23
  405. GetDocumentClientAsync took 00:00:01.9866833, or 1986.6833 ms.
  406. GetAllDocuments took 00:00:05.6934298, or 5693.4298 ms.
  407. GetDocumentsInArea took 00:00:00.4367697, or 436.7697 ms.
  408. RunAsync took 00:00:08.3177466, or 8317.7466 ms.
  409. Finished @ 22/06/2017 11:02:32
  410.  
  411. So that's 6 seconds to return all 10,000 records, and 0.5 seconds to run the spatial query! This is the kind of performance I was hoping for.
  412.  
  413. And this is the log when running in the cloud:
  414.  
  415. Starting @ 6/22/2017 11:04:17 AM
  416. GetDocumentClientAsync took 00:00:01.0331417, or 1033.1417 ms.
  417. GetAllDocuments took 00:00:05.9973366, or 5997.3366 ms.
  418. GetDocumentsInArea took 00:00:01.7479158, or 1747.9158 ms.
  419. RunAsync took 00:00:08.7885318, or 8788.5318 ms.
  420. Finished @ 6/22/2017 11:04:26 AM
  421.  
  422. So that's 6 seconds to return all 10,000 records, and 2 seconds to run the spatial query. It's a little odd that the query is slower in the cloud, but I suspect that's due to network/envrionment conditions. The performance is good enough, and way better than it was.
  423.  
  424. Much much, better than yesterday.
  425.  
  426. Now, to make changes to the working code until I can replicate the poor performance I was seeing yesterday.
  427.  
  428. **UPDATE 9 - The Differences**
  429.  
  430. I'll run through each difference I can find, after each difference I'll revert the code back to whats in the Repro update. This should show what impact each difference has independantly.
  431.  
  432. As there is a limited difference in runtimes between running locally and in the cloud, I'll only run locally for these tests to save time.
  433.  
  434. The first difference I noticed, is that while I was creating the `ConnectionPolicy` object in my old code, I was failing to pass it to the `DocumentClient`. An easy mistake to make, but one I suspect will have a drastic impact as the `ConnectionMode` will default to `Gateway`, and the `ConnectionProtocol` to `HTTPS`.
  435.  
  436. Code Changes:
  437.  
  438. //var client = new DocumentClient(serviceEndpointUri, authKey, connectionPolicy);
  439. var client = new DocumentClient(serviceEndpointUri, authKey);
  440.  
  441. Results:
  442.  
  443. Starting @ 22/06/2017 11:09:25
  444. GetDocumentClientAsync took 00:00:01.3944992, or 1394.4992 ms.
  445. GetAllDocuments took 00:00:05.0646737, or 5064.6737 ms.
  446. GetDocumentsInArea took 00:00:01.5874742, or 1587.4742 ms.
  447. RunAsync took 00:00:08.2558457, or 8255.8457 ms.
  448. Finished @ 22/06/2017 11:09:34
  449.  
  450. 5 seconds / 1.5 seconds.
  451.  
  452. So it seems that the `ConnectionPolicy` has little impact for my specific tests, the difference in times is well within the varience seen between runs.
  453.  
  454.  
  455. Next up, I previously wasn't setting the `MaxItemCount` for the `FeedOptions` when executing a query.
  456.  
  457. Code Changes:
  458.  
  459. //var query = client
  460. // .CreateDocumentQuery<TestDocument>(documentCollectionUri, new FeedOptions()
  461. // {
  462. // MaxItemCount = 1000
  463. // });
  464.  
  465. var query = client.CreateDocumentQuery<TestDocument>(documentCollectionUri);
  466.  
  467. //var query = client
  468. // .CreateDocumentQuery<TestDocument>(documentCollectionUri, new FeedOptions()
  469. // {
  470. // MaxItemCount = 1000
  471. // })
  472. // .Where(document => document.Location.Intersects(area));
  473.  
  474. var query = client.CreateDocumentQuery<TestDocument>(documentCollectionUri).Where(document => document.Location.Intersects(area));
  475.  
  476. Results:
  477.  
  478. Starting @ 22/06/2017 11:12:01
  479. GetDocumentClientAsync took 00:00:02.2025152, or 2202.5152 ms.
  480. GetAllDocuments took 00:00:09.4601258, or 9460.1258 ms.
  481. GetDocumentsInArea took 00:00:01.2358951, or 1235.8951 ms.
  482. RunAsync took 00:00:13.1350025, or 13135.0025 ms.
  483. Finished @ 22/06/2017 11:12:14
  484.  
  485. 9.5 seconds / 1.5 seconds.
  486.  
  487. Returning lots of results took longer, which is to be expected as they're being returning in a larger number of smaller chunks. But it's still not as bad as yesterday.
  488.  
  489.  
  490. Another difference is that I was using some custom `JSONSerializerSettings` to include type information, and the `Location` property was a `Geometry` type containing a `Point`, rather than directly being a `Point`.
  491.  
  492. (For this test I had to reinsert the test documents, as the JSON serializer output is different.)
  493.  
  494. Code Changes:
  495.  
  496. //AJ: Init Serializer
  497. JsonConvert.DefaultSettings = () =>
  498. {
  499. return new JsonSerializerSettings
  500. {
  501. TypeNameHandling = TypeNameHandling.Auto
  502. };
  503. };
  504.  
  505. //public Point Location { get; set; } //AJ: Longitude,Latitude i.e. X/Y
  506. public Geometry Location { get; set; } //AJ: Longitude,Latitude i.e. X/Y
  507.  
  508. Results:
  509.  
  510. Starting @ 22/06/2017 11:23:20
  511. GetDocumentClientAsync took 00:00:01.9610856, or 1961.0856 ms.
  512. GetDocuments took 00:00:10.4402680, or 10440.268 ms.
  513. UpsertDocumentsAsync took 00:06:30.6676130, or 390667.613 ms.
  514. GetAllDocuments took 00:00:06.9084585, or 6908.4585 ms.
  515. GetDocumentsInArea took 00:00:00.7809068, or 780.9068 ms.
  516. RunAsync took 00:06:51.0916425, or 411091.6425 ms.
  517. Finished @ 22/06/2017 11:30:11
  518.  
  519. 7 seconds, 1 second.
  520.  
  521. It took a lot longer to run, but thats because the inital insert was included. The run time of the queries was still well within normal variences between runs.
  522.  
  523.  
  524. Yet another difference is that my previous document had a few extra properties, and so was slightly larger.
  525.  
  526. (For this test I had to reinsert the test documents, as the JSON serializer output is different.)
  527.  
  528. Code Changes:
  529.  
  530. public enum TestEnum
  531. {
  532. Default = 0,
  533. Foo = 1,
  534. Bar = 2
  535. }
  536.  
  537. public class TestDocument : Resource
  538. {
  539. public object GeoTrigger { get; set; }
  540.  
  541. public TestEnum SeverityTrigger { get; set; }
  542. public TestEnum TypeTrigger { get; set; }
  543. public TestEnum Type { get; set; } = TestEnum.Foo;
  544.  
  545. public bool IsEnabled { get; set; } = true;
  546.  
  547. public string Name { get; set; }
  548. public Point Location { get; set; } //AJ: Longitude,Latitude i.e. X/Y
  549.  
  550. public TestDocument()
  551. {
  552. this.Id = Guid.NewGuid().ToString("N");
  553. }
  554. }
  555.  
  556. Results:
  557.  
  558. Starting @ 22/06/2017 11:41:58
  559. GetDocumentClientAsync took 00:00:01.8769841, or 1876.9841 ms.
  560. GetDocuments took 00:00:10.7117590, or 10711.759 ms.
  561. UpsertDocumentsAsync took 00:06:42.9010966, or 402901.0966 ms.
  562. GetAllDocuments took 00:00:07.7120264, or 7712.0264 ms.
  563. GetDocumentsInArea took 00:00:00.9173274, or 917.3274 ms.
  564. RunAsync took 00:07:04.5086401, or 424508.6401 ms.
  565. Finished @ 22/06/2017 11:49:03
  566.  
  567. 8 seconds / 1 second.
  568.  
  569. Still no real difference.
  570.  
  571.  
  572. I'm starting to run out of ideas, but while looking at the `JSONSerializerSettings` I found I'd also left a custom `ContractResolver` in the original code, so lets try that! The `ContractResolver` isn't needed at all, but maybe it was impacting the performance.
  573.  
  574. Code Changes:
  575.  
  576. //AJ: Init Serializer
  577. JsonConvert.DefaultSettings = () =>
  578. {
  579. return new JsonSerializerSettings
  580. {
  581. ContractResolver = new PropertyNameMapContractResolver(new Dictionary<string, string>()
  582. {
  583. { "ID", "id" }
  584. })
  585. };
  586. };
  587.  
  588. public class PropertyNameMapContractResolver : DefaultContractResolver
  589. {
  590. private Dictionary<string, string> propertyNameMap;
  591.  
  592. public PropertyNameMapContractResolver(Dictionary<string, string> propertyNameMap)
  593. {
  594. this.propertyNameMap = propertyNameMap;
  595. }
  596.  
  597. protected override string ResolvePropertyName(string propertyName)
  598. {
  599. if (this.propertyNameMap.TryGetValue(propertyName, out string resolvedName))
  600. return resolvedName;
  601.  
  602. return base.ResolvePropertyName(propertyName);
  603. }
  604. }
  605.  
  606. Results:
  607.  
  608. Starting @ 22/06/2017 11:56:11
  609. GetDocumentClientAsync took 00:00:02.0503540, or 2050.354 ms.
  610. GetDocuments took 00:00:17.0165692, or 17016.5692 ms.
  611. UpsertDocumentsAsync took 00:10:48.9390534, or 648939.0534 ms.
  612. GetAllDocuments took 00:02:28.1758434, or 148175.8434 ms.
  613. GetDocumentsInArea took 00:00:13.1400179, or 13140.0179 ms.
  614. RunAsync took 00:13:49.6587200, or 829658.72 ms.
  615. Finished @ 22/06/2017 12:10:01
  616.  
  617. 2 minutes 28 seconds / 13 seconds.
  618.  
  619. It looks like we have a winner! I thought that this wouldn't be an issue, but it just goes to show you should check all the possible causes you find, not just the obvious ones!
  620.  
  621.  
  622. So here's the offending reproduction code in full:
  623.  
  624. using System;
  625. using System.Collections.Generic;
  626. using System.Configuration;
  627. using System.Diagnostics;
  628. using System.Linq;
  629. using System.Runtime.CompilerServices;
  630. using System.Threading;
  631. using System.Threading.Tasks;
  632. using Microsoft.Azure.Documents;
  633. using Microsoft.Azure.Documents.Client;
  634. using Microsoft.Azure.Documents.Spatial;
  635. using Newtonsoft.Json;
  636. using Newtonsoft.Json.Serialization;
  637.  
  638. namespace Repro.Cli
  639. {
  640. public class Program
  641. {
  642. static void Main(string[] args)
  643. {
  644. JsonConvert.DefaultSettings = () =>
  645. {
  646. return new JsonSerializerSettings
  647. {
  648. ContractResolver = new PropertyNameMapContractResolver(new Dictionary<string, string>()
  649. {
  650. { "ID", "id" }
  651. })
  652. };
  653. };
  654.  
  655. //AJ: Init logging
  656. Trace.AutoFlush = true;
  657. Trace.Listeners.Add(new ConsoleTraceListener());
  658. Trace.Listeners.Add(new TextWriterTraceListener("trace.log"));
  659.  
  660. //AJ: Increase availible threads
  661. //AJ: https://docs.microsoft.com/en-us/azure/storage/storage-performance-checklist#subheading10
  662. //AJ: https://github.com/Azure/azure-documentdb-dotnet/blob/master/samples/documentdb-benchmark/Program.cs
  663. var minThreadPoolSize = 100;
  664. ThreadPool.SetMinThreads(minThreadPoolSize, minThreadPoolSize);
  665.  
  666. //AJ: https://docs.microsoft.com/en-us/azure/cosmos-db/performance-tips
  667. //AJ: gcServer enabled in app.config
  668. //AJ: Prefer 32-bit disabled in project properties
  669.  
  670. //AJ: DO IT
  671. var program = new Program();
  672.  
  673. Trace.TraceInformation($"Starting @ {DateTime.UtcNow}");
  674. program.RunAsync().Wait();
  675. Trace.TraceInformation($"Finished @ {DateTime.UtcNow}");
  676.  
  677. //AJ: Wait for user to exit
  678. Console.WriteLine();
  679. Console.WriteLine("Hit enter to exit...");
  680. Console.ReadLine();
  681. }
  682.  
  683. public async Task RunAsync()
  684. {
  685. using (new CodeTimer())
  686. {
  687. var client = await this.GetDocumentClientAsync();
  688. var documentCollectionUri = UriFactory.CreateDocumentCollectionUri(ConfigurationManager.AppSettings["databaseID"], ConfigurationManager.AppSettings["collectionID"]);
  689.  
  690. //AJ: Prepare Test Documents
  691. var documentCount = 10000; //AJ: 10,000
  692. var documentsForUpsert = this.GetDocuments(documentCount);
  693. await this.UpsertDocumentsAsync(client, documentCollectionUri, documentsForUpsert);
  694.  
  695. var allDocuments = this.GetAllDocuments(client, documentCollectionUri);
  696.  
  697. var area = this.GetArea();
  698. var documentsInArea = this.GetDocumentsInArea(client, documentCollectionUri, area);
  699. }
  700. }
  701.  
  702. private async Task<DocumentClient> GetDocumentClientAsync()
  703. {
  704. using (new CodeTimer())
  705. {
  706. var serviceEndpointUri = new Uri(ConfigurationManager.AppSettings["serviceEndpoint"]);
  707. var authKey = ConfigurationManager.AppSettings["authKey"];
  708.  
  709. var connectionPolicy = new ConnectionPolicy
  710. {
  711. ConnectionMode = ConnectionMode.Direct,
  712. ConnectionProtocol = Protocol.Tcp,
  713. RequestTimeout = new TimeSpan(1, 0, 0),
  714. RetryOptions = new RetryOptions
  715. {
  716. MaxRetryAttemptsOnThrottledRequests = 10,
  717. MaxRetryWaitTimeInSeconds = 60
  718. }
  719. };
  720.  
  721. var client = new DocumentClient(serviceEndpointUri, authKey, connectionPolicy);
  722.  
  723. await client.OpenAsync();
  724.  
  725. return client;
  726. }
  727. }
  728.  
  729. private List<TestDocument> GetDocuments(int count)
  730. {
  731. using (new CodeTimer())
  732. {
  733. return External.CreateDocuments(count);
  734. }
  735. }
  736.  
  737. private async Task UpsertDocumentsAsync(DocumentClient client, Uri documentCollectionUri, List<TestDocument> documents)
  738. {
  739. using (new CodeTimer())
  740. {
  741. //TODO: AJ: Parallelise
  742. foreach (var document in documents)
  743. {
  744. await client.UpsertDocumentAsync(documentCollectionUri, document);
  745. }
  746. }
  747. }
  748.  
  749. private List<TestDocument> GetAllDocuments(DocumentClient client, Uri documentCollectionUri)
  750. {
  751. using (new CodeTimer())
  752. {
  753. var query = client
  754. .CreateDocumentQuery<TestDocument>(documentCollectionUri, new FeedOptions()
  755. {
  756. MaxItemCount = 1000
  757. });
  758.  
  759. var documents = query.ToList();
  760.  
  761. return documents;
  762. }
  763. }
  764.  
  765. private Polygon GetArea()
  766. {
  767. //AJ: Longitude,Latitude i.e. X/Y
  768. //AJ: Ring orientation matters
  769. return new Polygon(new List<LinearRing>()
  770. {
  771. new LinearRing(new List<Position>()
  772. {
  773. new Position(1.8567 ,51.3814),
  774.  
  775. new Position(0.5329 ,51.4618),
  776. new Position(0.2477 ,51.2588),
  777. new Position(-0.5329 ,51.2579),
  778. new Position(-1.17 ,51.2173),
  779. new Position(-1.9062 ,51.1958),
  780. new Position(-2.5434 ,51.1614),
  781. new Position(-3.8672 ,51.139 ),
  782. new Position(-4.1578 ,50.9137),
  783. new Position(-4.5373 ,50.694 ),
  784. new Position(-5.1496 ,50.3282),
  785. new Position(-5.2212 ,49.9586),
  786. new Position(-3.7049 ,50.142 ),
  787. new Position(-2.1698 ,50.314 ),
  788. new Position(0.4669 ,50.6976),
  789.  
  790. //AJ: Last point must be the same as first point
  791. new Position(1.8567 ,51.3814)
  792. })
  793. });
  794. }
  795.  
  796. private List<TestDocument> GetDocumentsInArea(DocumentClient client, Uri documentCollectionUri, Polygon area)
  797. {
  798. using (new CodeTimer())
  799. {
  800. var query = client
  801. .CreateDocumentQuery<TestDocument>(documentCollectionUri, new FeedOptions()
  802. {
  803. MaxItemCount = 1000
  804. })
  805. .Where(document => document.Location.Intersects(area));
  806.  
  807. var documents = query.ToList();
  808.  
  809. return documents;
  810. }
  811. }
  812. }
  813.  
  814. public class TestDocument : Resource
  815. {
  816. public string Name { get; set; }
  817. public Point Location { get; set; } //AJ: Longitude,Latitude i.e. X/Y
  818.  
  819. public TestDocument()
  820. {
  821. this.Id = Guid.NewGuid().ToString("N");
  822. }
  823. }
  824.  
  825. //AJ: This should be "good enough". The times being recorded are seconds or minutes.
  826. public class CodeTimer : IDisposable
  827. {
  828. private Action<TimeSpan> reportFunction;
  829. private Stopwatch stopwatch = new Stopwatch();
  830.  
  831. public CodeTimer([CallerMemberName]string name = "")
  832. : this((ellapsed) =>
  833. {
  834. Trace.TraceInformation($"{name} took {ellapsed}, or {ellapsed.TotalMilliseconds} ms.");
  835. })
  836. { }
  837.  
  838. public CodeTimer(Action<TimeSpan> report)
  839. {
  840. this.reportFunction = report;
  841. this.stopwatch.Start();
  842. }
  843.  
  844. public void Dispose()
  845. {
  846. this.stopwatch.Stop();
  847. this.reportFunction(this.stopwatch.Elapsed);
  848. }
  849. }
  850.  
  851. public class PropertyNameMapContractResolver : DefaultContractResolver
  852. {
  853. private Dictionary<string, string> propertyNameMap;
  854.  
  855. public PropertyNameMapContractResolver(Dictionary<string, string> propertyNameMap)
  856. {
  857. this.propertyNameMap = propertyNameMap;
  858. }
  859.  
  860. protected override string ResolvePropertyName(string propertyName)
  861. {
  862. if (this.propertyNameMap.TryGetValue(propertyName, out string resolvedName))
  863. return resolvedName;
  864.  
  865. return base.ResolvePropertyName(propertyName);
  866. }
  867. }
  868. }
  869.  
  870. [1]: https://docs.microsoft.com/en-us/azure/cosmos-db/geospatial#indexing
  871. [2]: https://docs.microsoft.com/en-us/azure/cosmos-db/request-units
  872. [3]: https://docs.microsoft.com/en-us/azure/cosmos-db/request-units-per-minute
  873. [4]: https://docs.microsoft.com/en-us/azure/cosmos-db/geospatial
  874. [5]: https://i.stack.imgur.com/fUYXf.png
Advertisement
Add Comment
Please, Sign In to add comment