Advertisement
Guest User

Untitled

a guest
Oct 16th, 2015
25
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 10.33 KB | None | 0 0
  1. Easy profiling for Node.js Applications
  2.  
  3. There are many third party tools available for profiling Node.js applications but, in many cases, the easiest option is to use the Node.js built in profiler. The built in profiler uses the (profiler inside V8)[https://developers.google.com/v8/profiler_example] which samples the heap at regular intervals during program execution. It records the results of these samples, along with important optimization events such as jit compiles, as a series of ticks:
  4.  
  5. code-creation,LazyCompile,0,0x2d5000a337a0,396,"bp native array.js:1153:16",0x289f644df68,~
  6. code-creation,LazyCompile,0,0x2d5000a33940,716,"hasOwnProperty native v8natives.js:198:30",0x289f64438d0,~
  7. code-creation,LazyCompile,0,0x2d5000a33c20,284,"ToName native runtime.js:549:16",0x289f643bb28,~
  8. code-creation,Stub,2,0x2d5000a33d40,182,"DoubleToIStub"
  9. code-creation,Stub,2,0x2d5000a33e00,507,"NumberToStringStub"
  10.  
  11. These ticks can be difficult to manually interpret. Luckily, tools have recently been introduced into Node.js 4.1.1 that facilitate the consumption of this information without separately building V8 from source. Let's see how the built-in profiler can help provide insight into application performance.
  12.  
  13. To illustrate the use of the tick profiler, we will work with a simple Express application. Our application will have two handlers, one for adding new users to our system:
  14.  
  15. app.get('/newUser', function (req, res) {
  16. var username = req.query.username || '';
  17. var password = req.query.password || '';
  18.  
  19. username = username.replace(/[!@#$%^&*]/g, '');
  20.  
  21. if (!username || !password || users.username) {
  22. return res.sendStatus(400);
  23. }
  24.  
  25. var salt = crypto.randomBytes(128).toString('base64');
  26. var hash = crypto.pbkdf2Sync(password, salt, 10000, 512);
  27.  
  28. users[username] = {
  29. salt: salt,
  30. hash: hash
  31. };
  32.  
  33. res.sendStatus(200);
  34. });
  35.  
  36. and another for validating user authentication attempts:
  37.  
  38. app.get('/auth', function (req, res) {
  39. var username = req.query.username || '';
  40. var password = req.query.password || '';
  41.  
  42. username = username.replace(/[!@#$%^&*]/g, '');
  43.  
  44. if (!username || !password || !users[username]) {
  45. return res.sendStatus(400);
  46. }
  47.  
  48. var hash = crypto.pbkdf2Sync(password, users[username].salt, 10000, 512);
  49.  
  50. if (users[username].hash.toString() === hash.toString()) {
  51. res.sendStatus(200);
  52. } else {
  53. res.sendStatus(401);
  54. }
  55. });
  56.  
  57. Please note that these are NOT recommended handlers for authenticating users in your Node.js applications and are used purely for illustration purposes. You should not be trying to design your own cryptographic authentication mechanisms in general. It is much better to use existing, proven authentication solutions.
  58.  
  59. Now assume that we've deployed our application and users are complaining about high latency on requests. We can easily run the app with the built in profiler:
  60.  
  61. NODE_ENV=production node --prof app.js
  62.  
  63. and put some load on the server using ab:
  64.  
  65. curl -X GET "http://localhost:8080/newUser?username=matt&password=password"
  66. ab -k -c 20 -n 250 "http://localhost:8080/auth?username=matt&password=password"
  67.  
  68. and get an ab output of:
  69.  
  70. ...
  71.  
  72. Concurrency Level: 20
  73. Time taken for tests: 46.932 seconds
  74. Complete requests: 250
  75. Failed requests: 0
  76. Keep-Alive requests: 250
  77. Total transferred: 50250 bytes
  78. HTML transferred: 500 bytes
  79. Requests per second: 5.33 [#/sec] (mean)
  80. Time per request: 3754.556 [ms] (mean)
  81. Time per request: 187.728 [ms] (mean, across all concurrent requests)
  82. Transfer rate: 1.05 [Kbytes/sec] received
  83.  
  84. ...
  85.  
  86. Percentage of the requests served within a certain time (ms)
  87. 50% 3755
  88. 66% 3804
  89. 75% 3818
  90. 80% 3825
  91. 90% 3845
  92. 95% 3858
  93. 98% 3874
  94. 99% 3875
  95. 100% 4225 (longest request)
  96.  
  97. From this output, we see that we're only managing to serve about 5 requests per second and that the average request takes just under 4 seconds round trip. In a real world example, we could be doing lots of work in many functions on behalf of a user request but even in our simple example, time could be lost compiling regular expressions, generating random salts, generating unique hashes from user passwords, or inside the Express framework itself.
  98.  
  99. Since we ran our application using the --prof option, a tick file was generated in the same directory as your local run of the application. It should have the form isolate-0x124353456789-v8.log. In order to make sense of this file, we need to use the tick processor included in the Node.js source at <nodejs_dir>/tools/v8-prof/tick-processor.js. It is important that the version of the tick-processor that you run comes from the same version of node source as version of node used to generate the isolate file. The raw tick output can be processed using this tool by running:
  100.  
  101. node <path_to_nodejs_src>/tools/v8-prof/tick-processor.js isolate-0x101804c00-v8.log >processed.txt
  102.  
  103. Opening processed.txt in your favorite text editor will give you a few different types of information. The file is broken up into sections which are again broken up by language. First, we look at the summary section and see:
  104.  
  105. [Summary]:
  106. ticks total nonlib name
  107. 79 0.2% 0.2% JavaScript
  108. 36703 97.2% 99.2% C++
  109. 7 0.0% 0.0% GC
  110. 767 2.0% Shared libraries
  111. 215 0.6% Unaccounted
  112.  
  113. This tells us that 97% of all samples gathered occurred in C++ code and that when viewing other sections of the processed output we should pay most attention to work being done in C++ (as opposed to Javascript). With this in mind, we next find the [C++] section which contains information about which C++ functions are taking the most CPU time and see:
  114.  
  115. [C++]:
  116. ticks total nonlib name
  117. 19557 51.8% 52.9% node::crypto::PBKDF2(v8::FunctionCallbackInfo<v8::Value> const&)
  118. 4510 11.9% 12.2% _sha1_block_data_order
  119. 3165 8.4% 8.6% _malloc_zone_malloc
  120.  
  121. We see that the top 3 entries account for 72.1% of CPU time taken by the program. From this output, we immediately see that at least 51.8% of CPU time is taken up by a function called PBKDF2 which corresponds to our hash generation from a user's password. However, it may not be immediately obvious how the lower two entries factor into our application (or if it is we will pretend otherwise for the sake of example). To better understand the relationship between these functions, we will next look at the [Bottom up (heavy) profile] section which provides information about the primary callers of each function. Examining this section, we find:
  122.  
  123. ticks parent name
  124. 19557 51.8% node::crypto::PBKDF2(v8::FunctionCallbackInfo<v8::Value> const&)
  125. 19557 100.0% v8::internal::Builtins::~Builtins()
  126. 19557 100.0% LazyCompile: ~pbkdf2 crypto.js:557:16
  127.  
  128. 4510 11.9% _sha1_block_data_order
  129. 4510 100.0% LazyCompile: *pbkdf2 crypto.js:557:16
  130. 4510 100.0% LazyCompile: *exports.pbkdf2Sync crypto.js:552:30
  131.  
  132. 3165 8.4% _malloc_zone_malloc
  133. 3161 99.9% LazyCompile: *pbkdf2 crypto.js:557:16
  134. 3161 100.0% LazyCompile: *exports.pbkdf2Sync crypto.js:552:30
  135.  
  136. Parsing this section takes a little more work than the raw tick counts above. Within each of the "call stacks" above, the percentage in the parent column tells you the percentage of samples for which the function in the row above was called by the function in the current row. For example, in the middle "call stack" above for _sha1_block_data_order, we see that _sha1_block_data_order occurred in 11.9% of samples, which we knew from the raw counts above. However, here, we can also tell that it was always called by the pbkdf2 function inside the Node.js crypto module. We see that similarly, _malloc_zone_malloc was called almost exclusively by the same pbkdf2 function. Thus, using the information in this view, we can tell that our hash computation from the user's password accounts not only for the 51.8% from above but also for all CPU time in the top 3 most sampled functions since the calls to _sha1_block_data_order and _malloc_zone_malloc were made on behalf of the pbkdf2 function.
  137.  
  138. At this point, it is very clear that the password based hash generation should be the target of our optimization. Thankfully, you've fully internalized the benefits of asynchronous programming (https://nodesource.com/blog/why-asynchronous) and you realize that the work to generate a hash from the user's password is being done in a synchronous way and thus tying down the event loop. This prevents us from working on other incoming requests while computing a hash.
  139.  
  140. To remedy this issue, you make a small modification to the above handlers to use the asynchronous version of the pbkdf2 function:
  141.  
  142. app.get('/auth', function (req, res) {
  143. var username = req.query.username || '';
  144. var password = req.query.password || '';
  145.  
  146. username = username.replace(/[!@#$%^&*]/g, '');
  147.  
  148. if (!username || !password || !users[username]) {
  149. return res.sendStatus(400);
  150. }
  151.  
  152. crypto.pbkdf2(password, users[username].salt, 10000, 512, function(err, hash) {
  153. if (users[username].hash.toString() === hash.toString()) {
  154. res.sendStatus(200);
  155. } else {
  156. res.sendStatus(401);
  157. }
  158. });
  159. });
  160.  
  161. A new run of the ab benchmark above with the asynchronous version of your app yields:
  162.  
  163. ...
  164.  
  165. Concurrency Level: 20
  166. Time taken for tests: 12.846 seconds
  167. Complete requests: 250
  168. Failed requests: 0
  169. Keep-Alive requests: 250
  170. Total transferred: 50250 bytes
  171. HTML transferred: 500 bytes
  172. Requests per second: 19.46 [#/sec] (mean)
  173. Time per request: 1027.689 [ms] (mean)
  174. Time per request: 51.384 [ms] (mean, across all concurrent requests)
  175. Transfer rate: 3.82 [Kbytes/sec] received
  176.  
  177. ...
  178.  
  179. Percentage of the requests served within a certain time (ms)
  180. 50% 1018
  181. 66% 1035
  182. 75% 1041
  183. 80% 1043
  184. 90% 1049
  185. 95% 1063
  186. 98% 1070
  187. 99% 1071
  188. 100% 1079 (longest request)
  189.  
  190. Yay! Your app is now serving about 20 requests per second, roughly 4 times more than it was with the synchronous hash generation. Additionally, the average latency is down from the 4 seconds before to just over 1 second.
  191.  
  192. Hopefully, through the performance investigation of this (admittedly contrived) example, you've seen how the V8 tick processor can help you gain a better understanding of the performance of your Node.js applications.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement