Skip to content

url: add urlSearchParams.sort()#11098

Closed
TimothyGu wants to merge 2 commits into
nodejs:masterfrom
TimothyGu:urlsearchparams-sort
Closed

url: add urlSearchParams.sort()#11098
TimothyGu wants to merge 2 commits into
nodejs:masterfrom
TimothyGu:urlsearchparams-sort

Conversation

@TimothyGu

@TimothyGu TimothyGu commented Feb 1, 2017

Copy link
Copy Markdown
Member

The URL Standard requires sort() to be stable, which precludes us from using the V8-native sort() function.

I originally wanted to make the function a simple insertion sort only, but I realized it had the potential for a DoS attack because of its nature of being O(n2).

The algorithm for the function follows the norm: insertion sort for small arrays, divide-and-conquer (in this case, a stable merge sort) for larger arrays. In this specific case, the cutoff is set at 59 items, though if performance improvements are made to either sort the cutoff may be adjusted.

Fixes: #10760
Ref: whatwg/url#26
Ref: whatwg/url#199
Ref: web-platform-tests/wpt#4531

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

url

@nodejs-github-bot nodejs-github-bot added dont-land-on-v4.x whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Feb 1, 2017
@jasnell jasnell requested a review from mscdex February 1, 2017 16:48
@jasnell

jasnell commented Feb 1, 2017

Copy link
Copy Markdown
Member

@mscdex ... I would appreciate you taking a look at this from a perf point of view

@jasnell jasnell left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there

Comment thread doc/api/url.md Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example may be helpful here.

Comment thread lib/internal/url.js Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look to benchmark this on a number of different systems to see if this number holds up.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also include a test with an empty key? e.g. z=a&=b&c=d

@joyeecheung joyeecheung left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sorting bit looks like C++ code itself really, and it creates two holey arrays, so probably worth just implement those in C++?

Comment thread lib/internal/url.js Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be moved outside of sort

@TimothyGu

TimothyGu commented Feb 1, 2017

Copy link
Copy Markdown
Member Author

@joyeecheung said:

so probably worth just implement those in C++?

Just tried doing something like that:

C++ diff
diff --git a/src/node_url.cc b/src/node_url.cc
index 0d5e695..3ab7108 100644
--- a/src/node_url.cc
+++ b/src/node_url.cc
@@ -1389,6 +1389,49 @@ namespace url {
                             v8::NewStringType::kNormal).ToLocalChecked());
   }

+  static void SortParams(const FunctionCallbackInfo<Value>& args) {
+    Environment* env = Environment::GetCurrent(args);
+    CHECK_GE(args.Length(), 1);
+    CHECK(args[0]->IsArray());
+
+    Local<Array> src = args[0].As<Array>();
+    uint32_t len = src->Length();
+
+    std::vector<std::string> a;
+    Copy(env, src, &a);
+    std::vector<std::string> tmp(len);
+
+    for (uint32_t step = 2; step < len; step *= 2) {
+      for (uint32_t start = 0; start < len - 2; start += 2 * step) {
+        uint32_t mid = start + step;
+        uint32_t end = mid + step;
+        end = end < len ? end : len;
+        if (mid > end)
+          continue;
+
+        uint32_t l = start;
+        uint32_t r = mid;
+        uint32_t t = start;
+        while (l < mid && r < end) {
+          if (a[l] <= a[r]) {
+            tmp[t++] = a[l++];
+            tmp[t++] = a[l++];
+          } else {
+            tmp[t++] = a[r++];
+            tmp[t++] = a[r++];
+          }
+        }
+        while (l < mid)
+          tmp[t++] = a[l++];
+        while (r < end)
+          tmp[t++] = a[r++];
+      }
+      tmp.swap(a);
+    }
+
+    args.GetReturnValue().Set(Copy(env, a));
+  }
+
   static void Init(Local<Object> target,
                    Local<Value> unused,
                    Local<Context> context,
@@ -1398,6 +1441,7 @@ namespace url {
     env->SetMethod(target, "encodeAuth", EncodeAuthSet);
     env->SetMethod(target, "domainToASCII", DomainToASCII);
     env->SetMethod(target, "domainToUnicode", DomainToUnicode);
+    env->SetMethod(target, "sortParams", SortParams);

 #define XX(name, _) NODE_DEFINE_CONSTANT(target, name);
     FLAGS(XX)

It's much slower than the JS version: in fact, the two Copys alone are several times slower than the JS sort(). I suspect I might be doing something gravely wrong, but I'm not very interested in pushing this further. Feel free to try to improve on this though.

@TimothyGu

Copy link
Copy Markdown
Member Author

@jasnell said:

+    // arbitrary number found through testing
+    if (len < 118) {

We should look to benchmark this on a number of different systems to see if this number holds up.

Agreed. I've made a simple benchmark specifically for this purpose.

I'm unfortunately not too well-versed in data analysis languages, so my method of measuring is rather crude. A regression model is probably better-fit for this purpose.

How to run
# create `merge` and `insertion` binaries, with insertion and merge sort blocks resp.
# commented out

# wait. it's gonna run for a while (~16 minutes)
$ node benchmark/compare.js --old ./insertion --new ./merge --filter rrrr --runs 10 url > sort.csv
$ Rscript benchmark/compare.R < sort
My results
                                                      improvement confidence      p.value
 url/url-searchparams-sort-rrrr.js n=100000 type="02"    -27.63 %        *** 2.033434e-04
 url/url-searchparams-sort-rrrr.js n=100000 type="03"    -36.96 %        *** 5.757754e-07
 url/url-searchparams-sort-rrrr.js n=100000 type="04"    -43.56 %        *** 1.783939e-07
 url/url-searchparams-sort-rrrr.js n=100000 type="05"    -45.19 %        *** 7.799419e-06
 url/url-searchparams-sort-rrrr.js n=100000 type="06"    -50.19 %        *** 1.354454e-04
 url/url-searchparams-sort-rrrr.js n=100000 type="07"    -42.88 %        *** 2.282991e-05
 url/url-searchparams-sort-rrrr.js n=100000 type="08"    -44.04 %        *** 1.352008e-06
 url/url-searchparams-sort-rrrr.js n=100000 type="09"    -54.74 %        *** 4.523282e-05
 url/url-searchparams-sort-rrrr.js n=100000 type="10"    -49.68 %        *** 8.312344e-09
 url/url-searchparams-sort-rrrr.js n=100000 type="12"    -54.26 %        *** 3.827682e-15
 url/url-searchparams-sort-rrrr.js n=100000 type="14"    -46.84 %        *** 7.272478e-06
 url/url-searchparams-sort-rrrr.js n=100000 type="16"    -31.54 %        *** 1.889203e-05
 url/url-searchparams-sort-rrrr.js n=100000 type="18"    -27.55 %        *** 2.764037e-06
 url/url-searchparams-sort-rrrr.js n=100000 type="20"    -26.76 %        *** 4.446811e-05
 url/url-searchparams-sort-rrrr.js n=100000 type="22"    -25.42 %        *** 9.297027e-04
 url/url-searchparams-sort-rrrr.js n=100000 type="24"    -24.46 %         ** 1.730617e-03
 url/url-searchparams-sort-rrrr.js n=100000 type="26"    -16.97 %         ** 2.375029e-03
 url/url-searchparams-sort-rrrr.js n=100000 type="28"    -19.62 %        *** 4.361287e-04
 url/url-searchparams-sort-rrrr.js n=100000 type="30"    -18.92 %          * 2.231800e-02
 url/url-searchparams-sort-rrrr.js n=100000 type="32"    -10.70 %        *** 3.871433e-04
 url/url-searchparams-sort-rrrr.js n=100000 type="34"    -23.76 %        *** 2.304507e-04
 url/url-searchparams-sort-rrrr.js n=100000 type="36"     -5.79 %            2.411327e-01
 url/url-searchparams-sort-rrrr.js n=100000 type="38"    -20.30 %         ** 7.036384e-03
 url/url-searchparams-sort-rrrr.js n=100000 type="40"    -10.63 %            7.832040e-02
 url/url-searchparams-sort-rrrr.js n=100000 type="42"      6.09 %            1.150079e-01
 url/url-searchparams-sort-rrrr.js n=100000 type="44"     -1.04 %            7.316006e-01
 url/url-searchparams-sort-rrrr.js n=100000 type="46"    -14.07 %          * 1.335973e-02
 url/url-searchparams-sort-rrrr.js n=100000 type="48"     -6.45 %          * 4.874718e-02
 url/url-searchparams-sort-rrrr.js n=100000 type="50"     11.42 %        *** 7.072396e-04
 url/url-searchparams-sort-rrrr.js n=100000 type="52"      7.85 %          * 1.622392e-02
 url/url-searchparams-sort-rrrr.js n=100000 type="54"     21.93 %         ** 1.438206e-03
 url/url-searchparams-sort-rrrr.js n=100000 type="56"     16.34 %         ** 9.748436e-03
 url/url-searchparams-sort-rrrr.js n=100000 type="58"      6.05 %          * 3.993576e-02
 url/url-searchparams-sort-rrrr.js n=100000 type="60"     28.51 %        *** 1.892919e-04
 url/url-searchparams-sort-rrrr.js n=100000 type="62"     30.69 %         ** 8.003372e-03
 url/url-searchparams-sort-rrrr.js n=100000 type="64"     56.55 %        *** 1.434437e-06
 url/url-searchparams-sort-rrrr.js n=100000 type="66"     17.48 %        *** 1.588903e-04
 url/url-searchparams-sort-rrrr.js n=100000 type="68"     65.50 %        *** 2.107887e-07
 url/url-searchparams-sort-rrrr.js n=100000 type="70"     38.83 %        *** 3.505777e-07
 url/url-searchparams-sort-rrrr.js n=100000 type="72"      5.23 %            7.135306e-02
 url/url-searchparams-sort-rrrr.js n=100000 type="74"     56.92 %        *** 5.389008e-08
 url/url-searchparams-sort-rrrr.js n=100000 type="76"     74.65 %        *** 5.051242e-10
 url/url-searchparams-sort-rrrr.js n=100000 type="78"     24.76 %        *** 3.956218e-06
 url/url-searchparams-sort-rrrr.js n=100000 type="80"     30.39 %        *** 9.959930e-05

@TimothyGu

Copy link
Copy Markdown
Member Author

@TimothyGu TimothyGu added the semver-minor PRs that contain new features and should be released in the next minor version. label Feb 3, 2017
@TimothyGu TimothyGu force-pushed the urlsearchparams-sort branch from ba709b9 to 7cbbea6 Compare February 3, 2017 05:28
@TimothyGu

Copy link
Copy Markdown
Member Author

Added Web Platform Tests and rebased.

New CI: https://ci.nodejs.org/job/node-test-pull-request/6186/

@jasnell jasnell left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there.

Comment thread doc/api/url.md Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to swap the query[]=123 and query[]=abc positions to show that the sort will not affect the ordering of the abc and 123 positions given that 123 would typically appear before abc

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

@TimothyGu

Copy link
Copy Markdown
Member Author

@mscdex, have you had a chance yet to look at this? Frankly, I'm more than happy with its performance, as my makeshift insertion/merge sort hybrid is able to outperform Array.prototype.sort() (which isn't even stable).

Comment thread lib/internal/url.js Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new Array

@mscdex mscdex Feb 5, 2017

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @targos.

@joyeecheung

Copy link
Copy Markdown
Member

@TimothyGu There is a cost of memory though, Array.prototype.sort is in-place.

Comment thread benchmark/url/url-searchparams-sort.js Outdated

@mscdex mscdex Feb 5, 2017

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be 88 parameters because only unique keys are kept by querystring.parse(). One suggestion is to use unique keys first:

long: 'g&r&t&h&s&rr&d&w&b&n&hh&k&x&m&kk&hhh&o&e&xx&c&cc&gg&ee&bb&' +
      'p&pp&ss&nn&j&bbb&y&z&u&l&oo&rrr&ww&a&uu&ll&mm&f&jj&q&ppp&ff&' +
      'eee&yy&eeee&nnn&eeeee&lll&mmm&www&uuu&wwww&tt&nnnn&ttt&qq&v&' +
      'yyy&ccc&ooo&kkk&fff&jjj&i&llll&mmmm&ggg&jjjj&dd&ii&zz&qqq&pppp&' +
      'xxx&qqqq&qqqqq&ddd&nnnnn&yyyy&wwwww&gggg&iii&vv&rrrr'

and then add this before the timed loop to create duplicates:

if (conf.type === 'long') {
  // Make `array` contain duplicate keys, useful for benchmarking stable-ness
  // of sorting
  for (i = 0; i < array.length; i += 2) {
    if (array[i].length > 1)
      array[i] = array[i][0];
  }
}

This should trigger the current alternate code path that uses merge sort.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but querystring.parse returns something like { g: [ '', '' ] } in this case, and this object is converted to [ 'g', '', 'g', '' ]. Either way, I added a small ad-hoc parser to skip querystring.parse, and to also maintain the order of the input.

@mscdex mscdex Feb 5, 2017

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's just a bug in the current querystring.parse() in master. For example, querystring.parse('a&a&a&a') currently returns { a: [ '', '' ] } instead of { a: [ '', '', '', '' ] }. This bug is probably due to 4e259b2 which may/may not be fixed by #11171. At any rate, using unique keys in the beginning will avoid such possible bugs.

Comment thread benchmark/url/url-searchparams-sort.js Outdated

@mscdex mscdex Feb 5, 2017

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better name here would be 'search parameters' or similar. The term 'elements' can be confusing if you're familiar with the underlying URLSearchParams implementation (e.g. does 'elements' mean number of pairs or params[searchParams].length?).

Comment thread benchmark/url/url-searchparams-sort.js Outdated

@mscdex mscdex Feb 5, 2017

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a little low IMHO. The results came back awfully fast on my machine, which could mean V8 didn't have enough time to properly optimize functions. For example, I used ~5e6 when testing type=long.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 1e6.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the results come back too fast there is a possibility that this got loop-invariant code-motioned. I usually get alarmed when I see op/s come close to the op/s of an empty loop on my machine.

Comment thread lib/internal/url.js Outdated

@mscdex mscdex Feb 5, 2017

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove these two temporary variables (use the values as-is where needed), you will get a small perf boost.

@mscdex

mscdex commented Feb 5, 2017

Copy link
Copy Markdown
Contributor

Without diving into the multitude of sort algorithms out there to refresh my memory, these changes are probably fine, except for a few minor nits.

One unrelated nit I noticed is that because the whatwg URL module uses querystring, the order of parameters is not preserved. From what I read the spec does not explicitly state the order in which parameters are returned (for example in params.toString()). At least Chrome preserves the original order (as long as .sort() is not used of course), but node's implementation groups duplicate keys together. I just thought I'd throw that out there.

@mscdex

mscdex commented Feb 5, 2017

Copy link
Copy Markdown
Contributor

I should also say I'm a bit confused about the "cutoff" being 59. In the code the conditional is len < 90 which would mean 45 pairs. Either way, 90 or 45 are not 59. It's not obvious which count the "cutoff" is taking into account. @TimothyGu Did you perhaps change the "cutoff" value after your original post?

@TimothyGu

Copy link
Copy Markdown
Member Author

@mscdex, yes I changed the cutoff after moving merge() out of sort(), which helped the performance a bit. Sorry for not making that clear. I'll look at your comments and see if there's anything more I can do.

@TimothyGu

Copy link
Copy Markdown
Member Author

@joyeecheung

There is a cost of memory though, Array.prototype.sort is in-place.

For most of the cases (when there are fewer than 50 queries), the in-place selection sort is used. For >= 50, merge sort at most makes two more copies of the params array, so O(n) space complexity. (It is also the algorithm used by SpiderMonkey's Array.prototype.sort.) At the end of the day, chances are, sort() probably isn't the function using up the most memory in an app either way.

@TimothyGu

TimothyGu commented Feb 5, 2017

Copy link
Copy Markdown
Member Author

@TimothyGu TimothyGu force-pushed the urlsearchparams-sort branch from 9945032 to c0ea7d0 Compare February 5, 2017 18:27
@mscdex

mscdex commented Feb 5, 2017

Copy link
Copy Markdown
Contributor

For >= 50, merge sort at most makes two more copies of the params array, so O(n) space complexity.

You mean O(2n)?

@TimothyGu

Copy link
Copy Markdown
Member Author

You mean O(2n)?

Coefficients are irrelevant in Big O notation 😺

Comment thread benchmark/url/url-searchparams-sort.js Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO () should be added after the constructor name.

@TimothyGu

Copy link
Copy Markdown
Member Author

@jasnell, @targos, @mscdex, can I get some LGTMs please :)

@mscdex

mscdex commented Feb 8, 2017

Copy link
Copy Markdown
Contributor

@TimothyGu TimothyGu force-pushed the urlsearchparams-sort branch from 2c925d2 to 54dde7e Compare February 11, 2017 19:18
@TimothyGu

Copy link
Copy Markdown
Member Author

Rebased. Going to apply on Monday if there are no more objections.

CI: https://ci.nodejs.org/job/node-test-pull-request/6356/

@TimothyGu

Copy link
Copy Markdown
Member Author

Landed in 02d1e32 and 781eb90.

@TimothyGu TimothyGu closed this Feb 14, 2017
TimothyGu added a commit that referenced this pull request Feb 14, 2017
PR-URL: #11098
Fixes: #10760
Ref: whatwg/url#26
Ref: whatwg/url#199
Ref: web-platform-tests/wpt#4531
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
TimothyGu added a commit that referenced this pull request Feb 14, 2017
PR-URL: #11098
Fixes: #10760
Ref: whatwg/url#26
Ref: whatwg/url#199
Ref: web-platform-tests/wpt#4531
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
@TimothyGu TimothyGu deleted the urlsearchparams-sort branch February 14, 2017 04:50
TimothyGu added a commit to TimothyGu/node that referenced this pull request Feb 18, 2017
TimothyGu added a commit to TimothyGu/node that referenced this pull request Feb 18, 2017
italoacasas pushed a commit to italoacasas/node that referenced this pull request Feb 20, 2017
italoacasas pushed a commit to italoacasas/node that referenced this pull request Feb 20, 2017
italoacasas pushed a commit that referenced this pull request Feb 22, 2017
italoacasas pushed a commit that referenced this pull request Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

semver-minor PRs that contain new features and should be released in the next minor version. whatwg-url Issues and PRs related to the WHATWG URL implementation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants