One of the fun things about working with big data is that you can often hit weird limits with a system.
I was personally trying to load every 'common' single nucleotide polymorphism for the human genome into memory (dbSNP), of which there are over 37 million entries (there are many more uncommon ones) for the purposes of making a custom search index for them [1].
Turns out, you may run into some hard limits. Note that these are all V8-isms and may not apply to all browsers or engines (I was using node.js for this)
This will crash after adding approx 16.7M elements and say
0
100000
200000
...
16400000
16500000
16600000
16700000
Uncaught RangeError: Value undefined out of range for undefined options
property undefined
That is a very weird error message. It says "undefined" three times! Much better
than your usual TypeError: Can’t find property ‘lol’ of undefined
. See
https://bugs.chromium.org/p/v8/issues/detail?id=11852 for a bug filed to help
improve the error message perhaps.
Now, also interestingly enough, if you use an Object instead of a Map
Then it will print...
0
100000
200000
...
8000000
8100000
8200000
8300000
And it will actually just hang there...frozen...no error message though! And it is failing at ~8.3M elements. Weird right? This is roughly half the amount of elements as the 16.7M case
Turns out there is a precise hard limit for the Map case
For the Map: 2^24=16,777,216
For the Object it is around 2^23=8,388,608 HOWEVER, I can actually add more than this, e.g. I can add 8,388,609 or 8,388,610 or even more, but the operations start taking forever to run, e.g. 8,388,999 was taking many minutes
Very weird stuff! If you expected me to dig into this and explain it in deep technical detail, well, you’d be wrong. However, this helpful post on stackoverflow by a V8 js engine developer clarifies the Map case!! https://stackoverflow.com/questions/54452896/maximum-number-of-entries-in-node-js-map
V8 developer here. I can confirm that 2^24 is the maximum number of entries in
a Map. That’s not a bug, it’s just the implementation-defined limit.
The limit is determined by:
The FixedArray backing store of the Map has a maximum size of 1GB (independent
of the overall heap size limit) On a 64-bit system that means 1GB / 8B = 2^30 /
2^3 = 2^27 ~= 134M maximum elements per FixedArray A Map needs 3 elements per
entry (key, value, next bucket link), and has a maximum load factor of 50% (to
avoid the slowdown caused by many bucket collisions), and its capacity must be
a power of 2. 2^27 / (3 * 2) rounded down to the next power of 2 is 2^24, which
is the limit you observe. FWIW, there are limits to everything: besides the
maximum heap size, there’s a maximum String length, a maximum Array length, a
maximum ArrayBuffer length, a maximum BigInt size, a maximum stack size, etc.
Any one of those limits is potentially debatable, and sometimes it makes sense
to raise them, but the limits as such will remain. Off the top of my head I
don’t know what it would take to bump this particular limit by, say, a factor
of two – and I also don’t know whether a factor of two would be enough to
satisfy your expectations.
Great details there. It would also be good to know what the behavior is for the Object, which has those 100% CPU stalls after ~8.3M, but not the same error message...
Another fun note: if I modify the Object code to use only “integer IDs” the code actually works fine, does not hit any errors, and is “blazingly fast” as the kids call it
I presume that this code works because it detects that I’m using it like an array and it decides to transform how it is working internally and not use a hash-map-style data structure, so does not hit a limit. There is a slightly higher limit though, e.g. 1 billion elements gives “Uncaught RangeError: Invalid array length”
This has been another episode of ....the twilight zone (other episodes catalogued here) https://github.com/cmdcolin/technical_oddities/
[1] The final product of this adventure was this, to create a search index for a large number of elements https://github.com/GMOD/ixixx-js