feat: switch grisu2 float-to-string algorithm to hybrid of xjb & zmij algorithms#3025
feat: switch grisu2 float-to-string algorithm to hybrid of xjb & zmij algorithms#3025JairusSW wants to merge 29 commits into
Conversation
Signed-off-by: Jairus Tanaka <me@jairus.dev>
Signed-off-by: Jairus Tanaka <me@jairus.dev>
Signed-off-by: Jairus Tanaka <me@jairus.dev>
|
Just curious here, why the exception of a trailing .0? |
Max and Dan made that decision so that floats are easily identifiable when converting them to a string. For example, you know that the string |
|
Can you rename xjb.ts to dtoa.ts and in comments mentioned about implementation based on xjb and zmij? |
Co-authored-by: Max Graey <maxgraey@gmail.com>
Co-authored-by: Max Graey <maxgraey@gmail.com>
| let gBcd: u64 = 0; | ||
| let gBcdLen: i32 = 0; |
| export let gDigHi: u64 = 0; | ||
| export let gDigLo: u64 = 0; | ||
| export let gDigNum: i32 = 0; |
There was a problem hiding this comment.
You have a lot of global vars. Does we need all of them? And can we declare all of necessary vars in one place?
Co-authored-by: Max Graey <maxgraey@gmail.com>
| (data $5 (i32.const 1239) "\80\00\00\00\00\00\00\00\a0\00\00\00\00\00\00\00\c8\00\00\00\00\00\00\00\fa\00\00\00\00\00\00@\9c\00\00\00\00\00\00P\c3\00\00\00\00\00\00$\f4\00\00\00\00\00\80\96\98\00\00\00\00\00 \bc\be\00\00\00\00\00(k\ee\00\00\00\00\00\f9\02\95\00\00\00\00@\b7C\ba\00\00\00\00\10\a5\d4\e8\00\00\00\00*\e7\84\91\00\00\00\80\f4 \e6\b5\00\00\00\a01\a9_\e3\00\00\00\04\bf\c9\1b\8e\00\00\00\c5.\bc\a2\b1\00\00@v:k\0b\de\00\00\e8\89\04#\c7\8a\00\00b\ac\c5\ebx\ad\00\80z\17\b7&\d7\d8\00\90\acn2x\86\87\00\b4W\n?\16h\a9\00\a1\ed\cc\ce\1b\c2\d3\a0\84\14@aQY\84\c8\a5\19\90\b9\a5o\a5:\0f \f4\'\8f\cb\ce") | ||
| (data $6 (i32.const 1456) "o\1b\8e(\10T\8e\af\daM\e4^\ae\f0\ec\07J\fb\9f\f4\98\'D\b1\9dwA\df\cf\11\cd\99\07\ef\99\85\0b?\fe\b2\15\aa\b4\dc\e6\a7\1f\86c\beZ\06\0b\a5\bc\b4\aaSkuz\07\ed\0f\08\bf,)Ud\7f\b6C\d5\b1\17L\c8;\1a\fb;\efi\c2\87F\b8B\a7\ee@OQ]=\eb\dd\e4PF\1a\12\ba\13\e4labM\f3\92\ea\af(\b6\ef&\e2\bb\8c6U\n\f7\89\04\89\0f`\cb\05\e9\b8\b6\bd!\c9\c1\bb\87\e9\00T\96_\9a\84x\db\8f\bf4\d0\bdr\04R\98\de\'\8a\92\95\00\9am\c1\94\82\17\0f<\05\b7u\00\00\00\00\00\00P\c3\00\00\00\00\00\00\00\00\05\e3L6\12\197\c5\00\00\00\00\00\00(l\d6\aa\80\9d\ef\f0\"\c7\f6~\b9\b7\d2:MBL\c8q\d5m\93\13\c9\ea8\1e\cd\19:\bc\03\1cU\ab\01\80\0c\t\cb\c6,\07\d3\bf\f5\ad\\\a1\90\08\137h\03\cd\10\8cz\c3\87\a8\db6.\ef\07\12\c2\b2\02\cf\bc\f4\03^\e4g\f9\94\c7\85\d7in\f8\06\d1R\ba\be\01\d763\e1|\a0\1c4\a8E\10\d3Q\a0\t\12\11H\de\1e1Vx\85\fa\a6\1e\d5f\a5>\7f\"t*U3\f1\ca\ba\0f)2\d7\96@\adGy\17|\a9t\088\c7\b1\d8J\d9\bc\"x\ae\81R7\18") | ||
| (data $7 (i32.const 1824) "?6N\n@\18\00\00\00d\00\00@\00 $\00\00\00\00\00\00\00\0c\80\13\c8\82\1f\e0L^\0f\f60\d7\1b\00\00\00\00\00\00\00\fc\ff\f7\cd\d8\01\82n\d1?\cd@\01%d\db\r\r\00\00\00$\04\14@8qS\b4\1dx\11") | ||
| (data $9 (i32.const 2032) "p\\\ea{\ce2~\8f\1a\c7C\c6\b0\b7\96\e5\ae\05\03\05\'\c6\ab\b7\bf7\cf\d0\b8\d1\ef\92\fe%\e5\1a\8eO\19\eb2\ebP\e2\a4?\14\bc\f5\88\r\b5P\99v\96!\dbH\bb\1a\c2\bd\f0\b4\15\07\c9{\ce\97\c0]\11l:\96\0b\13\9a\c7\1b\e0\c3V\df\84\f6\06\e3L6\12\197\c5\9e\b5p+\a8\ad\c5\9d\97\"\81E@|o\fc\dfNg\04\cd\c9\f2\c9\e6\0b\b96\d7\07\8f\a1\85\t\94\f8x9?\81:\0f \f4\'\8f\cb\ce\c8\a5\19\90\b9\a5o\a5\a0\84\14@aQY\84\00\a1\ed\cc\ce\1b\c2\d3\00\b4W\n?\16h\a9\00\90\acn2x\86\87\00\80z\17\b7&\d7\d8\00\00b\ac\c5\ebx\ad\00\00\e8\89\04#\c7\8a\00\00@v:k\0b\de\00\00\00\c5.\bc\a2\b1\00\00\00\04\bf\c9\1b\8e\00\00\00\a01\a9_\e3\00\00\00\80\f4 \e6\b5\00\00\00\00*\e7\84\91\00\00\00\00\10\a5\d4\e8\00\00\00\00@\b7C\ba\00\00\00\00\00\f9\02\95\00\00\00\00\00(k\ee\00\00\00\00\00 \bc\be\00\00\00\00\00\80\96\98\00\00\00\00\00\00$\f4\00\00\00\00\00\00P\c3\00\00\00\00\00\00@\9c\00\00\00\00\00\00\00\fa\00\00\00\00\00\00\00\c8\00\00\00\00\00\00\00\a0\00\00\00\00\00\00\00\80\cd\cc\cc\cc\cc\cc\cc\cc\0b\d7\a3p=\n\d7\a3<\dfO\8d\97n\12\83,e\19\e2X\17\b7\d1$\84G\1bG\ac\c5\a7\b6il\af\05\bd7\86\bdBz\e5\d5\94\bf\d6\fd\cea\84\11w\cc\ab\98\a5\b46A_p\89\bf\d5\ed\bd\ce\fe\e6\db\ff\aa$\cb\0b\ff\eb\af\cc\88Po\t\cc\bc\8c\14\0e\b4KB\13.\e1\10\d8\\\t5\dc$\b4\da\ac\b0:\f7|\1d\90\\\e1M\c4\be\94\95\e6J\b4\a462\aaw\b8\08]\1d\92\8e\ee\92\93\a6a\95\b6}J\1e\ec\eb\1a\11\92d\08\e5\bc\ef{\datP\a0\1d\97\b2,\f7\ba\80\00\c9\f1(\8a\92\95\00\9am\c1S;uD\cd\14\be\9aR\c5\ee\d3\ae\87\96\f7\db\9dXv%\06\12\c6I~\e0\91\b7\d1t\9e\0e\ca\00\83\f2\b5\87\fd?;\9a5\f5\f7\d2\ca2\fc\14^\f7_B\a2\f5\fcCK,\b3\ce\81\bb\949E\ad\1e\b1\cf") | ||
| (data $10 (i32.const 2648) "\"\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$!\"#$\"#$\"#$\"#$!\"#") | ||
| (data $11 (i32.const 2908) ",") | ||
| (data $11.1 (i32.const 2920) "\02\00\00\00\1c\00\00\00I\00n\00v\00a\00l\00i\00d\00 \00l\00e\00n\00g\00t\00h") | ||
| (data $12 (i32.const 2956) "<") | ||
| (data $12.1 (i32.const 2968) "\02\00\00\00&\00\00\00~\00l\00i\00b\00/\00a\00r\00r\00a\00y\00b\00u\00f\00f\00e\00r\00.\00t\00s") | ||
| (data $13 (i32.const 3020) "<") | ||
| (data $13.1 (i32.const 3032) "\02\00\00\00(\00\00\00A\00l\00l\00o\00c\00a\00t\00i\00o\00n\00 \00t\00o\00o\00 \00l\00a\00r\00g\00e") | ||
| (data $14 (i32.const 3084) "<") | ||
| (data $14.1 (i32.const 3096) "\02\00\00\00 \00\00\00~\00l\00i\00b\00/\00r\00t\00/\00i\00t\00c\00m\00s\00.\00t\00s") | ||
| (data $17 (i32.const 3212) "<") | ||
| (data $17.1 (i32.const 3224) "\02\00\00\00$\00\00\00I\00n\00d\00e\00x\00 \00o\00u\00t\00 \00o\00f\00 \00r\00a\00n\00g\00e") | ||
| (data $18 (i32.const 3276) ",") | ||
| (data $18.1 (i32.const 3288) "\02\00\00\00\14\00\00\00~\00l\00i\00b\00/\00r\00t\00.\00t\00s") | ||
| (data $20 (i32.const 3356) "<") | ||
| (data $20.1 (i32.const 3368) "\02\00\00\00\1e\00\00\00~\00l\00i\00b\00/\00r\00t\00/\00t\00l\00s\00f\00.\00t\00s") | ||
| (data $21 (i32.const 3420) "\1c") | ||
| (data $21.1 (i32.const 3432) "\02") | ||
| (data $22 (i32.const 3452) "<") | ||
| (data $22.1 (i32.const 3464) "\02\00\00\00$\00\00\00~\00l\00i\00b\00/\00t\00y\00p\00e\00d\00a\00r\00r\00a\00y\00.\00t\00s") | ||
| (data $23 (i32.const 3516) "<") | ||
| (data $23.1 (i32.const 3528) "\02\00\00\00&\00\00\00~\00l\00i\00b\00/\00s\00t\00a\00t\00i\00c\00a\00r\00r\00a\00y\00.\00t\00s") | ||
| (data $24 (i32.const 3580) ",") | ||
| (data $24.1 (i32.const 3592) "\02\00\00\00\1a\00\00\00~\00l\00i\00b\00/\00a\00r\00r\00a\00y\00.\00t\00s") | ||
| (data $25 (i32.const 3628) "|") | ||
| (data $25.1 (i32.const 3640) "\02\00\00\00^\00\00\00E\00l\00e\00m\00e\00n\00t\00 \00t\00y\00p\00e\00 \00m\00u\00s\00t\00 \00b\00e\00 \00n\00u\00l\00l\00a\00b\00l\00e\00 \00i\00f\00 \00a\00r\00r\00a\00y\00 \00i\00s\00 \00h\00o\00l\00e\00y") | ||
| (data $26 (i32.const 3756) "<") | ||
| (data $26.1 (i32.const 3768) "\02\00\00\00*\00\00\00O\00b\00j\00e\00c\00t\00 \00a\00l\00r\00e\00a\00d\00y\00 \00p\00i\00n\00n\00e\00d") | ||
| (data $27 (i32.const 3820) "<") | ||
| (data $27.1 (i32.const 3832) "\02\00\00\00(\00\00\00O\00b\00j\00e\00c\00t\00 \00i\00s\00 \00n\00o\00t\00 \00p\00i\00n\00n\00e\00d") | ||
| (data $28 (i32.const 3888) "\10\00\00\00 \00\00\00 \00\00\00 ") | ||
| (data $28.1 (i32.const 3912) "\81\08\00\00\01\19\00\00\01\02\00\00$\t\00\00\a4\00\00\00$\n\00\00\02\t\00\00\02A\00\00\00\00\00\00A\00\00\00 ") |
There was a problem hiding this comment.
Wondering if the baseline amount of static memory can be reduced here by computing the parts of it that aren't strictly necessary to have as LUTs?
Co-authored-by: Max Graey <maxgraey@gmail.com>
Co-authored-by: Max Graey <maxgraey@gmail.com>
Co-authored-by: Max Graey <maxgraey@gmail.com>
Co-authored-by: Max Graey <maxgraey@gmail.com>
| const shift = h + 1 + EXTRA_SHIFT; | ||
|
|
||
| loadPow10Xjb64(powExp); | ||
| const pHi = gPow10Hi, pLo = gPow10Lo; |
There was a problem hiding this comment.
| const pHi = gPow10Hi, pLo = gPow10Lo; | |
| let pHi = gPow10Hi; | |
| let pLo = gPow10Lo; |
| const plo64 = pHi * y; | ||
| const lo = plo64 + mulhi64(pLo, y); | ||
| const p_hi = a + u64(lo < plo64); |
There was a problem hiding this comment.
Very inconsistent names plo64 instead pLo64, p_hi instead pHi64
| while (v >= 100) { | ||
| const q = v / 100; | ||
| p -= 4; | ||
| store<u32>(p, load<u32>(DIGITS + (<usize>(v - q * 100) << alignof<u32>()))); | ||
| v = q; | ||
| } |
| @inline function decimalLen15(v: u64): i32 { | ||
| if (v < 100000000) { | ||
| if (v < 10000) { | ||
| if (v < 100) return v < 10 ? 1 : 2; | ||
| return v < 1000 ? 3 : 4; | ||
| } | ||
| if (v < 1000000) return v < 100000 ? 5 : 6; | ||
| return v < 10000000 ? 7 : 8; | ||
| } | ||
| if (v < 1000000000000) { | ||
| if (v < 10000000000) return v < 1000000000 ? 9 : 10; | ||
| return v < 100000000000 ? 11 : 12; | ||
| } | ||
| if (v < 100000000000000) return v < 10000000000000 ? 13 : 14; | ||
| return 15; | ||
| } |
There was a problem hiding this comment.
See similar apreach which slightly more optimal for perf for branch leafs:
https://github.com/AssemblyScript/assemblyscript/blob/main/std/assembly/util/number.ts#L133
| @inline function writeUInt16(buf: usize, value: u64): usize { | ||
| const len = decimalLen16(value); | ||
| let p = buf + (<usize>len << 1); | ||
| let v = value; | ||
| while (v >= 100) { | ||
| const q = v / 100; | ||
| p -= 4; | ||
| store<u32>(p, load<u32>(DIGITS + (<usize>(v - q * 100) << alignof<u32>()))); | ||
| v = q; | ||
| } | ||
| if (v >= 10) { | ||
| store<u32>(buf, load<u32>(DIGITS + (<usize>v << alignof<u32>()))); | ||
| } else { | ||
| store<u16>(buf, 0x30 + <u32>v); | ||
| } | ||
| return buf + (<usize>len << 1); | ||
| } |
There was a problem hiding this comment.
Also did you compared this approach with original?
https://github.com/AssemblyScript/assemblyscript/blob/main/std/assembly/util/number.ts#L221
I think original whould be faster
| if (binSig != 0) return writeNaN(buf); | ||
| return writeInfinity(buf, neg); | ||
| } | ||
| if (binSig == 0) { store<u16>(buf, 0x30); return buf + 2; } // +/-0 -> "0" |
There was a problem hiding this comment.
| if (binSig == 0) { store<u16>(buf, 0x30); return buf + 2; } // +/-0 -> "0" | |
| // +/-0 -> "0" | |
| if (binSig == 0) { | |
| store<u16>(buf, 0x30); | |
| return buf + 2; | |
| } |
Btw I guess it should be "0.0"?
| @inline export function toDigits32(value: u64): void { | ||
| if (ASC_FEATURE_SIMD) return toDigits32Simd(value); | ||
| toBcd8(value); | ||
| gDigHi = gBcd + ZEROS; | ||
| gDigNum = gBcdLen; | ||
| } |
There was a problem hiding this comment.
| @inline export function toDigits32(value: u64): void { | |
| if (ASC_FEATURE_SIMD) return toDigits32Simd(value); | |
| toBcd8(value); | |
| gDigHi = gBcd + ZEROS; | |
| gDigNum = gBcdLen; | |
| } | |
| @inline export function toDigits32(value: u64): void { | |
| if (ASC_FEATURE_SIMD) { | |
| toDigits32Simd(value); | |
| } else { | |
| toBcd8(value); | |
| gDigHi = gBcd + ZEROS; | |
| gDigNum = gBcdLen; | |
| } | |
| } |
| if (hasExtraDigit) store<u16>(buf + 16, <u32>lastDigitChar); | ||
| const sig = 8 + i32(hasExtraDigit); | ||
| const endByte = buf + ((decExp + 1) << 1); | ||
| for (let z = buf + (sig << 1); z < endByte; z += 16) putBlock8(z, ZEROS); |
There was a problem hiding this comment.
| for (let z = buf + (sig << 1); z < endByte; z += 16) putBlock8(z, ZEROS); | |
| for (let z = buf + (sig << 1); z < endByte; z += 16) { | |
| putBlock8(z, ZEROS); | |
| } |
| } | ||
|
|
||
| const n = numDigits + i32(hasExtraDigit); | ||
| const endPos = decExp >= 0 ? (n > decExp + 1 ? n + 1 : decExp + 1) : n; |
|
I think I’ve roughly point style direction. There are still quite a lot of tweaks & nits to make. Ping me when you’re done, and I’ll give it another pass. |
…ript into jairus/switch-to-xjb
Fixes #3012.
Changes proposed in this pull request:
⯈ Switch to xjb-as which is an improvement over zmij-as
⯈ Comply to the ECMA262 Specification with the exception of a trailing
.0It's good quality stuff though, and I'm quite confident in that.
Here's some performance notes. xjb-as/README.md has more extensive notes and stuff. These benches are taken on an AMD 7800x3D.
Lookup-table footprint vs the old grisu2 implementation (omiting shared tables):
dtoa) onlyftoa) only