Cool! One thought:
>Correctly handles Unicode emoji and complex symbols
I'm no JS/TS expert, but this sounds like it could be a case of making something behave like an inexperienced dev would expect at the cost of making it behave unexpectedly to an experienced dev.
JS strings are sequences of UTF16 code points, and that's part of their public API (e.g. 'split'), right? That's JS's choice (or mistake, depending on your feelings). So I would expect 'chonk' to treat strings as what they are. And then add a flag or separate method ('chonkGraphemes') instead.
But maybe this kind of "helpfulness" is normal in 3rd party JS APIs?
Hi! Great idea — overall, all the edge cases are handled really well. That said, I think it would be good to add a config for the available processing options (similar to how it’s done in markdown-it, for example) to better support expected behavior. I’ll give it some thought and work on implementing it!
Thanks for your support! Added separate processing, now it seems the behavior is more obvious