pbronez 1 day ago

I like how the spec defines character classes by just passing the buck to Unicode

=====

Forbidden Characters

Forbidden characters are Unicode scalar values with general category Control, Surrogate, and Unassigned. Forbidden characters must not appear in the source text.

White Space

White space characters are those Unicode characters with the Whitespace property, including line terminators.

1
hgs3 1 day ago

The "general category" [1] and "whitespace" [2] properties are real character properties defined by Unicode. Referring to them is, ideally, how a language that supports Unicode should do things.

[1] https://www.unicode.org/reports/tr44/#GC_Values_Table

[2] https://www.unicode.org/reports/tr44/#White_Space