I like how the spec defines character classes by just passing the buck to Unicode
=====
Forbidden Characters
Forbidden characters are Unicode scalar values with general category Control, Surrogate, and Unassigned. Forbidden characters must not appear in the source text.
White Space
White space characters are those Unicode characters with the Whitespace property, including line terminators.
The "general category" [1] and "whitespace" [2] properties are real character properties defined by Unicode. Referring to them is, ideally, how a language that supports Unicode should do things.