However I don't like taking conservative way. It's always better to implement full feature so that users don't have to consider implementation's limitation. Now the problem is how and compatibility with built in engine. If Unicode is supported then SRFI requires, for example,
alphabeticto match L, Nl and Other_Alphabetic characters. Current builtin engine only considers ASCII for named character sets (e.g.
\w). Well there are bunch of options to resolve this but followings are, I think, rational;
- Support these only in this SRFI
- Builtin engine should support Unicode as well
\w. I believe I've already wrote bunch of code which depend on the fact that it only matches ASCII characters.
Let's think about #1 first. This must be relatively easy so that I just need to convert the named character set to Unicode character set or ASCII character set depending on the expression. The problem, if I dare to call, is that I need to prepare 2 types of the same named character sets; one of them is already there, though. So all what I need are adding full Unicode character sets and switch them according to the context. Easy isn't it?
Now option #2. The problem is backward compatibility. There are 2 main possible breaking compatibility issues; one is regular expression itself and the other one is SRFI-14. The whole point to do this option is make my life easier for later stage to merging Unicode character sets to predefined ones such as
char-set:letter. For regular expression engine, I probably just need to make sure by default it's ASCII context. However I'm not even sure whether or not I wrote a piece of code with SRFI-14 that depending on ASCII character set. (quick grep showed I have, so if I merge it then I need to check all code...)
So if I take #2 then the following things need to be done;
- Separating Unicode/ASCII context on builtin regular expression engine
- Adding full unicode set to builtin character sets (including SRFI-14)
- Checks if there is a piece of code which depends ASCII charset