Syntax highlighter

2014-04-09

Set, bag and comparator

I've implemented new final SRFI 114 (not sure if this is really final state since there was no call for final from author...). You may already know this SRFI is used in SRFI-113. it's like the relationship between SRFI-13 and SRFI-14. If this is related to other SRFI then next step would be implementing SRFI-113 (needs to be finalised first though).

So I've read a bit of SRFI-113 specification and realised that this is very generic and can be applied existing charset. However if I apply it to charset, it would be a big internal change. Even though there is still some time to be finalised, it's better to think about how I should implement.

The very first consition is this;
  • Charset can't be moved to Scheme world
This is because it's used for builtin regular expression and moving it to Scheme world would be serious performance issue. Then implementing strategy would be like this;
  • Make <set> class in C as base clasts
  • <char-set> extends it
  • Implement <set> constructor and some procedures in C
  • Rewrite SRFI-14 implementation
Now there is already a problem with this strategy and which is comparator needs to be implemented in C. As I mentioned above I've already implemented SRFI-114 in Scheme. Implementing in Scheme was really easy but moving to C would be a bit hard. There are 2 considerations;
  1. Performance
  2. Co-operate with Scheme world
#1 is the biggest. The constructor of comparator takes 4 procedures, though 2 of them can be omitted, and created comparator is used for sets to determine whether or not the given elements are in the set. Thus, for charset I need to implement this behaviour in C. It's not possible of course but naive implementation causes performance issue since calling Scheme procedure from C costs a lot. There are at least 2 solution for this; one is calling procedure directly if it's subr (which is C world procedure). the other one is make comparator be able to have both C functions and Scheme procedures. The first one is used for hashtable and the latter one is used for ports. For builtin charset, the first one would be sufficient but not sure if anyone wants to extend it.

#2 is a bit less serious. It's related to class hierarchy. How should class hierarchy of <set> look like? The very naive but safe way would be like this;
  • <abstract-set>
    • <charset>
    • <set>
      • <integer-set>
So <charset> and <set> share the super class but it's not in direct relation. This is more like implementing the same interface. The other one would be like this;
  • <set>
    • <charset>
    • <integer-set>
<set> is the base class of all set related classes. This is more direct. The class hierarchy affects the implementation. If I take the first one, then comparator doesn't have to be in C but second one. However for the first one, it would be very inconvenient/indirect to implement common set APIs. The goal for this merging is sharing the APIs. It doesn't have to support all SRFI-114 APIs but should be able to provide it with common APIs and can be applied for both sets.

Fortunately, there is still time to think about it but not much I think...

No comments:

Post a Comment