-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved C language interface #93
Comments
Hello Stewart, Have you heard anything about Ofront+, the extended translator of the Oberon languages into C? • https://github.com/Oleg-N-Cher/OfrontPlus Some of the suggestions you described here have already been implemented in Ofront+. In my development strategy, I also faced the need to improve the interaction with C libraries. And as a basis, I took mainly those solutions that are implemented in BlackBox. Maybe my steps should be severely criticized, but I would be interested in your opinion about what is missing. And, of course, the VOC team can always look into my commits and sources. So, I will list the most important differences from VOC.
[foreign] --> don't generate C body, only header This way, even your binding generator can be used to produce interfaces that work with Ofront+ after slightly modification. I have plans for further improvements. For example, I would very much like to implement the [union] tag. I didn't consider the option of a checkbox for linking libraries, but only because Ofront+ is not a compiler, it is a pure translator in C, and it doesn't directly call the C compiler and has nothing to do with its command-line options. P.S. I know Norayr as a very conservative person who never likes to make new features, so perhaps you will have to code everything yourself. |
Hello, everyone. Thank you for the feedback. Let me express my first reaction, which is probably predictable for @Oleg-N-Cher . I do not like pragmas of any kind for Oberon. I believe if we start adding pragmas, we can go and go with it till the point we have more pragmas than language keywords. I even know fork of ooc, here, on github, which introduced, (surprise!) a new pragma. Then we'll have a problem why we discriminate this pragma, if we didn't discriminate others. (: I believe, we should not complicate the parser with parsing OS dependent, compiler dependent expressions, and invent these expressions. I think, currently voc, having Ofront, and OP2 heritage, evolved with using code procedures, that are used in Oberon operating systems, as, well, listings of machine code instructions, and I think that's a brilliant idea, to use already existing feature to enable linking to the foreign libraries. @sgreenhill, you mentioned this notation ooc has: FPC and Delphi Pascal has "unit", "program" and "library" keywords to describe how the unit should be compiled. In case of "library", it gets linked as dll in Windows. I think voc's parser have been improved a lot, mostly thanks to @dcwbrown 's work, and I believe it can be used as a foundation of a 64bit native code compiler for Oberon operating system. I think using makefiles to control the compilation is a good idea. So in case of pragmas like One day we may have compiler feature requests, to introduce new features. I believe today we have a luxury of situation, when Oberon, or voc is not actively used in industry, so there is no need to solve some problem, that developers have, by a solution, which might not be the best. We have a luxury to discuss the ideas, the features, without putting them to code, and having to introduce new stones in the building, that we later might realize, we build not the best way, but it might be too late to revert the changes, or to painful to do so. That was my first reaction on the thought of complicating the parser. Thank you again. I am glad you are keeping Oberon alive. |
Thanks all for your comments. There are a few different issues here. First issue: foreign code interface. John Donne says "No man is an island entire of itself", and clearly in a world where there are now so much great free software available it is important to provide a seamless interface to foreign code. This vastly increases what a developer can achieve with limited time resources. The designers of Component Pascal (who are also some of the Oberon-2 designers) realised this, and almost every point that @Oleg-N-Cher mentions is implemented in that system. The facilities that are currently in VOC, inherited from the original Ofront are basic, but incomplete. The RECORD and ARRAY flags "[1]" protect the GC, but allow many other dangerous operations that could easily crash or corrupt a program. For example, applying LEN to a foreign open array. As it is, it can be hard to avoid doing many "unsafe" operations (eg. type casts via SYSTEM.VAL) in order to use C-implemented objects. @Oleg-N-Cher mentions in point (2) the potential for compatibility problems during compilation. This arises because the "code procedure" idea requires a module compilation to include every include file required by the foreign code on which you depend. So on Mac OS, I am frequently getting this sort of warning:
This is because of the name confict between Ofront-generated headers (eg. "Math.h") and the headers required by the code procedures (eg. "math.h"). On both Mac OS and Windows it is possible for file-systems to be case-insensitive, so this immediately gives you a portability issue. I have been lucky so far, but only because Oberon tends to capitalise the first letter of the module name, and C uses almost entirely lower case. So the "C library procedure without a body" approach described by @Oleg-N-Cher can sometimes be significantly more robust. It simplifies the declarations, but importantly it saves developers from potential name collisions between Oberon modules and unrelated C headers. In such situations one is forced to either rename the Oberon module, or delve into the compiler and modify the naming scheme for intermediate header files. This should probably be done anyway as it is likely to cause a problem somewhere in the future. Renaming Oberon modules (eg. system modules) could require a cascade of edits to user and library code. So doing the foreign interface properly means:
Second issue: @norayr, you mention the desire to keep the compiler simple, which I accept. Most Oberon users are probably here because they value simplicity. But many software projects involve complexity, and keeping the compiler too simple can push much complexity into the user's code. This has the effect of increasing overall complexity, because the problems are now duplicated countless times, and the individual solutions may be incompatible. For the developer, time is the critical resource, and most users won't accept a solution that requires them to jump through too many hoops. The "system flag" approach is fairly simple, and does not have much impact on the language. In the OP2/ofront implementation, its pretty hard to decode what the different flags are meant to do, and in some cases they cram different values together (eg. the trailing gap for record alignment is also encoded in the sysflags). All languages must adapt over time to new conditions, or risk dying out. For example, when multi-threading became common "C" introduced "volatile", and recently we have "atomic" in response to the development of SMP CPUs. "volatile" is also important for memory-mapped I/O which is now common on many devices. These things are all necessary to safely exploit modern hardware and operating systems. The fact that a feature is not in the original language does not mean that it should not be added. One of the advantages of the Ofront approach is that pretty much every required concept is already implemented in the C compiler, so something like:
can easily be declared without introducing keywords:
Third issue: makefiles. These might be acceptable for small projects, but when you have a few dozen modules, and are constantly updating makefiles to express module dependencies that are already expressed in the source code, you begin to ask why the compiler is not handling this task. Basic design principle: DRY ("don't repeat yourself"). Adding unnecessary code to a code-base increases the work required to maintain and extend the code. It introduces a potential failure point, since dependencies in the Makefile may become out-of-sync with the Oberon modules. The developer needs the confidence that the build process is correct, even when using modules that may have been written by other developers. So, for example:
always compiles and links modules in the correct sequence, regardless of what you changed, just like:
...or any other modern compiler. After all, the compiler already has all the knowledge required to correctly build the project, so why not use it? Likewise, relying on Makefiles to maintain dependencies between Oberon and external libraries is IMO an unnecessary task that could easily be supported by the compiler. Otherwise, you have to understand every library dependency of every module that you are using, either directly or indirectly, and encode this in the Makefile. This breaks the concept of "blackbox reuse", which is important in any software ecosystem, because modules are not able to fully express their own dependencies. Sorry, I hope this doesn't sound like a rant. Many thanks to you all for your great work. |
It doesn't sound like a rant, it sounds very reasonable. Just a short note on C's atomic and volatile: I understand that there are emerging problems that C tries to address. One of those is how to use SMP efficiently. Another example of this is "memory fence" functionality added to the C++ language. Corresponding instructions have been added to Intel and several CPUs, so C++ keeps up with this CPU design. I want just to share here, that I have a completely different idea on multitasking, and that is basically what Erlang vm does, but implemented in native code, and without having a shared memory model, but with messaging between processes,. The messages can be delivered locally or over the network. So the scaling to many machines is much easier, than with shared memory model. And the efficiency, in case of high load is higher. Joe Armstrong has an amazing video about that, called "How do we program multicores", I recommend it a lot. I think Oberon is well suited for that approach. I think we need to avoid shared memory for threads by any means. (: Still, I will reread everything, and write more. You have a point, may be we cannot leave FFI like this. Still, I think if it is possible to use a separate tool for that. |
Hi Stewart, welcome back :-), |
yes, I have a feeling that H2O and voc can work nicely together to solve these ffi issues, but i need time to concentrate and think about it. |
In my experience building and using H2O, the main issue is translating the "style" of the foreign API so that it maps well to Oberon. There are many possible mappings so each API will need some human intervention to define the mapping rules. Within these rules, the actual translation can usually be done mostly automatically. The rules may need to be periodically checked as APIs are updated, so it is useful to have users invested in this process. The more users, the more APIs that can be maintained. Keeping this in mind, I think a sensible approach would be to follow @Oleg-N-Cher's implementation of the Component Pascal standard. That would make any API translation usable on at least three platforms: VOC, OFrontPlus, and Blackbox. Apart from minor syntactic differences, this also conforms to INTERFACE modules in OOC. There would of course still be the existing "code procedure" method, but this is essentially only usable on ofront-derived systems that translate to C. This of course only applies to C-style libraries. Component Pascal actually went further, and implemented an object representation that was binary compatible with Microsoft's COM (Component Object Model). Basically, these are interface objects that use VTABLE dispatch (like in C++). COM objects support a form of introspection via type libraries, so it is possible to automatically handle remote method calls between processes (including over networks), and even dynamically build language bindings on the fly in some scripting environments such as Visual Basic. C++ APIs are a more difficult problem. I had a few ideas about this, but not enough motivation to do much about it. @btreut, Hi and thanks for the link. Good to see that software still exists in some useful form. |
May I throw in a few general considerations?
I think code procedures are brilliant although I don't know how to use quotations marks within them since a quotation mark denotes the end. To get the prototypes a tool like H2O might be helpful. I would refrain from changing the language that unsafe or missing C types like pointers without size or unions can be used without glue code. This seems introducing unsafety and making the first step into transforming Oberon into C with a different syntax. If Oberon offers something special it is simplicity and safety, both should not be sacrificed.
|
if we implement so this can be done with an external tool, without placing any extra functionality on compiler. but still, what if I want to link several modules and distribute those as a library? i think this discussion also is about how we define a programmer. Usual Unix developers know at least one So my understanding is that developer should not be the person which doesn't want to see beyond its IDE window, but only implement beautiful abstractions. And let me stress again, each tool to its need: Today I would not even mind, if voc was stripped down to only produce an object file, leaving programmer the task to link it. (though I was the one who first introduced the automation code in voc) Or if we had a separate tool, which calls the compiler, gets the object file, and links those together. May be that tool could also build a dependency tree for the modules used, and be a replacement of the In case of These are thoughts that are not directly related to all parts of this thread, and it does not relate to FFI part probably. But it expresses my feelings today, about the design of tools. |
hello, people. today I was able to re read everything, and understand better all the points. I think, in general, the social processes are going in alignment with the demands of societies. That is why it is important to participate in discussions, to influence the formation of those demands. So, I reread the texts, and by the way, POINTER TO ARRAY OF CHAR, which is translated to a struct, came to my attention as well, and I was thinking of documenting that somewhere. I have mixed thoughts. One of those is, C interface is by definition unsafe. We may represent a mapping to a C struct, but that C struct would be different on other platform, or on the same platform after the upgrade. However hard we try, the C binding will be unreliable. Well, that's may be okay for some desktop platform we tested code on, and tell users to run it on, but I would not put that kind of code in to something serious. But we do wrappers to external libraries, so what we can do is we can try to make those wrappers safe. Becasue mappings to struct types were mentioned, one idea i would like to share with you is invented (invented, is that a right word?) by @dcwbrown when he was making his improvements. So Ofront has "struct stat" mapping in Unix.Mod
Instead of mapping each field to get an equivalent RECORD, @dcwbrown encapsulated the data like this
this way we don't have to worry about paddings, alignment, and different order of fields on different platforms. As long as the struct has the field, our procedure will return its value. and now I will wish you all the best, and come back later. |
The choice to include only one module or several modules for creating a dynamic link library on UNIX is an example of the mismatch between Oberon and UNIX. A Dynamic link library is a mere collection of routines which can be loaded together. Templ Josef has implemented dynamic loading of modules in his latest version of ofront by generating a dynamic link library for each module. That way he can emulate the behaviour of the Oberon System and in fact he produced an Oberon System which behaves like the original version with respect to module loading and unloading. Probably, the type LIBRARY is helpful on Windows. Basically, the compiler should only do the minimum. A note on complexity: Complexity is not size. Complexity is a function of interdependency of parts. To keep the complexity low it is a good strategy to have the parts to interact only by small bandwidth. Therefore it should not increase the complexity of the compiler if it generates a list of modules that have to be linked in order to resolve all symbolic references because that task gets only a simple input and produces only a simple output. Likewise the interpretation of flags within comments. The interaction is restricted only to parsing one comment and delivering a few boolean values. Having cleanly defined, i.e. syntactically and contentswise, output makes it easy to combine tools. Since UNIX needs a linked program or references which dynamic link libraries have to be included having a seperate tool which generates the linking command is fine. And there is nothing wrong to glue all that together with a script which executes an Oberon make to detect which modules have to be compiled, calls the compiler for each of these modules and finally calls the link step generator and links the program. To make the user experience as good as possible the script and the tools should be provided. |
Hi,
This is my first post here, so hello and thanks to all the developers of VOC. I have been trying VOC with a few C libraries and have a few suggestions for improvements. I may attempt to impelement some of these, so any comments or technical suggestions would be welcome.
A few system flags are already implemented in VOC.
Assigning a system flag
[1]
to anARRAY
causes it to be not copied when passed as a value parameter. This avoids making useless copies of strings which improves the performance of text I/O. This has the same effect as theNO_COPY
flag in OOC. As far as I can see this is only used in Files.WriteString, but there are probably many other places that could benefit.Assigning a system flag
[1]
to aRECORD
orPOINTER
causes the object to be untraced by the GC. That is, the pointer (or any pointers in theRECORD
are not recorded as heap pointers in the generated code and type descriptors. Clearly this is required for pointers to any objects allocated outside the Oberon heap. The GC assumes that it can use a record's type descriptor for marking the heap object, and for enumerating embedded pointers, so omitting this flag on a C-allocated object could corrupt the C heap and/or crash the garbage collector. This flag is I think equivalent to theUNTRACED
flag in Component Pascal.OOC defines some useful flags for its C interface:
http://ooc.sourceforge.net/OOCref/OOCref_16.html#SEC150
The important flags are:
NO_DESCRIPTOR
declares that a record type has no type descriptor. This means it cannot be used in type tests and type guards, or the NEW procedure, and cannot be passed as a formal parameter that requires a type tag.NO_LENGTH_INFO
declares that an open array type has no length information. This means that LEN cannot be used, and it cannot be passed as a formal parameter that requires a length.UNION
declares that aRECORD
is to be translated to a C "union" instead of "struct". These don't occur very often but need to be implemented properly, especially if the variable is allocated by the (Oberon) client code.The current approach to calling external functions is a bit cumbersome, and requires one to hand-code a "macro" that will translate the Oberon call into a C call. For example:
Since this is a macro (or "code procedure"), it is necessary to explicitly cast between Oberon and C types in order to keep the compiler happy. Also, since the Oberon parameter list contains extra information (array length, type descriptors) the C and Oberon declarations don't always match (eg. the hidden title__len parameter in the above). If the types were properly declared (eg. with
NO_DESCRIPTOR
andNO_LENGTH_INFO
) then it should be possible to declare matching parameter lists, and simply call the functions via C externs. The need to explicitly code the type conversions makes it more difficult to automatically generate such interface declarations (eg. using a tool like H2O).One existing problem is the representation of open arrays. If you declare the following, as in oocC.string:
string = POINTER TO ARRAY OF CHAR;
The corresponding C code is:
With the current representation of the open array,
POINTER TO ARRAY
points to the length field rather than the data, so you can't simply cast (char *) as (oocC.string) without losing a few bytes of the array. At the moment, the only way to properly access a C array is to delcare an array with static length (eg. POINTER TO ARRAY 1 OF CHAR), and then disable run-time bounds checking. TheNO_LENGTH_INFO
flag would allow this to be done properly. With RECORDs this is less of a problem, as the type information is stored at a negative offset relative to the data.One last addition that would really help is to declare link libraries for C interface modules. For example, something like this, as can be done in OOC:
MODULE X11 [ LINK LIB "X11" ];
This means that whenever the X11 module is included the correct library dependency (
-lX11
) will be added to the link command. Currently it looks like the best option for specifying link libraries is via theCFLAGS
, but this adds messy dependencies into the makefiles. In some cases one must write a separate link step because gcc does not always allow link libraries to be declared before the object files that depend on them.I think this should be fairly easy to implement. Currently, each module contains a complete list of all modules that are directly or indirectly imported. The linker uses OPT.Links to enumerate the required object files, and this list is extended via OPT.InLinks for every IMPORT statement, and is saved via OPT.OutLinks when the symbol file is created. Link libraries could be added to this list (maybe with a special flag for the linker), or a separate parallel list could be maintained.
Any comments or suggestions?
The text was updated successfully, but these errors were encountered: