天天看點

Generate C interface from C++ source code using Clang libtooling原文位址Generate C interface from C++ source code using Clang libtooling

Two files <code>cwrapper.h</code> and <code>cwrapper.cpp</code> are created to map the Rectangle class to C:

Cwrapper functions should only be generated for static and public member functions of each class. There is no point to map private member functions of classes to C.

I am not going to map operator overloading, copy constructor and move constructor functions to C, as uThreads classes do not have any of that

uThreads does not have a namespace yet (it will be added in the future), so there should be a way of figuring out the name of the classes under uThreads that I want to map to C. If there was a namespace, I could simply find all the classes under that namespace and use them to achieve the goal. But for now, I am using a list of class names to map to C stored in <code>std::vector</code>.

Since function overloading is not supported in C, there should be a way to map overloaded functions to C. For now I use numbering to map the overloaded functions to C. e.g., uThread_create and uThread_create_1 are constructors for uThread class.

Template functions are not supported in the current version and will be supported in the future.

Here, all the functions only accepts builtin types and references to uThreads objects, thus make it easy to generate the C counterparts. However, if objects from outside libraries or even standard libraries are passed to or returned from the functions, it needs additional effort to make it work. Since those classes should be listed separately and a struct in C should be generated for them and a pointer to them stored in the C code.

Let’s also take a look at the generated AST by Clang. If you pass <code>-Xclang -ast-dump</code> to clang, you can see the AST that is generated by clang. So issueing the following command in the root directory of uThreads will print out the AST in a simple text format:

<code>include/uThreads.h</code> is the file that includes all the header files in uThreads, thus all uThreads classes and functions are present when passing the file to clang. Also, since many stdlib and system files are included the resulting AST is going to be very large, as it covers all the included files as well. Here are AST nodes related to constructor, destructor, and method <code>accept</code> from class Connection in uThreads:

It enters <code>clang-query</code> console and you can issue <code>help</code> to see the available commands. Let’s use the <code>match</code> command to find all the <code>CXXMethodDecl</code> in the AST:

As before, since many files are included in the source code, the output is going to be large and includes all the methods, constructors and destructors from uThreads and all other classes included. Here is part of the output for class <code>Connection</code>:

However, for the C interface we are only interested in public methods. So let’s change the query to reflect that:

This time only the public methods are printed to the output. Also, since we are only interested in classes defined in uThreads library, I am going to use specific class names to limit the output. For example, to see all the public methods from class <code>Connection</code> issue the following commands:

This time I changed the output to show AST dump instead of diagnostics, which results in the following:

We can use the final version of the matcher to find the public methods of interest and generate C source code based on that. We start by creating an option category which is applied to all command-line options. Also you can define new options for the command line tool using option tables, so arguments can be passed to your program. I am not going to use that feature in the current implementation, but included “-o” option just to show how it’s done:

I also create a struct with custom string streams which is used to generate the final C <code>.h</code> and <code>.c</code> files, a vector of the name of the classes from uThreads to be explored and a map of function names to an integer, which is used to keep track of functions and detect overleaded functions.

Next, we write the main function:

Whithin the main function, we pass the OptionCategory to the <code>CommonOptionsParser</code> along with argc and argv to parse the arguments passed to the code. Next, We create a <code>ClangTool</code> instance and pass the Compilation and source path lists to it. Finally, the ClangTool needs a new FrontendAction for each translation unit we run on. Thus, it takes a FrontendActionFactory as parameter. To create a <code>FrontendActionFactory</code> from a given FrontendAction type, we call <code>newFrontendActionFactory&lt;MyFrontendAction&gt;()</code>. <code>MyFrontendAction</code> is a <code>ASTFrontendAction</code> which is used for AST consumer-based frontend actions, and since we are going to rely on AST matchers we use the <code>ASTFrontendAction</code>.

<code>MyFrontendAction</code> includes an instance of <code>OutputStreams</code> that has two string streams, one for the header file and the other one for the <code>.cpp</code> file. Both streams are intialized with the required include statements and <code>#ifdef</code> statements to be placed at the beginning of the file. The <code>EndSourceFileAction</code> is a callback at the end of processing a single input, and is used to append required strings to the end of the streams and save the streams to <code>cwrapper.h</code> and <code>cwrapper.cpp</code> files. Finally, <code>CreateASTConsumer</code> is called to create the AST consumer object for this action. Here an instance of <code>MyASTConsumer</code> is created and returned, we also pass the <code>OutputStreams</code> instance by reference, so later it can be passed to handlers. AST consumer, in our case, is responsible for registering matchers and run them over the AST. Here, the list of classes is used along with the <code>cxxMethodDecl</code>matcher we came up with earlier to match the methods of interest.

<code>MyASTConsumer</code> has an instance of a <code>classMatchHandler</code> and a <code>MatchFinder</code>. <code>MatchFinder</code> is used to register matchers and handlers and to apply the matchers to AST Context and call the related handlers when a match is found. In the constructor, we add all the matchers using <code>addMatcher</code>, and under the <code>HandleTranslationUnit</code> we call <code>matchAST</code> to perform the matches and do the callbacks. We also bind the match to a string (<code>publicMethodDecl</code>), which is used later to differentiate among matchers. Also, notice that for each class a struct is generated with name <code>WclassName</code> along with a typedef with the same name. As explained eariler, this struct is used to hold a pointer to the instance of the class in C.

Class <code>classMatchHandler</code> is a child of <code>MatchFinder::MatchCallBack</code> class, and is used to perform the required task after a match is found. We also override member function <code>run</code>, that accepts a <code>MatchResult</code> and is called upon finding a match. The first step is to check whether the matched node with the provided string <code>publicMethodDecl</code> is of type <code>cxxMethodDecl</code>.

If so, we create a few of strings variables in order to generate the final C method. <code>className</code> captures the name of the parent class of the method, i.e. the name used to find the method in the matcher, by getting the declaration name of the parent class as string. To map member functions to C, there is a need to send the pointer to the object as the first argument, so the string <code>self</code> contains the string that represents the pointer to the struct in C that is named as self. The rest of the variables are explained later.

As explained eariler, operator overloading is ignored and not mapped to C, thus:

Next, we check whether the matched method is a constructor:

If <code>cmd</code> can be dynamically casted to <code>CXXConstructorDecl</code>, it means the method is a constructor. If the constructor is a copy constructor or move constructor we are not going to map it to C as explained above. The final method name should be <code>className_create</code>, so we set the <code>methodName</code> to be <code>_create</code> and later concatenate it to the calss name. Also, since the constructor creates an instance of the object and return a pointer to the created object, the return type should be the same as the struct type that holds the pointer to the object in C. <code>self</code> string will be empty, since there is no instance of the object yet to be passed to the constructor. <code>separator</code> is used to separate function arguments in the string, and since there is no self, the next argument do not need a comma before it. <code>functionBody</code> is a string stream that holds the information related to the function definition. For the constructor, we know that it returns the casted pointer to the object created by <code>new</code>, so the related string is generated accordingly. <code>bodyEnd</code>holds the string to complete the function definition, and since there are two open parentheses, they should be closed after arguments are placed in between the parentheses.

Next, we check wether the method is a destructor:

If the method declaration <code>isa&lt;CXXDestructorDecl&gt;</code>, it means the method is a destructor. Thus, the <code>methodName</code> should be <code>_destroy</code>, <code>returnType</code> should be void, and <code>functionBody</code> just calls <code>delete</code> on the casted pointer.

If the method is not a constructor, destructor, or overloaded operator, it is a member or static function:

The <code>methodName</code> in this case is the same as the name of the method with prefix <code>_</code>. Next step is to determine what is the return type of the method, for that purpose we use <code>getReturnType</code> to find the <code>QualType</code>. Then, we pass the <code>QualType</code> to the helper function <code>determineCType</code>, in order to find the type, whether it needs casting, whether it’s a pointer and whether it’s a void type. Here is the definition of the <code>determineCType</code> function:

This function returns 4 variables; one that reperesents the type that should be returned, one that shows what the return type should be casted from, and two booleans reperesenting whether the type is a pointer and whether the function needs a <code>return</code> statement before the function call. First, if the type is a builtin type, or is a pointer to a builtin type, there is no need to cast it (castType is left empty), and if this builtin type is <code>void</code> there is no need to put the <code>return</code>keyword before calling the function. Otherwise, if the type is a record type (class or struct), since in uThreads all functions only return record types that is defined whitin uThreads library, we can be sure that by casting the return type to <code>WclassName</code> struct, we are returning the instance of the object (it can be problematic if the returned object needs to be copied, since the original memory might be freed at the end of the function and the returned pointer is not valid anymore. But since in uThreads all objects are returned either by reference or only pointers are being returned, this condition does not apply to uThreads. Thus, it should be modified if you need to use it in another projects. This part is mainly relevant for function arguments, when later <code>determineCType</code> is called to determine the type of the passed arguments). Finally, if the return type is a pointer or reference to a record type, and the pointer is to a class under the uThreads list of classes, return the <code>WclassName*</code>, otherwise return the pointer itself.

Continuing on determining the return type, if the <code>shouldRetran</code> flag is true, a <code>return</code> keyword is added before the function call. If the return variable requires casting, insert <code>reinterpret_cast</code> along with the C return type before calling the function. Finally, generate the function call: if the function is static, call it by using <code>className::methodName</code>, and if not use <code>reinterpret_cast</code> to cast the passed <code>self</code> pointer to the object pointer and call the function by <code>self-&gt;methodName</code>.

Next, let’s generate the function declaration in C by concatenating the related strings together:

Function name is of form <code>returnType className_methodName</code> in C. In addition, as explained earlier, if there is any function overloading, a number is added to the end of the function name. Thus, a map of function names to integer is used to keep track of the number of times the function of the specified class appeared. If it appears more than one time, a suffix <code>_n</code> is added to the end of the function, where <code>n</code> is the number of times the function appeared. Finally, we add a parentheses to the end of the function name, along with the <code>self</code> string, that reperesnts whether a pointer to the object is passed to the function or not, and start generating the argument list:

To generate the argument list in C, we iterate through the method arguments by using <code>getNumParams</code> and <code>parameters</code> array. The type of the argument in C is determined by calling to <code>determineCType</code>, and based on the tpe of the argument it might require casting. After generating the list of the arguments, we simply close the parentheses and print the generated function name and body in header and body string streams:

As explained earlier, <code>BodyOS</code> and <code>HeaderOS</code> streams are flushed to related files in the end. To build and install the binary simply clone the repo and issue the following commands (you require cmake and make to be installed):

The binary is installed under <code>/usr/local/bin</code> and you can use it under the uThreads directory as follows:

Another issue with current approach is in cases where there is no public constructor (e.g., uThread class does not have a public constructor and uThread instances are created using uThreads::create function) the content of the constructor is going to be wrong and need to be updated. A possible fix that can further improve the current approach is to annotate the C++ source code with the intended C interface, and in cases the annotation is used simply use the annotation instead of generating the function.

Although the C interface generator still has some shortcomings when it comes to return types, dealing with template functions and out of uThreads classes, this practice shows the potential of using such approach to automate the generation of C interfaces from C++ source code, specially for large projects.

繼續閱讀