Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BindPython ABI #2796

Merged
merged 21 commits into from
Aug 12, 2024
Merged

BindPython ABI #2796

merged 21 commits into from
Aug 12, 2024

Conversation

Vipul-Cariappa
Copy link
Contributor

No description provided.

@certik
Copy link
Contributor

certik commented Aug 3, 2024

The right solution here is to have an ASR->ASR pass that takes the bind(Python) and replaces it with calls to Python C/API. This should not be hardwired into all the backends.

If you think you can get something working with relatively short code, you can finish the PR, but I think having a pass is a better solution long term.

split one `pass_python_bind` function into
`generate_body`, `native_to_cpython`, `cpython_to_native`,
and `pass_python_bind`
@Vipul-Cariappa
Copy link
Contributor Author

The right solution here is to have an ASR->ASR pass that takes the bind(Python) and replaces it with calls to Python C/API. This should not be hardwired into all the backends.

Yes. I am implementing this using an ASR to ASR pass. The python_bind.py implements the ASR to ASR pass.

@Vipul-Cariappa Vipul-Cariappa changed the title BindPython ABI for LLVM backend BindPython ABI Aug 4, 2024
@Vipul-Cariappa Vipul-Cariappa marked this pull request as ready for review August 5, 2024 04:59
@Vipul-Cariappa
Copy link
Contributor Author

@Shaikh-Ubaid, @certik, Please have a look at this.
This ASR pass currently supports, primitive datatypes, i.e. integers, unsigned integers, floats, and strings.
I will add support for arrays in the following PRs.
I have not tested this in interactive mode. I will also do it in the following PRs.

current_scope = parent_scope;
visit_ImportFrom(*imports);
current_scope = function_scope;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these needed in AST->ASR?

Ideally the initial ASR has nothing to do with Python C/API. Then the ASR->ASR pass translates bind(python) into C/API calls in ASR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, even I was confused in my last meet with Vipul. Looking into the changes, I think I understand what he is doing here.

He has a new runtime module cpython_bindings.py which contains the declarations to the C functions in Python.h that we need. In the above change, he is just creating AST code for the following:

from cpython_bindings import Py_Initialize, Py_IsInitialized, PyRun_SimpleString, Py_DecodeLocale, PySys_SetArgv, Py_FinalizeEx, PyUnicode_FromString, PyUnicode_AsUTF8AndSize, PyImport_Import, Py_DecRef, Py_IncRef, PyObject_GetAttrString, PyTuple_New, PyTuple_SetItem, PyObject_CallObject, PyLong_AsLongLong, PyLong_AsUnsignedLongLong, PyLong_FromLongLong, PyLong_FromUnsignedLongLong, PyFloat_FromDouble, PyFloat_AsDouble, PyBool_FromLong, PyObject_IsTrue

When he visits the AST code, lpython reads/loads the runtime/cpython_bindings.py and automatically generates the declarations for the above BindC functions.

@Vipul-Cariappa Initially this approach sounded interesting to me, but now after going through the changes, it seems like we should instead write the ASR code for these directly in the python_bind pass. I think if we simply extend the current pybind_funcs array with the parameter types and return type info (for example specifying types like here

m_wa.define_func({i64}, {}, {i64, i64, i64, i64}, "print_i64", [&](){
), we should be able to run a loop and generate the ASR nodes for declaring these. Then, we ideally would not be making any changes in the AST->ASR level. And we can also remove the runtime/cpython_bindings.py file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will update the code to generate the required function declarations in the python_bind pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially this approach sounded interesting to me, but now after going through the changes, it seems like we should instead write the ASR code for these directly in the python_bind pass.

Yes, exactly.

@certik
Copy link
Contributor

certik commented Aug 5, 2024

Otherwise it looks like the right approach, it seems you are doing the type translation to/from Python in the ASR->ASR pass already, which is perfect.

@@ -321,6 +321,7 @@ int emit_c(const std::string &infile,
pass_manager.use_default_passes(true);
compiler_options.po.always_run = true;
compiler_options.po.run_fun = "f";
compiler_options.po.c_backend = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have --backend=c which means the backend is c. Another c_backend could seem confusing. I think we should call it something like c_skip_bindpy_pass.

We should also enable this flag in emit_c_to_file().

// backend as it would be handled by this ASR pass.
return;
}
bool bindpython_used = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need not check for bindpython_used here, and just assume that bindpython is used when --enable-cpython is provided.

We skip this whole pass if --enable-cpython is not provided.

Comment on lines 175 to 177
ASR::symbol_t *sym_PyRun_SimpleString = parent_scope.get_symbol("PyRun_SimpleString");
Vec<ASR::call_arg_t> args_PyRun_SimpleString;
s.from_str(al, "import sys; sys.path.append(\".\")");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to run this? I think the current example in integration_tests/bindpy_05.py works without it, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the bindpy_05.py example does not require this. But it is required for other tests example bindpy_01.py.

We append the current path to CPython's module search path. If we do not do this, CPython will fail with ModuleNotFoundError, because it only looks for modules installed using pip or the builtins/CPython standard library.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

argv1: CPtr = Py_DecodeLocale("", empty_c_void_p())
PySys_SetArgv(1, pointer(argv1, i64))
should help. I believe it adds the current directory . to the path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try this.

What does pointer(argv1, i64) do? What does the pointer function do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it out.

I am no longer using PyRun_SimpleString and appending the path. I am using Py_DecodeLocale and PySys_SetArgv.

// reassignment
f.m_body = body.p;
f.n_body = body.n;
ASRUtils::get_FunctionType(f)->m_abi = ASR::abiType::BindC;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to make the abi as BindC? I think we should be making it Source?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Will change it to Source in the next commit.

Vec<ASR::call_arg_t> args_PyLong_AsLongLong;
args_PyLong_AsLongLong.reserve(al, 1);
args_PyLong_AsLongLong.push_back(al, {f.base.base.loc, exp});
conv_result = ASRUtils::EXPR(ASR::make_Cast_t(al, f.base.base.loc,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do Cast_t here (and the rest of the types below)? Isn't the returned value by the specific function (in this case PyLong_AsLongLong) already of the desired type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature of PyLong_AsLongLong is
long long PyLong_AsLongLong(PyObject*). i.e.
i64 PyLong_AsLongLong(CPtr).

Python does not have PyLongAsInt that returns an int (i32).

The Cast_t is required in the following situation.
Assume the following PythonBind function

@pythoncall(module="mypylib")
def f(a: i32) -> i32:
    pass

or

@pythoncall(module="mypylib")
def f(a: i8) -> i8:
    pass

The Cast_t casts the result of PyLong_AsLongLong from i64 to i32 or i8 according to the actual type required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks.

current_scope = parent_scope;
visit_ImportFrom(*imports);
current_scope = function_scope;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, even I was confused in my last meet with Vipul. Looking into the changes, I think I understand what he is doing here.

He has a new runtime module cpython_bindings.py which contains the declarations to the C functions in Python.h that we need. In the above change, he is just creating AST code for the following:

from cpython_bindings import Py_Initialize, Py_IsInitialized, PyRun_SimpleString, Py_DecodeLocale, PySys_SetArgv, Py_FinalizeEx, PyUnicode_FromString, PyUnicode_AsUTF8AndSize, PyImport_Import, Py_DecRef, Py_IncRef, PyObject_GetAttrString, PyTuple_New, PyTuple_SetItem, PyObject_CallObject, PyLong_AsLongLong, PyLong_AsUnsignedLongLong, PyLong_FromLongLong, PyLong_FromUnsignedLongLong, PyFloat_FromDouble, PyFloat_AsDouble, PyBool_FromLong, PyObject_IsTrue

When he visits the AST code, lpython reads/loads the runtime/cpython_bindings.py and automatically generates the declarations for the above BindC functions.

@Vipul-Cariappa Initially this approach sounded interesting to me, but now after going through the changes, it seems like we should instead write the ASR code for these directly in the python_bind pass. I think if we simply extend the current pybind_funcs array with the parameter types and return type info (for example specifying types like here

m_wa.define_func({i64}, {}, {i64, i64, i64, i64}, "print_i64", [&](){
), we should be able to run a loop and generate the ASR nodes for declaring these. Then, we ideally would not be making any changes in the AST->ASR level. And we can also remove the runtime/cpython_bindings.py file.

Copy link
Collaborator

@Shaikh-Ubaid Shaikh-Ubaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, Vipul! Shared some comments. Rest looks good.

@Vipul-Cariappa Vipul-Cariappa marked this pull request as draft August 10, 2024 14:45
@Vipul-Cariappa Vipul-Cariappa marked this pull request as ready for review August 11, 2024 10:18
Comment on lines 379 to 489
F64,
PTR,
PTR_TO_PTR,
};

struct F {
std::string m_name;
std::vector<enum T> args;
enum T retvar;
};

ASR::expr_t *type_enum_to_asr_expr(Allocator &al, enum T t, Location &l, std::string n, SymbolTable *current_scope, ASR::intentType intent) {
ASR::ttype_t *type = nullptr;

Str s;
s.from_str(al, n);

switch (t) {
case VOID:
return nullptr;
case I1:
type = ASRUtils::TYPE(ASR::make_Logical_t(al, l, 4));
break;
case I8:
type = ASRUtils::TYPE(ASR::make_Integer_t(al, l, 1));
break;
case I32:
type = ASRUtils::TYPE(ASR::make_Integer_t(al, l, 4));
break;
case I64:
type = ASRUtils::TYPE(ASR::make_Integer_t(al, l, 8));
break;
case U8:
type = ASRUtils::TYPE(ASR::make_UnsignedInteger_t(al, l, 1));
break;
case U32:
type = ASRUtils::TYPE(ASR::make_UnsignedInteger_t(al, l, 4));
break;
case U64:
type = ASRUtils::TYPE(ASR::make_UnsignedInteger_t(al, l, 8));
break;
case F32:
type = ASRUtils::TYPE(ASR::make_Real_t(al, l, 4));
break;
case F64:
type = ASRUtils::TYPE(ASR::make_Real_t(al, l, 8));
break;
case STR:
type = ASRUtils::TYPE(ASR::make_Character_t(al, l, 1, -2, nullptr));
break;
case PTR:
type = ASRUtils::TYPE(ASR::make_CPtr_t(al, l));
break;
case PTR_TO_PTR:
type = ASRUtils::TYPE(ASR::make_Pointer_t(al, l, ASRUtils::TYPE(ASR::make_CPtr_t(al, l))));
break;
}
LCOMPILERS_ASSERT(type);
ASR::symbol_t *v = ASR::down_cast<ASR::symbol_t>(ASR::make_Variable_t(al, l, current_scope, s.c_str(al), nullptr,
0, intent, nullptr, nullptr, ASR::storage_typeType::Default,
type, nullptr, ASR::abiType::BindC, ASR::Public,
ASR::presenceType::Required, true)); // intent == ASRUtils::intent_in ? true : false
current_scope->add_symbol(n, v);
return ASRUtils::EXPR(ASR::make_Var_t(al, l, v));
}

void declare_functions(Allocator &al, std::vector<struct F> fns, SymbolTable *parent_scope,
std::string header_name="") {
Location *l = al.make_new<Location>();
l->first = 0;
l->last = 0;

Str s;
char *c_header = nullptr;
if (header_name != "") {
s.from_str(al, header_name);
c_header = s.c_str(al);
}

for (auto i: fns) {
Vec<ASR::expr_t*> args;
args.reserve(al, i.args.size());
s.from_str(al, i.m_name);
SymbolTable *current_scope = al.make_new<SymbolTable>(parent_scope);
int c = 0;
for (auto j: i.args) {
args.push_back(al, type_enum_to_asr_expr(al, j, *l, i.m_name + std::to_string(++c), current_scope,
ASRUtils::intent_in));
}
ASR::expr_t *retval = type_enum_to_asr_expr(al, i.retvar, *l, "_lpython_return_variable", current_scope,
ASRUtils::intent_return_var);
char *fn_name = s.c_str(al);
ASR::asr_t *f = ASRUtils::make_Function_t_util(al, *l, current_scope, fn_name, nullptr, 0, args.p, args.n,
nullptr, 0, retval, ASR::abiType::BindC, ASR::accessType::Public,
ASR::deftypeType::Interface, nullptr, false, false, false, false, false, nullptr, 0,
false, false, false, c_header);

parent_scope->add_symbol(i.m_name, ASR::down_cast<ASR::symbol_t>(f));
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about moving these functions to asr_utils.cpp.
These are generic functions that can be used to create new BindC Interface function declarations.

Any thoughts? @certik @Shaikh-Ubaid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Also see/consider if pass_utils can be a better option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel asr_utils.cpp is the better option. In the future, if we need to generate function declarations for using any C function, we could just use these functions. Putting it in asr_utils.cpp will let us use it in the AST to ASR step also (if necessary).

@Vipul-Cariappa
Copy link
Contributor Author

@Shaikh-Ubaid, I believe if the CI passes and these changes look good to you, then we can merge.

fns.push_back({"PyBool_FromLong", {ASRUtils::I32}, ASRUtils::PTR});
fns.push_back({"PyObject_IsTrue", {ASRUtils::PTR}, ASRUtils::I32});

bool included_cpython_funcs = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think included_cpython_funcs is not needed. We can simply call ASRUtils::declare_functions(al, fns, *l, unit.m_symtab, "Python.h") here. We skip the whole python_bind pass if --enable-cpython is not provided.

Also suggested here #2796 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --enable-cpython flag sets CompilerOptions.enable_cpython to true. We do not have access to CompilerOptions within an ASR pass. We only have access to PassOptions.
Should I move enable_cpython to PassOptions then? Or have it in both places in CompilerOptions and PassOptions, so that we don't break any other code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move it to pass options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes. Hope I understood it correctly and this is what you wanted.

Copy link
Collaborator

@Shaikh-Ubaid Shaikh-Ubaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared one comment here https://github.com/lcompilers/lpython/pull/2796/files#r1713010953. I think its good to merge after that. Thanks for this @Vipul-Cariappa!

Copy link
Contributor

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job. After @Shaikh-Ubaid is happy, this can be merged. It looks clean and small, the pass has less than 460 lines and it does everything. I think this is the way to go.

Is this now used by default for all Python bindings, meaning all our existing tests pass with this new approach?

@Vipul-Cariappa
Copy link
Contributor Author

@certik
The tests that use arrays (i.e. call into a CPython function that accepts arrays or returns an array) are skipped.
Conversion from LPython arrays to CPython arrays (numpy array) needs to be implemented. I will do it in the next PR.

Copy link
Collaborator

@Shaikh-Ubaid Shaikh-Ubaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks Vipul! It looks great! I appreciate it.

@Shaikh-Ubaid Shaikh-Ubaid merged commit 4b06d70 into lcompilers:main Aug 12, 2024
18 checks passed
@Vipul-Cariappa Vipul-Cariappa deleted the BindPython branch August 12, 2024 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants