Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add to_buffer and helper methods in layout builder #2766

Merged
merged 3 commits into from
Oct 23, 2023

Conversation

ManasviGoyal
Copy link
Collaborator

@ManasviGoyal ManasviGoyal commented Oct 23, 2023

Added some methods to be able to the features of LayoutBuilder.h in C where std::map can't be used.

  • to_buffer method for each builder to copy and concatenate the data of each builder buffer to user-defined pointer based on the given node name instead of passing a map.
  • some other helper functions to extract the names, sizes (in bytes) and the number of buffers in a separate namespace.

@codecov
Copy link

codecov bot commented Oct 23, 2023

Codecov Report

Merging #2766 (5e36dac) into main (fa4b2bc) will not change coverage.
The diff coverage is n/a.

Additional details and impacted files

@ManasviGoyal ManasviGoyal temporarily deployed to docs October 23, 2023 13:23 — with GitHub Actions Inactive
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how this will make it much more convenient to fill buffers from a C interface, which has to pass each buffer individually. This implementation puts the recursion in the LayoutBuilder classes themselves, rather than outside of them.

Is our C/C++ tidy formatter removing the curly brackets from single-statement if/for bodies? I'm surprised it's not enforcing them. According to https://stackoverflow.com/q/26111162/1623645, it's either named readability-braces-around-statements in clang-tidy or InsertBraces in clang-format.

header-only/layout-builder/awkward/LayoutBuilder.h Outdated Show resolved Hide resolved
header-only/layout-builder/awkward/LayoutBuilder.h Outdated Show resolved Hide resolved
header-only/layout-builder/awkward/LayoutBuilder.h Outdated Show resolved Hide resolved
header-only/layout-builder/awkward/LayoutBuilder.h Outdated Show resolved Hide resolved
header-only/layout-builder/awkward/LayoutBuilder.h Outdated Show resolved Hide resolved
header-only/layout-builder/awkward/LayoutBuilder.h Outdated Show resolved Hide resolved
Copy link
Collaborator

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ManasviGoyal - good progress, thanks!

Let's combine these helper functions to fill one map to avoid a potential miss-match between the map key and its value.

Here is how ChatGPT suggests the std::map handling :-)

please, write a Python code and a C function using ctypes that wraps C++ to pass std::map to python

Certainly! To wrap a C++ function using ctypes that passes a std::map to Python, you'll need to create a C interface that can be accessed from Python. Here's an example:

  1. C++ Code (example.cpp):
#include <iostream>
#include <map>

extern "C" {
    typedef std::map<int, int> IntMap;

    IntMap* create_map() {
        return new IntMap();
    }

    void insert_to_map(IntMap* map, int key, int value) {
        (*map)[key] = value;
    }

    void print_map(IntMap* map) {
        for (const auto& entry : *map) {
            std::cout << entry.first << ": " << entry.second << std::endl;
        }
    }

    void delete_map(IntMap* map) {
        delete map;
    }
}
  1. C++ Compilation:
g++ -shared -o example.so -fPIC example.cpp

This will create a shared library example.so that can be loaded by ctypes in Python.

  1. Python Code (example.py):
import ctypes

# Load the shared library
example = ctypes.CDLL('./example.so')

# Define types for C functions
example.create_map.restype = ctypes.c_void_p
example.create_map.argtypes = []

example.insert_to_map.argtypes = [ctypes.c_void_p, ctypes.c_int, ctypes.c_int]

example.print_map.argtypes = [ctypes.c_void_p]

example.delete_map.argtypes = [ctypes.c_void_p]

# Create a map
map_ptr = example.create_map()

# Insert some values
example.insert_to_map(map_ptr, 1, 10)
example.insert_to_map(map_ptr, 2, 20)
example.insert_to_map(map_ptr, 3, 30)

# Print the map
example.print_map(map_ptr)

# Clean up memory
example.delete_map(map_ptr)
  1. Run the Python Code:
python example.py

This code will create a std::map in C++, insert some values, print the map, and then clean up the memory.

Please make sure to adjust the compilation and library loading paths as needed. Additionally, this example assumes a Linux-like environment; if you're on a different OS, the compilation and library loading process may be slightly different.

@ianna
Copy link
Collaborator

ianna commented Oct 23, 2023

I can see how this will make it much more convenient to fill buffers from a C interface, which has to pass each buffer individually. This implementation puts the recursion in the LayoutBuilder classes themselves, rather than outside of them.

Is our C/C++ tidy formatter removing the curly brackets from single-statement if/for bodies? I'm surprised it's not enforcing them. According to https://stackoverflow.com/q/26111162/1623645, it's either named readability-braces-around-statements in clang-tidy or InsertBraces in clang-format.

I think the clang-tidy is reinforcing it :-)
Since we do not have a .clang-tidy configuration in the project it uses the following settings:

---
Checks:          'clang-diagnostic-*,clang-analyzer-*'
WarningsAsErrors: ''
HeaderFilterRegex: ''
AnalyzeTemporaryDtors: false
FormatStyle:     none
User:            yana
CheckOptions:
  llvm-else-after-return.WarnOnConditionVariables: 'false'
  modernize-loop-convert.MinConfidence: reasonable
  modernize-replace-auto-ptr.IncludeStyle: llvm
  cert-str34-c.DiagnoseSignedUnsignedCharComparisons: 'false'
  google-readability-namespace-comments.ShortNamespaceLines: '10'
  cert-err33-c.CheckedFunctions: '::aligned_alloc;::asctime_s;::at_quick_exit;::atexit;::bsearch;::bsearch_s;::btowc;::c16rtomb;::c32rtomb;::calloc;::clock;::cnd_broadcast;::cnd_init;::cnd_s
ignal;::cnd_timedwait;::cnd_wait;::ctime_s;::fclose;::fflush;::fgetc;::fgetpos;::fgets;::fgetwc;::fopen;::fopen_s;::fprintf;::fprintf_s;::fputc;::fputs;::fputwc;::fputws;::fread;::freopen;::
freopen_s;::fscanf;::fscanf_s;::fseek;::fsetpos;::ftell;::fwprintf;::fwprintf_s;::fwrite;::fwscanf;::fwscanf_s;::getc;::getchar;::getenv;::getenv_s;::gets_s;::getwc;::getwchar;::gmtime;::gmt
ime_s;::localtime;::localtime_s;::malloc;::mbrtoc16;::mbrtoc32;::mbsrtowcs;::mbsrtowcs_s;::mbstowcs;::mbstowcs_s;::memchr;::mktime;::mtx_init;::mtx_lock;::mtx_timedlock;::mtx_trylock;::mtx_u
nlock;::printf_s;::putc;::putwc;::raise;::realloc;::remove;::rename;::scanf;::scanf_s;::setlocale;::setvbuf;::signal;::snprintf;::snprintf_s;::sprintf;::sprintf_s;::sscanf;::sscanf_s;::strch
r;::strerror_s;::strftime;::strpbrk;::strrchr;::strstr;::strtod;::strtof;::strtoimax;::strtok;::strtok_s;::strtol;::strtold;::strtoll;::strtoul;::strtoull;::strtoumax;::strxfrm;::swprintf;::
swprintf_s;::swscanf;::swscanf_s;::thrd_create;::thrd_detach;::thrd_join;::thrd_sleep;::time;::timespec_get;::tmpfile;::tmpfile_s;::tmpnam;::tmpnam_s;::tss_create;::tss_get;::tss_set;::unget
c;::ungetwc;::vfprintf;::vfprintf_s;::vfscanf;::vfscanf_s;::vfwprintf;::vfwprintf_s;::vfwscanf;::vfwscanf_s;::vprintf_s;::vscanf;::vscanf_s;::vsnprintf;::vsnprintf_s;::vsprintf;::vsprintf_s;
::vsscanf;::vsscanf_s;::vswprintf;::vswprintf_s;::vswscanf;::vswscanf_s;::vwprintf_s;::vwscanf;::vwscanf_s;::wcrtomb;::wcschr;::wcsftime;::wcspbrk;::wcsrchr;::wcsrtombs;::wcsrtombs_s;::wcsst
r;::wcstod;::wcstof;::wcstoimax;::wcstok;::wcstok_s;::wcstol;::wcstold;::wcstoll;::wcstombs;::wcstombs_s;::wcstoul;::wcstoull;::wcstoumax;::wcsxfrm;::wctob;::wctrans;::wctype;::wmemchr;::wpr
intf_s;::wscanf;::wscanf_s;'
  cert-oop54-cpp.WarnOnlyIfThisHasSuspiciousField: 'false'
  cert-dcl16-c.NewSuffixes: 'L;LL;LU;LLU'
  google-readability-braces-around-statements.ShortStatementLines: '1'
  cppcoreguidelines-non-private-member-variables-in-classes.IgnoreClassesWithAllMemberVariablesBeingPublic: 'true'
  google-readability-namespace-comments.SpacesBeforeComments: '2'
  modernize-loop-convert.MaxCopySize: '16'
  modernize-pass-by-value.IncludeStyle: llvm
  modernize-use-nullptr.NullMacros: 'NULL'
  llvm-qualified-auto.AddConstToQualified: 'false'
  modernize-loop-convert.NamingStyle: CamelCase
  llvm-else-after-return.WarnOnUnfixable: 'false'
  google-readability-function-size.StatementThreshold: '800'
...

@jpivarski
Copy link
Member

Not to be down on ChatGPT, but I prefer Manasvi's solution. Instead of creating an temporary object that gets filled and later has to be deleted, Manasvi's solution goes directly to the thing we ultimately want to do: get the buffers to the right depth of the LayoutBuilder tree so that they can be filled.

I'll admit that there are performance benefits to making the temporary std::map. In Manasvi's solution, every name is compared with the name of every node in the tree, and those are string comparisons, whereas with a std::map<std::string, void*>, there's only one tree-traversal for all buffers, and keys of the map are first checked by comparing hashes before string comparisons. At the absolute worst, there can be thousands of nodes in a tree, so we're comparing $\mathcal{O}(n^2)$ string comparisons with $\mathcal{O}(n)$ hashmap lookups, and $n \approx 1000$. A million string comparisons sounds bad, but I'll bet it's not any more than a few milliseconds, and it's only done once per file/Awkward Array. Since this is not a speed-critical part of the code, I would favor a million string comparisons over having to delete a temporary std::map.

But if we go with the temporary std::map (and all of that code would make more sense in the Kaitai-Awkward runtime, rather than here in Awkward Array), then the Python needs to ensure that the map pointer gets deleted, even if there are errors.

# Create a map
map_ptr = example.create_map()

try:
    # Insert some values
    example.insert_to_map(map_ptr, 1, 10)
    example.insert_to_map(map_ptr, 2, 20)
    example.insert_to_map(map_ptr, 3, 30)

    # There's no reason to ever print the map

finally:
    # Clean up memory
    example.delete_map(map_ptr)

That try-finally logic applies to the LayoutBuilder's opaque pointer, too, but it would be a different try-finally because the try block has to encompass all of the code after the pointer is constructed and before it is deleted, and the LayoutBuilder and std::map would be constructed at different times. Even if they're constructed one after the other, the two try blocks would be nested because one is on a line immediately following the other.

(This is the kind of complexity I wanted to avoid by not having intermediate objects.)

@jpivarski
Copy link
Member

We have both a .clang-tidy and a .clang-format in awkward-cpp, right? Do they apply to the header-only code?

@ianna
Copy link
Collaborator

ianna commented Oct 23, 2023

We have both a .clang-tidy and a .clang-format in awkward-cpp, right? Do they apply to the header-only code?

No, I don't think so

@ianna
Copy link
Collaborator

ianna commented Oct 23, 2023

Not to be down on ChatGPT, but I prefer Manasvi's solution. Instead of creating an temporary object that gets filled and later has to be deleted, Manasvi's solution goes directly to the thing we ultimately want to do: get the buffers to the right depth of the LayoutBuilder tree so that they can be filled.

I'll admit that there are performance benefits to making the temporary std::map. In Manasvi's solution, every name is compared with the name of every node in the tree, and those are string comparisons, whereas with a std::map<std::string, void*>, there's only one tree-traversal for all buffers, and keys of the map are first checked by comparing hashes before string comparisons. At the absolute worst, there can be thousands of nodes in a tree, so we're comparing O(n2) string comparisons with O(n) hashmap lookups, and n≈1000. A million string comparisons sounds bad, but I'll bet it's not any more than a few milliseconds, and it's only done once per file/Awkward Array. Since this is not a speed-critical part of the code, I would favor a million string comparisons over having to delete a temporary std::map.

But if we go with the temporary std::map (and all of that code would make more sense in the Kaitai-Awkward runtime, rather than here in Awkward Array), then the Python needs to ensure that the map pointer gets deleted, even if there are errors.

# Create a map
map_ptr = example.create_map()

try:
    # Insert some values
    example.insert_to_map(map_ptr, 1, 10)
    example.insert_to_map(map_ptr, 2, 20)
    example.insert_to_map(map_ptr, 3, 30)

    # There's no reason to ever print the map

finally:
    # Clean up memory
    example.delete_map(map_ptr)

That try-finally logic applies to the LayoutBuilder's opaque pointer, too, but it would be a different try-finally because the try block has to encompass all of the code after the pointer is constructed and before it is deleted, and the LayoutBuilder and std::map would be constructed at different times. Even if they're constructed one after the other, the two try blocks would be nested because one is on a line immediately following the other.

(This is the kind of complexity I wanted to avoid by not having intermediate objects.)

I'm not worried about the performance, but rather a potential issues when a user loads multiple files, gets the sizes first for all files, then gets the names, etc. The results could be out of order. The helpers should be hidden.

And, yes, I think the anonymous helpers would make more sense in an auto-generated part, or outside of the LayoutBuilder header.

@ManasviGoyal
Copy link
Collaborator Author

ManasviGoyal commented Oct 23, 2023

And, yes, I think the anonymous helpers would make more sense in an auto-generated part, or outside of the LayoutBuilder header.

Yes, you are right. The anonymous helpers don't need to be in the LayoutBuilder header as they serve no purpose there. I don't think they will be needed anywhere else later besides the kaitai code. So, they can be moved to the generated code.

@ManasviGoyal
Copy link
Collaborator Author

Is our C/C++ tidy formatter removing the curly brackets from single-statement if/for bodies? I'm surprised it's not enforcing them. According to https://stackoverflow.com/q/26111162/1623645, it's either named readability-braces-around-statements in clang-tidy or InsertBraces in clang-format.

It's a bit uneven. There are few places where it is enforced while in other places it's not.

@jpivarski
Copy link
Member

I'm not worried about the performance, but rather a potential issues when a user loads multiple files, gets the sizes first for all files, then gets the names, etc. The results could be out of order. The helpers should be hidden.

It's already the case that a single LayoutBuilder instance can't be used with multiple files. It's a stateful object that is created when a file is opened, goes through all the sequential steps of snapshot-generation when it's done reading, and then is deleted when the file is closed. That's true in both its C++ and its C interface—even in the C++ interface, the snapshot process involved multiple steps that have to go in order.

And, yes, I think the anonymous helpers would make more sense in an auto-generated part, or outside of the LayoutBuilder header.

I would expect to find std::map generation in the Kaitai-Awkward runtime (not generated, but also not in the Awkward header-only code).

But Manasvi's solution of passing each buffer and name down through the LayoutBuilder tree is most easily expressed in the Awkward header-only code, so if you're going with that implementation, I agree with Manasvi's original idea that it should be here.

I'm in favor of merging this PR if all of the curly brackets are put in.

@ManasviGoyal
Copy link
Collaborator Author

I would expect to find std::map generation in the Kaitai-Awkward runtime (not generated, but also not in the Awkward header-only code).

But Manasvi's solution of passing each buffer and name down through the LayoutBuilder tree is most easily expressed in the Awkward header-only code, so if you're going with that implementation, I agree with Manasvi's original idea that it should be here.

I'm in favor of merging this PR if all of the curly brackets are put in.

I have removed the helper functions for now and fixed the curly braces (there were more in the code). Maybe we can this PR open and discuss further about this in tomorrow. I do have a few questions regarding what should go the the kaitai-awkward-runtime and what in the generated code.

Maybe we can have something like utils.h in the kaitai-awkward-runtime for such code if we decide to put it there.

Copy link
Collaborator

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ManasviGoyal - great! thanks!

@ianna
Copy link
Collaborator

ianna commented Oct 23, 2023

@jpivarski - I'm happy with the PR. Please, feel free to merge it anytime. Thanks!

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, this just adds a

to_buffer(void* buffer, const char* name)

interface to LayoutBuilder, which is a useful thing to have.

@jpivarski jpivarski merged commit 589b351 into main Oct 23, 2023
36 checks passed
@jpivarski jpivarski deleted the ManasviGoyal/add-methods-to-LayoutBuilder branch October 23, 2023 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants