Skip to content

Commit

Permalink
[CLIENT-2258] Backport 7.*: Remove auto-serialization and auto-deseri…
Browse files Browse the repository at this point in the history
…alization (#485)

* Return a client error when writing unsupported Python types to the server
* Convert AS_BYTES_PYTHON server types to bytearrays when reading data from the server
* Remove support for serializing booleans as the AS_BYTES_PYTHON server type
* Remove aerospike.SERIALIZER_PYTHON
* client.put(): set serializer parameter's default value to aerospike.SERIALIZER_NONE
* Add extra tests to verify client.operate() and expression behavior with instance-level serializers and deserializers
* client.put(): fix bug where Python bytes bin values can't be used if serializer parameter is set to SERIALIZER_NONE
* Docs: data mapping: add headers
* Add build wheels workflow
* Change send_bool_as default value to AS_BOOL

Co-authored-by: dwelch-spike <[email protected]>
  • Loading branch information
juliannguyen4 and dwelch-spike authored Aug 17, 2023
1 parent fd58b5d commit 5610071
Show file tree
Hide file tree
Showing 26 changed files with 298 additions and 454 deletions.
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
7.1.1
7.2.0
12 changes: 1 addition & 11 deletions doc/aerospike.rst
Original file line number Diff line number Diff line change
Expand Up @@ -920,18 +920,14 @@ Job Statuses
Serialization Constants
-----------------------
.. data:: SERIALIZER_PYTHON
Use the cPickle serializer to handle unsupported types (default)
.. data:: SERIALIZER_USER
Use a user-defined serializer to handle unsupported types. Must have \
been registered for the aerospike class or configured for the Client object
.. data:: SERIALIZER_NONE
Do not serialize bins whose data type is unsupported
Do not serialize bins whose data type is unsupported (default)
.. versionadded:: 1.0.47
Expand All @@ -942,12 +938,6 @@ Send Bool Constants
Specifies how the Python client will write Python booleans.
.. data:: PY_BYTES
Write Python Booleans as PY_BYTES_BLOBs.
This is Python's native boolean type.
.. data:: INTEGER
Write Python Booleans as integers.
Expand Down
2 changes: 1 addition & 1 deletion doc/client.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ Record Operations
.. class:: Client
:noindex:

.. method:: put(key, bins: dict[, meta: dict[, policy: dict[, serializer=aerospike.SERIALIZER_PYTHON]]])
.. method:: put(key, bins: dict[, meta: dict[, policy: dict[, serializer=aerospike.SERIALIZER_NONE]]])

Create a new record, or remove / add bins to a record.

Expand Down
35 changes: 19 additions & 16 deletions doc/data_mapping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,25 +6,28 @@ Python Data Mappings

.. rubric:: How Python types map to server types

Default Behavior
----------------

By default, the :py:class:`~aerospike.Client` maps the supported Python types to Aerospike server \
`types <https://docs.aerospike.com/server/guide/data-types/overview>`_. \
When an unsupported type is encountered by the module, it uses \
`cPickle <https://docs.python.org/2/library/pickle.html?highlight=cpickle#module-cPickle>`_ \
to serialize and deserialize the data, storing it in the server as a blob with \
`'Python encoding' <https://developer.aerospike.com/udf/api/bytes#encoding-type>`_ \
(`AS_BYTES_PYTHON <https://docs.aerospike.com/apidocs/c/d0/dd4/as__bytes_8h.html#a0cf2a6a1f39668f606b19711b3a98bf3>`_).

The functions :func:`~aerospike.set_serializer` and :func:`~aerospike.set_deserializer` \
allow for user-defined functions to handle serialization, instead. The user provided function will be run instead of cPickle. \
When an unsupported type is encountered by the module:

1. When sending data to the server, it does not serialize the type and will throw an error.
2. When reading `AS_BYTES_PYTHON` types from the server, it returns the raw bytes as a :class:`bytearray`.
To deserialize this data, the application must use cPickle instead of relying on the client to do it automatically.

Serializers
-----------

However, the functions :func:`~aerospike.set_serializer` and :func:`~aerospike.set_deserializer` \
allow for user-defined functions to handle serialization.
The serialized data is stored in the server with generic encoding \
(`AS_BYTES_BLOB <https://docs.aerospike.com/apidocs/c/d0/dd4/as__bytes_8h.html#a0cf2a6a1f39668f606b19711b3a98bf3>`_). \
(`AS_BYTES_BLOB <https://docs.aerospike.com/apidocs/c/d0/dd4/as__bytes_8h.html#a0cf2a6a1f39668f606b19711b3a98bf3>`_).
This type allows the storage of binary data readable by Aerospike Clients in other languages. \
The *serialization* config parameter of :func:`aerospike.client` registers an \
instance-level pair of functions that handle serialization.

Unless a user specified serializer has been provided, all other types will be stored as Python specific bytes. \
Python specific bytes may not be readable by Aerospike Clients for other languages.

.. warning::

*Aerospike is introducing a new boolean data type in server version 5.6.*
Expand All @@ -34,10 +37,13 @@ Python specific bytes may not be readable by Aerospike Clients for other languag
It is important to consider how other clients connected to the Aerospike database write booleans in order to maintain cross client compatibility.
For example, if there is a client that reads and writes booleans as integers, then another Python client working with the same data should do the same thing.

``send_bool_as`` can be set so the client writes Python booleans as ``AS_BYTES_PYTHON``, integers, or the new server boolean type.
``send_bool_as`` can be set so the client writes Python booleans as integers or the Aerospike native boolean type.

All versions before ``6.x`` wrote Python booleans as ``AS_BYTES_PYTHON``.

Data Mappings
-------------

The following table shows which Python types map directly to Aerospike server types.

+---------------------------------+------------------------+
Expand Down Expand Up @@ -71,9 +77,6 @@ The following table shows which Python types map directly to Aerospike server ty

It is possible to nest these datatypes. For example a list may contain a dictionary, or a dictionary may contain a list as a value.

Unless a user specified serializer has been provided, all other types will be stored as Python specific bytes. \
Python specific bytes may not be readable by Aerospike Clients for other languages.

.. _integer: https://docs.aerospike.com/server/guide/data-types/scalar-data-types#integer
.. _string: https://docs.aerospike.com/server/guide/data-types/scalar-data-types#string
.. _double: https://docs.aerospike.com/server/guide/data-types/scalar-data-types#double
Expand Down
2 changes: 1 addition & 1 deletion doc/query.rst
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ Assume this boilerplate code is run before all examples below:
:param str module: the name of the Lua module.
:param str function: the name of the Lua function within the *module*.
:param list arguments: optional arguments to pass to the *function*. NOTE: these arguments must be types supported by Aerospike See: `supported data types <http://www.aerospike.com/docs/guide/data-types.html>`_.
If you need to use an unsuported type, (e.g. set or tuple) you can use a serializer like pickle first.
If you need to use an unsupported type, (e.g. set or tuple) you must use your own serializer.
:return: one of the supported types, :class:`int`, :class:`str`, :class:`float` (double), :class:`list`, :class:`dict` (map), :class:`bytearray` (bytes), :class:`bool`.

.. seealso:: `Developing Stream UDFs <http://www.aerospike.com/docs/udf/developing_stream_udfs.html>`_
Expand Down
2 changes: 1 addition & 1 deletion doc/scan.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Methods
:param str module: the name of the Lua module.
:param str function: the name of the Lua function within the *module*.
:param list arguments: optional arguments to pass to the *function*. NOTE: these arguments must be types supported by Aerospike See: `supported data types <http://www.aerospike.com/docs/guide/data-types.html>`_.
If you need to use an unsuported type, (e.g. set or tuple) you can use a serializer such as pickle first.
If you need to use an unsupported type, (e.g. set or tuple) you must use your own serializer.
:return: one of the supported types, :class:`int`, :class:`str`, :class:`float` (double), :class:`list`, :class:`dict` (map), :class:`bytearray` (bytes), :class:`bool`.

.. seealso:: `Developing Record UDFs <http://www.aerospike.com/docs/udf/developing_record_udfs.html>`_
Expand Down
7 changes: 3 additions & 4 deletions src/include/policy.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,16 +37,15 @@
*/

enum Aerospike_serializer_values {
SERIALIZER_NONE,
SERIALIZER_PYTHON, /* default handler for serializer type */
SERIALIZER_NONE, /* default handler for serializer type */
SERIALIZER_PYTHON,
SERIALIZER_JSON,
SERIALIZER_USER,
};

enum Aerospike_send_bool_as_values {
SEND_BOOL_AS_PY_BYTES, /* default for writing Python bools */
SEND_BOOL_AS_INTEGER,
SEND_BOOL_AS_AS_BOOL,
SEND_BOOL_AS_AS_BOOL, /* default for writing Python bools */
};

enum Aerospike_list_operations {
Expand Down
2 changes: 1 addition & 1 deletion src/main/aerospike.c
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ static int Aerospike_Clear(PyObject *aerospike)
MOD_INIT(aerospike)
{

const char version[8] = "7.1.1";
const char version[8] = "7.2.0";
// Makes things "thread-safe"
PyEval_InitThreads();
int i = 0;
Expand Down
2 changes: 1 addition & 1 deletion src/main/client/put.c
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ PyObject *AerospikeClient_Put(AerospikeClient *self, PyObject *args,
PyObject *py_meta = NULL;
PyObject *py_policy = NULL;
PyObject *py_serializer_option = NULL;
long serializer_option = SERIALIZER_PYTHON;
long serializer_option = SERIALIZER_NONE;

// Python Function Keyword Arguments
static char *kwlist[] = {"key", "bins", "meta",
Expand Down
4 changes: 2 additions & 2 deletions src/main/client/type.c
Original file line number Diff line number Diff line change
Expand Up @@ -905,7 +905,7 @@ static int AerospikeClient_Type_Init(AerospikeClient *self, PyObject *args,
self->has_connected = false;
self->use_shared_connection = false;
self->as = NULL;
self->send_bool_as = SEND_BOOL_AS_PY_BYTES;
self->send_bool_as = SEND_BOOL_AS_AS_BOOL;

if (PyArg_ParseTupleAndKeywords(args, kwds, "O:client", kwlist,
&py_config) == false) {
Expand Down Expand Up @@ -1333,7 +1333,7 @@ static int AerospikeClient_Type_Init(AerospikeClient *self, PyObject *args,
PyObject *py_send_bool_as = PyDict_GetItemString(py_config, "send_bool_as");
if (py_send_bool_as != NULL && PyLong_Check(py_send_bool_as)) {
int send_bool_as_temp = PyLong_AsLong(py_send_bool_as);
if (send_bool_as_temp >= SEND_BOOL_AS_PY_BYTES &&
if (send_bool_as_temp >= SEND_BOOL_AS_INTEGER &&
send_bool_as_temp <= SEND_BOOL_AS_AS_BOOL) {
self->send_bool_as = send_bool_as_temp;
}
Expand Down
94 changes: 26 additions & 68 deletions src/main/conversions.c
Original file line number Diff line number Diff line change
Expand Up @@ -64,10 +64,6 @@

static bool requires_int(uint64_t op);

static as_status py_bool_to_py_bytes_blob(AerospikeClient *self, as_error *err,
as_static_pool *static_pool,
PyObject *py_bool, as_bytes **target,
int serializer_type);
static as_status py_bool_to_as_integer(as_error *err, PyObject *py_bool,
as_integer **target);
static as_status py_bool_to_as_bool(as_error *err, PyObject *py_bool,
Expand Down Expand Up @@ -794,15 +790,6 @@ as_status pyobject_to_val(AerospikeClient *self, as_error *err,
PyBool_Check(
py_obj)) { //TODO Change to true bool support post jump version.
switch (self->send_bool_as) {
case SEND_BOOL_AS_PY_BYTES:;
as_bytes *bool_bytes = NULL;
if (py_bool_to_py_bytes_blob(self, err, static_pool, py_obj,
&bool_bytes,
serializer_type) != AEROSPIKE_OK) {
return err->code;
}
*val = (as_val *)bool_bytes;
break;
case SEND_BOOL_AS_AS_BOOL:;
as_boolean *converted_bool = NULL;
if (py_bool_to_as_bool(err, py_obj, &converted_bool) !=
Expand Down Expand Up @@ -869,17 +856,14 @@ as_status pyobject_to_val(AerospikeClient *self, as_error *err,
*val = (as_val *)as_geojson_new(geo_value, false);
}
else if (PyByteArray_Check(py_obj)) {
as_bytes *bytes;
GET_BYTES_POOL(bytes, static_pool, err);
if (err->code == AEROSPIKE_OK) {
if (serialize_based_on_serializer_policy(self, serializer_type,
&bytes, py_obj,
err) != AEROSPIKE_OK) {
return err->code;
}
*val = (as_val *)bytes;
}
}
Py_ssize_t str_len = PyByteArray_Size(py_obj);
as_bytes *bytes = as_bytes_new(str_len);

char *str = PyByteArray_AsString(py_obj);
as_bytes_set(bytes, 0, (const uint8_t *)str, str_len);

*val = (as_val *)bytes;
}
else if (PyList_Check(py_obj)) {
as_list *list = NULL;
pyobject_to_list(self, err, py_obj, &list, static_pool,
Expand Down Expand Up @@ -1000,15 +984,6 @@ as_status pyobject_to_record(AerospikeClient *self, as_error *err,
PyBool_Check(
value)) { //TODO Change to true bool support post jump version.
switch (self->send_bool_as) {
case SEND_BOOL_AS_PY_BYTES:;
as_bytes *bool_bytes = NULL;
if (py_bool_to_py_bytes_blob(
self, err, static_pool, value, &bool_bytes,
serializer_type) != AEROSPIKE_OK) {
return err->code;
}
ret_val = as_record_set_bytes(rec, name, bool_bytes);
break;
case SEND_BOOL_AS_AS_BOOL:;
as_boolean *converted_bool = NULL;
if (py_bool_to_as_bool(err, value, &converted_bool) !=
Expand Down Expand Up @@ -1099,18 +1074,24 @@ as_status pyobject_to_record(AerospikeClient *self, as_error *err,
char *val = PyString_AsString(value);
ret_val = as_record_set_strp(rec, name, val, false);
}
else if (PyByteArray_Check(value)) {
as_bytes *bytes;
GET_BYTES_POOL(bytes, static_pool, err);
if (err->code == AEROSPIKE_OK) {
if (serialize_based_on_serializer_policy(
self, serializer_type, &bytes, value, err) !=
AEROSPIKE_OK) {
return err->code;
}
ret_val = as_record_set_bytes(rec, name, bytes);
}
}
else if (PyBytes_Check(value)) {
Py_ssize_t str_len = PyBytes_Size(value);
as_bytes *bytes = as_bytes_new(str_len);

char *str = PyBytes_AsString(value);
as_bytes_set(bytes, 0, (const uint8_t *)str, str_len);

ret_val = as_record_set_bytes(rec, name, bytes);
}
else if (PyByteArray_Check(value)) {
Py_ssize_t str_len = PyByteArray_Size(value);
as_bytes *bytes = as_bytes_new(str_len);

char *str = PyByteArray_AsString(value);
as_bytes_set(bytes, 0, (const uint8_t *)str, str_len);

ret_val = as_record_set_bytes(rec, name, bytes);
}
else if (PyList_Check(value)) {
// as_list
as_list *list = NULL;
Expand Down Expand Up @@ -2648,29 +2629,6 @@ static bool requires_int(uint64_t op)
op == CDT_CTX_LIST_INDEX_CREATE;
}

/*
* py_bool_to_py_bytes_blob serializes py_bool.
* Target should be a NULL pointer to an as_integer. py_bool_to_py_bytes_blob will get memory for target
* from the static pool, static_pool. The pool should be destroyed after use, by the caller.
*/
static as_status py_bool_to_py_bytes_blob(AerospikeClient *self, as_error *err,
as_static_pool *static_pool,
PyObject *py_bool, as_bytes **target,
int serializer_type)
{
GET_BYTES_POOL(*target, static_pool, err);
if (err->code != AEROSPIKE_OK) {
return err->code;
}

if (serialize_based_on_serializer_policy(self, serializer_type, target,
py_bool, err) != AEROSPIKE_OK) {
return err->code;
}

return AEROSPIKE_OK;
}

/*
* py_bool_to_as_integer converts a python object to an as_integer based on its truth value.
* Target should be a NULL pointer to an as_integer. py_bool_to_as_integer will allocate a new
Expand Down
2 changes: 0 additions & 2 deletions src/main/policy.c
Original file line number Diff line number Diff line change
Expand Up @@ -166,11 +166,9 @@ static AerospikeConstants aerospike_constants[] = {
{AS_POLICY_REPLICA_PREFER_RACK, "POLICY_REPLICA_PREFER_RACK"},
{AS_POLICY_COMMIT_LEVEL_ALL, "POLICY_COMMIT_LEVEL_ALL"},
{AS_POLICY_COMMIT_LEVEL_MASTER, "POLICY_COMMIT_LEVEL_MASTER"},
{SERIALIZER_PYTHON, "SERIALIZER_PYTHON"},
{SERIALIZER_USER, "SERIALIZER_USER"},
{SERIALIZER_JSON, "SERIALIZER_JSON"},
{SERIALIZER_NONE, "SERIALIZER_NONE"},
{SEND_BOOL_AS_PY_BYTES, "PY_BYTES"},
{SEND_BOOL_AS_INTEGER, "INTEGER"},
{SEND_BOOL_AS_AS_BOOL, "AS_BOOL"},
{AS_INDEX_STRING, "INDEX_STRING"},
Expand Down
Loading

0 comments on commit 5610071

Please sign in to comment.