-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adds Schema class and modifies schema handling #2081
base: pangea-v1alpha
Are you sure you want to change the base?
Changes from all commits
47474a9
f22246a
7670971
c37be67
aba6c10
83aacdf
7c08ca8
af02937
ef2b95d
cfa609c
1f00e76
39d4c1d
c4f5fd5
7006a31
561f05f
e3f57a6
3280e79
1837f81
94e9e6e
db6ef9e
0b0b85a
ba2795b
67c0182
3fa949c
7c77bfb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -19,7 +19,7 @@ | |||||||||||||||||||||||
import collections | ||||||||||||||||||||||||
import copy | ||||||||||||||||||||||||
import enum | ||||||||||||||||||||||||
from typing import Any, cast, Dict, Iterable, Optional, Union | ||||||||||||||||||||||||
from typing import Any, cast, Dict, Iterable, Optional, Union, List | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
from google.cloud.bigquery import _helpers | ||||||||||||||||||||||||
from google.cloud.bigquery import standard_sql | ||||||||||||||||||||||||
|
@@ -264,11 +264,11 @@ def __init__( | |||||||||||||||||||||||
self._properties["fields"] = [field.to_api_repr() for field in fields] | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@classmethod | ||||||||||||||||||||||||
def from_api_repr(cls, api_repr: dict) -> "SchemaField": | ||||||||||||||||||||||||
def from_api_repr(cls, api_repr: Dict[str, Any]) -> "SchemaField": | ||||||||||||||||||||||||
"""Return a ``SchemaField`` object deserialized from a dictionary. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||
api_repr (Mapping[str, str]): The serialized representation | ||||||||||||||||||||||||
api_repr (Dict[str, str]): The serialized representation | ||||||||||||||||||||||||
of the SchemaField, such as what is output by | ||||||||||||||||||||||||
:meth:`to_api_repr`. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
|
@@ -489,7 +489,7 @@ def __repr__(self): | |||||||||||||||||||||||
|
||||||||||||||||||||||||
|
||||||||||||||||||||||||
def _parse_schema_resource(info): | ||||||||||||||||||||||||
"""Parse a resource fragment into a schema field. | ||||||||||||||||||||||||
"""Parse a resource fragment into a sequence of schema fields. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||
info: (Mapping[str, Dict]): should contain a "fields" key to be parsed | ||||||||||||||||||||||||
|
@@ -513,25 +513,33 @@ def _build_schema_resource(fields): | |||||||||||||||||||||||
return [field.to_api_repr() for field in fields] | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
|
||||||||||||||||||||||||
def _to_schema_fields(schema): | ||||||||||||||||||||||||
"""Coerce `schema` to a list of schema field instances. | ||||||||||||||||||||||||
def _to_schema_fields( | ||||||||||||||||||||||||
schema: Union[Schema, List[Union[SchemaField, Dict[str, Any]]]] | ||||||||||||||||||||||||
) -> Union[Schema, List[SchemaField]]: | ||||||||||||||||||||||||
"""Convert the input to either a Schema object OR a list of SchemaField objects. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
This helper method ensures that the fields in the schema are SchemaField objects. | ||||||||||||||||||||||||
It accepts: | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
* A :class:`~google.cloud.bigquery.schema.Schema` instance: It will | ||||||||||||||||||||||||
convert items that are mappings to | ||||||||||||||||||||||||
:class:`~google.cloud.bigquery.schema.SchemaField` instances and | ||||||||||||||||||||||||
preserve foreign_type_info. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
* A list of | ||||||||||||||||||||||||
:class:`~google.cloud.bigquery.schema.SchemaField` instances. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
* A list of mappings: It will convert each of the mapping items to | ||||||||||||||||||||||||
a :class:`~google.cloud.bigquery.schema.SchemaField` instance. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||
schema(Sequence[Union[ \ | ||||||||||||||||||||||||
:class:`~google.cloud.bigquery.schema.SchemaField`, \ | ||||||||||||||||||||||||
Mapping[str, Any] \ | ||||||||||||||||||||||||
]]): | ||||||||||||||||||||||||
Table schema to convert. If some items are passed as mappings, | ||||||||||||||||||||||||
their content must be compatible with | ||||||||||||||||||||||||
:meth:`~google.cloud.bigquery.schema.SchemaField.from_api_repr`. | ||||||||||||||||||||||||
schema: The schema to convert. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Returns: | ||||||||||||||||||||||||
Sequence[:class:`~google.cloud.bigquery.schema.SchemaField`] | ||||||||||||||||||||||||
The schema as a list of SchemaField objects or a Schema object. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Raises: | ||||||||||||||||||||||||
Exception: If ``schema`` is not a sequence, or if any item in the | ||||||||||||||||||||||||
sequence is not a :class:`~google.cloud.bigquery.schema.SchemaField` | ||||||||||||||||||||||||
instance or a compatible mapping representation of the field. | ||||||||||||||||||||||||
ValueError: If the items in ``schema`` are not valid. | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
for field in schema: | ||||||||||||||||||||||||
|
@@ -541,6 +549,17 @@ def _to_schema_fields(schema): | |||||||||||||||||||||||
"mapping representations." | ||||||||||||||||||||||||
) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
if isinstance(schema, Schema): | ||||||||||||||||||||||||
schema = Schema( | ||||||||||||||||||||||||
[ | ||||||||||||||||||||||||
field | ||||||||||||||||||||||||
if isinstance(field, SchemaField) | ||||||||||||||||||||||||
else SchemaField.from_api_repr(field) | ||||||||||||||||||||||||
for field in schema | ||||||||||||||||||||||||
], | ||||||||||||||||||||||||
foreign_type_info=schema.foreign_type_info, | ||||||||||||||||||||||||
) | ||||||||||||||||||||||||
return schema | ||||||||||||||||||||||||
Comment on lines
+553
to
+562
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we get a Returning the original object should be sufficient.
Suggested change
This also future-proofs us a bit. In case any additional properties are added to |
||||||||||||||||||||||||
return [ | ||||||||||||||||||||||||
field if isinstance(field, SchemaField) else SchemaField.from_api_repr(field) | ||||||||||||||||||||||||
for field in schema | ||||||||||||||||||||||||
|
@@ -761,8 +780,6 @@ def serde_info(self) -> Any: | |||||||||||||||||||||||
prop = _get_sub_prop(self._properties, ["serDeInfo"]) | ||||||||||||||||||||||||
if prop is not None: | ||||||||||||||||||||||||
prop = StorageDescriptor().from_api_repr(prop) | ||||||||||||||||||||||||
print(f"DINOSAUR prop: {prop}") | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lol, oops! 😆 Could be worth a separate cleanup PR to have something small and easy to approve. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed. |
||||||||||||||||||||||||
|
||||||||||||||||||||||||
return prop | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@serde_info.setter | ||||||||||||||||||||||||
|
@@ -886,3 +903,131 @@ def from_api_repr(cls, resource: dict) -> SerDeInfo: | |||||||||||||||||||||||
config = cls("") | ||||||||||||||||||||||||
config._properties = copy.deepcopy(resource) | ||||||||||||||||||||||||
return config | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
|
||||||||||||||||||||||||
class Schema(collections.UserList): | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
Represents a BigQuery schema, defining the structure and types of data. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
This class manages a list of schema fields and provides methods for | ||||||||||||||||||||||||
serialization and deserialization with the BigQuery API. It extends the | ||||||||||||||||||||||||
`collections.UserList` class to allow for list-like behavior. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||
fields (Optional[List[Any]], optional): A list of SchemaField objects representing the fields | ||||||||||||||||||||||||
in the schema. Defaults to None, which creates an empty schema. | ||||||||||||||||||||||||
foreign_type_info (Optional[str], optional): Optional type information relevant for foreign | ||||||||||||||||||||||||
systems. Defaults to None. | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def __init__( | ||||||||||||||||||||||||
self, | ||||||||||||||||||||||||
fields: Optional[List[Any]] = None, | ||||||||||||||||||||||||
foreign_type_info: Optional[str] = None, | ||||||||||||||||||||||||
): | ||||||||||||||||||||||||
self._properties: Dict[str, Any] = {} | ||||||||||||||||||||||||
self._fields = [] if fields is None else list(fields) # Internal List | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to convert the items of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Resolving. I missed that _fields was a property with a setter. I think your code does actually convert Iterable of SchemaField or dict to the correct API representation in _properties There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unresolving. I don't think
Suggested change
And remove the |
||||||||||||||||||||||||
self.foreign_type_info = foreign_type_info | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@property | ||||||||||||||||||||||||
def foreign_type_info(self) -> Optional[str]: | ||||||||||||||||||||||||
return self._properties.get("foreignTypeInfo") | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@foreign_type_info.setter | ||||||||||||||||||||||||
def foreign_type_info(self, value: Optional[str]) -> None: | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
Sets the foreign type information for this schema. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||
value (Optional[str]): The foreign type information, can be set to None. | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
value = _isinstance_or_raise(value, str, none_allowed=True) | ||||||||||||||||||||||||
self._properties["foreignTypeInfo"] = value | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@property | ||||||||||||||||||||||||
def _fields(self) -> Optional[List[Any]]: | ||||||||||||||||||||||||
return self._properties.get("fields") | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please avoid storing non-REST API representation in |
||||||||||||||||||||||||
|
||||||||||||||||||||||||
@_fields.setter | ||||||||||||||||||||||||
def _fields(self, value: Optional[List[Any]]) -> None: | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
Sets the internal list of field definitions. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
NOTE: In the API representation the 'fields' key points to a sequence of schema fields. | ||||||||||||||||||||||||
To maintain a similarity in names, the 'Schema._fields' attribute points to the | ||||||||||||||||||||||||
'_properties["fields"]' key. Schema class is superclassed by UserList, which requires the | ||||||||||||||||||||||||
use of a '.data' attribute. The decision was made to have both of these attributes point to | ||||||||||||||||||||||||
the same key "fields" in the '_properties' dictionary. Thus '.data' is an alias | ||||||||||||||||||||||||
for '_fields'. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||
value (Optional[List[Any]]): A list of schema fields, can be set to None. | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
value = _isinstance_or_raise(value, list, none_allowed=True) | ||||||||||||||||||||||||
value = _build_schema_resource(value) | ||||||||||||||||||||||||
self._properties["fields"] = value | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@property | ||||||||||||||||||||||||
def data(self) -> Any: | ||||||||||||||||||||||||
return self._properties.get("fields") | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@data.setter | ||||||||||||||||||||||||
def data(self, value) -> None: | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
Sets the list of schema fields. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
NOTE: In the API representation the 'fields' key points to a sequence of schema fields. | ||||||||||||||||||||||||
To maintain a similarity in names, the 'Schema._fields' attribute points to the | ||||||||||||||||||||||||
'_properties["fields"]' key. Schema class is superclassed by UserList, which requires the | ||||||||||||||||||||||||
use of a '.data' attribute. The decision was made to have both of these attributes point to | ||||||||||||||||||||||||
the same key "fields" in the '_properties' dictionary. Thus '.data' is an alias | ||||||||||||||||||||||||
for '_fields'. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||
value (Optional[List[Any]]): A list of schema fields, can be set to None. | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
value = _isinstance_or_raise(value, list, none_allowed=True) | ||||||||||||||||||||||||
value = _build_schema_resource(value) | ||||||||||||||||||||||||
self._properties["fields"] = value | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def __str__(self) -> str: | ||||||||||||||||||||||||
return f"Schema({self._fields}, {self.foreign_type_info})" | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def __repr__(self) -> str: | ||||||||||||||||||||||||
return f"Schema({self._fields!r}, {self.foreign_type_info!r})" | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def to_api_repr(self) -> Dict[str, Any]: | ||||||||||||||||||||||||
"""Build an API representation of this object. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Returns: | ||||||||||||||||||||||||
Dict[str, Any]: | ||||||||||||||||||||||||
A dictionary in the format used by the BigQuery API. | ||||||||||||||||||||||||
If the schema contains SchemaField objects, the fields are | ||||||||||||||||||||||||
also converted to their API representations. | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
# If this is a RECORD type, then sub-fields are also included, | ||||||||||||||||||||||||
# add this to the serialized representation. | ||||||||||||||||||||||||
answer = self._properties.copy() | ||||||||||||||||||||||||
if self._fields == []: | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would have to be updated if we remove Also, per go/pystyle#truefalse-evaluations-decision it is preferred to use the falsey nature of empty list if possible.
Suggested change
|
||||||||||||||||||||||||
return answer | ||||||||||||||||||||||||
schemafields = any([isinstance(f, SchemaField) for f in self._fields]) | ||||||||||||||||||||||||
if schemafields: | ||||||||||||||||||||||||
answer["fields"] = [f.to_api_repr() for f in self._fields] | ||||||||||||||||||||||||
Comment on lines
+1014
to
+1016
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems wrong. We should allow people to insert I would expect something more like this:
Suggested change
I'd also like to see some tests where the user replaces all fields with the REST representation of different fields. Something like: def test_schema_to_api_repr_with_rest_entries():
schema = Schema(fields=[SchemaField(), ....])
del schema[:]
rest_fields = [{"name": "...", "type": "..."}, ...]
schema.extend(rest_fields)
got = schema.to_api_repr()
assert got = {"fields": rest_fields} Edit: Might be good to have another test such as above but with mixing and matching REST and SchemaField objects. Perhaps by not deleting all entries but instead inserting some REST fields. |
||||||||||||||||||||||||
return answer | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@classmethod | ||||||||||||||||||||||||
def from_api_repr(cls, resource: Dict[str, Any]) -> "Schema": | ||||||||||||||||||||||||
"""Factory: constructs an instance of the class (cls) | ||||||||||||||||||||||||
given its API representation. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||
resource (Dict[str, Any]): | ||||||||||||||||||||||||
API representation of the object to be instantiated. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Returns: | ||||||||||||||||||||||||
An instance of the class initialized with data from 'resource'. | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
config = cls([]) | ||||||||||||||||||||||||
config._properties = copy.copy(resource) | ||||||||||||||||||||||||
Comment on lines
+1031
to
+1032
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With using
Suggested change
This makes sure |
||||||||||||||||||||||||
return config |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -70,6 +70,7 @@ | |||||
from google.cloud.bigquery.schema import _build_schema_resource | ||||||
from google.cloud.bigquery.schema import _parse_schema_resource | ||||||
from google.cloud.bigquery.schema import _to_schema_fields | ||||||
from google.cloud.bigquery.schema import Schema | ||||||
from google.cloud.bigquery.external_config import ExternalCatalogTableOptions | ||||||
|
||||||
if typing.TYPE_CHECKING: # pragma: NO COVER | ||||||
|
@@ -453,17 +454,20 @@ def schema(self): | |||||
instance or a compatible mapping representation of the field. | ||||||
""" | ||||||
prop = self._properties.get(self._PROPERTY_TO_API_FIELD["schema"]) | ||||||
if not prop: | ||||||
if not prop: # if empty Schema, empty list, None | ||||||
return [] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we return an "empty" Schema here for consistency?
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the following influence your thoughts on this comment? Historically, we have returned an empty My general principal, focused on backwards compatibility is:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could go either way on this one. One one hand, using On further consideration, I agree with your assessment that avoiding breaking existing users is preferable. +1 on empty list in this case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a related concept. That complicates getting a What are your thoughts on the following approach? (I seem to recall you mention keeping objects in a separate place and ensuring that _properties is a view into the object)
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My worry with Also, there are other cases to consider besides "someone gave us a Alternatively, we introduce separate properties for the pangea fields to |
||||||
else: | ||||||
return _parse_schema_resource(prop) | ||||||
elif isinstance(prop, Schema): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would be concerning if this were ever true. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tswast Question for you. This response relates to setting the In general, I see the reasoning behind maintaining the If the user provides a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If so, I wouldn't put it in Ideally, the objects we provide would just be a "view" on top of the REST resource ( In some cases, we have to store state outside of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comment still stands. We shouldn't have |
||||||
return prop | ||||||
return _parse_schema_resource(prop) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Presumably we want to return a
Suggested change
Aside: Is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Where possible I -tried- to stay consist about what the user provides versus gets back: if someone handed the library a |
||||||
|
||||||
@schema.setter | ||||||
def schema(self, value): | ||||||
api_field = self._PROPERTY_TO_API_FIELD["schema"] | ||||||
|
||||||
if value is None: | ||||||
self._properties[api_field] = None | ||||||
elif isinstance(value, Schema): | ||||||
self._properties[api_field] = value | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
else: | ||||||
value = _to_schema_fields(value) | ||||||
self._properties[api_field] = {"fields": _build_schema_resource(value)} | ||||||
|
@@ -1394,7 +1398,8 @@ def _row_from_mapping(mapping, schema): | |||||
Raises: | ||||||
ValueError: If schema is empty. | ||||||
""" | ||||||
if len(schema) == 0: | ||||||
|
||||||
if not schema: | ||||||
raise ValueError(_TABLE_HAS_NO_SCHEMA) | ||||||
|
||||||
row = [] | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this check if we get a
Schema
object. Let's move that above.Note: I just
return schema
if given aSchema
. See my comment below.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could call
return Schema.from_api_repr(schema.to_api_repr())
, but I don't see much reason to do so.