-
Notifications
You must be signed in to change notification settings - Fork 218
Add pydantic curation model, improve merging rules, and add splitting model #3760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
sparsity_overlap: float = 0.75, | ||
new_id_strategy: str = "append", | ||
return_new_unit_ids: bool = False, | ||
format: str = "memory", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general question that will matter for typing going forward. We have been moving toward doing:
"append" | "new"
for typing but type analysis programs don't like this so I assume pydantic won't either. str
however is not accurate either because it doesn't expect any string, but specific strings. So in this case should we move the library over to
Literal['append' | 'new']
I forget the actual argument so 'new' was me just making something up for example.
Or does pydantic only accept str
and doesn't accept Literal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pydantic only accepts Literal
.
Why ""append" | "new" for typing but type analysis programs don't like this"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. On vscode I only get a warning saying that the type "append" | "new" are not defined. And others (I think Heberto) have commented about why not use Literal['append' | 'new'] so maybe he is seeing the typing warning too. I just want to make sure we fit in the pydantic model but also be useful to the end user. Saying str
is not useful to the end-user that uses type hints because it is actually a Literal. I think adding Literal clutters stuff, but if we are now relying on a tool that expects Literal then we have to use it and we should move the whole code base in that direction for consistency.
I think the static type analysis programs think that "append" should be a type because we are not specifying it is a literal. So although python allows it, I think static type checkers don't know what to do with it. It is a little similar to the Optional
, optional
debate in type hinting in python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think using "append" | "new"
is not supported...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is exactly what I'm saying!
I prefer it, but it is not supported. So we need to switch! I don't want us to switch to str
I want us to switch to Literal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! That makes sense to me :)
…e into curation-pydantic
@samuelgarcia this is now only the new pydantic model, including splitting, and a general validation clean up in the curation format |
ready to review |
Hi. Then we should have a clear v2 in the format. |
Could you also change the curation.rst format ? |
|
||
class LabelDefinition(BaseModel): | ||
name: str = Field(..., description="Name of the label") | ||
label_options: List[str] = Field(..., description="List of possible label options", min_length=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We we really need this ...
everywhere ?
I think this is clear that they are all mandatory expecet whe there is a default no ?
Is this common pydantic ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ...
is needed for required fields
"If labels, the split is defined by a list of labels for each spike (`split_labels`). " | ||
), | ||
) | ||
split_indices: Optional[Union[List[List[int]]]] = Field(default=None, description="List of indices for the split") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why Union ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this play with numpy array ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, union is useless here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm, no It should be Union[List[int], List[List[int]]]
: The List[int]
is for labels
mode, the List[List[int]]
for indices
mode
return values | ||
|
||
@classmethod | ||
def check_splits(cls, values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure of split_data should be desribe by mode here.
The split_data type is unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
merge_unit_group: List[Union[int, str]] = Field(..., description="List of groups of units to be merged") | ||
merge_new_unit_id: Optional[Union[int, str]] = Field(default=None, description="New unit IDs for the merge group") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we now have a nested dict why not
merge_unit_group: List[Union[int, str]] = Field(..., description="List of groups of units to be merged") | |
merge_new_unit_id: Optional[Union[int, str]] = Field(default=None, description="New unit IDs for the merge group") | |
group: List[Union[int, str]] = Field(..., description="List of groups of units to be merged") | |
new_unit_id: Optional[Union[int, str]] = Field(default=None, description="New unit IDs for the merge group") |
because it is obvisouly a merge. no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, also for splits
This PR goes in the direction of adding more structure to the curation format.
By defining a Pydantic model, we can add proper description, types, and validation strategies for the curation.
This will make it easier to validate and adopt by third party software