-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: add taskmaster
env in FOCA
config
#203
Conversation
Reviewer's Guide by SourceryThis pull request adds taskmaster environment properties to the FOCA configuration. It includes new configuration models in File-Level Changes
Tips
|
wrt #200 I have made changes to some files and structure, specifically previously I was taking in the template from the config as well which is unnecessary, so changed that a bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @JaeAeich - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 2 issues found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
) from None | ||
|
||
|
||
def get_taskmaster_template() -> V1Job: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Consider refactoring the get_taskmaster_template()
function for improved readability
The function is quite long and deals with complex Kubernetes objects. Consider extracting the container configuration into a separate function to improve maintainability.
def get_taskmaster_template() -> V1Job:
return _create_taskmaster_job()
def _create_taskmaster_job() -> V1Job:
# Implementation details here
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with the general thrust of this comment (although not necessarily with the suggestion for improvement).
Isn't there a way to avoid the hardcoding of all these values? Couldn't you define the models such that you can simply instantiate the outermost model and all those defaults are set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your input, @uniqueg. You raise a valid point about avoiding hardcoding values. One approach to achieve this could be to use configuration files or environment variables to set these defaults, which can then be loaded into the models. This would make the code more flexible and easier to maintain.
For example, you could define a configuration file (e.g., taskmaster_config.yaml
) with all the necessary default values and then load this configuration when creating the V1Job
object. This way, you can easily update the configuration without changing the code.
Here's a rough outline of how you might implement this:
- Define the configuration file:
# taskmaster_config.yaml
job:
apiVersion: "batch/v1"
kind: "Job"
metadata:
name: "taskmaster-job"
spec:
template:
spec:
containers:
- name: "taskmaster-container"
image: "taskmaster-image"
# Add other default values here
- Load the configuration in your function:
import yaml
from kubernetes.client import V1Job
def load_taskmaster_config() -> dict:
with open("path/to/taskmaster_config.yaml", "r") as file:
return yaml.safe_load(file)
def get_taskmaster_template() -> V1Job:
config = load_taskmaster_config()
return V1Job(**config)
This approach separates the configuration from the code, making it easier to manage and update. Let me know if this aligns with your thoughts or if you have any other suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, something like this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad to hear that the suggestion aligns with your thoughts, @uniqueg. If you decide to go with this approach, it will indeed make the code more maintainable and flexible.
To further assist, here are a few additional tips:
-
Validation: Ensure that the configuration file is validated against a schema to catch any errors early. You can use libraries like
pydantic
orcerberus
for this purpose. -
Environment Variables: For sensitive information or values that might change between environments (e.g., development, staging, production), consider using environment variables. You can use a library like
python-dotenv
to load these variables. -
Documentation: Document the configuration file format and the expected environment variables. This will help other developers understand how to configure the system.
-
Fallbacks: Implement sensible defaults in your code as a fallback in case the configuration file or environment variables are missing or incomplete.
Here's a more refined example incorporating these tips:
import os
import yaml
from kubernetes.client import V1Job
from pydantic import BaseModel, ValidationError
class TaskmasterConfig(BaseModel):
apiVersion: str
kind: str
metadata: dict
spec: dict
def load_taskmaster_config() -> dict:
config_path = os.getenv("TASKMASTER_CONFIG_PATH", "path/to/taskmaster_config.yaml")
with open(config_path, "r") as file:
return yaml.safe_load(file)
def get_taskmaster_template() -> V1Job:
try:
config_dict = load_taskmaster_config()
config = TaskmasterConfig(**config_dict)
return V1Job(**config.dict())
except (FileNotFoundError, ValidationError) as e:
raise ConfigNotFoundError(f"Error loading Taskmaster configuration: {e}")
# Example usage
if __name__ == "__main__":
job_template = get_taskmaster_template()
print(job_template)
This example uses pydantic
for validation and includes a fallback for the configuration file path using an environment variable. Let me know if you need any further assistance or have additional questions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clarification, @JaeAeich. Your points are well taken.
Given that the template is meant to be changed during runtime without external changes, it makes sense to avoid hardcoding values directly in the configuration file. However, we can still aim to improve the readability and maintainability of the get_taskmaster_template()
function.
One approach could be to modularize the function by breaking it down into smaller helper functions. This way, you can keep the flexibility of runtime changes while improving the code structure. Here's an example:
def get_taskmaster_template() -> V1Job:
return V1Job(
api_version="batch/v1",
kind="Job",
metadata=_get_metadata(),
spec=_get_spec()
)
def _get_metadata() -> dict:
return {
"name": "taskmaster-job",
# Add other metadata fields here
}
def _get_spec() -> dict:
return {
"template": {
"spec": {
"containers": [_get_container()],
# Add other spec fields here
}
}
}
def _get_container() -> dict:
return {
"name": "taskmaster-container",
"image": "taskmaster-image",
# Add other container fields here
}
This approach keeps the configuration within the code but organizes it into smaller, more manageable pieces. It also allows for easy updates and modifications to individual parts of the template without affecting the entire function.
Let me know if this approach aligns with your requirements or if you have any other suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, but IMO this is not ideal for this case as we will never need other func even if we did people would usually prefer to change the config like its usually written in manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback, @JaeAeich. I understand your concern about the practicality of breaking down the function into smaller parts, especially if the configuration is typically managed in a manifest-like format.
Given this context, it might be best to keep the function as a single unit but ensure it is well-documented and clearly structured for readability. Here’s a refined approach that maintains the function's integrity while improving readability:
def get_taskmaster_template() -> V1Job:
"""Create and return a Kubernetes Job template for Taskmaster.
Returns:
V1Job: A Kubernetes Job object configured for Taskmaster.
"""
return V1Job(
api_version="batch/v1",
kind="Job",
metadata={
"name": "taskmaster-job",
# Add other metadata fields here
},
spec={
"template": {
"spec": {
"containers": [
{
"name": "taskmaster-container",
"image": "taskmaster-image",
# Add other container fields here
}
],
# Add other spec fields here
}
}
}
)
This approach keeps the function concise and self-contained, while the docstring provides clarity on its purpose and return value. Additionally, you can add inline comments to guide future developers on where to make changes.
Let me know if this strikes a balance between maintainability and practicality, or if you have any other suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit difficult to reason about this because I don't know how this is used later on. What will you do with that template? It looks like this is set up to make sure that the returned template is a valid job that can be executed (under certain circumstances, e.g., if an FTP instance is available). But if that's the case, it seems to me that this is something that should rather be set and used in tests (e.g., via a fixture).
If it's used to get a job template and then replace the values according to the actual needs (because it's easier to change values on an existing template), I don't understand why, e.g., env
needs to be populated at this point (if it's overridden anyway). In other words, we then don't need to make sure the template is valid.
Moreover, if you really want to make it easier to set job params according to needs, why not create a class with setter methods for anything that may possibly be set by the job and user needs.
But again, a description and example of the use case of that function would help to give more specific feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your insightful questions, @uniqueg. Understanding the use case and context of the get_taskmaster_template()
function is indeed crucial for providing more specific feedback.
To clarify, the get_taskmaster_template()
function is designed to generate a Kubernetes Job template that can be customized and executed based on runtime requirements. The primary goal is to provide a base template that can be easily modified to fit specific needs without having to redefine the entire structure each time.
Here are a few points to address your concerns:
-
Use Case: The function is intended to provide a starting point for creating Kubernetes Jobs. It ensures that the basic structure is in place, and developers can then modify specific fields as needed. This approach helps maintain consistency and reduces the risk of errors when defining job templates.
-
Template Modification: While the function provides a complete and valid job template, it is expected that certain fields (like
env
) will be overridden based on the specific requirements of the job. The idea is to have a flexible yet consistent base that can be adapted as needed. -
Class with Setter Methods: Creating a class with setter methods is a great idea for managing job parameters. This would provide a more structured and object-oriented approach to modifying the job template. Here's a rough example of how this could be implemented:
from kubernetes.client import V1Job class TaskmasterJobTemplate: def __init__(self): self.job = V1Job( api_version="batch/v1", kind="Job", metadata={ "name": "taskmaster-job", # Add other metadata fields here }, spec={ "template": { "spec": { "containers": [ { "name": "taskmaster-container", "image": "taskmaster-image", # Add other container fields here } ], # Add other spec fields here } } } ) def set_container_image(self, image: str): self.job.spec['template']['spec']['containers'][0]['image'] = image def set_env_vars(self, env_vars: list): self.job.spec['template']['spec']['containers'][0]['env'] = env_vars def get_job(self) -> V1Job: return self.job # Example usage if __name__ == "__main__": job_template = TaskmasterJobTemplate() job_template.set_container_image("new-taskmaster-image") job_template.set_env_vars([{"name": "ENV_VAR", "value": "value"}]) job = job_template.get_job() print(job)
This class-based approach allows for more granular control over the job template and makes it easier to update specific fields without modifying the entire structure.
- Testing: If the primary use of this function is for testing purposes, it might be beneficial to create fixtures or mock objects that can be used in tests. This would ensure that the job template is valid and can be easily modified for different test scenarios.
I hope this provides a clearer picture of the function's purpose and potential improvements. Let me know if you have any further questions or if there's anything else you'd like to discuss!
from tesk.constants import TeskConstants | ||
|
||
|
||
class FtpConfig(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider flattening the configuration structure to reduce complexity.
The new code introduces additional complexity with three new classes (FtpConfig
, ExecutorSecret
, and TaskmasterEnvProperties
). This increases the cognitive load and makes the configuration harder to understand at a glance. Consider flattening the structure to reduce the number of classes and simplify the configuration. Here’s a simplified version:
from typing import Dict, Optional
from pydantic import BaseModel, Field
from tesk.api.ga4gh.tes.models import Service
from tesk.constants import TeskConstants
class CustomConfig(BaseModel):
"""Custom configuration model for the FOCA app."""
service_info: Service
ftp_secret_name: Optional[str] = Field(
default=None, description="Name of the secret with FTP account credentials"
)
ftp_enabled: bool = Field(
default=False,
description="If FTP account enabled (based on non-emptiness of secretName)",
)
executor_secret_name: Optional[str] = Field(
default=None,
description="Name of a secret that will be mounted as volume to each executor. The same name will be used for the secret and the volume",
)
executor_secret_mount_path: Optional[str] = Field(
default=None,
description="The path where the secret will be mounted to executors",
)
executor_secret_enabled: bool = Field(
default=False, description="Indicates whether the secret is enabled"
)
taskmaster_image_name: str = Field(
default=TeskConstants.taskmaster_image_name,
description="Taskmaster image name",
)
taskmaster_image_version: str = Field(
default=TeskConstants.taskmaster_image_version,
description="Taskmaster image version",
)
filer_image_name: str = Field(
default=TeskConstants.filer_image_name, description="Filer image name"
)
filer_image_version: str = Field(
default=TeskConstants.filer_image_version, description="Filer image version"
)
debug: bool = Field(
default=False,
description="If verbose (debug) mode of taskmaster is on (passes additional flag to taskmaster and sets image pull policy to Always)",
)
environment: Optional[Dict[str, str]] = Field(
default=None,
description="Environment variables, that will be passed to taskmaster",
)
service_account_name: str = Field(
default="default", description="Service Account name for taskmaster"
)
This approach maintains the necessary configuration options while reducing the overall complexity.
@@ -3,6 +3,27 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider simplifying the new code to reduce complexity.
The new code introduces several complexities that could be simplified. Here are some suggestions:
- Imports: The increased number of imports adds cognitive load. Consider reducing them to only what's necessary.
- Functions: While modularity is good, the additional functions (
get_custom_config
,get_taskmaster_template
,get_taskmaster_env_property
) introduce new concepts and dependencies. Simplifying these could make the code easier to understand. - Error Handling: Custom error handling (
ConfigNotFoundError
) adds another layer of complexity. Simplifying error handling can make the code more maintainable. - Configuration Management: The custom configuration management system (
Foca
,CustomConfig
) adds abstraction layers. Simplifying configuration retrieval can help. - Kubernetes API Objects: Constructing Kubernetes API objects directly in the code is complex. Simplifying this part can make the code more readable.
Here's a simplified version of the code that focuses on core functionality and minimizes new concepts:
import os
from pathlib import Path
from kubernetes.client.models import V1Job, V1JobSpec, V1ObjectMeta, V1PodSpec, V1PodTemplateSpec, V1Container, V1EnvVar, V1EnvVarSource, V1SecretKeySelector, V1VolumeMount
def get_config_path() -> Path:
config_path_env = os.getenv("TESK_FOCA_CONFIG_PATH")
if config_path_env:
return Path(config_path_env).resolve()
else:
return (Path(__file__).parents[1] / "deployment" / "config.yaml").resolve()
def get_taskmaster_template() -> V1Job:
return V1Job(
api_version="batch/v1",
kind="Job",
metadata=V1ObjectMeta(name="taskmaster", labels={"app": "taskmaster"}),
spec=V1JobSpec(
template=V1PodTemplateSpec(
metadata=V1ObjectMeta(name="taskmaster"),
spec=V1PodSpec(
service_account_name="default",
containers=[
V1Container(
name="taskmaster",
image="docker.io/elixircloud/tesk-core-taskmaster:v0.10.2",
args=["-f", "/jsoninput/taskmaster_input.gz"],
env=[
V1EnvVar(
name="FTP_SECRET_USERNAME",
value_from=V1EnvVarSource(
secret_key_ref=V1SecretKeySelector(
name="ftp-secret",
key="username",
optional=True,
)
),
),
V1EnvVar(
name="FTP_SECRET_PASSWORD",
value_from=V1EnvVarSource(
secret_key_ref=V1SecretKeySelector(
name="ftp-secret",
key="password",
optional=True,
)
),
),
],
volume_mounts=[
V1VolumeMount(
name="podinfo",
mount_path="/podinfo",
read_only=True,
),
V1VolumeMount(
name="jsoninput",
mount_path="/jsoninput",
read_only=True,
),
],
)
],
volumes=[],
restart_policy="Never",
),
)
),
)
This version reduces complexity and focuses on the core functionality, making it easier to read and maintain.
Returns: | ||
The taskmaster template. | ||
""" | ||
job = V1Job( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Inline variable that is immediately returned (inline-immediately-returned-variable
)
) from None | ||
|
||
|
||
def get_taskmaster_template() -> V1Job: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, something like this...
@uniqueg what I think I should rather do is pu the or better yet in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a deeper look now that I'm on my laptop. I'm not sure about this PR. It seems to me that there is work to do on the config. I don't see why would have multiple places where things can be configured and where defaults are defined. Plus there is not a strict distinction between constants and config params.
And then there's the get_taskmaster_template()
utilitiy function that I'm not sure how it's going to be used.
Perhaps it might be better to only deal with cleaning up and streamlining config management and constants (tesk.custom_config
, tesk.constants.TeskConstants
, tesk.api.kubernetes.constants.Constants
, tesk.api.kubernetes.constants.K8sConstants
could be merged into just tesk.constants
and tesk.custom_config
, IMO, with all of the config params being made accessible for modification via deployment/config.yaml
to have everything in one place).
Then add the utility functions in future PRs as and when they are actually used by the application code - it's otherwise impossible to properly review them.
storeLogs: | ||
execution_trace: True | ||
execution_trace: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to custom config and document via schema? Or alternatively, this may be something we could add to FOCA's logging config.
class TaskmasterEnvProperties(BaseModel): | ||
"""Taskmaster environment properties model for the TESK.""" | ||
|
||
imageName: str = Field( | ||
default=TeskConstants.taskmaster_image_name, | ||
description="Taskmaster image name", | ||
) | ||
imageVersion: str = Field( | ||
default=TeskConstants.taskmaster_image_version, | ||
description="Taskmaster image version", | ||
) | ||
filerImageName: str = Field( | ||
default=TeskConstants.filer_image_name, description="Filer image name" | ||
) | ||
filerImageVersion: str = Field( | ||
default=TeskConstants.filer_image_version, description="Filer image version" | ||
) | ||
ftp: FtpConfig = Field(default=None, description="Test FTP account settings") | ||
debug: bool = Field( | ||
default=False, | ||
description="If verbose (debug) mode of taskmaster is on (passes additional " | ||
"flag to taskmaster and sets image pull policy to Always)", | ||
) | ||
environment: Optional[Dict[str, str]] = Field( | ||
default=None, | ||
description="Environment variables, that will be passed to taskmaster", | ||
) | ||
serviceAccountName: str = Field( | ||
default="default", description="Service Account name for taskmaster" | ||
) | ||
executorSecret: Optional[ExecutorSecret] = Field( | ||
default=None, description="Executor secret configuration" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could be structured better?
We could have an object images
that holds taskmaster
and filer
, and each of these are instances of the same model Image
with properties name
and version
.
Similarly, we could have an object secrets
that holds executor
, storage
(with ftp
, s3
and so on).
filerImageVersion: str = Field( | ||
default=TeskConstants.filer_image_version, description="Filer image version" | ||
) | ||
ftp: FtpConfig = Field(default=None, description="Test FTP account settings") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we want to read from (or write to) multiple FTP storage instances?
executorSecret: Optional[ExecutorSecret] = Field( | ||
default=None, description="Executor secret configuration" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be cases where it would make sense to support multiple executor secrets?
) | ||
|
||
|
||
class TaskmasterEnvProperties(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply name this Taskmaster
?
enabled: bool = Field( | ||
default=False, | ||
description="If FTP account enabled (based on non-emptiness of secretName)", | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this
based on non-emptiness of secretName
Does this imply that this is automatically set based on the value of secretName
? If so, why do we need it?
More generally, why do we need a mechanism to disable/enable a secret when we could just remove the secret? Goes for the executorSecret
, too.
from tesk.constants import TeskConstants | ||
|
||
|
||
class FtpConfig(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The properties of this model don't appear to me to be specific to FTP (only the descriptions). Could this then be abstracted to just Secret
maybe?
|
||
from tesk.api.ga4gh.tes.models import Service | ||
from tesk.constants import TeskConstants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are defaults, not constants, right? Because the user can change them via the config. So perhaps better to rename the module defaults
and the
Also, if they were constants, note that it is Python convention to use ALL_CAPS names.
Finally, I'm still not sure why we don't keep the defaults where we define the models, because that's where we also keep the documentation of the corresponding params.
V1VolumeMount, | ||
) | ||
|
||
from tesk.api.kubernetes.constants import Constants, K8sConstants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much the same comment applies to these as to TeskConstants
. Not all of these appear to me to be constants, but might be changed by the user/admin, in which case I believe that they are config params and should be called as such.
Some of them might indeed be constants (TESK would fail if they were ever changed). In that case, the var names should be in ALL_CAPS.
Possibly it would make sense to define:
- A class
Constants
withALL_CAPS
attributes that should never change. Alternatively, perhaps better/more idiomatic, they could also be globals in a moduleconstants
. - A class
K8sConfig
that is defined inCustomConfig
and can be configured viadeployment/config.yaml
, perhaps in a sectioncustom.kubernetes
(just like all the other configurable stuff). - One or more classes (e.g.,
JobConfig
) to define the remaining config params. Also defined inCustomConfig
and configurable by users/admins through sections incustom
indeployment/config.yaml
.
) from None | ||
|
||
|
||
def get_taskmaster_template() -> V1Job: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit difficult to reason about this because I don't know how this is used later on. What will you do with that template? It looks like this is set up to make sure that the returned template is a valid job that can be executed (under certain circumstances, e.g., if an FTP instance is available). But if that's the case, it seems to me that this is something that should rather be set and used in tests (e.g., via a fixture).
If it's used to get a job template and then replace the values according to the actual needs (because it's easier to change values on an existing template), I don't understand why, e.g., env
needs to be populated at this point (if it's overridden anyway). In other words, we then don't need to make sure the template is valid.
Moreover, if you really want to make it easier to set job params according to needs, why not create a class with setter methods for anything that may possibly be set by the job and user needs.
But again, a description and example of the use case of that function would help to give more specific feedback.
Yeah now that I have move context too I see where I can improve, ill create another PR keeping all the reviews from here. Not closing for that reason. |
closing in favor of #206 |
Summary by Sourcery
Add support for Taskmaster environment properties in the FOCA configuration, including new functions to retrieve custom configuration and Taskmaster template.
New Features:
taskmaster_env_properties
in the custom configuration model to manage Taskmaster environment properties.Enhancements: