Adding your algorithm to devex_sdk is easy with the following steps.
It is assumed that you have already cloned the devex_sdk main branch to your local machine for your use. If you have not done this already, this is covered in the README file. As standard with git, you need to have the main branch cloned inorder to create a branch.
Perform the following commands within your devex_sdk repository:
- Get a fresh pull on the the devex_sdk repository:
git pull
- Next, create a branch from main on your local machine by executing following command; you will need to replace the fields with your first name and the name of your algorithm:
git branch firstname/name_of_algo
For Example: If Vinny put together a brand-new algorithm for creating buckets with EKS data, his branch and command would look as follows:
git branch vinny/bucketization
- Now, you are still working in the main branch on your local machine. This can be confirmed by issuing the branch command which will show the current branch as highlighted:
git branch
- You will now need to switch your branch to the new branch you created. To do this, execute the following command, replacing firstname/name_of_algo with your previously defined branch name:
git switch firstname/name_of_algo
- Confirm you are now working in your branch by issuing the branch command:
git branch
- Now, you will need to push your current branch and set it as the upstream equivalent. This will add your branch held on your local machine to the repository online:
git push --set-upstream origin firstname/name_of_algo
Adding your algorithm to the devex_sdk library is a relatively easy process; however, it is important that the steps below are followed closely to ensure the new algorithm builds in correctly to the existing framework. To make things easy to follow, an example algorithm circles has been included as an example to follow.
1. Create a New Directory Under the devex_sdk Sub-Directory
Navigate to the devex_sdk subdirectory within the devex_sdk repository:
> devex_sdk
> devex_sdk
__init__.py
> [OtherPackages]
Here, create a new directory. The directory should be named with your algorithm name, the same name you included in the branch you checked out above. For the circles sub-package, the new directory would be titled as follows:
> devex_sdk
> devex_sdk
__init__.py
> circles
> [OtherPackages]
This new folder is the sub-package folder for your algorithm.
This directory is where your algorithm will be stored. You will also need an init file in this directory - this _init_file allows access to your Python file when building the package. So, following the circles sub-package, the directory structure would look as follows:
> devex_sdk
> devex_sdk
__init__.py
> circles
__init__.py
CirclesClass.py
> [OtherPackages]
This step is very important for ensuring your sub-package is able to interact correctly with the rest of the devex_sdk structure and be available for use when the package is installed. For this step, work from the sub-package level (in the example, the init file within the circles directory) up to the root of the devex_sdk repository.
This example is following a simple class that is being added to the package. You as the developer make the decision on what functions and classes of your code should be available for use with the package and what should not be. You will also need to make the decision on what level of dot notation will be needed in order to access specific aspects of your code. This is covered in detail below.
Sub-Package init File:
The init file is used in Python to give access to aspects of your code from a higher level. The general process is to explicitly elevate functions and classes that should be accessible at the level of the directory in-which the init file resides. Here is an example to illustrate this point:
Following the circles example from earlier, I have written a useful class called Circle located in CirclesClass.py. In the Python file, there are several class methods as well as a function outside the class for describing circles:
import math
class Circle:
def __init__(self, radius, color):
self.radius = radius
self.color = color
def get_radius(self):
return self.radius
def get_color(self):
return self.color
def area(self):
return math.pi*(self.radius*self.radius)
def perimeter(self):
return 2*math.pi*self.radius
def describe(circle):
print(f"This is a {circle.color} circle with a radius of {circle.radius}.")
When I contribute this code to devex_sdk, my hope is to have both the class Circle and the function describe available for use at a minimum level of the circles sub-package directory. This means that if I am to import the devex_sdk library as follows, I should be able to access both the class and function with the following syntax:
Desired Syntax
import devex_sdk
# this is how I want to be able to use Circle and describe
ci = devex_sdk.circles.Circle(radius, color)
devex_sdk.circles.describe(ci)
But since there is currently nothing in the init file, in order to access the describe function and Circle class, the following notation would be needed to access the class and function:
Current Syntax
import devex_sdk
# this is how you currently access Circle and describe
ci = devex_sdk.circles.CirclesClass.Circle(radius, color)
devex_sdk.circles.CirclesClass.Circle(ci)
As you can see, as the syntax currently sits, using the class and function require quite verbose calls. However, the following use of the init files eliminates the need to call the CirclesClass in the dot notation. That's because the describe function and Circles class reside in CirclesClass.py and are not directly present in the circles directory, but including them in the init file as follows effectively adds them there:
As a reminder, we are adding these to the inner init file within the circles directory:
> devex_sdk
> devex_sdk
__init__.py
> circles
__init__.py
CirclesClass.py
> [OtherPackages]
To elevate Circles and describe to the circles directory, add the following to the init file:
# devex_sdk/devex_sdk/circles/__init__.py
from .CirclesClass import Circle
from .CirclesClass import describe
Next, we need to add some information to the outer init file:
Package Level init file:
To complete the initial init file process, the following must be added to the init file within /devex_sdk/devex_sdk:
# devex_sdk/devex_sdk/__init__.py
from .circles import CirclesClass
With the above imports included in the respective init files, the Desired Syntax from above can now be used:
import devex_sdk
# this syntax will work now
ci = devex_sdk.circles.Circle(radius, color)
devex_sdk.circles.describe(ci)
Now, looking at the existing syntax, I may feel that the syntax is still too verbose in order to make a call to the class Circle; however, looking at the describe function, I feel like this syntax is appropriate (I want people to know that describe is only to be used on instances of Circle, so leaving the circles dot notation is appropriate). With this line of thinking, I would like to change the syntax to the following:
New Desired Syntax:
import devex_sdk
# I want to shorten the syntax for creating an instance of Circle
ci = devex_sdk.Circle(radius, color)
# and keep the syntax the same for calling describe
devex_sdk.circles.describe(ci)
To do this, I simply add the following to the Package Level init file:
# devex_sdk/devex_sdk/__init__.py
# existing imports:
from .circles import CirclesClass
# add this line to shorten the sytax to devex_sdk.Circles(radius, color):
from .circles import Circle
This effectively pulls only the class Circle up to the level of devex_sdk while leaving the describe function at the level of the circles directory. So once these imports in the init files are completed, the following syntax will be valid:
import devex_sdk
ci = devex_sdk.Circle(radius, color)
devex_sdk.circles.describe(ci)
Here are the example init files for completeness and clarity:
init file in devex_sdk directory:
# devex_sdk/devex_sdk/__init__.py
from .circles import CirclesClass
from .circles import Circle
init file in circles directory:
# devex_sdk/devex_sdk/circles/__init__.py
from .CirclesClass import Circle
from .CirclesClass import describe
2. Add Dependencies to requirements.txt
Add all dependencies included in your algorithm / sub-package in the requirements.txt file. You must also include the package version number for your dependency. This locks down the version you are using and prevents version conflicts if present.
Add your dependencies with the following format, this is taken directly from the existing requirements.txt file:
Note: make sure to remember what dependencies you added to the requirements.txt file as you will need to add these same dependencies to the setup.py file under install_requires in the next step.
pandas==1.4.3
numpy==1.23.1
tqdm==4.64.0
If your version specific dependencies are already included in the list, do not duplicate them.
3. Modify the setup.py File
There are two points in the setup.py that will need to be edited to incorporate your sub-package into devex_sdk.
First, you will add your package directory to the find_packages field. To do this, add your package name with the format of devex_sdk.PackageDirectoryName to the include list. Simply add a comma after the last item and include the formatted package name as a string as specified above.
For example if the packages=find_packages field is currently set with the following list:
packages=find_packages(include=['devex_sdk', 'devex_sdk.extras']),
and I want to add the circles sub-package, I would make the following addition to the list:
packages=find_packages(include=['devex_sdk', 'devex_sdk.extras', 'devex_sdk.circles'])
Second, you must add any dependencies you added in the requirements.txt to the install_requires list. Only the package name is required to be added here, the version number is not required in this section:
install_requires=['pandas',
'numpy',
'tqdm',
]
In order to maintain the integrity of the devex_sdk library, grow the set of algorithms sustainably, and future proof the code with increased maintainability, all new algorithms are required to include unit tests. This is the process of testing each aspect/function of the code individually to gain insight to how each aspect of the code in performing and make the process of debugging much easier.
In devex_sdk the pytest framework is used to create a simple and scalable testing environment. Below are examples of how to implement unit tests for your algorithm based on the circles example above.
All unit tests must be stored in the tests directory at the root of the devex_sdk structure. In this directory, create a Python file with the naming format as follows:
> devex_sdk
> devex_sdk
> tests
test_<AlgoName>.py
Replace AlgoName with the name of your algorithm, this should match the algorithm name you included on your branch.
For the circles example, the test file looks as follows:
>devex_sdk
> devex_sdk
>tests
test_circles.py
Note: The test_ prefix to the Python test file is crucial to ensure successful testing with Pytest.
Next, you will need to write the unit tests in the test_circles.py file:
The idea of unit tests is to test each aspect of your code. With this in mind, the unit tests for the circles sub-package of devex_sdk are written as follows:
from devex_sdk import circles
ci = circles.Circle(5, 'red')
def test_describe():
assert circles.describe(ci) == print('this is a red circle.')
def test_getters():
assert ci.get_radius() == 5
assert ci.get_color() == 'red'
def test_area():
assert ci.area() == 78.53981633974483
def test_perimeter():
assert ci.perimeter() == 31.41592653589793
The general process is as follows:
- Pull in the sub-package you are testing (i.e. your sub-package included in devex_sdk)
- For each class and function in your sub-package, create a test function
- The test function should contain the same test_ prefix
- Within the test function, an assertion must be made to test against
The test function format can be generalized as follows:
def test_FunctionInPackage():
assert FunctionInPackage(input) == ExpectedOutPut
When your test functions are complete, you can run the test cases by navigating to the devex_sdk root and executing the following command: RUN: pytest from root dir. RUN pytest --slow from root dir to run slow tests.
pytest
pytest --slow
In order to run tests locally please change the bucket_name and folder_name to either local data or s3 stored data.
The file exixts in dish-devex-sdk -> devex_sdk -> tests-> commons.py
In our case we default to run these tests in the GitHub runner so we point to an s3 bucket.
the pytest command must be within the same directory that the tests directory is located. Once run, pytest will generate output on the test.
It is time for a test build! If you've followed the steps above, and your unit tests are all passing, then it's time to test-build devex_sdk with your sub-package on your local machine. This involves the following steps:
- Uninstall devex_sdk from your pip package manager
- Compile a new .whl file including your sub-package
- Install the new wheel file using pip
- Test the features of your sub-package
Issue the following command in your terminal to remove the existing install of devex_sdk:
pip uninstall devex_sdk
Navigate to the root of the devex_sdk directory you have been working in on your local host. Then issue the following command:
python setup.py bdist_wheel --version <VERSION_NUMBER>
This will generate the .whl file in the dist directory at the root of the devex_sdk file structure.
NOTE: the <VERSION_NUMBER> only effects your local build. You can use any version number you like. This can be helpful in testing prior to submitting a pull request. Alternatively, you can eclude the --version <VERSION_NUMBER> flag and the .whl file name will output as devex_sdk-VERSION_PLACEHOLDER-py3-none-any.whl
Next, from the same directory, execute the following command:
pip install /dist/*.whl
Note: in Windows, you will need to hit tab prior to executing the above command, this will autocomplete the name of the .whl file.
Now navigate to your home directory to get out of the devex_sdk folder, this will ensure that you are testing devex_sdk off of the pip installed version, not the devex_sdk directory:
cd ~
Now, enter a Python Environment and test your sub-package. Try various levels of imports, test all the features, and ensure everything is behaving as it should. For example, the cricles sub-packge would be tested as follows
>>> from devex_sdk import circles as ci
>>> mycircle = ci.Circle(2, "red")
>>> mycircle.radius
2
>>> mycircle.color
'red'
>>> ci.describe(mycircle)
This is a red circle with a radius of 2.
If everything in your algorithm is functioning as expected, then it's time to submit a pull request to have your code included in the next release of devex_sdk!
Each subpackage contributed to devex_sdk must have a README.md file included. This tells other users how to successfully use your functions, the use cases for each function, and the expected outputs. The README.md files should be included in the following location:
> dish-devex-sdk
> devex_sdk
__init__.py
> circles
__init__.py
CirclesClass.py
--> README.md
> [OtherPackages]
The minimum topics for inclusion in the README file are as follows:
- Detailed description of each input
- Detailed description of the expected output
- An example call for each function you are contributing
Once all of the above steps are completed, your unit tests are all passing, and you are able to successfully build devex_sdk on your local machine then commit and push your branch. It's now time to submit a pull request.