Write PyTests in Databricks
Hello Folks, this blogs helps you to setup and run Pytest and code coverage in Databricks . This way you could run all the tests at once
We see databricks users writing unit tests inside databricks notebooks but the issue is that they cannot trigger all the tests in their repository at once and need to run every notebook individually to run each test.
We could write all the tests and Python scripts in databricks by creating Python files using the ‘Create → File‘ option instead of notebooks. We can create Python files in databricks using the option seen below
By having python scripts as files and not notebooks, we can trigger all the tests at once and get the tests results and coverage report using PyTest. This approach is also useful to run all unit tests at once in CI-CD pipelines
An example is given below which you can use as per your requirement.
Let’s say we have a Python module by the name ‘greetings.py’ which contains a function ‘greet’.
To write unit tests for this function, let us create another python module ‘test_greetings.py’ which contains the tests for the module ‘greetings.py’
Below ‘run_pytest.py’ script is run to execute the test
Once the above ‘run_pytest‘ script is run using the run option in databricks(play button on top left hand side of above image), all the test results are shown in the terminal above along with coverage report.
My folder structure for this project in databricks is as below.
Though I just had only one test file , you could have any number of tests or python modules in the ‘tests’ folder and execute them automatically at once by running the ‘run_pytest.py‘ file. Just edit the directory path in ‘run_pytest.py‘ file for python to detect tests or just name it till the folder name to run all the tests in the folder automatically.
Prerequisites to run Pytest:
Pytest automatically detects and runs tests only if
a. Python module name start or end with the word “test”
ex: test_greetings.py or greetings_test.py
b. Test functions names inside the test module must start with the word ‘test’
ex: a function with name ‘test_greet.py’
Install Pytest and Pytest-Cov using using below command.
‘Pytest-cov’ is a plugin to show coverage report
a. pip install pytest
b. pip install pytest-cov
I have used relative import statement in python file ‘test_greetings.py’. Make sure you have ‘__init__.py’ file in every python package for this relative import statement to work fine
you can also view/ download this above code repo from my github
Hope, this blog was some help to you . Share your comments and follow for more blogs on Python and data engineering