Python Code Best Practices
This page outlines the best practices for writing Python code in 2024.
You will learn about how to expose Python public interfaces, code formatting and linting, and proper typing.
Define public interfaces
Python codebases should expose end-user functionality via clearly defined public interfaces.
The public interfaces should be easy to import. Let's take a look at how to in the pandas library for example:
import pandas as pd
This import makes it easy to import the public functions like pd.read_parquet
or pd.read_csv
.
Python codebases must only expose the public interface in the documentation. Don't make the mistake of improperly structuring your code and not clearly defining a public interface.
Python Code autoformatting tools
Python codebases can be autoformatted with Black or Ruff.
Repos should contain clear instructions on how to autoformat the code via text editor integrations, pre commit hooks, or CI. The code should be setup so that autoformatting the code is straightforward for all developers.
This saves the whole team a lot of wasted effort talking about code styling and from providing code formatting nits in PR reviews.
Python Code linting
Ruff is a good tool for linting code.
Code linters ensure that code is compliant with PEP8.
You should lint all your code and prevent new code from getting merged unless it is properly linted.
Python Type hints
Modern Python code should use type hints.
Here's an example of a function without type hints:
TODO
Here's the same function with type hints:
TODO
Type hints have a variety of advantages:
- make it easier to invoke the function
- prevent the function from being invoked with inproper arguments
- provide higher quality documentation for end users with clear inputs and outputs
Python build tools
There are many popular build tools for Python projects like Poetry and TODO.
Poetry has nice features and encourages coding best practices.
- users can specify different dependency groups
- dependencies can be listed in the
pypoetry.toml
file - The exact versions of all dependencies and transitive dependencies is specified in the
poetry.lock
file to allow for deterministic builds - nice developer quality of life features like single commands to build wheel files and deploy to PyPI
Using Poetry to properly specify dependencies is better than listing dependencies in a requirements.txt
file for the following reasons:
requirements.txt
files don't specify exact versions of dependencies / transitive dependencies, so builds are not deterministic- no quality of life helper functions
You should build your Python projects with a proper build tool.
Python documentation
You should have well-documented instructions to generate documentation for the public interface of your project.
The public facing documentation should provide examples and give users a solid understanding of the functionality provided by your library.
The best alternative is a user guide that's generated by humans for a high level overview of the project and programatic generated API documentation with easily accessible details for each component of the public interface.
Limiting Python dependencies
You should limit the number of dependencies and transitive dependencies in your projects, especially if you're building a library.
Remember that if you depend on one library which in turn depends on 10 other library, then you have 11 dependencies, not just one dependency.
Dependencies can cause dependency hell for end users, so you always need to througoly vet all dependencies. You also need to analyze all the transitive dependencies that are pulled in and make sure the projects are maintained well for the long term viability of your codebase.
Optional Python dependencies
Some dependencies in Python projects can be specified as optional if the funcionality is only relevant for a subset of users.
Use optional dependencies whenever possible rather than requiring all users to install all dependencies, even when they're not needed.
PyPI has a nice installation process for optional dependencies. Here's an example of how to install Polars with the optional deltalake dependency:
TODO
This optional dependency powers the pl.read_delta
and pl.scan_delta
functionality. It's only required for Polars users that want to read Delta Lake tables.
Supported Python versions
Your Python library should support Python versions applicable to your users.
You can't just support the latest Python version because that will make your code inaccessible to users running older versions of Python.
This unfortunately means the Python library developers generally can't use the latest features of Python (unless they've been backported to earlier Python versions).
Python code performance
TODO
Python Code style
There are various Python code style guides that recommend the object oriented/functional programming style, max lines in functions, and max lines in files.
You can even quantify the code quality with tools like TODO.
Nice code style is preferable, but it's comparatively less important than a clean public interface, a proper number of dependencies, and code that meets performance needs.