Fix slowness and add profiling
This MR does two things:
- Add profiling with silk:
- There is now a django setting
ENABLE_PROFILING
, this turns off or on runtime settings (like adding the correct urls for the profiler dashboard) - There is a decorator/wrapper for the
silk_profiler
decorator (which adds detailed profiling on views). This wrapper looks at theENABLE_PROFILING
to determine whether to profile or not - There are settings files for local development (
dev_profiling
) and dev/prod (docker_sdc_profiling
)
This combination will also prevent warning/error messages regarding profiling. For local development, just run with dev_profiling
and on dev/prod, change the settings env var to sdc_docker_profiling
in favor of sdc_docker
and restart the application (e.g., redeploy)
Example:
- Solve the slowness on the create/detail page by optimizing the call to determine distinct dataproduct filter types.
The problem lies in retrieving distinct dataproduct fields (e.g., location, activity, dataproduct_type) from a table with 10+ million records. In reality, across millions of records, only a handful of distinct values exist. Some sort of caching/memoization solution is in order.
Two major solutions were considered: adding an index to certain fields on dataproduct columns or caching distinct values directly. The latter solution was chosen.
Two major cons of adding an index:
- there are only a few distinct values, but an index on these fields would 'cache' the value for every record (potentially 10+ million), this is wasteful and costs a lot of diskpace.
- postgres is not so good at efficient distinct queries because it lacks a decent
index skip scan
, which requires complicated querying
Chosen caching solution is based on Django's cache framework: https://docs.djangoproject.com/en/4.1/topics/cache/
It's used in conjunction with Memcached, which is spun up in a docker container from docker compose configuration.