Skip to content

D1. Stage the data on a given resource (in a more user-friendly way)

We want to stage data to a given compute resource in a way so that:

  • It can handle 1000+ rows of the shopping basket
  • It does not require an (extra) login and uses existing credentials
  • It does not require manual actions during runtime (such as Jupyter Notebook commands, working with a plugin)

For this, an async worker could be started using a generic interface taking:

  • Metadata for which data to store
  • Metadata for which software to use
  • Metadata for where to run it on (Compute infrastructure)

An initial diagram of this workflow can be seen here: https://drive.google.com/file/d/1slRgVOwpkuvBulcou9eRnoGEFIIb66yA/view?usp=sharing

Topics to discuss:

  • Authorization: Just use the token from the user? Get certificates from IAM? Use a "trusted" worker/plugin?
Edited by Klaas Kliffen