Skip to content
Snippets Groups Projects
Commit 73429aa5 authored by John Swinbank's avatar John Swinbank
Browse files

Add requirements analysis

parent 6f0d228a
No related branches found
No related tags found
No related merge requests found
Pipeline #42143 passed
...@@ -163,7 +163,7 @@ As such, limited computational resources can always be made available using the ...@@ -163,7 +163,7 @@ As such, limited computational resources can always be made available using the
Upon choosing an appropriate workflow and analysis service, the user is redirected to the notebook environment. Upon choosing an appropriate workflow and analysis service, the user is redirected to the notebook environment.
In that environment, a Python library --- initially developed specifically to address Zooniverse classification data, but now adapted to a wide range of data types --- makes it possible for them to access their \pgls{ESAP} shopping basket, and hence to download or otherwise manipulate the data that they have selected. In that environment, a Python library --- initially developed specifically to address Zooniverse classification data, but now adapted to a wide range of data types --- makes it possible for them to access their \pgls{ESAP} shopping basket, and hence to download or otherwise manipulate the data that they have selected.
In addition, the “Data-Lake-as-a-Service” system, developed in conjunction with \gls{ESCAPE} \gls{WP}2 (\Acrshort{DIOS}), provides integration between the interactive analysis environment and bulk storage offered through the \gls{ESCAPE} Data Lake. In addition, the \gls{DLaaS} system, developed in conjunction with \gls{ESCAPE} \gls{WP}2 (\Acrshort{DIOS}), provides integration between the interactive analysis environment and bulk storage offered through the \gls{ESCAPE} Data Lake.
Although the current \gls{IDA} system focuses on Jupyter notebooks, we expect to extend this service to address other forms of interaction in future work; refer to \cref{sec:future:ida} for further discussion. Although the current \gls{IDA} system focuses on Jupyter notebooks, we expect to extend this service to address other forms of interaction in future work; refer to \cref{sec:future:ida} for further discussion.
......
\section{Requirements Analysis} \section{Requirements Analysis}
\label{sec:requirements} \label{sec:requirements}
\gls{ESCAPE} deliverable \citetitle{ESCAPE-D5.2} \autocite{ESCAPE-D5.2} defines a series of functional requirements on the \pgls{ESAP} system.
Not all of these requirements are met within the context of the funded \gls{ESCAPE} project; some required future development (\cref{sec:future}) and/or ongoing work by \glspl{ESFRI}.
Nevertheless, this section summarizes the current progress towards meeting the requirements.
\renewcommand{\arraystretch}{1.5}
\begin{longtable}{l|p{0.43\textwidth}|p{0.43\textwidth}}
\textbf{ID} & \textbf{Description} & \textbf{Current Status} \\
\hline
\endhead
R-1 &
Users should be able to get a list of available data, searchable by different criteria, including keyword, science domain, institute, datatype etc &
This capability is provided by the Data Discovery system described in \cref{sec:delivered:data}.
\\
R-2 &
Users should be able to get a list of known (\gls{VO} \& other) tools and software for users \& publishers. &
This capability is partially provided through the \gls{OSSR} integration described in \cref{sec:delivered:ida}.
By default, only Jupyter notebooks are exposed to end users as other tools are not supported by the \gls{ESAP} \gls{IDA} system; we expect this to be addressed in future work (\cref{sec:future:ida}).
\\
R-3 &
Users should be able to, for a given project \& dataset, query for metadata and aggregate information (i.e.\ find location of data). &
This capability is provided by the Data Discovery system described in \cref{sec:delivered:data}.
\\
R-4 & Users should be able to stage a given dataset at the appropriate facility. &
This capability is provided by the \gls{DLaaS} system described in \cref{sec:delivered:ida}.
\\
R-5 &
Users should be able to execute a job on a given dataset, including but not limited to: batch or real-time queries \& pipelines, depending on the capabilities of the facility, which need to be made clear to the user. &
These capabilities are provided by the \gls{IDA} system described in \cref{sec:delivered:ida} and Batch Processing system described in \cref{sec:delivered:batch}.
\\
R-6 &
The platform needs to accommodate restricted data access, so that groups of authorised users are the only ones that are able to access a given private data set, shared to them via the platform. &
This requirement is handled through access restrictions at the service level.
\\
R-7 &
Users should be able to select from an existing list of Workflows (Notebooks) and either download, or deploy on available facilities. &
This capability is provide by the \gls{IDA} system as described in \cref{sec:delivered:ida}.
\\
R-8 &
Users should be able to assign \glspl{PID} to every digital object that is part of a Research Object. &
This capability is not yet available; see \cref{sec:future:pid}.
\\
R-9 &
User generated data needs to be queryable via \gls{ESAP}. &
\pgls{ESAP} provides access to all data which is available through configured archives, including user-generated data.
\\
R-10 &
Users should be able to ingest advance data products generated from data processing and/or data analysis back to the project data archive. &
This is an interaction between the analysis service and the archive, which is not currently directly mediated by \pgls{ESAP}.
However, using the \gls{IDA} environment and \gls{DLaaS} system described in \cref{sec:delivered:ida} the user should be able to arrange whatever ingest is required.
\\
R-11 &
Users should be able to select computing facilities on the basis of their capacity. E.g.\ an \gls{HPC} resource with a specific acceleration (such as a \gls{GPU}) might be needed because the software to be run requires it. &
This capability is not yet available; see \cref{sec:future:metadata}.
\\
R-12 &
Less experienced users (e.g.\ citizen scientists) should be able to filter the list of available software tools to include only those deemed pertinent to the data that they have selected. &
The system described in \cref{sec:delivered:ida} makes it possible for users to search and filter the list of tools available from the \gls{OSSR}.
Future evolution of this work could provide additional assistance to users in locating software that is directly relevant to their data; see \cref{sec:future:metadata}.
\\
R-13 &
Users should be able to schedule computational tasks at regular intervals e.g.\ to periodically retrieve new classification data from a citizen science experiment. &
This capability is not yet available; see \cref{sec:future:periodic}.
\\
\end{longtable}
\renewcommand{\arraystretch}{1}
In summary, of the thirteen requirements described, seven have been met (R-1, R-2, R-3, R-4, R-5, R-7, R-12), three require future development effort (R-8, R-11, R-13), and three are satisfied by the integration of \pgls{ESAP} with other \gls{ESCAPE} infrastructure (R-6, R-9, R-10).
...@@ -2,6 +2,19 @@ ...@@ -2,6 +2,19 @@
\label{sec:future} \label{sec:future}
\subsection{Advanced Metadata Management} \subsection{Advanced Metadata Management}
\label{sec:future:metadata}
Including matching data to IDA tasks, tasks to appropriate compute hardware, etc.
\subsection{Periodic Execution}
\label{sec:future:periodic}
See requirement R-13.
\subsection{Issue Persistent Identifiers}
\label{sec:future:pid}
See requirement R-8.
\subsection{Containerized Interactive Analysis Tooling} \subsection{Containerized Interactive Analysis Tooling}
\label{sec:future:ida} \label{sec:future:ida}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment