Add requirements analysis

73429aa5 · John Swinbank · 6f0d228a · 73429aa5 · 73429aa5 · 73429aa5
Commit 73429aa5 authored Jan 7, 2023 by John Swinbank
--- a/contents/3-delivered.tex
+++ b/contents/3-delivered.tex
@@ -163,7 +163,7 @@ As such, limited computational resources can always be made available using the
 Upon choosing an appropriate workflow and analysis service, the user is redirected to the notebook environment.
 In that environment, a Python library --- initially developed specifically to address Zooniverse classification data, but now adapted to a wide range of data types --- makes it possible for them to access their \pgls{ESAP} shopping basket, and hence to download or otherwise manipulate the data that they have selected.
-In addition, the “Data-Lake-as-a-Service” system, developed in conjunction with \gls{ESCAPE} \gls{WP}2 (\Acrshort{DIOS}), provides integration between the interactive analysis environment and bulk storage offered through the \gls{ESCAPE} Data Lake.
+In addition, the \gls{DLaaS} system, developed in conjunction with \gls{ESCAPE} \gls{WP}2 (\Acrshort{DIOS}), provides integration between the interactive analysis environment and bulk storage offered through the \gls{ESCAPE} Data Lake.
 Although the current \gls{IDA} system focuses on Jupyter notebooks, we expect to extend this service to address other forms of interaction in future work; refer to \cref{sec:future:ida} for further discussion.

--- a/contents/4-requirements.tex
+++ b/contents/4-requirements.tex
 \section{Requirements Analysis}
 \label{sec:requirements}
+\gls{ESCAPE} deliverable \citetitle{ESCAPE-D5.2} \autocite{ESCAPE-D5.2} defines a series of functional requirements on the \pgls{ESAP} system.
+Not all of these requirements are met within the context of the funded \gls{ESCAPE} project; some required future development (\cref{sec:future}) and/or ongoing work by \glspl{ESFRI}.
+Nevertheless, this section summarizes the current progress towards meeting the requirements.
+\renewcommand{\arraystretch}{1.5}
+\begin{longtable}{l|p{0.43\textwidth}|p{0.43\textwidth}}
+\textbf{ID} & \textbf{Description} & \textbf{Current Status} \\
+\hline
+\endhead
+R-1 &
+Users should be able to get a list of available data, searchable by different criteria, including keyword, science domain, institute, datatype etc &
+This capability is provided by the Data Discovery system described in \cref{sec:delivered:data}.
+\\
+R-2 &
+Users should be able to get a list of known (\gls{VO} \& other) tools and software for users \& publishers. &
+This capability is partially provided through the \gls{OSSR} integration described in \cref{sec:delivered:ida}.
+By default, only Jupyter notebooks are exposed to end users as other tools are not supported by the \gls{ESAP} \gls{IDA} system; we expect this to be addressed in future work (\cref{sec:future:ida}).
+\\
+R-3 &
+Users should be able to, for a given project \& dataset, query for metadata and aggregate information (i.e.\ find location of data). &
+This capability is provided by the Data Discovery system described in \cref{sec:delivered:data}.
+\\
+R-4 & Users should be able to stage a given dataset at the appropriate facility. &
+This capability is provided by the \gls{DLaaS} system described in \cref{sec:delivered:ida}.
+\\
+R-5 &
+Users should be able to execute a job on a given dataset, including but not limited to: batch or real-time queries \& pipelines, depending on the capabilities of the facility, which need to be made clear to the user. &
+These capabilities are provided by the \gls{IDA} system described in \cref{sec:delivered:ida} and Batch Processing system described in \cref{sec:delivered:batch}.
+\\
+R-6 &
+The platform needs to accommodate restricted data access, so that groups of authorised users are the only ones that are able to access a given private data set, shared to them via the platform. &
+This requirement is handled through access restrictions at the service level.
+\\
+R-7 &
+Users should be able to select from an existing list of Workflows (Notebooks) and either download, or deploy on available facilities. &
+This capability is provide by the \gls{IDA} system as described in \cref{sec:delivered:ida}.
+\\
+R-8 &
+Users should be able to assign \glspl{PID} to every digital object that is part of a Research Object. &
+This capability is not yet available; see \cref{sec:future:pid}.
+\\
+R-9 &
+User generated data needs to be queryable via \gls{ESAP}. &
+\pgls{ESAP} provides access to all data which is available through configured archives, including user-generated data.
+\\
+R-10 &
+Users should be able to ingest advance data products generated from data processing and/or data analysis back to the project data archive. &
+This is an interaction between the analysis service and the archive, which is not currently directly mediated by \pgls{ESAP}.
+However, using the \gls{IDA} environment and \gls{DLaaS} system described in \cref{sec:delivered:ida} the user should be able to arrange whatever ingest is required.
+\\
+R-11 &
+Users should be able to select computing facilities on the basis of their capacity. E.g.\ an \gls{HPC} resource with a specific acceleration (such as a \gls{GPU}) might be needed because the software to be run requires it. &
+This capability is not yet available; see \cref{sec:future:metadata}.
+\\
+R-12 &
+Less experienced users (e.g.\ citizen scientists) should be able to filter the list of available software tools to include only those deemed pertinent to the data that they have selected. &
+The system described in \cref{sec:delivered:ida} makes it possible for users to search and filter the list of tools available from the \gls{OSSR}.
+Future evolution of this work could provide additional assistance to users in locating software that is directly relevant to their data; see \cref{sec:future:metadata}.
+\\
+R-13 &
+Users should be able to schedule computational tasks at regular intervals e.g.\ to periodically retrieve new classification data from a citizen science experiment. &
+This capability is not yet available; see \cref{sec:future:periodic}.
+\\
+\end{longtable}
+\renewcommand{\arraystretch}{1}
+In summary, of the thirteen requirements described, seven have been met (R-1, R-2, R-3, R-4, R-5, R-7, R-12), three require future development effort (R-8, R-11, R-13), and three are satisfied by the integration of \pgls{ESAP} with other \gls{ESCAPE} infrastructure (R-6, R-9, R-10).
--- a/contents/7-future.tex
+++ b/contents/7-future.tex
@@ -2,6 +2,19 @@
 \label{sec:future}
 \subsection{Advanced Metadata Management}
+\label{sec:future:metadata}
+Including matching data to IDA tasks, tasks to appropriate compute hardware, etc.
+\subsection{Periodic Execution}
+\label{sec:future:periodic}
+See requirement R-13.
+\subsection{Issue Persistent Identifiers}
+\label{sec:future:pid}
+See requirement R-8.
 \subsection{Containerized Interactive Analysis Tooling}
 \label{sec:future:ida}