Skip to content
Snippets Groups Projects
SDC-001.tex 12.1 KiB
Newer Older
\documentclass{astron}
John Swinbank's avatar
John Swinbank committed
\usepackage{hyperref}
\usepackage[nonumberlist,nogroupskip]{glossaries}

\input{meta}

John Swinbank's avatar
John Swinbank committed
\setDocTitle{SDC Software Maintenance in an Agile Environment}
\setDocNumber{SDC-001}
\setDocRevision{\vcsRevision}
\setDocDate{\vcsDate}
\setDocClass{Limited}
\setDocProgram{SDC}

\setDocChangeRecord{
John Swinbank's avatar
John Swinbank committed
  \addChangeRecord{\vcsRevision}{\vcsDate}{Internal draft}
}

\setDocAuthors{
John Swinbank's avatar
John Swinbank committed
  \addPerson{John Swinbank}{ASTRON}{\vcsDate}
  \addPerson{Jasper Annyas}{ASTRON}{}
}

John Swinbank's avatar
John Swinbank committed
\newacronym{AO}{A\&O}{Astronomy and Operations}
\newacronym{ICA}{ICA}{Internal Cooperation Agreement}
\newacronym{IS}{I\&S}{Innovation and Systems}
\newacronym{SBe}{SBe}{Smart Backend}
\newacronym{SDCO}{SDCO}{Science Data Centre Operations}
\newacronym{SDC}{SDC}{Science Data Centre}
\newacronym{SD}{SD}{Software Delivery}
\newacronym{TO}{TO}{Telescope Operations}

\makeglossaries

\begin{document}
\maketitle

John Swinbank's avatar
John Swinbank committed
%\renewcommand{\glossarypreamble}{\glsfindwidesttoplevelname[\currentglossary]}
%\glsfindwidesttoplevelname
\setglossarystyle{altlist}
\printglossary[title=List of abbreviations]
\clearpage

\section{Introduction}

This document proposes a mechanism for incorporating support and maintenance tasks into \gls{SDC} development cycle.
It originates from discussion at a “Portfolio Management” planning session held on 11 November 2020.
Ultimately, it will be proposed as part of the \gls{SDC} Program Management Plan.

\section{Context}

The \gls{SDC} will be developed and operated in a complex, matrixed management structure.
Broadly, the \gls{SDC} Program will coordinate the efforts of teams drawn from competence and focus groups across ASTRON --- primarily, but not exclusively, the \gls{SD} and \gls{SBe} competence groups within the \gls{IS} division --- to deliver software releases to the \gls{SDCO} group within the \gls{AO} division.
\gls{SDCO} is, in turn, responsible for deploying and operating this functionality to provide services to end users.

While the detailed roles, responsibilities, authorities, and accountabilities of the various groups described above and their management structure is still being defined at time of writing, it is clear that we will adopt an \emph{agile}, and --- likely --- \emph{sprint}-based approach to this work.
That is, we expect the development teams working under the aegis of the \gls{SDC} Program to plan their work as a series of short increments, each no more than a small number of weeks, each of which results in releasable product\footnote{Note that the mechanism by which these products might, in practice, be released to the SDCO team has not yet been established.}.

However rigorous the test procedures adopted by the \gls{SDC} Program and its development teams, and however careful the acceptance procedures used by \gls{SDCO}, it is clear that all software products need ongoing maintenance to fix emergent issues and ensure their long term functionality and stability.
This document discusses by which this maintenance may be requested and carried out.

\section{Definitions}
\label{sec:definitions}

We begin by establishing the meaning of key terms:

\begin{description}

  \item[Maintenance]\hfill\\
    The process of modifying a software system after delivery to correct faults, improve performance, or adapt to a changed environment.

  \item[Support]\hfill\\
    A service provided to respond to and mitigate problem reports received from the users of software.
    A support request may result in changes to the underlying software system (i.e., maintenance), or it may be resolved by other means such as suggesting alternative procedures, providing more information about the correct operation of the software, etc.

  \item[Emergent Work]\hfill\\
    Maintenance or other activities that could not reasonably have been planned in advance.
    For example, responding to a bug report would constitute emergent work: although the existence of bugs in the abstract might reasonably have been anticipated, the details of any particular bug could not have been.
    In contrast, a planned migration to a new framework\footnote{For example, transitioning from Python 2 to Python 3.} is not emergent: the migration was signposted well in advance, and can be properly scheduled as part of our regular plan.

\end{description}

Further, we identify four separate regimes under which maintenance may be required:

\begin{itemize}

  \item{The \gls{SDCO} team is unable to correctly operate the \gls{SDC} due to a software error.
        Work is blocked until the error can be resolved.}

  \item{The \gls{SDCO} team is unhappy with some aspect of the way the \gls{SDC} software is operated: it requires manual intervention, or is otherwise slow or awkward.
        However, adequate procedures are in place to enable operations to continue.}

  \item{The \gls{SDCO} team is happy with the current performance of the \gls{SDC} software.
        However, they are aware of future changes to the operational environment --- for example, migration to a different underlying platform --- which will require updates to the software.}

  \item{The \gls{SDC} \emph{Program} team is unhappy with some aspect of the software.
        That is, although it delivers all the functionality that \gls{SDCO} have requested, it has, for example, accumulated substantial technical debt which causes development velocity to be reduced, or it relies on old or obsolete frameworks and libraries which the team would like to replace.}

\end{itemize}

\section{Axioms}
\label{sec:axioms}

We make the following assumptions:

\begin{itemize}

  \item{The primary source of expertise about the \gls{SDC} software lies with its developers.
        That is, given the scale of ASTRON as an institute, it is not practical to assume separate “development” and “maintenance” teams, each with the in-depth expertise required to resolve problems when the software fails.}

  \item{Developers will perform best when working as part of a single, coherent team, rather than when pulled in multiple directions.
        That is, any given developer should be expected to participate in only one sprint at any given time (they are not split across multiple teams), and, where possible, the team proceed from sprint to sprint with minimal changes.}

  \item{Where possible, each sprint will focus on a coherent set of goals.
        That is, it is preferable to have a sprint which delivers a single, complete feature, than a sprint which makes incremental progress on a number of fronts without completing any of them.}

\end{itemize}

\section{Proposal}

\subsection{\Acrlongpl{ICA}}

The support interface between the \gls{SDC} Program and \gls{SDCO} will be defined by one or more \glspl{ICA}.
These specify the services which the \gls{SDC} Program provides to \gls{SDCO}.
For example, an \gls{ICA} describes mechanisms by which members of the \gls{SDCO} team may request support from the Program, and sets expectations about how those requests will be handled.
The \gls{ICA} may also make provisions for requests from the Program to \gls{SDCO}.

There is some prior-art within ASTRON for \glspl{ICA} of this type; see, for example, the \gls{ICA} between \gls{SD} and \gls{TO}\footnote{\url{https://support.astron.nl/confluence/display/TO/ICA+Software+Delivery}}

Note that there is still some discussion to be had here regarding the signatories to the \glspl{ICA}.
Specifically, it is not immediately clear whether the agreement is between individual competence or focus groups with \gls{SDCO}, or between the Program as a whole and \gls{SDCO}.
The latter may be more convenient, as it will provide a single point of contact for \gls{SDCO} requests and facilitate cross-team working within the Program.
This discussion must be resolved as part of reaching a consensus about the larger-scale structure of \gls{SDC} management.

\subsection{Scheduling emergent work}
\label{sec:proposal:emergent}

When support requests are received by the Program, they are triaged to assess their impact.
Some effort must be reserved for this, but it should primarily fall on management and product owners, rather than on the development team itself.
During triage, the issue may be:

\begin{enumerate}

  \item{Immediately resolved: the individual responsible for triage can identify and directly rectify a mistake or misapprehension on the part of the person requesting support.}
  \item{Identified as an urgent issue: substantial \gls{SDCO} activities are blocked until the issue is resolved.}
  \item{Identified as a lower priority issue: it should be resolved at some point, but an urgent response is not required.}

\end{enumerate}

These three cases are handled as follows:

\subsubsection{Immediate resolution}

In case 1, no further action is required.
It may be appropriate for the individual responsible for triage to add an issue to the backlog requesting a documentation or interface update to avoid the same issue recurring in future.

\subsubsection{Urgent issue}
\label{sec:proposal:emergent:urgent}

In case 2, the issue is added to a \emph{current} sprint.
The responsible product owner will work with members of the team to ensure that the issue is resolved promptly (for example, by raising it at the next standup meeting).
In exceptional cases, it may be appropriate for management to intervene and ensure that the team prioritises this issue even if it means an interruption to their current activities and runs the risk of failing to meet the sprint goals.

It follows from the above that \emph{the sprint should not be fully loaded during sprint planning}.
That is, some spare capacity should be available so that it is possible for one or more emergent issues to be addressed without imperilling the sprint goal.
It is impossible to know \emph{a priori} how much time to leave unscheduled at the start of the sprint, but this is something that we can learn and adapt to with experience.

\subsubsection{Lower priority issues}

In case 3, the issue is added to the product backlog, but is not scheduled in the current sprint.
It may be accepted into a future sprint during the corresponding sprint planning meeting, based on the product owner's prioritization of this issue vis-à-vis other items in the backlog.

\subsection{Longer-term activities}

It follows from the material in \S\S\ref{sec:definitions} \& \ref{sec:proposal:emergent} that the backlog will gradually accumulate scheduled (e.g. transition to a new framework) and lower priority emergent issues.
We propose addressing this in three ways:

\begin{itemize}

  \item{On occasion, items on the backlog will naturally group together with the product owner's priorities for an upcoming sprint.
       For example, minor bugs and technical debt in some part of the codebase might be conveniently addressed when that code is being updated to add new functionality.
       The product owner will work with team members to identify these commonalities and define appropriate sprint goals.}

  \item{Per \S\ref{sec:proposal:emergent:urgent}, our each sprint will be slightly under-scheduled relative to the total amount of work we expect the team to perform.
       This slack time is expected to be filled with high-priority issues, but these issues will not always emerge.
       In this case, the sprint can be “padded” by selecting further issues from the backlog.}

  \item{Periodically, we expect to schedule sprints which are devoted to maintenance activities.
        In principle, these sprints should focus on major activities with concrete deliverables which will fully occupy the team (like, for example, adapting the codebase to a new framework).
        However, it may be appropriate sometimes to “sweep” the backlog looking for accumulated technical debt and other low-priority items to address in a single sprint.
        This second option is contrary to the third axiom identified in \S\ref{sec:axioms}: it may be occasionally necessary, but we prefer to avoid it in general.}


\end{itemize}

\section{Conclusions}

The material above presents some ideas about how support requests and maintenance work can be addressed by the \gls{SDC} Program.
It is currently provisional, and is expected to serve as the basis for further discussion, both independently (addressing the narrow context of this document) and within the wider framework of developing the \gls{SDC} Program Management Plan.

\end{document}