This lesson is still being designed and assembled (Pre-Alpha version)

Sustainable Software Development

Introduction

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • What kind of code should we (not) write?

Objectives
  • How to make the software packages written by us more accessible by our colleagues.

What is a programming language?

“Programs must be written for people to read, and only incidentally for machines to execute.”

-Abelson & Sussman, Structure and Interpretation of Computer Programs1


We work in a large collaboration so that we read/use each other’s codes very often. Sadly but truly discussing coding is part of our lives (at least mine). No matter what is going on in this world, we talk (complain) about software often. Again at least for me:

My lunch conversation compositions:

Ok that is a bit exaggerated but software is indeed important for us. We gathered feedback from many people and extracted several simple axioms. It is interesting that we all have agreed on those simple things despite various backgrounds. In this mini section we are not gonna tell you everything you need to know about writing elegant codes but rather share our experience working with software in ATLAS. Hopefully in the end we could convince you that it is worth thinking about this.

Why care about good code?

We are busy people with dozens of things to pay attention to in our work. Why should we put in additional effort to produce quality code? Why not just settle with code that works?

Focusing on quality code is actually one of the best ways to save you and your team tremendous amounts of time in the future. Clean and elegant code is much easier to understand, maintain, and extend.

Code quality is a broad measure of how useful code is. This obviously includes whether or not the code behaves as it’s intended to, but also includes non-functional aspects such as robustness or maintainability. Whether or not a piece of code is high-quality can depend on the industry or team it’s written for, and is often subjective even within that team. As we gather our collective experience as developers however, we can define some broad code quality goals that are useful as guiding principles.

Code quality goals2

These are a sample of typical code quality goals one might adhere to. These are not exhaustive, but rarely should you write software without keeping these goals in mind.

What other goals?

Can you think of other non-functional code quality goals that might not be included in this list?

Examples

Security is often another goal that we don’t often encounter in ATLAS analysis.

Efficiency is another goal that is more important in some cases than others.


This lesson will introduce practices and habits that will help you achieve these goals. Some techniques are good for software development in general, and some are specific to the coding we do in ATLAS.


Key Points

  • We are not professional software engineers but software is important.

  • Our code is very likely to be shared with our colleagues.


Useful Documents

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • Where could I get guidance?

Objectives
  • Knowing where to seek for help within the collaboration.

Materials from the ATLAS software team

The ATLAS computing team has many useful documenations (twiki pages, slides, notes, etc). Here is a list of them recommended by software experts:

  1. A nice twiki page on software quality
  2. A note on ATLAS C++ guidelines
  3. A set of slides summarizing best C++ practice

Of course there are more available. If you have any questions you can also ask via the mailing lists: hn-atlas-PATHelp@cern.ch (Physics Analysis Tools) and hn-atlas-offlineSWHelp@cern.ch (mainly Athena).

Sidebar

When is following this guide required?

If you do find yourself doing any Athena development (e.g. simulation, reconstruction, derivation) you are required to adhere to the style guide. Merge request shifters will hopefully enforce these rules.

Combined Performance and Analysis software packages are typically less restrictive with their requirements, but a few still maintain contribution guidelines that encourage high-quality code. For example, EGamma CP closely follows the Google C++ and Python style guides.

Let’s be honest though. We don’t expect that everyone is going to open those links and read the full style guides. The best way to learn these things is incrementally.

Internet

There are always answers to any coding problems on the Internet. We hope this bootcamp has expanded your ATLAS related coding vocabulary so that you can acquire your answers easier and faster.

Key Points

  • Start paying attention to good coding practice.

  • There are experts willing to help!


Use GitLab Wisely

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • What usually causes trouble when using git?

  • How could we avoid them?

Objectives
  • Highlight non-optimal practice that occurs very often.

As illustrated by the previous instructors, Git (GitLab) provides very convenient functions maintain our repositories. We should make the most of it.

It is not Dropbox

The first recipe I got after joining ATLAS was:

"Copy my folder on lxplus to your directory and follow the README there. 
Do not use the version on GitLab, it does not work"

We hope this will not be the recipe you will share with your colleagues:).

Pull often, commit often

Have you done this before:

mv project project_backup
git clone ssh://git@gitlab.cern.ch:7999/project.git

Because of merge conflicts? The time spent on fixing individual files again can be saved by many git commit/push/pull commands while developping your code.

Informative commit messages

I browsed the commit messages I made six year ago:

Why was I so apologetic?

I’d hope the younger me could have just written down what the mistake was rather than being apologetic.

Nothing really?

This is rather ambiguous as it could really have been literally nothing or a reflection of my state of mind. It turned out that I fixed a dump bug introduced by me.

There are many articles defining how and why to write a good commit message (here’s one) but the basics are:

Merge requests and code reviews

Merge requests

Merge requests in GitLab (and pull requests in GitHub) are some of the most powerful tools for collaboration and ensuring good practices are followed. You should always make a merge request when submitting a significant change to a code base used by multiple people. Merge requests give other developers the opportunity to comment on your code, make suggestions, and review your work.

Code reviews

Code reviews are one of the most effective techniques for any software project. Not only do they keep bugs from making it into the code base, but they are great for sharing knowledge. They are one of the best tools for mentoring new developers. Merge requests are the core structures for code reviews. They enable comments on individual lines of code, and keep a record of discussion about how a decision was reached. You are encouraged to a) review any code that someone writes to your repository and b) request than any code you push to a repository is looked over by a colleague.

Formatting Rules

An excellent way to learn about the ATLAS codebase is to sign up for a merge review shift! As a level-1 shifter you mostly follow a checklist and can raise any confusing changes to the level-2 shifters. Try it out!

Key Points

  • Commit often.

  • Write informative commit messages.


We Like Portable Code

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • Why should we make our code portable?

Objectives

Usually we do our work on certain machines. People based at CERN would prefer lxplus while people based outside may prefer their local clusters or personal machines. Each individual may also have a different shell setup. It is very rare that the person who writes the code is the sole user. As a result, it is important to make sure the code can be used by others on a different machine/setup easily. It is really worth the time thinking about this as in the end if your code does not work for someone, you will likely receive an email or message asking for help:)

Avoid user/machine specific setups

It can be prevented by using relative paths or environment variables. It also makes our softwares easier to run on the grid as the grid can not access your local machines.

Test it often

The infrastructures/softwares keep evolving (slc5 -> slc6 -> centos7, Athena Rel 20.7 -> Rel 21). The packages involved can change very often as well. We should test our code often.

Having it compiled is not the end goal as we want the results to be meaningful ultimately. So it is better to test the whole work flow.

It feels really great when everything just works out of the box after following the recipes in the README.

CI saves the day!

The best tool for this is continuous integration (CI) testing on GitLab (or GitHub). Using Docker images, you can perform tests on many emulate platforms. Remember to look back at the CI/CD tutorial if you need a reminder of how to get started with this.

Key Points

  • Use relative path and environment variables.

  • Test it often.

  • Use CI.


We Like Short Code

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How to make our code easier for others to use?

Objectives

“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.”

- Bill Gates

We work in a large collaboration and very likely when we need some functions in our code someone has already done it before and it may even exist in many well maintained packages.

ROOT has a lot of stuffs

Question:

"Have you created some functions and later on realized that ROOT already has them?"

Nowadays there are more well maintained packages to use so look around! Start with a look at the ROOT Tutorials to get a sense of what is possible.

But be careful of the pitfalls:

Elegance!

Functions are better than copy/paste

Prrof of concept:

Loop!

We can use multiple files.

"Would you rather fight 100 500-line C++ scripts or 10 5000-line C++ scripts?"

Refactoring

Another habit that’s good to form is refactoring your code. Refactoring is the process of restructuring existing computer code without changing its external behavior. It is intended to improve is non-functional attributes while preserving its functional behavior.

This can include:

Do you think that the first time you write some code it is going to be the most correct or the most readable? Me neither. Refactoring is an important process of making sure the code is high-quality. It also is often how you catch unnoticed bugs!

Testing is a very important part of refactoring! The more automated tests you have, the more confident you can be that your changes didn’t unexpectedly break anything.

Want to know more?

Refactoring Guru has some excellent lessons on how to refactor your code and why it’s useful.

Key Points

  • Use existing libraries if we can.

  • Create common functions if we need them very often.

  • Split if certain blocks become monstrous.


Design Your Package

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • When and how to design a package efficiently.

Objectives
  • Understand when designing a program can be useful.

  • Learn that UML exists

Fast solutions vs sustainable solutions

This is a very unique challenge for us. Usually our performance is evaluated by the results we’ve produced, not the codes behind the scenes. As a result fast solutions are naturally appealing. There are many fast solutions to lure us. For instance, we can open a TBrowser in ROOT and make the plots beautiful by hands. Also there are many unexpected studies when doing an analysis. We often have to perform a very specific study that goes beyond the current scope of your software. In this case, having something simple and fast to tackle down this one or two issues is indeed the most sufficient way.

However, we also have frequent updates from various groups (combined performance groups, data preparation groups, physics groups, etc) and some of them can involve substantial changes. It is not surprising to find out that a script we wrote before does not work any more.

A sustainable solution would take those possible changes into account and become flexible enough to incorporate them. They are better integrated into the whole workflow and can save us a lot of time in the long run. Giving the facts mentioned above, it is also challenging.

Plan and organize the package

Blueprints are important. It is hard to make sure all small pieces fit together nicely if we do not think ahead. Draw some diagrams on paper and think about what the upstream/downstream softwares are and who the expected users are. Try to figure out which parts of the package need to be rigid (or OK to be rigid) and which parts are definitely necessary to be flexible

Let’s have a quick brain storming section:

"What are the most common requests or comments one would receive during meetings that may require software change?"

Change the binning? Extend the binning? Add a histogram? Update CP recommendations? Add a ratio panel? Submitting grid jobs? Babysitting grid jobs?…..

Well if you have a list let’s try to make our software able to do those things easily.

Live with it or change it

Unless we start a project from scratch, we will probably be given some existing packages. If everything is up-to-date, nicely written and well organized. CONGRATULATIONS! But often this is not the case. In this situation, we can either live with it and try to get the work done or try to change it. When we have such a dilemma, we should think about how long we are gonna use them and how many more new things we are supposed to do. Often the time it is better to revamp. Do not let the argument “Please do not re-invent the wheels” stop you, you are crafting better wheels.

Good wheel!

Before you start writing code

Take a deep breath… and get a blank sheet of paper

Software design

Modeling

Modeling is the designing of software applications before coding

The software we create is complex and involves lots of abstract ideas that need to work together. The classes and functions you write can quickly grow to more than you can keep in your head all at once. Large or small projects can all benefit from a careful consideration of how your software should work. If you want to build a house, your best first tool is a pencil, not a hammer.

Sidebar

An excellent way to begin writing software is to draw a diagram. These can be diagrams that describe the inheritance of classes and interfaces, the ownership of objects, the logical flow of one or more functions, etc. All of these are useful thoughts to put on paper. Of course software is not a house; code is free. Your design can change as you learn new things about your data or your requirements.

Unified Modelling Language

A powerful standard for producing these designs is called Unified Modelling Language, or UML. UML consists of many specifications that are widely adopted for creating designs for your software. These diagrams can be grouped into two broad categories:

You may find some of these diagrams more useful than others. Here is a very nice UML tutorial that gives a concise description of how to use the most common diagrams.

Here are a few specific examples of diagrams that you may want to try drawing for your code:

Class diagram

The class diagram is a structural diagram. The purpose of class diagram is to model the static view of an application. Class diagrams are the only diagrams which can be directly mapped with object-oriented languages and thus widely used at the time of construction.1

Sidebar

Interaction diagram

The purpose of interaction diagrams is to visualize the interactive behavior of the system. Visualizing the interaction is a difficult task. Hence, the solution is to use different types of models to capture the different aspects of the interaction.1

One of these models, shown below, is called the sequence diagram which captures the time sequence of the message flow from one object to another.

Sidebar


You don’t have to learn how these diagrams work right now, but it’s good to know that they exist. When you’re staring at a blank piece of paper trying to figure out how to design your code, take a look at the UML tutorial because chances are, theres a useful model to follow.

Key Points

  • Modeling is the designing of software applications before coding.

  • Layout the diagrams on paper.

  • Think about the expected users.

  • Consolidating standalone scripts.

  • Modeling is the designing of software applications before coding.


Subtle Things That Make Our Lives Better

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • What prevents others from understanding our codes?

Objectives
  • Start apply some simple but very useful practice.

Here are some suggestions from ATLAS colleagues about practices they think are particularly useful for our collaboration.

We need comments

Quotes from a colleague:

"I did something quick and dirty in my code, wrote it down in my notebook (physical notebook). Now I have problems when using my code and I want to check what I did before. Guess what, I left my notebook at CERN and I could not go back!"

An Instagram post from a friend when she was on VACATION:

Art is everywhere?

A snapshot from a piece of code written by a paranoid programmer:

Art is everywhere!

We should add comments where we are not sure whether what we are doing is correct (FIXME) or at places to conclude a loop/block. Also if we think we might be the only person on this planet writing such a block of code, we should probably add some comments.

Use easy to understand variable/file names

It is quite desperate to look for a variable named as “m” and figure out what it is doing. Unless it is an index or a counter or something similar.

We physicists like acronyms, they can be funny but we should make sure they are understandable when using them in our code.

Follow a certain naming convention can help a lot.

Testing

Write automated tests for the libraries and tools you are developing. For example pytest is a great library to help with testing. To take a somewhat strong position: any code that’s not tested should be assumed to not work.

For more reading, Atlassian has a nice post on the different kinds of testing in software.

Documentation

The never ending battle of documenting your code… Documentation is very important not just for your future colleagues, but also for yourself. “The most likely person to read your code is you six months from now, and unfortunately, past you doesn’t respond to emails” (no idea where I heard that).

Like the several levels of testing, there are several levels of documentation. Code can be seen as self-documenting if it’s written with clear variable names. It also benefits from sensible inline comments. And it’s always a good idea to leave method docstrings in python or Doxygen-style comments in C++ to clarify what your functions are doing. These are also used for automated documentation parsing.

On top of these low-level pieces of documentation, each software package you write should have a README file that explains what it does, how it’s used, and who to ask for help.

Key Points

  • Add comments in your code.

  • Use variable/function/file names that are easy to interpret.


Summary

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • Is it worth our time?

Objectives
  • It is worth the time!

It is well paid off

Of course paying attention to those points requires some extra time. But it is well paid off. It saves both ours and our colleagues’ time. We can then have more interesting lunch conversations:)

My ideal lunch conversation compositions:

Start early, benefit early

This extensive bootcamp has exposed you to a wide range of topics and know we have all learned a lot. This might be your first investment on software. Keep doing it and let it grow. Like long term investment, small but continuous efforts can result in a great fortune. Unlike any investment, this is much safer.

Key Points

  • Long term gain.