0% found this document useful (0 votes)
13 views18 pages

Docker Installation and Requirements Guide

This document provides guidance on Docker, outlining its purpose, installation, and execution for research computing. It includes definitions of key terms, best practices for writing Dockerfiles, and information on managing images, containers, and networking. Additionally, it addresses data privacy and security considerations, as well as sharing Docker images and Dockerfiles within the research community.

Uploaded by

jebarajben123
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views18 pages

Docker Installation and Requirements Guide

This document provides guidance on Docker, outlining its purpose, installation, and execution for research computing. It includes definitions of key terms, best practices for writing Dockerfiles, and information on managing images, containers, and networking. Additionally, it addresses data privacy and security considerations, as well as sharing Docker images and Dockerfiles within the research community.

Uploaded by

jebarajben123
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Insert Laboratory Specific Name Here

Docker Container Requirements, Installation, and Execution

1.0 Purpose
The purpose of this document is to provide guidance on Docker, which is the most commonly used
container program in research computing. Here the aim is to do the following:
 Orient first-time users to Docker resources
 Define and explain key terms
 Summarize Docker features and functions
 Assess the program’s data privacy and security protections
Containerization has been addressed in the Containerization Purpose and Approaches document and
should be consulted if you are unfamiliar with the purpose of containers and approaches to their creation
and use.

2.0 Scope
This document describes the requirements, installation, and execution of Docker, and it contains
information about Docker resources. Broad topics within Docker’s functions and options are summarized,
and important terms are defined.

3.0 Related Documents


Title Document Control Number
Containerization Purpose and Approaches

4.0 Definitions
Term Definition
bind mount A filesystem on the host machine used by Docker to store container
output
container An image that is currently running and editable, but changes will not
be saved to the image
Dockerfile A file that contains the commands needed by Docker to build an
image
image The complete virtual reproduction of a computer storage device
base image An image that starts from the minimal image “scratch” (i.e., is built
from a Dockerfile that starts with “FROM scratch”)
parent image An image called by FROM in the Dockerfile, which is modified to
create the new image
layer One of a series of objects or commands in an image, as specified in
the Dockerfile
port A process or service defined within a computer operating system out
Document #: Revision #: Effective Date: Page 1 of 18
of which flows information
registry A storehouse of images, which can be either public (e.g., Docker Hub,
Google, Amazon Web Services(AWS)) or private (cloud or on-site,
using Docker Trusted Registry or home-built)
volume A filesystem managed by Docker and used to store container output,
sometimes from multiple containers; volumes can be “named”
(having a specified source container) or anonymous (without a
specified source)
tarball A jargon term for a tar archive (.tar file), which is a group of files
collected together as one

5.0 Installation and Initial Guidance


5.1 Docker is open source, and local versions can be installed on Linux, Windows, or Mac operating
systems. Requirements, instructions, and downloads can all be accessed here: [Link]/get-
docker.
a. On Linux, “Docker Engine” is installed directly. This is the program that runs containers and is
bundled with other Docker programs in the Docker Desktop program for MacOS and Windows.
b. Docker Engine packages that are available with some Linux distributions are not maintained by
Docker.
c. Docker Desktop for Windows requires Windows 10 (Professional, Business, Educational, and Home
versions), 64-bit processing, 4GB system RAM, Windows Hyper-V and Container features, and BIOS-
level hardware virtualization.
d. Docker Desktop for Mac requires a 2010 computer model or later, OS 10.13 or later, and at least
4GB of RAM.
5.2 The home page of Docker Docs ([Link]) offers a comprehensive collection of entry points to
other resources, including program downloads, installation guides, FAQs, videos, links, community
platforms, manuals, news, and developer guides.
5.3 Docker Labs
a. Accessed by right-clicking on the Docker icon in one’s System Tray, then going to the “Learn” menu
option
b. Opens a tutorial in a Docker console, with one-click functionality for downloading a Dockerfile,
using it to build a container in which a tutorial program is run, and saving the built image to Docker
Hub.
c. The container is then shown in the Docker console and can be opened in a browser
d. In this “Getting Started” container one is then led through the steps to create, share, and run
another container, this one holding a JavaScript program that generates an interactive To-Do list
manager
5.4 The State Public Health Bioinformatics (StaPH-B) community has written a User Guide for Docker,
which can be found here: [Link]

Document #: Revision #: Effective Date: Page 2 of 18


6.0 Dockerfiles
A Dockerfile is a simple text file (saved without an extension and usually named simply “Dockerfile”) that
instructs Docker on the software, dependencies, and data needed to replicate a particular computing
environment. The Dockerfile is executed to build another, much larger file, called an “image,” which houses
everything specified by the Dockerfile. The environment generated by the image can be thus used in a new
setting, such as on a different machine or on the same machine at a later time.
Specifically, the Dockerfile instructs the creation of layers and temporary images, on top of which is a new
writeable layer, also known as the container layer. For example, the following Dockerfile creates a temporary
image from Python 3, then a layer by adding the script (script_1.py), then a layer from running pyBio (a
Python library required by the script), and finally a temporary, interactive layer from the execution of the
script.
FROM python:3
ADD script_1.py /
RUN pip install pyBio
CMD [ "python", "./script_1.py" ]

Dockerfiles use a limited number of instructions, and only RUN, COPY, and ADD create layers that increase
the size of a build. The most commonly used instructions are as follows:
 FROM – All Dockerfiles need to start with this command, which specifies a parent image that is
subsequently modified to create the new image. In some cases one may want to specify the
complete contents of the starting image, in which case the Dockerfile begins with the minimal
Docker image “scratch;” the Dockerfile begins with “FROM scratch” and builds what is called a
“base image.”
 WORKDIR – sets directory for command execution
 VOLUME – enables container access to a host directory
 ENV – sets environmental variables
 COPY – copies files and directories from host to container filesystem
 ADD – copies files, directories, url sources, and tar files from host to container filesystem
 LABEL – specifies metadata for Docker objects
 EXPOSE – sets a port for networking with container, which can be overridden at runtime with new
instructions
 USER – sets the user ID which runs the container
 ENTRYPOINT – specifies the executable to be run when the container is started
 RUN – executes a command in a new layer
 CMD – executes command(s) within the container, which can be overwritten from the command
line
A file named “.dockerignore” can be placed in the working directory alongside the Dockerfile to indicate
which files and directories should be ignored when building the image. This file takes wildcards, which helps
when asking Docker to ignore all files of a particular type. (e.g., *.pyc).

Document #: Revision #: Effective Date: Page 3 of 18


6.1 Best Practices for writing Dockerfiles ( i.e. Docker Docs)
Docker focuses on speed, storage efficiency, and mobility in the use of containers, which informs their
suggested best practices for creating Dockerfiles:
a. Create ephemeral containers – use Dockerfiles to create containers that can be “stopped and
destroyed, then rebuilt and replaced,” as much as possible.
b. Understand build context – “Build context” refers to the current working directory, which is usually
the same as the Dockerfile location but can be specified elsewhere.
c. Pipe Dockerfile through “stdin” – Standard input (stdin) refers to an input stream, and Docker can
build a container without there being an actual Dockerfile by specifying the path with a dash and
specifying the Dockerfile instructions in the command line.
d. Exclude undesired files and directories with .dockerignore
e. Use multi-stage builds – A relatively new feature in Docker, multi-stage creates an efficient
workflow in one Dockerfile whereby the needed output from one application is used to continue
the build using another application; previously, this required two Dockerfiles and a more elaborate
build command.
f. Do not install unnecessary packages
g. Decouple applications – Putting different applications into different containers allows the
containers to be reused, facilitating container management and modularity.
h. Minimize the number of layers – By employing multi-stage builds and using RUN, COPY, and ADD
only as necessary, the size of builds are minimized.
i. Sort multi-line arguments – Putting several arguments on different lines and alphabetizing them
helps one avoid the mistake of duplicating packages and makes Dockerfiles easier to review.
j. Leverage build cache – Unless instructed otherwise, Docker will look through existing images in
one’s cache for reuse rather than creating it anew, but this may not be more efficient than creating
new images, as validating a cache match can take time.

7.0 Images
Dockerfiles are executed through the “build” command to produce an exact replica of a desired computing
environment saved in a single file. This file is similar to a virtual machine, but without the kernel and OS, and
it is referred to as an “image.” Because image files contain applications, dependencies, and possibly data,
they can be very large.
The command “docker build” does not need to specify the Dockerfile if executed in the same directory, and
the resulting image is named using “-t”. A variety of other options can be specified, including the removal of
intermediate containers, setting image metadata, setting memory limits, and designating an output location.
One can use the docker image history command to see the command in the Dockerfile used to create each
layer within the image, and the “--no-trunc” flag prevents truncation of longer lines. The final image file can
now be run as a container or shared to be run elsewhere.

Document #: Revision #: Effective Date: Page 4 of 18


8.0 Running Containers and Saving Output
An image is run using the “docker run” command, with further arguments to specify (among other options)
the image file and, if necessary, a port. A running image is known as a “container.” Containers can be
interactive, new layers can be added, and new data can be generated in the writable layer. The changed
image can be saved as a tarfile (also called a tarball), and new layers and data will increase the size of the file.
Those layers can be rolled back when the image is run again. If the container is exported to a tarball (as
opposed to saved), the resulting file lacks history and metadata, and rolling back added layers from the
original image cannot be done. This functionality is demonstrated in the walk-through exercise in Appendix A
below.
The preferred way to save data after a container is closed is in a “volume.” A volume is a filesystem
completely managed by Docker that exists outside of a container. Historically, Docker used “bind mounts,”
which is a file system on the host machine, to store data. Volumes, however, can be at any networked
location (including the cloud), encrypted, managed using Docker utilities, and more safely shared between
multiple containers. “Named” volumes have a specific source and are deleted when that container is deleted;
meanwhile “anonymous” volumes can be mounted by any container and persist after any of them are
deleted.

9.0 Networking
Docker will create networks through which containers can communicate with each other, the host machine,
and the outside world. These can be seen using the command “docker network ls,” and the details of any one
network can be examined with “docker network inspect [network name or ID].” The default network through
which containers can communicate to each other (using their IP addresses) is through Docker’s “bridge
network,” and containers can be added to this as needed. However, users can also define their own networks
and associated containers using the Docker Compose tool, and such networked containers can communicate
with each other using their aliases. (More detail on this is presented in Appendix B below.)
For those wanting to use Docker’s advanced networking capabilities, be aware of certain Docker settings.
These can be seen in the Settings panel of the Docker Desktop Dashboard. The first one is the “File Sharing”
list, which lets one add host machine locations that can be mounted by Docker containers. The second is the
"Expose daemon on tcp://localhost:XXXX without TLS” option, which has risks and is unchecked by default.
Networking designs can still be hampered by firewalls and a lack of admin privileges, but inasmuch as these
barriers are low, certain useful functionality is possible. For example GUI (graphical user interface)
applications can be run on containers, but this requires sharing the host computer’s display environment and
using the container inside a host network.

10.0 Sharing
To fulfill their intended purpose of facilitating the replication of computational research projects, images
need to be shared. This is best done by adding them to a “registry,” which is one’s collection of images
accessible by colleagues via the internet.
There are several registry options. With an account on Docker Hub one can easily “push” images to it and
organize them into various “repositories.” However, there are other public registry services, such as Google
and AWS, and in many cases Docker images need to be private and stored behind a firewall. For this one can
Document #: Revision #: Effective Date: Page 5 of 18
establish a protected cloud-based registry or build one on-premises. For the latter, Docker provides
assistance through its one-site registry management program, Docker Trusted Registry.
This is the Docker Hub site where images built by members of the StaPH-B community are shared:
 [Link]/u/staphb
There may be hurdles to sharing full images, as they may contain proprietary software and/or sensitive data.
For these reasons, it is common to simply share Dockerfiles (sometimes also called “builds”).
This is the Github site where Dockerfiles written by members of the StaPH-B community are shared:
 [Link]/StaPH-B/docker-builds
Still, images built from the same Dockerfile at different times will differ with respect to versions of added
software, so only full images or saved tarball files should be used to best replicate an original computing
environment.
Docker is not considered appropriate for use on high-performance computing (HPC) systems, as it has access
to the system root, which makes it an exploitable pathway for malfeasance. In addition, Docker was not
developed with large-scale computing in mind, and coordinating multiple containers on cluster computing
hardware requires additional Docker applications. However, Docker images are easily ingested and converted
by Singularity and other applications designed for HPC.

11.0 General Docker Best Practices (Boettiger, 2015)


Boettiger (2015) listed best practices for Docker, which are summarized below. They emphasize
reproducibility using Docker throughout workflow development and archiving containers regularly.
11.1 Use Docker containers during development
a. If a researcher begins the creation of a workflow from within a container, code will appear to run
natively, but the computational environment and processes can be reproduced, or imaged and
shared, with only a few commands.
11.2 Write Dockerfiles instead of installing interactive sessions
11.3 Add tests or checks to the Dockerfile
a. Dockerfiles are usually used to describe the installation of software, but they can also contain
commands for executing it once installed. This acts as a check that installations have been
successful, and the software is ready to use.
11.4 Use and provide appropriate base images
a. Docker is highly amenable to modular workflows, and when successful environments have been
established and containerized, it is efficient to re-use them as needed for new projects.
11.5 Share Docker images and Dockerfiles
11.6 Archive tarball snapshots
a. Although saved images can revert back to original layers through the preservation of historical
information, one cannot revert back to earlier versions of those layers, as may have been used in
previous builds of images from the same Dockerfile. Thus saving containers as .tar files (i.e.,

Document #: Revision #: Effective Date: Page 6 of 18


tarballs) in different runs of the same image is important to test whether software updates change
results.

12.0 Appendices
12.1 Appendix A – Running Mash 2.2 in Docker on Windows
12.2 Appendix B – Running Three Networked Containers with a Shared Host Folder

13.0 References
1. Boettiger, C., 2015. An introduction to Docker for reproducible research. ACM SIGOPS Operating
Systems Review, 49(1), pp.71-79.
Revision History

Rev # DCR # Change Summary Date

Document #: Revision #: Effective Date: Page 7 of 18


Appendices
Appendix A – Running Mash 2.2 in Docker on Windows
Below is a detailed description of the steps needed to build and run a Docker container on a local machine
with Windows 10. It is assumed that Docker is already installed, a process that is described on the Docker
website ([Link] and which may require permissions from your
IT administrator.
In this walk-through exercise, you will do the following:
 Download a Dockerfile for the program Mash v. 2.2 from the StaPH-B GitHub site
 Download data from the Mash website
 Build an image from the downloaded Dockerfile
 Test that the image build was successful
 Run the image to create an interactive container
 Copy downloaded data from your host machine to the container
 Conduct a simple analysis in Mash inside the container
 Copy this output to your host machine
 Save the output in the container and save this to a new image in Docker
 Save the new image with data as a tar file on your machine
 Create an image from the tar file, run the image, and confirm that the data and output are there
Mash is a bioinformatics program that calculates the distance, or degree of difference, between two
genomes. It is aimed for use in metagenomics and the massive sequence collections now available, and it
allows efficient searching and clustering of sequences. It does this in part by creating “sketches” of genomes
first, which drastically reduces the sizes of genome files and speeds genome comparisons.
1. Download the Mash 2.2 Dockerfile from the StaPH-B community site on GitHub
a. [Link]
b. The code here can be copied into a text file named “Dockerfile” (without an extension), or
one can right-click on the “Raw” button and save the file without an extension.
c. The contents of the Dockerfile are below (with the maintainer information redacted).
Comments are indicated by the pound sign (#), and it starts with a Linux (Ubuntu) base
image.
# base image
FROM ubuntu:xenial
# metadata
LABEL [Link]="ubuntu:xenial"
LABEL [Link]="1"
LABEL software="Mash"
LABEL [Link]="2.2"
LABEL description="Fast genome and metagenome distance estimation using MinHash"

Document #: Revision #: Effective Date: Page 8 of 18


LABEL website="[Link]
LABEL license="[Link]
LABEL maintainer="Maintainer Name"
LABEL [Link]="username@[Link]"
# install dependencies
RUN apt-get update && \
apt-get -y install wget && \
apt-get clean
RUN wget [Link] && \
tar -xvf [Link] && \
rm -rf [Link] && \
mkdir /data
# add mash to path, and set perl locale settings
ENV PATH="${PATH}:/mash-Linux64-v2.2" \
LC_ALL=C
WORKDIR /data
# make db dir. Store db there. Better to have db's added in the last layers
RUN mkdir /db && \
cd /db && \
wget [Link] && \
gunzip [Link]
2. Download data from the Mash website.
a. [Link]
b. [Link]
3. Open Docker
a. Search for Docker Desktop among applications and then Open.
b. Find the Docker icon in the System Tray.
i. Right-click on the Docker icon and open the Dashboard.
4. Open Command Prompt (a.k.a., Terminal or Console).
5. Build an image from the Dockerfile.
a. In Terminal navigate to the same folder as the Dockerfile.
b. Enter docker build -t mash2.2 . .
i. The period at the end of the command tells Docker that one will start in the current
directory, and so it will use the file named “Dockerfile” there. If the Dockerfile has a
different name, the option -f can be used to specify it (e.g., docker build -t
mash2.2 -f Dockerfile-2 .).
ii. This builds an image named “mash2.2” and puts in the default directory for images.

Document #: Revision #: Effective Date: Page 9 of 18


iii. The image, or virtual hard drive, for Docker itself can be found in the settings in the
Docker Dashboard, and it is usually in a hidden folder in the root (e.g., C:\
ProgramData\DockerDesktop\vm-data).
iv. Docker images are put inside Docker’s virtual hard drive.
6. See the new Docker image by entering docker images.
a. The image youjust built has “mash2.2” under the “Repository,” although it will be treated like
a name. You could have added a tag to the image by putting it after a colon in the tag
function when building the image (e.g., docker build -t mash2.2:new .).
b. The image has an ID number, which can also be used to run or save the image.
c. Adding -a after the images command allows us to see intermediate images downloaded
while building the layers of the final images (e.g., docker images -a).
d. Images can be removed using the command rmi and the image name or ID. For example, to
remove the image you just built, you would enter docker rmi mash2.2.
7. Run the image by entering docker run -it mash2.2. If no tag is specified, Docker will default
to “mash2.2:latest,” but if one gave it the tag “new” as per step 6a above, then one needs to enter
docker run -it mash2.2: new.
a. This creates a container in which you can use Mash.
b. All containers can be seen in the Docker Dashboard, where they can also be run, stopped,
deleted, opened in a port, or interacted with in a command line window. Currently running
containers are green.
c. The container is given a randomly generated name by Docker unless you use the --name
option, e.g., docker run -it --name mash2.2container mash2.2. You will
need to use “mash2.2container” to refer to this container from here.
d. The -it option runs the container interactively, meaning it generates a command line
interface for the user to give it instructions and produce output. This particular container can
only run interactively and immediately stops running without this option.
e. A command line for the container now starts in Terminal (root@[containerID]:/data#), or you
can use the command line available through the Docker Dashboard (the “CLI” button to the
right of the container). To return to the host computer in Terminal, you can stop the
container in the Dashboard and restart it. To run the image such that your retain control of
your Terminal for other commands, us the -d (detached) option with docker run. (e.g.,
docker run -it -d --name mash2.2container mash2.2).
f. Commands in this container follow basic Linux/Unix commands, unlike Terminal on the host,
which require Command Prompt commands.
g. All containers can also be seen in Terminal by entering docker ps -a.
i. Omitting the -a option shows only running containers.
8. Test that the Mash image was built properly by checking that the large file
“[Link]” inside the Mash directory was downloaded without any changes. Do
this by using the program “md5sum,” which calculates a mathematical description of the file’s

Document #: Revision #: Effective Date: Page 10 of 18


contents and compares it to the same calculation done when the file was made and provided with
the file. Enter the following commands inside the container:
md5sum /db/[Link] > hash.md5
md5sum -c hash.md5
a. This should result in the following message: /db/[Link]:
OK.
9. Copy the Mash data downloaded earlier to the container be entering the following command in
Terminal, replacing “[container name]” with the name of the container (whether given by the “—
name” option when running the image, as you did earlier, or the random name created by Docker)
and repeating it for the second data file ([Link]):
docker cp Desktop/app/[Link] mash2.2container:/db/[Link]
10. Now compute the distances between the first two genomes using Mash. These are Escherichia coli
genomes, and the they can be compared using this command: mash dist [Link]
[Link].
a. The output will look like this:
[Link] [Link] 0.0222766 0 456/1000
and it shows the names of the two files being compared, then a standardized distance measure
called a “Mash distance,” a p-value associated with seeing this particular distance by chance, and
the number of matching hashes (pseudo-random identifiers).
11. Now “sketch” the genome first to speed the comparison. This generates “.msh” files and stores them
in the container’s directory “db.”
a. Enter the following commands inside the container, which should produce the same output
as above, only slightly faster:
mash sketch [Link]
mash sketch [Link]
mash dist [Link] [Link]
b. Now check that the .msh files have been generated by moving out of the “data” directory
(the default directory where the command line started), and then navigating down to the
“db” directory. This is done by entering the following commands:
cd ..
cd db
dir
12. These msh files are inside the container’s Mash directory inside Docker’s virtual hard drive, so there
is no way to share these files unless they are copied to the host machine. This is done using the
following command in Docker (repeated for each file): docker cp
mash2.2container:/db/[Link] /Users/username/
a. The output will be unreadable on the host machine, since it is not running Mash, but you can
see that they are there and have some sort of content. The files can now easily be shared
with colleagues.

Document #: Revision #: Effective Date: Page 11 of 18


13. If you stop and delete this container now, you will lose the imported data and the sketches produced
from them unless they have been copied to the host machine; they will not appear when the image
you built is run again. However, you can commit the changes you have produced to a new image by
running the following command: docker commit mash2.2container
mash2.2_with_data. As always, the container name can be replaced by its ID, which can be seen
by entering docker ps -a in the Terminal. The new image name is your choice, and here we’ve
simply added “_with_data.” The new image will now appear in your list of images (seen in Terminal
with docker images), and it can be run again like any other image, except it will have the data
and output generated earlier.
14. You can now save the new image with the added data as tar file, or tarball, on your host machine,
where you can share it. Images are inside the Docker virtual hard drive, but Docker saves tar files to
your User folder inside your machine’s filsesystem. This is done by entering the following command
in Terminal: docker save mash2.2_with_data mash_w_data.tar] . If you are in your
User folder (/Users/your_user_name), then type dir to see its contents, including your new tar file.
15. If you receive a tar file generated by someone else, it needs to be brought into the available images
in Docker before it can be run, and this is done using the load command. The tar file will be moved
into Docker’s virtual hard drive with the other images and given the original image name. You can
test this with the tar file you just made using the following steps:
a. Delete your previous containers. Containers can be easily stopped and deleted in the Docker
Dashboard using the stop and trash icons, or by entering docker rm --force
[container name] in Terminal.
b. Delete your images using the docker rmi [image name] command (see section 6d
above) in Terminal. Confirm this by entering docker images.
c. Enter the command docker load -i mash_w_data.tar.
d. Enter docker images. This will show you that “mash2.2_with_data” is now an available
image.
e. Run this image (docker run -it mash2.2_with_data), and inside the running
container repeat steps 11b above. This will show that the imported data and generated
outputs are in the db folder as before.

Document #: Revision #: Effective Date: Page 12 of 18


Appendix B – Running Three Networked Containers with a Shared Host Folder

Below is an exercise that will use Docker Compose to run three containers in a networked environment, analyze
a small DNA sequence data set, and use a shared host directory to pass data and results among the host and
containers.
In this walk-through exercise, you will do the following:
 Create a directory structure to store data, Dockerfiles, and output
 Download DNA sequences of 31 SARS-CoV-2 isolates
 Download Dockerfiles MAFFT (for DNA alignment), RAxML (phylogenetics), and FigTree (visualization)
 Create a docker-compose file to build the images and connect a host folder to each
 Build the docker-compose network
 Run the docker-compose network
 Align the DNA sequences, find their evolutionary relationships, and output a pdf of the tree to the host
folder
 Shut down and remove the networked containers

A common analysis done with DNA sequence data is to create a tree of their historical relationships, and this is
done by first aligning the sequences, which can be done in the program MAFFT. This is done by introducing gaps
in the sequences as needed to make them all the same length and put homologous bases at the same position in
each sequence. The aligned sequences can then be read by the program RAxML, which will conduct a search of a
large number of possible connections between the sequences to find the optimal evolutionary tree. The output
of RAxML is a text file that describes the best tree using nested parentheses. It also outputs other tree files,
including ones with statistical measures of support and alternate topologies with the same likelihood. A large
number of programs can convert these parenthetical trees into graphical trees, and FigTree can do this via a
simple command line.
1. On your Desktop create a folder named “cova,” and within that folder create a subfolder named
“shared.”
2. Download the full zipped “cova” directory from GitHub here: [Link]/A-Farhan/cova:
a. On the page click on the “Code” button and choose “Download ZIP.”
b. Move the file to the Desktop and expand it.
c. Open Terminal and navigate to the “cova/shared” folder created in Step 1 above.
d. Move the file “[Link]” from the expanded “cova-master” directory to “cova”:
i. On Windows use the move command like this: move C:\Users\<username>\
Desktop\cova-master\cova-master\datasets\example\
[Link] C:\Users\<username>\Desktop\cova\shared
3. Download (or copy-paste) MAFFT and RAxML Dockerfiles from the StaPH-B GitHub page and put them in
“Desktop/cova.” Dockerfile content can be pasted into a new text file, which can be saved in /cova
without a file extension using the names in parts 3b and 4b below. Your text editor may still insist on
adding a file extension, so check the saved file and remove the extension if necessary.
a. Scroll down on the StaPH-B GitHub Builds page to find the folder containing the Dockerfile:
[Link]/StaPH-B/docker-builds.
Document #: Revision #: Effective Date: Page 13 of 18
b. Rename the Dockerfiles with the program name at the end:
i. Dockerfile_mafft
ii. Dockerfile_raxml
c. The MAFFT Dockerfile should look similar to this (with additional metadata labels):

FROM ubuntu:bionic

RUN apt-get update && apt-get install -y wget

RUN wget [Link] && \


dpkg -i mafft_7.450-1_amd64.deb && \
mkdir /data

WORKDIR /data

d. The RAxML Dockerfile should look similar to this (with additional metadata labels):

FROM ubuntu:bionic

RUN apt-get update && \


apt-get -y install build-essential\
wget \
zip \
unzip && \
apt-get clean

RUN wget [Link] && \


tar -xvf [Link] && \
rm -rf [Link] && \
cd standard-RAxML-8.2.12/ && \
make -f [Link] && \
rm *.o && \
make -f [Link] && \
rm *.o && \
make -f [Link] && \
rm *.o && \
make -f [Link] && \
rm *.o && \
make -f [Link] && \
rm *.o && \
make -f [Link] && \
rm *.o

RUN wget [Link]


ng_v0.9.0_linux_x86_64.zip && \
unzip raxml-ng_v0.9.0_linux_x86_64.zip && \
Document #: Revision #: Effective Date: Page 14 of 18
mkdir raxml_ng && \
mv raxml-ng raxml_ng/ && \
rm -rf raxml-ng_v0.9.0_linux_x86_64.zip

ENV PATH="${PATH}:/standard-RAxML-8.2.12:/raxml_ng"

WORKDIR /data

e. Notice that each Dockerfile directs its container to start in the directory named “data,” which
you will see when you open the running container later.
4. Download (or copy-paste) the FigTree Dockerfile from The GitHub Page for BioContainers and save it to
“Desktop/cova.”
a. [Link]/BioContainers/containers/blob/master/figtree/1.4.4-3-deb/Dockerfile
b. Rename the file with the program name:
i. Dockerhub_figtree
c. The FigTree Dockerfile should look like this (with additional metadata labels):

FROM biocontainers/biocontainers:vdebian-buster-backports_cv1

USER root

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update && (apt-get install -t buster-backports -y figtree || apt-get install -y figtree)
&& apt-get clean && apt-get purge && rm -rf /var/lib/apt/lists/* /tmp/*

USER biodocker

5. Build a Docker Compose file in a text editor.


a. Open a blank document and save it in /cova as “[Link]” and paste in the following
content:

version: "3.7"

services:
mafft:
image: mafft
build:
context: /Users/qlu0/Desktop/cova/
dockerfile: Dockerfile_mafft
stdin_open: true
volumes:
- ./shared:/shared
raxml:
image: raxml
build:

Document #: Revision #: Effective Date: Page 15 of 18


context: /Users/qlu0/Desktop/cova/
dockerfile: Dockerfile_raxml
stdin_open: true
volumes:
- ./shared:/shared
figtree:
image: figtree
build:
context: /Users/qlu0/Desktop/cova/
dockerfile: Dockerfile_figtree
stdin_open: true
volumes:
- ./shared:/shared

b. Notice the “version” command at the top of the file. This is the version of Docker Compose,
which can be found by simply entering docker-compose version. The third part of the
version is not necessary, i.e., version 3.7.4 is covered by 3.7 in the docker-compose file. Not
putting in the correct version can cause the dockerfile to fail at creating networked volumes and
generating informative error messages.
c. The instruction “stdin_open: true” is important for keeping the containers open for interactive
control. It is equivalent to the -it option when using docker run for individual containers.
d. The docker-compose file will be executed from the Desktop/cova folder, and this can be
specified simply by a period and slash (“./”), and thus the “volumes” specifications in the docker-
compose file tell Docker to map “/cova/shared” to a folder within each container also named
“shared” (the colon indicates the container). If the folder specified in the docker-compose file
does not yet exist on the host or container, Docker will create them automatically.
6. Check your docker-compose file by entering docker-compose config in Terminal. If the file is
executable by Docker, it will simply display the file contents on the screen. If not, it will report the first
error it finds with the line and character number. The format of .yml (or .yaml) is important, and
improper indenting can cause errors. Docker-compose files can contain a large amount of information,
especially when instructing complex networks, but the best practice is to start simply and add new
components one-by-one while checking with the config command.
a. Built images can be checked by entering docker images.
7. Execute your docker-compose by using the up command in detached mode as follows: docker-
compose up -d. Docker Compose will automatically use the file named [Link] in the
working directory to set up the containers and network.
a. The resulting network can be seen on the Docker Desktop dashboard.
i. The network will have the name of the folder from which the docker-compose file
was executed, and by clicking on this, you can see the three containers that are
running inside it.

Document #: Revision #: Effective Date: Page 16 of 18


ii. The container names are prefixed with the network name and suffixed by a number.
For example, the MAFFT container is named “cova_mafft_1.” The number suffix
allows multiple containers for the same program to be run on the same network.
iii. Each of the network containers can be interfaced in the dashboard by opening the
command line window (via the “CLI” cursor icon to the right of the container name).
1. Enter the container “cova_figtree_1” and explore the directory structure. Use
pwd to see the current working directory (it should be “/data”), cd .. to go up
the higher directory, and dir to see the directory contents.
a. Confirm that there is a folder named “shared” and that it containers
“[Link].”
2. Check that the FigTree container can communicate with the other containers by
pinging them. For example, enter ping -c 1 cova_raxml_1 and confirm
that the RAxML container has returned a packet.
8. The network that connects the three containers in “cova” can be seen by entering docker network
ls. This brings up a list of the networks, and there should be you named “cova_default” that is being
run by Docker’s “bridge” driver.
a. Inspect “cova_default” using the inspect command: enter docker network inspect
cova_default. This will bring up details about the network, including the three containers
running inside of it, as well as their networking addresses. Our docker-compose file did not
specify any ports, but if it had those would be shown here, too.
9. From the Terminal, enter each container in order using the exec (“execute”) command and produce a
tree of the DNA sequences in “[Link].” With exec, use the interactive option (-i) and the
“pseudo-TTY,” or text telephone mimic, option (-t), and specify the interface platform, usually bash or
ssh. The commands in each container will produce output files used by the next container.
a. In MAFFT, you will let the program choose the default optimal settings for the [Link] file
and then align the sequences so that they are the same length and have homologous bases at
the same position in each sequence. The program will then output an aligned file named
“[Link].”
i. Enter docker exec -it cova_mafft_1 bash.
1. Docker allows options to be combined in any order, so -it is equivalent to -
ti, -i -t, and -t -i.
ii. Enter mafft --auto /shared/[Link] >
/shared/[Link].
iii. Enter exit.
b. In RAxML you will ask the program to read the aligned sequences, conduct a simple tree search,
and output the various tree files to the “shared” folder. Each file will have “1” at the end of the
file name, which can be used to identify the output from different runs of the same search.
Other options (e.g., -m, -p) give starting specifications for the computational process needed to
find the optimal tree.

Document #: Revision #: Effective Date: Page 17 of 18


i. Enter docker exec -it cova_raxml_1 bash.
ii. Enter raxmlHPC -m GTRGAMMA -p 12345 -s
/shared/[Link] -n 1 -w /shared -f a -x 12345 -
N 100 -T 12.
iii. Enter exit.
c. In FigTree any one of the text trees output by RAxML can be read and transformed into
graphical trees and output in various file formats. Here you will read the file from RAxML that
contains the “best tree” and output it as a pdf of a tree illustration with sequence identifiers at
the terminals.
i. Enter docker exec -it cova_figtree_1 bash.
ii. Enter figtree -graphic PDF -width 500 -height 800
/shared/RAxML_bestTree.1 /shared/[Link].
iii. Confirm that the pdf of a tree is now in the “/cova/shared” folder on your Desktop.
iv. Enter exit
d. Dismantle the network and its containers by entering docker-compose down.
i. Confirm that the network “cova” is gone by entering docker network ls.
ii. Confirm that the containers are now gone by entering docker ps.
iii. Confirm that the images are still available by entering docker images.

Document #: Revision #: Effective Date: Page 18 of 18

You might also like