Skip to content

Troubleshooting for EESSI

Have you tried turning it off and on again?

Note

In this section, we will continue to use the EESSI CernVM-FS repository software.eessi.io as a running example, but the troubleshooting guidelines are by no means specific to EESSI.

Make sure you adjust the example commands to the CernVM-FS repository you are using, if needed.

Typical problems

Error messages

The error messages that you may encounter when accessing a CernVM-FS repository are often quite cryptic, especially if you are not very familiar with CernVM-FS, or with file systems and networking on Linux systems in general.

Here are a couple of examples:

  • The CernVM-FS repository may not be known (yet) on your system, which will result in a (clear) error message like this when you try to access it:

    $ ls /cvmfs/software.eessi.io
    ls: cannot access '/cvmfs/software.eessi.io': No such file or directory
    

  • You may see errors messages that suggest network connectivity problems, like:

    Failed to discover HTTP proxy servers (23 - proxy auto-discovery failed)
    

  • Other problems may be quite specific to the internals of CernVM-FS, rather than being configuration or networking issues. Examples include:

    Failed to initialize root file catalog (16 - file catalog failure)
    
    Failed to transfer ownership of /var/lib/cvmfs/shared to cvmfs
    
    ls: cannot open directory '/cvmfs/config-repo.cern.ch': Too many levels of symbolic links
    
    Transport endpoint is not connected
    
    The last error message indicates that FUSE has failed.

We will give some advice below on how you might figure out what is wrong when seeing error messages like this.

General approach

In general, it is recommended to take a step-by-step approach to troubleshooting:

  • WIP;

Common problems

CernVM-FS installation

Make sure that CernVM-FS is actually installed (correctly).

Check whether both the /cvmfs directory and the cvmfs service account exists on the system:

ls /cvmfs
id cvmfs

Either of these errors would be a clear indication that CernVM-FS is not installed, or that the installation was not completed:

ls: cannot access '/cvmfs': No such file or directory
id: ‘cvmfs’: no such user

You can also check whether the cvmfs2 command is available, and working:

cvmfs2 --help

which should produce output that starts with:

The CernVM File System
Version 2.11.2

CernVM-FS configuration

A common issue is incorrectly configuring CernVM-FS, either by making a silly mistake in a configuration file.

Reloading

Don't forget to reload the CernVM-FS configuration after you've made changes to it:

sudo cvmfs_config reload

Note that changes to specific configuration settings, in particular those related to FUSE, will not be reloaded with this command, since they require remounting the repository.

Show configuration

Verify the configuration via cvmfs_config showconfig:

cvmfs_config showconfig software.eessi.io

Using the -s option, you can trim the output to only show non-empty configuration settings:

cvmfs_config showconfig -s software.eessi.io

We strongly advise combining this command with grep to check for specific configuration settings, like:

$ cvmfs_config showconfig software.eessi.io | grep CVMFS_SERVER_URL
CVMFS_SERVER_URL='http://aws-eu-central-s1.eessi.science/cvmfs/software.eessi.io;http://azure-us-east-s1.eessi.science/cvmfs/software.eessi.io'    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/domain.d/eessi.io.conf

Be aware that cvmfs_config showconfig will read the configuration files as they are currently, but that does not necessarily mean that those configuration settings are currently active.

Non-existing repositories

Keep in mind that cvmfs_config does not check whether the specified repository is actually known at all. Try for example querying the configuration for the fictional vim.or.emacs.io repository:

cvmfs_config showconfig vim.or.emacs.io

Inspect active configuration

Inspect the active configuration that is currently used by talking to the running CernVM-FS service via cvmfs_talk.

Note

This requires that the specified CernVM-FS repository is currently mounted.

ls /cvmfs/software.eessi.io > /dev/null  # to trigger mount if not mounted yet
sudo cvmfs_talk -i software.eessi.io parameters

cvmfs_talk can also be used to query other live aspects of a particular repository, see the output of cvmfs_talk --help. For example:

  • The current revision of repository contents (via revision);
  • Information on the Stratum 1 replica server being used (via host ...);
  • Information on the proxy server being used (via proxy ...);
  • Information on the CernVM-FS client cache (via cache ...);

Non-mounted repositories

If running cvmfs_talk fails with an error like "Seems like CernVM-FS is not running", try triggering a mount of the repository first by accessing it (with ls), or by running:

cvmfs_config probe software.eessi.io

If the latter succeeds but accessing the repository does not, there may be an issue with the (active) configuration, or there may be a connectivity problem.

Repository public key

In order for CernVM-FS to access a repository the corresponding public key must be available, in a domain-specific subdirectory of /etc/cvmfs/keys, like:

$ ls /etc/cvmfs/keys/cern.ch
cern-it1.cern.ch.pub  cern-it4.cern.ch.pub  cern-it5.cern.ch.pub

or in the active CernVM-FS config repository, like for EESSI:

$ ls /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/keys/eessi.io
eessi.io.pub

Connectivity issues

There could be various issues related to network connectivity, for example a firewall blocking connections.

CernVM-FS uses plain HTTP as data transfer protocol, so basic tools can be used to investigate connectivity issues.

You should make sure that the client system can connect to the Squid proxy and/or Stratum-1 replica server(s) via the required ports.

Determine proxy server

First figure out if a proxy server is being used via:

sudo cvmfs_talk -i software.eessi.io proxy info

This should produce output that looks like:

Load-balance groups:
[0] http://PROXY_IP:3128 (PROXY_IP, +6h)
[1] DIRECT
Active proxy: [0] http://PROXY_IP:3128

(to protect the innocent, the actual proxy IP was replaced with "PROXY_IP" in the output above)

The last line indicates that a proxy server is indeed being used currently.

DIRECT would mean that no proxy server is being used.

Access to proxy server

If a proxy server is used, you should check whether it can be accessed at port 3128 (default Squid port).

For this, you can use standard networking tools (if available):

  • nc, ncat, a reimplementation of netcat:
    nc -vz PROXY_IP 3128
    
  • telnet:
    telnet PROXY_IP 3128
    
  • tcptraceroute:
    sudo tcptraceroute PROXY_IP 3128
    

You will need to replace "PROXY_IP" in the commands above with the actual IP (or hostname) of the proxy server being used.

Determine Stratum 1

Check which Stratum 1 servers are currently configured:

cvmfs_config showconfig software.eessi.io | grep CVMFS_SERVER_URL

Determine which Stratum 1 is currently being used by CernVM-FS:

$ sudo cvmfs_talk -i software.eessi.io host info
  [0] http://aws-eu-central-s1.eessi.science/cvmfs/software.eessi.io (unprobed)
  [1] http://azure-us-east-s1.eessi.science/cvmfs/software.eessi.io (unprobed)
Active host 0: http://aws-eu-central-s1.eessi.science/cvmfs/software.eessi.io

In this case, the public Stratum 1 for EESSI in AWS eu-central is being used: aws-eu-central-s1.eessi.science.

Access to Stratum 1

If no proxy is being used (CVMFS_HTTP_PROXY is set to DIRECT, see also above), you should check whether the active Stratum 1 is directly accessible at port 80.

Again, you can use standard networking tools for this:

nc -vz aws-eu-central-s1.eessi.science 80
telnet aws-eu-central-s1.eessi.science 80
sudo tcptraceroute aws-eu-central-s1.eessi.science 80

Download from Stratum 1

To see whether a Stratum 1 replica server can be used to download repository contents from, you can use curl to check whether the .cvmfspublished file is accessible ( this file must exist in every repository ):

S1_URL="http://aws-eu-central-s1.eessi.science"
curl --head ${S1_URL}/cvmfs/software.eessi.io/.cvmfspublished

If CernVM-FS is configured to use a proxy server, you should let curl use it too:

P_URL="http://PROXY_IP:3128"
S1_URL="http://aws-eu-central-s1.eessi.science"
curl --proxy ${P_URL} --head ${S1_URL}/cvmfs/software.eessi.io/.cvmfspublished
or equivalently via the standard http_proxy environment variable that curl picks up on:
S1_URL="http://aws-eu-central-s1.eessi.science"
http_proxy="PROXY_IP:3128" curl --head ${S1_URL}/cvmfs/software.eessi.io/.cvmfspublished

Make sure you replace "PROXY_IP" in the commands above with the actual IP (or hostname) of the proxy server.

If you see a 200 HTTP return code in the first line of output produced by curl, access is working as it should:

HTTP/1.1 200 OK

If you see 403 as return code, then something is blocking the connection:

HTTP/1.1 403 Forbidden

In this case, you should check whether a firewall is being used, or whether an ACL in the Squid proxy configuration is the culprit.

If you see 404 as return code, you made a typo in the curl command 😉:

HTTP/1.1 404 Not Found
Maybe you forgot the '.' in .cvmfspublished?

Note

A Stratum 1 server does not provide access to all possible CernVM-FS repositories.

Network latency & bandwidth

To check the network latency and bandwidth, you can use iperf3 and tcptraceroute.

Mounting problems

autofs

Keep in mind that (by default) CernVM-FS repositories are mounted via autofs.

Hence, you should not rely on the output of ls /cvmfs to determine which repositories can be accessed with your current configuration, since they may not be mounted currently.

You can check whether a specific repository is available by trying to access it directly:

ls /cvmfs/software.eessi.io

Currently mounted repositories

To check which CernVM-FS repositories are currently mounted, run:

cvmfs_config stat

Probing

To check whether a repository can be mounted, you can try to probe it:

$ cvmfs_config probe software.eessi.io
Probing /cvmfs/software.eessi.io... OK

Manual mounting

If you can not get access to a repository via auto-mounting by autofs, you can try to manually mount it, since that may reveal specific error messages:

mkdir -p /tmp/cvmfs/eessi
sudo mount -t cvmfs software.eessi.io /tmp/cvmfs/eessi

You can even try using the cvmfs2 command directly to mount a repository:

mkdir -p /tmp/cvmfs/eessi
sudo /usr/bin/cvmfs2 -d -f \
    -o rw,system_mount,fsname=cvmfs2,allow_other,grab_mountpoint,uid=$(id -u cvmfs),gid=$(id -g cvmfs),libfuse=3 \
    software.eessi.io /tmp/cvmfs/eessi
which prints lots of information for debugging (option -d).

Insufficient resources

Keep in mind that the problems you observe may be the result of a shortage in resources, for example:

  • Lack of sufficient memory, for example for the kernel file system cache, which will typically lead to degrated (start-up) performance;
  • Lack of sufficient disk space, for the CernVM-FS client cache, for the proxy server, or for the private Stratum 1 replica server;
  • Network latency issues, either within the local network (to the proxy server or Stratum 1 replica server), or to the outside world (public Stratum 1 replica servers) – see also the Connectivity section;

Caching woes

CernVM-FS assumes that the local cache directory is trustworthy.

Although unlikely, problems you are observing could be caused by some form of corruption in the CernVM-FS client cache, for example due to problems outside of the control of CernVM-FS (like a disk partition running full).

Even in the absence of problems it may still be interesting to inspect the contents of the client cache, for example when trying to understand performance-related problems.

Checking cache usage

To check the current usage of the client cache across all repositories, you can use:

cvmfs_config stat -v

You can get machine-readable output by not using the -v option (which is for getting human-readable output).

To only get information on cache usage for a particular repository, pass it as an extra argument:

cvmfs_config stat -v software.eessi.io

To check overall cache size, use du on the cache directory (determined by CVMFS_CACHE_BASE):

$ sudo du -sh /var/lib/cvmfs
1.1G    /var/lib/cvmfs

Inspecting cache contents

To inspect which files are currently included in the client cache, run the following command:

sudo cvmfs_talk -i software.eessi.io cache list

Checking cache consistency

To check the consistency of the CernVM-FS cache, use cvmfs_fsck:

sudo time cvmfs_fsck -j 8 /var/lib/cvmfs/shared

This will take a while, depending on the current size of the cache, and how many cores to use are specified (via the -j option).

Clearing client cache

To start afresh, you can clear the CernVM-FS client cache:

sudo cvmfs_config wipecache

Logs

By default CernVM-FS logs to syslog, which usually corresponds to either /var/log/messages or /var/log/syslog.

Scanning these logs for messages produced by cvmfs2 may help to determine the root cause of a problem.

Debug log

For obtaining more detailed information, CernVM-FS provides the CVMFS_DEBUGLOG configuration setting:

CVMFS_DEBUGLOG=/tmp/cvmfs-debug.log

CernVM-FS will log more information to the specified debug log file after reloading the CernVM-FS configuration (supported since CernVM-FS 2.11.0).

Debug logging is a bit like a firehose - use with care!

Note that with debug logging enabled every operation performed by CernVM-FS will be logged, which quickly generates large files and introduces a significant overhead, so it should only be enabled temporarily when trying to obtain more information on a particular problem.

Make sure that the debug log file is writable!

Make sure that the cvmfs user has write permission to the path specified in CVMFS_DEBUGLOG.

If not, you will not only get no debug logging information, but it will also lead to client failures!

For more information on debug logging, see the CernVM-FS documentation.

Logs via extended attributes

An interesting source of information for mounted CernVM-FS repositories is the extended attributes that CernVM-FS uses, which can accessed via the attr command (see also the CernVM-FS documentation).

In particular the logbuffer attribute, which contains the last log messages for that particular repository, which can be accessed without special privileges that are required to access log messages emitted to /var/log/*.

For example:

$ attr -g logbuffer /cvmfs/software.eessi.io
Attribute "logbuffer" had a 283 byte value for /cvmfs/software.eessi.io:
[3 Dec 2023 21:01:33 UTC] switching proxy from (none) to http://PROXY_IP:3128 (set proxies)
[3 Dec 2023 21:01:33 UTC] switching proxy from (none) to http://PROXY_IP:3128 (cloned)
[3 Dec 2023 21:01:33 UTC] switching proxy from http://PROXY_IP:3128 to DIRECT (set proxies)

Other tools

General check

To verify whether the basic setup is sound, run:

sudo cvmfs_config chksetup
which should print "OK".

If something is wrong, it may report a problem like:

Warning: autofs service is not running

You can also use cvmfs_config to perform a status check, and verify that the command has exit code zero:

$ sudo cvmfs_config status
$ echo $?
0