Unconventional use of nfs-ganesha on Ceph

by Lee, Chien-Pang | April 16, 2022

In spite of the trends towards highly scalable Object Storage in the cloud computing world, filesystem-based storage is still irreplaceable in many aspects. Its hierarchical structures, human-friendly file names and above all, mature applications with prevalent protocols such as NFS, Samba/CIFS, are rooted firmly in practical IT building blocks and daily usages. Ceph FileSystem comes with a built-in mechanism, cephfs, that can be exported via networking and mounted as a remote filesystem storage, functioning just like NFS. As built-in as it sounds, the underlying protocol is “ceph”.

    sh-4.4# mount -t ceph 10.1.0.1:/ /mnt/ceph
    sh-4.4# mount | grep ceph`
    10.1.0.1:/ on /mnt/ceph type ceph (rw,relatime,seclabel,acl)
    sh-4.4# df -h /mnt/ceph
    Filesystem      Size  Used Avail Use% Mounted on
    10.1.0.1:/      140G  4.7G  135G   4% /mnt/ceph

That’s no sweat and many people are happy with this. Nonetheless, here we are to answer to those who have to run NFS on top of Ceph for some reason, be it legacy setup, current IT restriction, old client software or something more complex in a our case when cephfs has to be bundled to a dedicated storage network which is not reachable directly by the VMs spawned on compute nodes in a different network. Whatever it is, nfs-ganesha, is here for anyone who needs his Ceph to work with NFS.

Because of the complexity of setting up Ceph storage cluster, most documents that one can find online propose to use Kubernetes or Docker to ease deployment efforts; as a result, nfs-ganesha is mostly recommended to be installed on top of containerization such that cluster HA can be achieved by the help of K8s cluster.

We do it here differently. Let’s run Ganesha as a native systemd service and manage HA with haproxy in Active/Active mode. First thing first, install needed yum repository:

    sh-4.4# dnf -y install centos-release-nfs-ganesha4


Then come these 2 major packages that give us daemon, rados-ng/rados-kv and config files

    sh-4.4# dnf -y install nfs-ganesha-ceph.x86_64 nfs-ganesha-rados-grace


Yum package manager helps me with dependencies and I end up having these installed. Samba shows up as well.

    (1/16): nfs-ganesha-selinux-4.0-1.el8s.noarch.rpm                        299 kB/s |  38 kB     00:00    
    (2/16): libtalloc-2.3.3-1.el8.x86_64.rpm                                 343 kB/s |  49 kB     00:00    
    (3/16): nfs-ganesha-rados-grace-4.0-1.el8s.x86_64.rpm                    368 kB/s |  58 kB     00:00    
    (4/16): lmdb-libs-0.9.24-1.el8.x86_64.rpm                                352 kB/s |  58 kB     00:00    
    (5/16): libtevent-0.11.0-0.el8.x86_64.rpm                                1.3 MB/s |  50 kB     00:00    
    (6/16): libtdb-1.4.4-1.el8.x86_64.rpm                                    238 kB/s |  59 kB     00:00    
    (7/16): logrotate-3.14.0-4.el8.x86_64.rpm                                340 kB/s |  86 kB     00:00    
    (8/16): libldb-2.4.1-1.el8.x86_64.rpm                                    563 kB/s | 188 kB     00:00    
    (9/16): samba-common-4.15.5-5.el8.noarch.rpm                             658 kB/s | 224 kB     00:00    
    (10/16): samba-common-libs-4.15.5-5.el8.x86_64.rpm                       624 kB/s | 177 kB     00:00    
    (11/16): libntirpc-4.0-1.el8s.x86_64.rpm                                 420 kB/s | 137 kB     00:00    
    (12/16): nfs-ganesha-4.0-1.el8s.x86_64.rpm                               1.1 MB/s | 734 kB     00:00    
    (13/16): libcephfs2-12.2.7-9.el8.x86_64.rpm                              733 kB/s | 486 kB     00:00    
    (14/16): nfs-ganesha-ceph-4.0-1.el8s.x86_64.rpm                           90 kB/s |  59 kB     00:00    
    (15/16): libwbclient-4.15.5-5.el8.x86_64.rpm                             137 kB/s | 124 kB     00:00    
    (16/16): samba-client-libs-4.15.5-5.el8.x86_64.rpm                       3.8 MB/s | 5.5 MB     00:01    


In /etc/ganesha/ceph.conf we modify couple values

    NFS_CORE_PARAM
    {
        Enable_NLM = false;
        Enable_RQUOTA = false;
        Protocols = 4;
        Bind_addr = 10.1.0.1;
    }

    NFSv4
    {
        RecoveryBackend = rados_ng;
        Minor_Versions = 1,2;
        Lease_Lifetime = 10;
        Grace_Period = 20;
    }

    MDCACHE {
        Dir_Chunk = 0;
    }

    EXPORT
    {
        Export_ID=100;
        Protocols = 4;
        Transports = TCP;
        Path = /;
        Pseudo = /;
        Access_Type = RW;
        Attr_Expiration_Time = 0;
        Squash = root;
        FSAL {
                Name = CEPH;
        }
    }

    CEPH
    {
        Ceph_Conf = /etc/ceph/ceph.conf;
    }

    RADOS_KV
    {
        pool = "cephfs_data";
    }


Bind_addr is fixed to 10.1.0.1. If it is not specifically changed to fixed IP, default is 0.0.0.0, representing all network interfaces; this will conflict with haproxy which would bind 2049 port to VIP. The example here is with such cluster constructs.

  • VIP: 10.1.0.100
  • Control node 1 (c1): 10.1.0.1
  • Control node 2 (c2): 10.1.0.2
  • Control node 3 (c3): 10.1.0.3
  • Compute node x 3 (not relevant here)
  • Storage node x 5 (not relevant here)

Start nfs-ganesha on control nodes.

    # systemctl start nfs-ganesha
    # systemctl status nfs-ganesha
    ● nfs-ganesha.service - NFS-Ganesha file server
       Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled)
       Active: active (running) since Fri 2022-04-15 18:40:08 CST; 7h ago
         Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
     Main PID: 16201 (ganesha.nfsd)
        Tasks: 45 (limit: 179687)
       Memory: 54.5M
       CGroup: /system.slice/nfs-ganesha.service
               └─16201 /usr/bin/ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT
    Apr 15 18:40:08 c1 systemd[1]: Starting NFS-Ganesha file server...
    Apr 15 18:40:08 c1 systemd[1]: Started NFS-Ganesha file server.


Now we have 3 A/A nfs-ganesha services, each on a control node. Next is the configuration for HA with haproxy. In /etc/haproxy/haproxy.cfg, add the section below.

    listen ceph_nfs_ganesha
      bind 10.1.0.100:2049
      balance  source
      option  tcpka
      option  tcplog
      server c1 10.1.0.1:2048 check inter 2000 rise 2 fall 5
      server c2 10.1.0.2:2048 check inter 2000 rise 2 fall 5
      server c3 10.1.0.3:2048 check inter 2000 rise 2 fall 5


where we tell haproxy to control forward incoming connections to our VIP 10.1.0.100 and port 2049 (standard NFS port) to any of the active NFS servers running on c1, c2 and c3 with port 2048. The last question is who would be in charge of managing where VIP (Virtual IP) runs on which of the 3 control nodes. The answer is pacemaker/corosync. Below is a snippet of create VIP resouce with pacemaker.

    # pcs resource create vip ocf:heartbeat:IPaddr2 ip="10.1.0.100" op monitor interval="30s"
    # pcs status
    Cluster name: cube-8kWwZRbkBPcR6xk3
    Cluster Summary:
      * Stack: corosync
      * Current DC: c1 (version 2.1.2-4.el8-ada5c3b36e2) - partition with quorum
      * Last updated: Sat Apr 16 02:36:41 2022
      * Last change:  Fri Apr 15 20:08:14 2022 by root via crm_resource on c3
      * 6 nodes configured
      * 9 resource instances configured

    Node List:
      * Online: [ c1 c2 c3 ]
      * RemoteOnline: [ p1 p2 p3 ]

    Full List of Resources:
      * vip (ocf::heartbeat:IPaddr2):        Started c1
      * haproxy     (systemd:haproxy-ha):    Started c1
      * cinder-volume       (systemd:openstack-cinder-volume):       Started c2
      * Clone Set: ovndb_servers-clone [ovndb_servers] (promotable):
        * Masters: [ c1 ]
        * Slaves: [ c2 c3 ]
      * p1  (ocf::pacemaker:remote):         Started c1
      * p2  (ocf::pacemaker:remote):         Started c1
      * p3  (ocf::pacemaker:remote):         Started c1


To verify, we can mount it with standard command and you can test out HA by stopping couple of the nfs-ganesha services on control nodes.

    sh-4.4# mount -t nfs 10.1.0.100:/ /mnt/nfs
    sh-4.4# mount | grep nfs
    10.1.0.100:/ on /mnt/nfs type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.1.0.254,local_lock=none,addr=10.1.0.100)
    sh-4.4# df -h /mnt/ceph
    Filesystem      Size  Used Avail Use% Mounted on
    10.1.0.1:/      139G  4.7G  134G   4% /mnt/ceph