Backup_and_Restore#

Overview#

Backup and Restore is a loaded topic that means different things to different people. The general principle that has driven IPA to date is to run several masters as a way to ensure that data is preserved in case of catastrophic failure. So back up has meant continuity of service by keeping several copies of the data in multiple servers.

Backup can also mean offline backups, for which our answer has been to fully back up a machine. We have been resistant to providing a list of files to back up because this has been a moving target as new services are added and others are enhanced (like switching from the MIT LDAP backend to our own).

This design will focus on the following scenarios:

  1. Critical IPA system backup and restoration

  2. LDAP data backup and restoration

To be done later:

  1. Partial data restoration (pick and choose entries to restore)

Full metal backup is left as an exercise for the system administrator. The only requirement is that the backup be done with the IPA services offline.

Use Cases#

Catastrophic hardware failure on a machine.#

  • Backup would be full-metal or VM snapshot

  • Restore would be full-metal or restore VM snapshot

Basically, the machine dies. You use a full system backup, done using your preferred method.

An optional method would be:

  • Reinstall the OS from scratch

  • Configure with same FQDN, hostname

  • Install IPA packages

  • Restore from an IPA full backup

This carries some limited risk that the packages do not match exactly, or that the administrator forgets to install some optional packages like bind, bind-dyndb-ldap, samba4-*, etc.

Once restored, replication will handle applying any missing changes. The replication protocol will detect if a replica is too out of date. In this case a re-init would be required.

Failed upgrade on an isolated machine.#

The OS is still fine but the IPA data is somehow corrupted in such a way that you want to restore to a known good state.

This one is a bit more complicated, as it is not predictable what an upgrade might touch. One could in theory simply do a restoration of the files that we back up and the IPA data but this wouldn’t cover, for example, any new services that were enabled as part of the upgrade.

In other words we could only restore the files we know about. The data is another matter, see Returning to a known good state

Restore Accidentally deleted data#

This will need to be covered separately. There are currently no tools or processes for restoring individual LDAP entries. This will be investigated with help from the 389-ds team. The issues are:

  • A tool to pick the entries to be restored

  • How to deal with uuid, modifiedby, creator, etc.

  • How to restore a deleted entry w/o the changelog intervening

  • How automatically to deal with membership and other relationships

  • What to do when restoring managed entries

Returning to a known good state#

Because we lack the ability to restore individual entries, the most that one can do is restore to a last known good state. This requires that ALL masters be restored at the same time, and all offline, until they are all restored and ready to go.

The issue is the 389-ds changelog. If one server is left with the corrupted data it will replay that to the other servers after they have been restored.

The restore program will disable all replication agreements on all masters before restoring the data.

The act of re-initializing a master will re-enable its agreement.

Design#

Basic Process#

As stated previously, there are four basic types of backup/restore, of which we will provide scripts/process for two.

The four possibilities are:

  • Full system backup/restore

  • Offline full IPA server backup/restore (e.g monthly)

  • Online full LDAP data backup/restore (e.g. daily or weekly)

  • Individual LDAP data backup/restore

This design covers only the middle two.

Full IPA backup#

Files for full IPA backup#

IPA touches hundreds of files. We will need to back up a mix of specific files and all files within a set of directories (e.g. certmonger, files needed for IPA uninstall, etc).

It will be incumbent upon developers to maintain this list as new files are modified/added to IPA.

Logs will be optionally backed up and restored.

The full list is included at the end of this document.

Full System Backup Process (offline)#

This is a raw file-based backup which is why it is done offline. All IPA services need to be stopped in order to ensure a safe backup.

This will include the LDAP DB files so this is a standalone backup. A scripted process would include this:

# ipactl stop
# tar --xattrs --selinux -czf /path/to/backup
# ipactl start

Note that this a simplified view and doesn’t include the metadata we will package as well.

Full System Restore Process#

# ipactl stop
# cd / && tar -xzf /path/to/backup
# ipactl start
# service sssd restart

Note that this a simplified view and doesn’t include the metadata we will package as well.

We will not verify in advance of restoration that the system services match the data (e.g. IPA is configured to start bind but the bind packages aren’t installed). This can be a future enhancement.

LDAP#

The db2bak method should be used to perform the LDAP backup. This will also back up (and can restore) the changelog. The scripts provided by 389-ds are not adequate for our purposes because they require either an operator to provide the DM password or to store it in a file.

The perl equivalents will do online backup/restore which is what we want but we want it to be seamless. The solution is to provide our own scripts which uses ldapi and autobind, allowing password-less backup/restore.

The scripts will need to be robust enough automatically handle the case of multiple instances (PKI-IPA and the IPA-REALM) as well as the case of a single instance. For the case of a single instance we will need to provide the list of backends to backup.

The 389-ds team recommends an LDIF back up as well because it is easier to move to another machine and is human readable. Therefore we will perform a db2ldif -r backup at the same time, and store the ldif with the backup files. The restore command will provide an option to extract the ldif.

Instances#

Depending on the upgrade path of IPA, there will be one or two 389-ds instances: one for IPA and one for the CA. Both will be backed up in all cases. An option for ipa-restore will allow one to conditionally restore instance data if needed. The possible instances are slapd-REALM and slapd-PKI-CA

Backends#

Depending on the upgrade path of IPA, there will be one or two 389-ds instances: one for IPA and one for the CA. Both will be backed up in all cases. An option for ipa-restore will allow one to conditionally restore instance data if needed. The possible backends are userRoot (basically $SUFFIX) and o=ipaca.

Data Backup & Restore Process (online)#

The connection will be made to the ldapi port and using autobind as root we will not be required to provide the DM password.

The default should be to back up and restore both instances (if installed) or both the IPA (userRoot) and dogtag (ipara) backends.

File Naming Convention#

Files will be stored in /var/lib/ipa/backup

Full Backups#

ipa-full-%Y-%m-%d-%H-%M-%S.bak

Data Backups#

ipa-data–%Y-%m-%d-%H-%M-%S.bak

Version wrapper#

We will want to maintain some metadata with each backup file. I propose we include:

  • Date/time of backup, store as Generalized Time in GMT

  • FQDN

  • IPA version number (ipapython.version.NUM_VERSION)

  • A backup format version number (start with 1)

  • List of services configured for the master. This may help us now or in the future verify that the system can be properly restored.

The format of this file will be a set of name/value pairs separated by =. Extra white space will be ignored. Comment is #.

We will prevent full restores on a different host.

Data restore should be allowed on any host.

Restore Validation#

Backing up is easy, restoring is hard, especially verifying that you actually backed everything up (and restored it properly).

Full restoration#

  1. Run full backup

  2. ipa-server-install –uninstall -U

  3. ipa-server-install

  4. Restore backup

Data restoration#

In the case of a single IPA master you can:

  1. Back up data

  2. Delete some data

  3. Restore from backup

Confirm that the data was restored. This will not automatically sync the restored data to other masters. Any pending changes will not be applied to the restored master but similarly any changes restored will not be sent out to the other masters. After the restoration the other masters will need to be reinitialized from the restored master:

# ipa-replica-manage re-initialize --from=

Replication#

Because we are going to backup and restore the changelog we should be ok when it comes to replication.

Agreements#

A very big issue will be what agreements exist at the time that a backup is made and restored.

For example, lets say you have a single IPA server. You add a bunch of records and then take a backup.

You add a replica, maybe even delete some records (oops).

So you do a restore.

Your data is back but your agreement is now gone because you restored a backup from prior to the agreement! The remote server will need to uninstalled and re-installed (no re-initialize is possible because the restored server doesn’t know about the replica at all).

This could potentially strand a number of servers.

External Impact#

The sssd service will need a restart. If the assumption is that the server is not in a known good state then it would be good practice to restart this service after restoring its files.

In fact, we may want to consider recommending a reboot to be sure things are in a good state, or we may need to think about extending ipactl to include other daemons.

More on partial restores#

Quite a bit of infrastructure is required to be able to pick and choose what to restore from a backup. In order to provide per-entry restoration we would need the backup in a more readable form, say LDIF, then provide a means to search for, pick and then execute restoration.

The restoration may take the form of:

  • an entry

  • a subtree

  • attributes within an entry, e.g. membership

Restoration of an entry may trigger other things to happen. Take the case where a group is accidentally removed. Not only does the group need to be restored but its membership needs to be recovered as well. Members of the group will be managed automatically but since we handle nested groups and groups can be members of other objects (HBAC, sudo, etc) we need to restore that as well.

Qualifying#

Here is a list of some things to test

  • Run IPA unit tests

  • Create a new replica

  • Manage existing replicas

  • Enroll a client

  • Unenroll a client

  • Verify that replication is still working, and working with dogtag as well

Open Questions#

Size of backup?#

Should we attempt to predict the resulting file size and try to determine if there is adequate space before starting the backup? We may be able to stat each file, sum the size, and check. It would just take a bit of time and I/O.

Encrypt backup files?#

Should we prompt for and/or encrypt with gpg the backup files? Yes

Should I delete everything before doing a restore?#

For example, if you have a single master, you do a backup, then you add a replica. If you then restore the backup and try to create another replica it will fail because the changelog directory already exists. Who knows what other problems might be lurking.

I’m inclined to suggest/force uninstalling the server first. We just may not be in any position to do that depending on how hosed things are.

The other alternative is to create a list of these corner cases and test for them on reinstall.

Implementation#

Full Restore#

If you do a full backup without the logs and do a restoration into a FS that doesn’t have an installed IPA server then tomcat will not stop. This is because the log files needed by the CA are created on-the-fly by the instance creation process. If the directory structure is created manually then things will work.

Uninstall#

The backup files are NOT removed on uninstall. When it comes to data, I prefer not to delete things automatically.

Development notes (semi-interesting testing)#

As part of developing the backups I tried a couple of fairly outlandish things. Here are those things and the outcomes. I’m not sure if these will ever be eventually interesting or helpful, but I don’t want to lose anything.

Backup, uninstall, reinstall, restore JUST the LDAP server#

So I wanted to verify that the restoration actually worked, so what I did was:

#. ipa-server-install ...
#. kinit admin
#. ipa user-add tuser1
#. ipa user-add tuser2
#. db2bak
#. ipa-server-install --uninstall -U
#. ipa-server-install (same options as above)
#. bak2db
#. ipa-getkeytab -k /etc/dirsrv/ds.keytab -p ldap/`hostname\` -s
   \`hostname\` -D 'cn=directory manager' -w password
#. service dirsrv restart
#. kdestroy
#. kinit admin
#. ipa-getkeytab -k /etc/httpd/conf/ipa.keytab -p HTTP/`hostname\` -s
   \`hostname\`
#. ipa-getkeytab -k /etc/krb5.keytab -p host/`hostname\` -s \`hostname\`
#. ipactl restart
#. service sssd restart
#. ipa user-show admin
#. ipa user-find (confirm that I have the 2 new users)
#. id tuser1 (to confirm that sssd is working)

So what does this do? Well, it replaces the CA for one. And it invalidates all certificates.

It is also, if all you have is the data backup, is a way to restore an IPA system.

A lot more work would be needed to actually make things work. All clients and services would need new certs.

And we overwrote the 389-ds and Apache server certs when we reimported the data, so those would need to be re-issued.

For the most part, any certificates in the data should be deleted because they are for a CA that no longer exists, so revocation will fail.

There may be quite a bit of certmonger rework needed, or it could be that certmonger could fix all the certs for us using: ipa-getcert resubmit.

References#

Feature Management#

UI

The backup/restore commands will need to be executed as root so it is unlikely that system backup/recovery can be managed from the UI. It could also represent a chicken-and-egg problem on restoration.

CLI

There will be two basic, standalone commands:

ipa-backup OPTIONS
   --data    Back up just the data. Default is full system backup.
   --gpg     Encrypt the backup
   --gpg-keyring ``\ ``   The gpg key name to be used (or full path)
   --logs    Include logs in backup
   --online Perform the LDAP backups online, for data only.

We will only encrypt the payload. The header will be in the clear.

ipa-restore OPTIONS /path/to/backup
   --data             If the backup is a full backup, restore only the data
   --extract        Extract the backup files, do not restore (including the LDIF)
   --gpg-keyring ``\ ``    The key name to be used by gpg
   --data             Restore only the data
   --online          Perform the LDAP restores online, for data only.
   --instance=INSTANCE   The 389-ds instance to restore (defaults to all found)
   --backend=BACKEND     The backend to restore within the instance or
                                              instances
   --no-logs         Do not restore log files from the backup
   -U, --unattended      Unattended restoration never prompts the user

ipa-restore will detect if the backup file provide contains only the data, but if provided a full backup it should be able to restore just the data component.

There are also common options:

--version             show program's version number and exit
-h, --help            show this help message and exit
-p PASSWORD, --password=PASSWORD
                    Directory Manager password
-v, --verbose       print debugging information
-d, --debug         alias for --verbose (deprecated)
-q, --quiet         output only errors
--log-file=FILE     log to the given file

Full list of files and directories to back up#

Directories#

  • /usr/share/ipa/html

  • /etc/pki-ca

  • /etc/httpd/alias

  • /var/lib/pki-ca

  • /var/lib/ipa-client/sysrestore

  • /var/lib/sss/pubconf/krb5.include.d

  • /var/lib/authconfig/last

  • /var/lib/certmonger

  • /var/lib/ipa

  • /var/run/dirsrv

  • /var/lock/dirsrv

Files#

  • /etc/named.conf

  • /etc/sysconfig/pki-ca

  • /etc/sysconfig/dirsrv

  • /etc/sysconfig/ntpd

  • /etc/sysconfig/krb5kdc

  • /etc/sysconfig/pki/ca/pki-ca

  • /etc/sysconfig/authconfig

  • /etc/resolv.conf

  • /etc/pki/nssdb/cert8.db

  • /etc/pki/nssdb/key3.db

  • /etc/pki/nssdb/secmod.db

  • /etc/nsswitch.conf

  • /etc/krb5.keytab

  • /etc/sssd/sssd.conf

  • /etc/openldap/ldap.conf

  • /etc/security/limits.conf

  • /etc/httpd/conf/password.conf

  • /etc/httpd/conf/ipa.keytab

  • /etc/httpd/conf.d/ipa-pki-proxy.conf

  • /etc/httpd/conf.d/ipa-rewrite.conf

  • /etc/httpd/conf.d/nss.conf

  • /etc/httpd/conf.d/ipa.conf

  • /etc/ssh/sshd_config

  • /etc/ssh/ssh_config

  • /etc/krb5.conf

  • /etc/group

  • /etc/passwd

  • /etc/ipa/ca.crt

  • /etc/ipa/default.conf

  • /etc/named.keytab

  • /etc/ntp.conf

  • /etc/dirsrv/ds.keytab

  • /etc/sysconfig/dirsrv-REALM

  • /etc/sysconfig/dirsrv-PKI-IPA

  • /root/ca-agent.p12

  • /root/cacert.p12

  • /var/kerberos/krb5kdc/kdc.conf

  • /etc/dirsrv/slapd-REALM

  • /var/lib/dirsrv/scripts-realm

  • /var/lib/dirsrv/slapd-realm

  • /usr/lib64/dirsrv/slapd-PKI-IPA

  • /etc/dirsrv/slapd-PKI-IPA

  • /var/lib/dirsrv/slapd-PKI-IPA

Logs#

This is a mix of files and directories

  • /var/log/pki-ca

  • /var/log/dirsrv/slapd-REALM-COM

  • /var/log/dirsrv/slapd-PKI-IPA

  • /var/log/httpd

  • /var/log/ipaserver-install.log

  • /var/log/kadmind.log

  • /var/log/pki-ca-install.log

  • /var/log/messages

  • /var/log/ipaclient-install.log

  • /var/log/secure

  • /var/log/ipaserver-uninstall.log

  • /var/log/pki-ca-uninstall.log

  • /var/log/ipaclient-uninstall.log

  • /var/named/data/named.run

GPG encryption#

The backup can be optionally encrypted using GPG. To create a key you can run:

# cat >keygen <<EOF
     %echo Generating a standard key
     Key-Type: RSA
     Key-Length: 2048
     Name-Real: IPA Backup
     Name-Comment: IPA Backup
     Name-Email: root@example.com
     Expire-Date: 0
     %pubring /root/backup.pub
     %secring /root/backup.sec
     %commit
     %echo done
EOF
# gpg --batch --gen-key keygen
# gpg --no-default-keyring --secret-keyring /root/backup.sec \
      --keyring /root/backup.pub --list-secret-keys

This will create the key backup and can be passed to ipa-backup using:

# ipa-backup --gpg --gpg-keyring=/root/backup ...

Troubleshooting#

gpg2 now requires an external program to enter pins to make it “easier” for desktop folks.

To run purely from a console add "pinentry-program /usr/bin/pinentry-curses" to .gnupg/gpg-agent.conf before generating a key.

How to Test#

General test outline#

  • Install server

  • Do a LDAP search for uid=admin,cn=users,cn=accounts,$SUFFIX. Note the result.

  • Verify that the commands ipa user-show admin, id admin, ipa cert-find, host$HOSTNAMElocalhost, kinit admin work. This checks basic functionality of IPA client, PAM, CA, DNS and Kerberos. Note the output of these commands

  • (Do backup & restore)

  • Do a LDAP search on admin again; check that all attributes except krbLastSuccessfulAuth match

  • Run the above commands again, check that they are successful and the output matches.

  • Uninstall server

Test Full Backup and Restore#

The “Do backup & restore” steps are:

  • ipa-backup -v

  • Uninstall server

  • ipa-restore$BACKUP_PATH

Test Backup and Restore with Removed Users#

The “Do backup & restore” steps are:

  • ipa-backup -v

  • Uninstall server

  • Remove users dirsrv and pkiuser

  • Add system user ipatest_user1 (to claim the UID of a removed user)

  • ipa-restore$BACKUP_PATH

At the end of the test, remove user ipatest_user1

Test Backup and Restore with SELinux Booleans Off#

The “Do backup & restore” steps are:

  • ipa-backup -v

  • Uninstall server

  • Turn SELinux booleans httpd_can_network_connect and httpd_manage_ipa off

  • ipa-restore$BACKUP_PATH

After restoring, check that the above booleans are on.

Test Backup and Restore from heavily upgraded instance#

Start with a master that has been in-place-upgraded since there was a separate 389-ds instance for IPA.

  • ipa-backup -v

  • Uninstall server

  • ipa-restore ...

Backup and Restore is NOT a method of eliminating that extra instance.

Data backup only#

  • ipa-backup --data

  • Add a new user

  • ipa-restore /var/lib/ipa/ipa-data-...

  • Ensure that the new user is gone

Online data restore#

  • ipa-backup --data

  • Add a new user

  • ipa-restore --online /var/lib/ipa/ipa-data-...

  • Ensure that the new user is gone

  • Ensure IPA is still functioning properly

Encryption/decryption of Backup files#

  • ipa-backup --gpg --gpg-keyring=/path/to/keyring

  • ipa-server-install --uninstall -U

  • ipa-restore --gpg-keyring=/path/to/keyring /var/lib/ipa/ipa-...

Client/Replica installation with restored Master#

  • ipa-backup

  • ipa-restore

  • Create new replica

  • Enroll client