Base resilient stack ==================== This stack is meant to be extended by SR profiles, or other stacks, that need to provide automated backup/restore, election of backup candidates, and instance failover. As reference implementations, both stack/lamp and stack/lapp define resilient behavior for MySQL and Postgres respectively. This involves three different software_types: * pull-backup * {mysoftware}_export * {mysoftware}_import where 'mysoftware' is the component that needs resiliency (can be postgres, mysql, erp5, and so on). pull-backup ----------- This software type is defined in http://git.erp5.org/gitweb/slapos.git/blob/HEAD:/stack/resilient/instance-pull-backup.cfg.in?js=1 and there should be no reason to modify or extend it. An instance of type 'pull-backup' will receive data from an 'export' instance and immediately populate an 'import' instance. The backup data is automatically used to build an historical, incremental archive in srv/backup/pbs. export ------ example: http://git.erp5.org/gitweb/slapos.git/blob/HEAD:/stack/lapp/postgres/instance-postgres-export.cfg.in?js=1 This is the *active* instance - the one providing live data to the application. A backup is run via the bin/exporter script: it will 1) run bin/{mysoftware}-backup and 2) notify the pull-backup instance that data is ready. The pull-backup, upon receiving the notification, will make a copy of the data and transmit it to the 'import' instances. You should provide the bin/{mysoftware}-exporter script, see for instance http://git.erp5.org/gitweb/slapos.git/blob/HEAD:/slapos/recipe/postgres/__init__.py?js=1#l207 http://git.erp5.org/gitweb/slapos.git/blob/HEAD:/slapos/recipe/mydumper.py?js=1#l71 By default, as defined in http://git.erp5.org/gitweb/slapos.git/blob/HEAD:/stack/resilient/pbsready-export.cfg.in?js=1#l27 the bin/exporter script is run every 60 minutes. import ------ example: http://git.erp5.org/gitweb/slapos.git/blob/HEAD:/stack/lapp/postgres/instance-postgres-import.cfg.in?js=1 This is the *fallback* instance - the one that can be activated and thus become active. Any number of import instances can be used. Deciding which one should take over can be done manually or through a monitoring + election script. You should provide the bin/{mysoftware}-importer script, see for instance http://git.erp5.org/gitweb/slapos.git/blob/HEAD:/slapos/recipe/postgres/__init__.py?js=1#l233 http://git.erp5.org/gitweb/slapos.git/blob/HEAD:/slapos/recipe/mydumper.py?js=1#l71 In practice ----------- Add resilience to your software Let's say you already have a file instance-mysoftware.cfg.in that instantiates your software. In which there is a part [mysoftware] where there is the main recipe that instantiates the program. You need to create two new files, instance-mysoftware-import.cfg.in and instance-mysoftware-export.cfg.in, following this layout: IMPORT: [buildout] extends = ${instance-mysoftware:output} ${pbsready-import:output} parts += mysoftware import-on-notification [importer] recipe = YourImportRecipe wrapper = $${rootdirectory:bin}/$${slap-parameter:namebase}-importer backup-directory = $${directory:backup} ... EXPORT: [buildout] extends = ${instance-mysoftware:output} ${pbsready-export:output} parts += mysoftware cron-entry-backup [exporter] recipe = YourExportRecipe wrapper = $${rootdirectory:bin}/$${slap-parameter:namebase}-exporter backup-directory = $${directory:backup} ... In the [exporter] / [importer] part, you are free to do whatever you want, but you need to dump / import your data from $${directory:backup} and specify a wrapper. I suggest you only add options and specify your export/import recipe. Checking that it works ---------------------- To check that your software instance is resilient you can proceed this way: Once all instances are successfully deployed, go to your export instance, connect as the instance user and run: $ ~/bin/exporter It is the script responsible for triggering the resiliency stack on your instance. After doing a backup of your data, it will notify the pull-backup instances of a new backup, triggering the transfer of this data to the import instances. Once this script is run successfully, go to your import instance, connect as its instance user and check ~/srv/backup/"your sofwtare"/, the location of the data you wanted to receive. The last part of the resiliency is up to your import script. DEBUGGING: Here is a partial list of things you can check to understand what is causing the problem: - Check that your import script does not fail and successfully places your data in ~/srv/backup/"your software" (as the import instance user) by runnig: $ ~/bin/"your software"-exporter - Check the export instance script is run successfully as this instance user by running: $ ~/bin/exporter - Check the pull-instance system did its job by going to one of your pull-backup instance, connect as its user and check the log : ~/var/log/equeue.log ----------------------------------------------------------------------------------------- Finally, instance-mysoftware-import.cfg.in and instance-mysoftware-export.cfg.in need to be downloaded and accessible by switch_softwaretype, and you need to extend stack/resilient/buildout.cfg and stack/resilient/switchsoftware.cfg to download the whole resiliency bundle. Here is how it's done in the mariadb case for the lamp stack: ** buildout.cfg ** extends = ../resilient/buildout.cfg [instance-mariadb-import] recipe = slapos.recipe.template url = ${:_profile_base_location_}/mariadb/instance-mariadb-import.cfg.in output = ${buildout:directory}/instance-mariadb-import.cfg md5sum = ... mode = 0644 [instance-mariadb-export] recipe = slapos.recipe.template url = ${:_profile_base_location_}/mariadb/instance-mariadb-export.cfg.in output = ${buildout:directory}/instance-mariadb-export.cfg md5sum = ... mode = 0644 ** instance.cfg.in ** extends = ../resilient/switchsoftware.cfg [switch-softwaretype] ... mariadb = ${instance-mariadb:output} mariadb-import = ${instance-mariadb-import:output} mariadb-export = ${instance-mariadb-export:output} ... Then, in the .cfg file where you want to instantiate your software, you can do, instead of requesting your software * template-resilient.cfg.in * [buildout] ... parts += {{ parts.replicate("Name","3") }} ... [...] ... [ArgLeader] ... [ArgBackup] ... {{ replicated.replicate("Name", "3", "mysoftware-export", "mysoftware-import", "ArgLeader","ArgBackup", slapparameter_dict=slapparameter_dict) }} and it'll expend into the sections require to request Name0, Name1 and Name2, backuped and resilient. The leader will expend the section [ArgLeader], backups will expend [ArgBackup]. slapparameter_dict is the dict containing the parameters given to the instance. If you don't need to specify any options, you can omit the last three arguments in replicate(). Since you will compile your template with jinja2, there should be no $${}, because it is not yet possible to use jinja2 -> buildout template. To compile with jinja2, see jinja2's recipe. Deploying your resilient software --------------------------------- You can provide sla parameters to each request you make (a lot: for export, import and pbs). example: Here is a small example of parameters you can provide to control the deployment (case of a runner): <?xml version='1.0' encoding='utf-8'?> <instance> <parameter id="-sla-1-computer_guid">COMP-GRP1</parameter> <parameter id="-sla-pbs1-computer_guid">COMP-PBS1</parameter> <parameter id="-sla-2-computer_guid">COMP-GROUP2</parameter> <parameter id="-sla-runner2-computer_guid">COMP-RUN2</parameter> <parameter id="-sla-2-network_guid">NET-2</parameter> <parameter id="-sla-runner0-computer_guid">COMP-RUN0</parameter> </instance> Consequence on sla parameters by request: * runner0: computer_guid = COMP-RUN0 (provided directly) * runner1: computer_guid = COMP-GRP1 (provided by group 1) * runner2: computer_guid = COMP-RUN2 (provided by group 2 but overided directly) network_guid = NET-2 (provided by group 2) * PBS 1: computer_guid = COMP-PBS1 (provided by group 1 but overided directly) * PBS 2: computer_guid = COMP-GRP2 (provided by group 2) network_guid = NET-2 (provided by group 2) Parameters are analysed this way: * If it starts with "-sla-" it is not transmitted to requested instance and is used to do the request as sla. * -sla-foo-bar=example (foo being a magic key) will be use for each request considering "foo" as a key to use and the sla parameter is "bar". So for each group using the "foo" key, sla parameter "bar" is used with value "example" About magic keys: We can find 2 kinds of magic keys: * id : example, in "-sla-2-foo" 2 is the magic key and the parameter will be used for each request with id 2 (in case of kvm: kvm2 and PBS 2) * nameid : example, in "-sla-kvm2-foo", foo will be used for kvm2 request. Name for pbs is "pbs" -> "-sla-pbs2-foo". IMPORTANT NOTE: in case the same foo parameter is asked for the group, the nameid key prevail