Controlling when a backup happens on a node

Answered

Comments

12 comments

  • Official comment
    Geir Ove Grønmo

    Sesam subscriptions (not self-hosted) currently makes backups at a randomized time inside the 22:00-03:00 UTC time window. It is currently not possible to override this.

    What is the use-case for needing to control the backup time?

    Comment actions Permalink
  • Jon Bryndorf

    We have another database that needs to be in sync with the Sesam database and if we need to restore our  database we need to chose a backup that matches a Sesam backup that matches in time (down to minutes it must match) otherwise will we get some conflicts that are close to impossible to resolve.

    0
    Comment actions Permalink
  • Geir Ove Grønmo

    It sounds to me like that is a next-to-impossible approach to take. Is Sesam writing data to that database or is Sesam reading data from the database? What is the reason for the conflicts? Can't you have Sesam all the data a second time? It would be helpful if you could explain the use case in more detail.

    0
    Comment actions Permalink
  • Jon Bryndorf

    Thanks for your reply

    The use case is the following:

    From our system a request for information from a third party solution is send through Sesam. Sesam checks if the information already exist. If the information is not found the request is procced and the response is saved both in Sesam and in our local database

    If a crash happens and Sesam gets two hours ahead of our backup and the same request is send again then Sesam will say that the information exist but it does not in our system. This is these kind of conflicts we want to avoid.

    We can take a meeting to explain in further details.

    0
    Comment actions Permalink
  • Jon Bryndorf

    Hi Sesam

    Are there any updates ? We need clarification soon since we go into production in September.

    0
    Comment actions Permalink
  • Geir Ove Grønmo

    Thank you for the explanation and sorry for the late reply.

    One way to work around this problem is to have Sesam pull in the state from your local database. That pipe needs to perform a full sync of all the data it needs every so often. It could do it on every run, every nth run or on a schedule. The pipe that receives the data from the third party system can then hop[1] to the dataset that contains the data from your local database to check if the information is found. To prevent the second pipe to process the data to soon you can use the completeness feature[2] to make it read its input data only after the state has been synced from your local database.

    [1] https://docs.sesam.io/DTLReferenceGuide.html#hops-dtl-function
    [2] https://docs.sesam.io/product-features.html#completeness

    0
    Comment actions Permalink
  • Geir Ove Grønmo

    The two pipes would look roughly something like this:

    [
        {
            "_id": "local-database-state",
            "type": "pipe",
            "source": {
                "type": "sql",
                "system": "local-database",
                "primary_key": ["the_id"],
                "schema": "dbo",
                "table": "some_table_or_view"
            }
        },
        {
            "_id": "processing-pipe",
            "type": "pipe",
            "source": {
                "type": "dataset",
                "dataset": "third-party-requests",
                "completeness": true
            },
            "transform": {
                "type": "dtl",
                "rules": {
                    "default": [
                      [...filter_or_some_other_logic...,
                         ["hops", {
                             "datasets": ["local-database-state s"],
                             "where": [
                                 ["eq", "_S.some_id", "s.the_same_id"]
                             ]
                         }]
                        ]
                    ]
                }
            }
        }
    ]
    0
    Comment actions Permalink
  • Jon Bryndorf

    Hi Sesam

    Thanks for the good solution but we have a specific system where a synching with the proposed solution will takes days and that is not acceptable.

    So we cannot use the proposed solution.

    0
    Comment actions Permalink
  • Geir Ove Grønmo

    I see. Could you write the response only to the local database and then sync it back to Sesam? Then you could use the same kind of technique as described above.

    0
    Comment actions Permalink
  • Jon Bryndorf

    Hi Geir

    That will defeat the purpose of using Sesam as a backup since when we will have our on database backup and then we have double backup. Our own and yours.

    It is possible to do something custom for our nodes so you (Sesam) control the time when the backups happens ? 

    0
    Comment actions Permalink
  • Geir Ove Grønmo

    Sesam's backup system is an implementation detail and it is implemented to support disaster recovery. There are technical reasons why we cannot control/expose the exact time when the backup runs. To make further more guarantees to prevent loss of data the durable data feature should be enabled.

    The use-case you describe is when the local database is restored and it is thus in a state that is older than the state in Sesam. Given that there is a processing queue, then another workaround is to rewind the pipe that processes that queue. The pipe should be rewound to the point in time right before the timestamp of the backup that was used to restore the local database. Would that work?

    0
    Comment actions Permalink
  • Jon Bryndorf

    Thank you the reply.

    It would work. We will take it from here.

    Thanks for your good replies and help.

    0
    Comment actions Permalink

Please sign in to leave a comment.