Scaling of endpoint-pipes with huge datasets

Answered

Comments

1 comment

  • Official comment
    Geir Ove Grønmo

    One way to solve this is to partition the endpoint pipe. Split the endpoint pipe into N parts and then use a subset for each of those pipes. Use a hash function on _id to produce the subset values so that entities are partitioned consistently. In this case you can run the N endpoint pipes in parallel. This is a nice way scale out pipes.

    This partitioning has to be done manually for now, but we are considering adding support for this through a templating feature in the config.

    Comment actions Permalink

Please sign in to leave a comment.