Temporal Blog 09月30日
Saga模式轻松实现
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Saga模式是一种用于分布式系统的设计模式,适用于涉及多个步骤且部分执行不可接受的场景。例如,旅行规划需要购买机票、预订酒店和安排活动,这些步骤相互依赖。如果其中一个步骤失败,Saga模式可以撤销已完成的步骤,确保系统状态一致。Temporal框架简化了Saga模式的实现,通过自动保存状态和重试失败的操作,开发者只需编写补偿逻辑。

🔍 Saga模式适用于涉及多个步骤且部分执行不可接受的场景,如旅行规划、订单处理和供应链管理。它通过将任务分解为一系列本地事务,并在失败时提供补偿逻辑,确保系统状态一致。

🔄 Saga模式由两部分组成:补偿逻辑和状态管理。补偿逻辑用于撤销已完成的步骤,而状态管理用于记录程序进度,以便在失败时恢复。Temporal框架自动处理状态管理和重试,简化了实现过程。

🔑 确保每个步骤是幂等的,即多次执行产生相同的结果。使用唯一标识符(如idempotency key)来区分不同的交易,防止重复执行导致的问题。例如,使用客户端ID或UUID作为标识符,确保酒店预订只执行一次。

🚀 Temporal框架通过自动保存状态和重试失败的操作,简化了Saga模式的实现。开发者只需编写补偿逻辑,无需担心状态管理和重试的实现细节。这使得Saga模式在微服务架构中更加实用和高效。

Saga Pattern Made Easy#

Trip planning with sagas but without the baggage#

So the last time your family went to the local park everyone was talking about the Saga Design Pattern, and now you want to know what it is, if you should make it part of your own distributed system design, and how to implement it. As we all know, software design is all about fashionable1 trends2.

Read now: Automating the Saga Pattern with Temporal ›

The case for sagas#

If you’re wondering if the saga pattern is right for your scenario, ask yourself: does your logic involve multiple steps, some of which span machines, services, shards, or databases, for which partial execution is undesirable? Turns out, this is exactly where sagas are useful. Maybe you are checking inventory, charging a user’s credit card, and then fulfilling the order. Maybe you are managing a supply chain. The saga pattern is helpful because it basically functions as a state machine storing program progress, preventing multiple credit card charges, reverting if necessary, and knowing exactly how to safely resume in a consistent state in the event of power loss.

A common life-based example used to explain the way the saga pattern compensates for failures is trip planning. Suppose you are itching to soak up the rain in Duwamish territory, Seattle. You’ll need to purchase an airplane ticket, reserve a hotel, and get a ticket for a guided backpacking experience on Mount Rainier. All three of these tasks are coupled: if you’re unable to purchase a plane ticket there’s no reason to get the rest. If you get a plane ticket but have nowhere to stay, you’re going to want to cancel that plane reservation (or retry the hotel reservation or find somewhere else to stay). Lastly if you can’t book that backpacking trip, there’s really no other reason to come to Seattle so you might as well cancel the whole thing. (Kidding!)

Above: a simplistic model of compensating in the face of trip planning failures.

There are many “do it all, or don’t bother” software applications in the real-world: if you successfully charge the user for an item but your fulfillment service reports that the item is out of stock, you’re going to have upset users if you don’t refund the charge. If you have the opposite problem and accidentally deliver items “for free,” you’ll be out of business. If the machine coordinating a machine learning data processing pipeline crashes but the follower machines carry on processing the data with nowhere to report their data to, you may have a very expensive compute resources bill on your hands3. In all of these cases having some sort of “progress tracking” and compensation code to deal with these “do-it-all-or-don’t-do-any-of-it” tasks is exactly what the saga pattern provides. In saga parlance, these sorts of “all or nothing” tasks are called long-running transactions. This doesn’t necessarily mean such actions run for a “long” time, just that they require more steps in logical time4 than something running locally interacting with a single database.

How do you build a saga?#

A saga is composed of two parts:

    Defined behavior for “going backwards” if you need to “undo” something (i.e., compensations) Behavior for striving towards forward progress (i.e., saving state to know where to recover from in the face of failure)

The avid reader of this blog will remember I recently wrote a post about compensating actions. As you can see from above, compensations are but one half of the saga design pattern. The other half, alluded to above, is essentially state management for the whole system. The compensating actions pattern helps you know how to recover if an individual step (or in Temporal terminology, an Activity) fails. But what if the whole system goes down? Where do you start back up? Since not every step might have a compensation attached, you’d be forced to do your best guess based on stored compensations. The saga pattern keeps track of where you are currently so that you can keep driving towards forward progress.

So how do I implement sagas in my own code?#

I’m so glad you asked.

leans forward

whispers in ear

This is a little bit of a trick question because by running your code with Temporal, you automatically get your state saved and retries on failure at any level. That means the saga pattern with Temporal is as simple as coding up the compensation you wish to take when a step (Activity) fails. The end.

The why behind this magic is Temporal, by design, automatically keeps track of the progress of your program and can pick up where it left off in the face of catastrophic failure. Additionally, Temporal will retry Activities on failure, without you needing to add any code beyond specifying a Retry Policy, e.g.,:

RetryOptions retryoptions = RetryOptions.newBuilder()       .setInitialInterval(Duration.ofSeconds(1))       .setMaximumInterval(Duration.ofSeconds(100))       .setBackoffCoefficient(2)       .setMaximumAttempts(500).build();

To learn more about how this automagic works, stay tuned for my upcoming post on choreography and orchestration, the two common ways of implementing sagas.

So to express the high-level logic of my program with both the vacation booking steps plus compensations I wish to take on failure, it would look like the following in pseudocode:

try:   registerCompensationInCaseOfFailure(cancelHotel)   bookHotel   registerCompensationInCaseOfFailure(cancelFlight)   bookFlight   registerCompensationInCaseOfFailure(cancelExcursion)   bookExcursioncatch:   run all compensation activities

In Java, the Saga class keeps track of compensations for you:

@Overridepublic void bookVacation(BookingInfo info) {   Saga saga = new Saga(new Saga.Options.Builder().build());   try {       saga.addCompensation(activities::cancelHotel, info.getClientId());       activities.bookHotel(info);       saga.addCompensation(activities::cancelFlight, info.getClientId());       activities.bookFlight(info);       saga.addCompensation(activities::cancelExcursion,                             info.getClientId());       activities.bookExcursion(info);   } catch (TemporalFailure e) {       saga.compensate();       throw e;   }}

In other language SDKs you can easily write the addCompensation and compensate functions yourself. Here's a version in Go:

func (s *Compensations) AddCompensation(activity any, parameters ...any) {    s.compensations = append(s.compensations, activity)   s.arguments = append(s.arguments, parameters)}func (s Compensations) Compensate(ctx workflow.Context, inParallel bool) { if !inParallel {      // Compensate in Last-In-First-Out order, to undo in the reverse order that activies were applied.       for i := len(s.compensations) - 1; i >= 0; i-- {            errCompensation := workflow.ExecuteActivity(ctx, s.compensations[i], s.arguments[i]...).Get(ctx, nil)         if errCompensation != nil {              workflow.GetLogger(ctx).Error("Executing compensation failed", "Error", errCompensation)            }        }    } else {     selector := workflow.NewSelector(ctx)     for i := 0; i < len(s.compensations); i++ {            execution := workflow.ExecuteActivity(ctx, s.compensations[i], s.arguments[i]...)           selector.AddFuture(execution, func(f workflow.Future) {                if errCompensation := f.Get(ctx, nil); errCompensation != nil {                    workflow.GetLogger(ctx).Error("Executing compensation failed", "Error", errCompensation)                }            })       }        for range s.compensations {          selector.Select(ctx)        }    }}

The high level Go code of steps and compensations will look very similar to the Java version:

func TripPlanningWorkflow(ctx workflow.Context, info BookingInfo) (err error) {   options := workflow.ActivityOptions{       StartToCloseTimeout: time.Second * 5,       RetryPolicy:         &temporal.RetryPolicy{MaximumAttempts: 2},   }   ctx = workflow.WithActivityOptions(ctx, options)   var compensations Compensations   defer func() {       if err != nil {           // activity failed, and workflow context is canceled           disconnectedCtx, _ := workflow.NewDisconnectedContext(ctx)           compensations.Compensate(disconnectedCtx, true)       }   }()   compensations.AddCompensation(CancelHotel)   err = workflow.ExecuteActivity(ctx, BookHotel, info).Get(ctx, nil)   if err != nil {       return err   }   compensations.AddCompensation(CancelFlight)   err = workflow.ExecuteActivity(ctx, BookFlight, info).Get(ctx, nil)   if err != nil {       return err   }   compensations.AddCompensation(CancelExcursion)   err = workflow.ExecuteActivity(ctx, BookExcursion, info).Get(ctx, nil)   if err != nil {       return err   }   return err}

This high-level sequence of code above is called a Temporal Workflow. And, as mentioned before, by running with Temporal, we don’t have to worry about implementing any of the bookkeeping to track our progress via event sourcing or adding retry and restart logic because that all comes for free. So when writing code that runs with Temporal, you only need to worry about writing compensations, and the rest is provided for free.

Idempotency#

Well, okay, there is a second thing to “worry about.” As you may recall, sagas consist of two parts, the first part being those compensations we coded up previously. The second part, “striving towards forward progress” involves potentially retrying an activity in the face of failure. Let’s dig into one of those steps, shall we? Temporal does all the heavy lifting of retrying and keeping track of your overall progress, however because code can be retried, you, the programmer, need to make sure each Temporal Activity is idempotent. This means the observed result of bookFlight is the same, whether it is called one time or many times. To make this a little more concrete, a function that sets some field foo=3 is idempotent because afterwards foo will be 3 no matter how many times you call it. The function foo += 3 is not idempotent because the value of foo is dependent on the number of times your function is called. Non-idempotency can sometimes look more subtle: if you have a database that allows duplicate records, a function that calls INSERT INTO foo (bar) VALUES (3) will blithely create as many records in your table as times you call it and is therefore not idempotent. Naive implementations of functions that send emails or transfer money are also not idempotent by default.

If you’re backing away slowly right now because your Real World Application does a lot more complex things than set foo=3, take heart. There is a solution. You can use a distinct identifier, called an idempotency key, or sometimes called a referenceId or something similar to uniquely identify a particular transaction and ensure the hotel booking transaction occurs effectively once. The way this idempotency key may be defined based on your application needs. In the trip planning application, clientId, a field in BookingInfo is used to uniquely identify transactions.

type BookingInfo struct {   Name     string   ClientId string   Address  string   CcInfo   CreditCardInfo   Start    date.Date   End      date.Date}

You also probably saw the clientId used to register the compensation in the above Java workflow code:

saga.addCompensation(activities::cancelHotel, info.getClientId());

However, using clientId as our key limits a particular person from booking more than one vacation at once. This is probably what we want. However, some business applications may choose to build an idempotency key by combining the clientId and the workflowId to allow more than one vacation at once booked per-client. If you wanted a truly unique idempotency key you could pass in a UUID to the workflow. The choice is up to you based on your application’s needs.

Many third-party APIs that handle money already accept idempotency keys for this very purpose. If you need to implement something like this yourself, use atomic writes to keep a record of the idempotency keys you’ve seen so far, and don’t perform an operation if its idempotency key is in the “already seen” set.

Benefits vs Complexity#

The saga pattern does add complexity to your code, so it’s important to not implement it in your code just because you have microservices. However, if you need to complete a task (like booking a trip with an airfare and hotel) that involves multiple services and partial execution is not actually a success, then a saga will be your friend. Additionally, if you find your saga getting particularly unwieldy, it may be time to reconsider how your microservices are divided up, and roll up the ol’ sleeves to refactor. Overall, Temporal makes implementing the saga pattern in your code comparatively trivial since you only need to write the compensations needed for each step. Stay tuned for my next post, where I dig into sagas and subscription scenarios, where Temporal particularly shines in reducing complexity when working with sagas.

The full repository that uses the code mentioned in this article can be found on GitHub:

If you want to see other tutorials of sagas using Temporal, please check out the following resources:

Additionally one of my colleagues, Dominik Tornow, gave an intro to sagas on YouTube.

Learn more about Temporal in our courses, tutorials, docs, and videos.

Notes#

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Saga模式 分布式系统 Temporal框架 旅行规划 补偿逻辑 幂等性
相关文章