In Search of Resiliency and High Availability – Strategies for Incremental Roll Out

So we migrated to cloud. We listed products, reorged and restructured teams. We broke down the monolithic and built micro-services. And we have been using Kubernetes for the past few years. We automated our builds and deployments using CICD pipelines. We have a lot to celebrate. Let me list some of the things that seem routine but a few years ago things were different.

  1. We deploy during business hours. This used to be done during night when there is not much traffic to our app and services. One is there could be impacts to customers during deployment. And if something happens, there could be outages. Now Kubernetes rollout update made deployment a smooth process with no downtime to app and services.
  2. Automated build and deployment pipeline allowed us to deploy multiple times during a day, versus once every two weeks, or even once every month.
  3. Ability to automatically rollback quickly also gave us more confidence for deployments.
  4. Disaster Recovery ability and redundant data centers increased high availability. Even when something happens during deployment to one data center, we can switch to the other data center.
  5. Small incremental changes and deployments help developers to have clarity of minds and make it easier for impact analysis.
  6. Emergency type of requirement can be handled quickly by pushing out changes once development work done, without having to wait on deployment schedule or wait on manual deployment process. (We don’t get this often but there are good examples of these such as government mandates during pandemic.)

These are all good and we love them. But now let’s look at the challenges. Having the capabilities of build and deploy fast, it gives us a high deployment frequency. With micro service architecture and many interconnect pieces, there could be impact to another part of the system even though on one side we are just making a small change. Some naive changes could cause the other side to collapse. And how much testing is enough? Unit tests, automated integration testing, automated functional testing. We do all that and are continuing doing that by the way. Software is just so critical. We cannot afford to have an outage of a few minutes. It causes interruptions to business and impact our customers.

(There is the saying that if you never caused an outage, then you are not a real software engineer.)

Okay. So we keep on pushing changes to the system and we can do that frequently. But how can we make the system more stable? Sure we try to eliminate mistakes from any possible failure points. Clear requirements, pair programming, code review, impact analysis, automated testing, random manual testing, acceptance tests, automated CICD pipeline, automated rollback pipeline, disaster recovery capabilities, rich monitoring and alerting systems, pushing better communication and knowledge sharing, empowering the teams, etc. Is that enough? And we know it is never enough. Each of these there is always room to improve. Let’s be honest. Software bug could still happen and we need to prepare for the system to fail. My team have been on a journey to stabilize the system and making enhancements to the overall process. We want to share our journey and invite the community to join us to find ways to adopt best practices, patterns and principles of doing things.

Just to talk about my team, Passenger Checkin of American Airlines. As an airline, my company helps about half a million passengers to travel each day. That’s also the daily checkin count for my team. Our services getting called 1000 times per minute. So let’s say some type of failure for whatever reason, might be a naïve bug that slipped through all the checking, or could be some upstream flow code change that affected passenger checkin flow, then each minute the issue is present, we have 1000 failed passenger checkins, and 5 minutes is 5000.

This is what we believe in and our expectation: Issues need to be self-reported. Waiting on customers to report issues is not acceptable. So early detection is key to fix the issue. A quick rollback of the deployment or switch traffic over to the other data center could be resolutions. Though each just take a few minutes, but we want to improve on these to avoid those 5000 passengers impact. You know travel could be stressful until you reach your destination. How stressful it could be for passengers when something is not working in the software they use.

We try to look at different aspect of our response time to an incident, Detection, and Communication. Automatic alerting system. Telemetry and Observabilities of the system. Spend time to set up all kinds of alerts. So after a deployment, not by developer checking each dashboard, because we have many apis, some of the errors may not show up on the dashboard.

The other thing we try to strengthen on is impact analysis, from starting of the development work and after. So there is a good strategy of what to check, test and communicate. Since the apis are being called by multiple clients, any contract breaking changes for sure will break the clients. Some naive change of returning back this value for a field instead of another value could also break the clients. (An additional enum value for example)

The main strategy we try to pursue is using Kubernetes deployment templates and Istio, together with gitaction to achieve incremental roll out on any of the changes as a routine process. The goal is to reduce blast radium for any changes made. Monitor in production and detect any failures before they having a big impact to users. The strategies are using Istio weighted routing for Canary Deployment but overall automate that.

During the research and experiments, we considered a few options. We could do it at the deployment/pods level. Using service’s selectors to point to the old and new deployments. Scale the pods up for the new deployment and down for the old deployment. Then traffic can gradually move from the old to the new deployment. The benefits being both deployments are active. The automatic switching traffic can happen instantly if needed. And this approach is simple as it is done at pods and deployments layer. But the drawback is that you can not precisely control the percentage to rolling out. It is all based on the number of pods running the new changes and pods with the old changes.

About routing traffic, most of our teams have already been using Istio as service mesh solution. We use Istio to route traffic for our apis. Then it is just a further step to introduce weighted routing. Also automation is the key. Once we figured out how to do it. We automate it into pipeline. Reusable workflow makes the solution scalable and enables many team to adopt it easily.

Strategies for Building Scalable and Adaptive Teams


Scalability for teams refers to the ability of a team to efficiently and effectively grow, adapt, and handle increased demands. Here are key considerations and strategies for achieving scalability in a team context:

1. Flexible Team Structure:

– Design a flexible team structure that can easily accommodate new members and changing roles.

– Implement modular team components that can scale independently.

2. Clear Communication Channels:

– Establish clear communication channels to ensure seamless information flow as the team expands.

– Document and communicate processes, workflows, and expectations.

3. Scalable Processes:

– Develop scalable processes that can handle increased workload and complexity.

– Automate repetitive tasks to streamline workflows and reduce manual effort.

4. Knowledge Sharing:

– Implement knowledge-sharing mechanisms to facilitate the transfer of information among team members.

– Use collaboration tools and platforms to centralize documentation and resources.

5. Cross-Functional Teams:

– Foster a cross-functional team environment where members have diverse skills.

– This enables teams to adapt to different tasks and responsibilities as needed.

6. Skill Development Programs:

– Establish ongoing skill development programs to ensure that team members can acquire new skills required for scalability.

– Encourage a culture of continuous learning.

7. Onboarding Processes:

– Develop efficient onboarding processes to integrate new team members quickly.

– Provide comprehensive training and mentorship to accelerate the learning curve.

8. Resource Planning:

– Regularly assess resource needs and plan for scalability in terms of manpower, technology, and infrastructure.

– Anticipate potential bottlenecks and address them proactively.

9. Agile Methodologies:

– Adopt agile methodologies to enhance flexibility and responsiveness.

– Break down projects into smaller, manageable tasks that can be tackled by smaller teams.

10. Performance Metrics:

– Establish key performance indicators (KPIs) to measure team efficiency and identify areas for improvement.

– Use metrics to make data-driven decisions for optimizing team scalability.

11. Collaboration Tools:

– Utilize collaboration tools and platforms that facilitate remote work and collaboration.

– Ensure that the tools can scale along with the team’s growth.

12. Succession Planning:

– Implement succession planning to identify and groom future leaders within the team.

– Ensure that there is a pipeline of talent to fill critical roles as the team expands.

13. Cultural Alignment:

– Foster a strong team culture that aligns with the organization’s values and goals.

– Ensure that the team’s culture can scale with the addition of new members.

14. Client and Stakeholder Management:

– Develop strategies for managing increased client or stakeholder interactions.

– Ensure that customer support and communication channels are scalable.

15. Regular Evaluations and Adjustments:

– Conduct regular assessments of team scalability and make adjustments based on lessons learned and changing requirements.

– Embrace a mindset of continuous improvement.

Achieving scalability for teams is an ongoing process that involves strategic planning, adaptability, and a commitment to fostering a culture that supports growth. The ability to scale effectively is crucial for meeting the demands of evolving projects, markets, and organizational goals.

Fostering Comprehensive Learning and Growth: Strategies for Individual and Team Development

Promoting both individual and team learning and growth is essential for the success of any development team. Here are some strategies to achieve this while addressing the concern of limited learning opportunities within a specific technology stack or problem domain:

1. Structured Learning Paths:

– Define clear career paths and learning objectives for team members, from junior to senior levels.

– Create structured learning paths that include both technical and soft skills development.

– Establish milestones and goals for each level within the team.

2. Cross-Training:

– Encourage cross-training where team members can learn about other roles within the team.

– Organize knowledge-sharing sessions where team members present on topics they’ve recently learned or mastered.

3. Mentorship Programs:

– Pair junior developers with senior mentors to facilitate knowledge transfer and skill development.

– Implement a mentorship program where team members can seek guidance on both technical and career-related matters.

4. Rotation Programs:

– Introduce rotation programs that allow team members to work on different aspects of the product or in different teams.

– Temporary rotations can provide exposure to new technologies and problem domains.

5. Training Budgets:

– Allocate budgets for training and development, allowing team members to attend conferences, workshops, or enroll in online courses.

– Provide resources for obtaining certifications related to the technology stack or new skills.

6. Hackathons and Innovation Time:

– Organize regular hackathons or innovation time where team members can explore new technologies or work on side projects that interest them.

– This fosters a culture of continuous learning and creativity.

7. Encourage Networking:

– Facilitate networking opportunities with professionals from other teams, departments, or even external organizations.

– Attend meetups, conferences, or industry events to broaden perspectives and bring new ideas to the team.

8. Continuous Feedback and Reviews:

– Conduct regular performance reviews and provide constructive feedback to help team members identify areas for improvement and growth.

– Use feedback sessions to discuss individual career aspirations and align them with team goals.

9. Leadership Development Programs:

– Offer leadership development programs for senior developers interested in high-level technical decision-making and leadership roles.

– Provide opportunities for them to lead projects or mentor junior team members.

10. External Learning Opportunities:

– If possible, support team members in attending external workshops, courses, or training programs to gain exposure to technologies outside the current stack.

11. Open Communication Channels:

– Foster a culture of open communication where team members feel comfortable expressing their career aspirations and discussing potential opportunities for growth.

12. Create a Learning Culture:

– Instill a learning culture within the team, emphasizing the importance of continuous improvement and staying updated on industry trends.

While it’s true that some team members may eventually need to move on to different teams or organizations to experience new challenges, creating a supportive learning environment can significantly enhance both individual and team growth without the necessity of leaving the current team. Regularly assess the effectiveness of these strategies and adjust them based on the evolving needs of the team and individuals.

Changing roles – T shaped software engineer

In the fast-paced landscape of digital transformation, companies are increasingly recognizing the value of versatile and adaptable teams. One key concept gaining traction is the idea of T-shaped skills, where professionals possess a broad array of competencies (the horizontal bar of the “T”) alongside deep expertise in specific areas (the vertical stem). This approach fosters a more collaborative and innovative environment, aligning well with the demands of a rapidly evolving digital landscape. Let’s explore the benefits, strategies to foster T-shaped skills, potential pitfalls, and the feasibility of applying this model to entire teams.

Benefits of T-Shaped Skills:

1. Versatility and Collaboration:

T-shaped engineers can seamlessly collaborate with individuals from diverse backgrounds. Their ability to adapt to different roles within a team enhances overall collaboration and communication.

2. Agility and Flexibility:

In dynamic environments, T-shaped professionals are more adaptable to changing project requirements. Their flexibility allows for efficient task-switching and quick adaptation to evolving needs.

3. Holistic Understanding:

T-shaped individuals possess a broader understanding of the entire development process. This comprehensive view facilitates smoother communication and collaboration across different functional areas.

4. Innovation and Problem-Solving:

The diverse skill set of T-shaped professionals fosters innovation. Their deep expertise in specific areas allows for creative and effective solutions to unique challenges.

Strategies to Foster T-Shaped Skills:

1. Training and Development:

Encourage continuous learning and provide resources for skill development in both technical and non-technical domains. This ensures that team members stay abreast of emerging trends and technologies.

2. Cross-Functional Collaboration:

Promote collaboration between team members with different expertise. Initiatives such as cross-functional projects can provide valuable opportunities for knowledge exchange.

3. Skill Rotation:

Allow team members to rotate through different roles periodically. This exposure to various aspects of the development process enhances their adaptability and skill set.

4. Mentorship:

Establish mentorship programs to facilitate knowledge transfer and skill development. Experienced team members can guide others in areas where they have deep expertise.

Things to Watch Out For:

1. Balancing Act:

Ensure that individuals maintain deep expertise in at least one area. Avoid spreading skills too thin, as this could lead to a lack of proficiency in any domain.

2. Specialization Needs:

Recognize situations where deep specialization is crucial, and do not compromise on expertise when it is essential for project success.

3. Individual Preferences:

Consider individual preferences and strengths when encouraging T-shaped skills. Acknowledge that not everyone may be interested in or suited for a broad range of tasks.

Feasibility and Considerations for the Entire Team:

1. Team Dynamics:

Assess the dynamics of the team and project. Some projects may benefit more from specialized roles, while others may require a more flexible and collaborative approach.

2. Project Requirements:

Consider the specific requirements of the projects the team is working on. Some projects may demand deep expertise in certain areas, and it’s essential to balance T-shaped skills with project needs.

3. Continuous Evaluation:

Regularly evaluate the team’s skill set and adjust strategies accordingly. This ensures that the team remains effective, adaptable, and aligned with the evolving goals of the organization.

Encouraging T-shaped skills doesn’t mean sacrificing deep expertise. It’s about striking a balance that allows for flexibility, collaboration, and innovation while still maintaining proficiency in key areas. By being mindful of team dynamics, project requirements, and individual preferences, organizations can leverage the benefits of T-shaped skills in their digital transformation journey.

Cultivating a Culture of Honest Feedback in Software Development Teams

In the dynamic and ever-evolving landscape of software development, fostering a culture of honest feedback is paramount to achieving continuous improvement. Whether it’s about software products, architectural decisions, DevOps practices, or interactions among team members and upper management, a culture that values open communication lays the foundation for innovation and success. Here are key characteristics that contribute to cultivating an environment where honest feedback is not only welcomed but encouraged.

1. Psychological Safety

Creating a psychologically safe space is fundamental to fostering open communication. Team members should feel secure expressing their thoughts and ideas without the fear of reprisal. Embrace a mindset that views mistakes as opportunities for learning rather than as grounds for blame.

2. Open Communication

Establish and maintain open channels of communication within the team. Regular team meetings, collaborative platforms, and forums can serve as effective tools to facilitate communication. Actively encourage team members to share their perspectives and insights.

3. Constructive Criticism

Promote a culture of constructive criticism where feedback is geared towards improvement rather than blame. Encourage specificity in feedback, and always provide suggestions for enhancement along with identified issues.

4. Lead by Example

Leaders and managers should model the behavior they wish to see in their teams. Actively seek and accept feedback, demonstrating humility and a genuine willingness to learn from others.

5. Regular Retrospectives

Institute regular retrospectives to reflect on the team’s performance and identify areas for improvement. Turn feedback into actionable items and track progress over time. This process not only encourages feedback but actively incorporates it into the team’s growth strategy.

6. Feedback Loops

Integrate feedback loops within development and operational processes. These loops, such as code reviews, automated testing, and monitoring systems, catch issues early and contribute to continuous improvement.

7. Anonymous Feedback Mechanisms

Provide channels for anonymous feedback to overcome potential barriers to open communication. Surveys, suggestion boxes, or other anonymous tools allow team members to express themselves freely without fear of personal repercussions.

8. Recognition of Efforts

Acknowledge and celebrate the efforts and contributions of team members. Positive reinforcement creates a conducive environment for open communication, where both successes and failures are recognized as integral parts of the learning process.

9. Training and Skill Development

Invest in training programs that enhance the team’s skills and knowledge. This investment not only improves the team’s capabilities but also contributes to a culture where continuous learning is valued.

10. Clearly Defined Goals and Expectations

Ensure that team members understand the goals and expectations of their work. Clear criteria for success enable individuals to evaluate their own performance and offer feedback to others with a shared understanding of expectations.

11. Encourage Solution-Oriented Feedback

Move beyond identifying problems and encourage team members to propose solutions. Shifting the focus from blame to collaborative problem-solving fosters a culture of improvement and innovation.

In conclusion, building a culture of honest feedback in software development teams requires intentional efforts across various dimensions. By prioritizing psychological safety, open communication, constructive criticism, and other key characteristics, teams can create an environment where feedback is not only expected but embraced as a crucial component of their journey towards excellence. This culture sets the stage for continuous improvement in software products, architecture, DevOps practices, and team dynamics.

Crafting the Ideal Team Structure

In the dynamic world of software development, team structure plays a pivotal role in shaping the success and innovation of projects. Striking the right balance between experience levels, team size, and diversity is essential for fostering creativity, collaboration, and effective problem-solving. In this article, we’ll delve into the considerations for team structure, emphasizing the importance of diversity and the benefits of a two-pizza team size.

Team Size: The Two-Pizza Principle: Coined by Amazon’s Jeff Bezos, the two-pizza team principle suggests that a team should be small enough to be fed by two pizzas, typically ranging from 6 to 8 members. This size is optimal for promoting efficient communication, collaboration, and agility. A compact team ensures that every member has a meaningful role, leading to a sense of ownership and accountability.

Experience Distribution: Striking the Right Balance: Deciding on the proportion of senior, mid-level, and junior developers is a critical aspect of team structure. While there is no one-size-fits-all formula, a balanced mix can bring diverse perspectives, mentorship opportunities, and a healthy learning environment.

Suggested Distribution:

Senior Developers (30-40%): Provide leadership, mentorship, and guide architectural decisions.

Mid-Level Developers (40-50%): Execute tasks, contribute to decision-making, and bridge the gap between senior and junior members.

Junior Developers (20-30%): Bring fresh perspectives, energy, and a hunger to learn. They benefit from mentorship while infusing the team with new ideas.

Randomized Diversity: Fostering Creativity and Consideration: Randomly diversifying teams ensures a rich mix of skills, experiences, and perspectives, enhancing creativity and problem-solving. By avoiding homogeneity, teams can better tackle challenges from different angles, leading to more robust solutions. Diverse teams are also better equipped to understand and cater to a broad user base, considering various needs and preferences.

Challenges and Mitigations: While diversity is valuable, it can also present challenges such as communication barriers and differing work styles. Encouraging open communication, fostering a culture of respect, and providing team-building activities can help mitigate potential issues and build a cohesive unit.

Conclusion:

Crafting an ideal team structure for a software development team involves a delicate balance of team size, experience levels, and diversity. The two-pizza team size, combined with a thoughtful mix of senior, mid-level, and junior developers, provides a foundation for efficient collaboration and knowledge transfer. Randomized diversity amplifies creativity and consideration, essential elements for tackling the complex challenges of software development. By prioritizing these aspects, teams can cultivate an environment that not only produces high-quality software but also nurtures the professional growth and satisfaction of its members.

Career Path For Software Developers

Embarking on a career in software development is akin to entering a vast and dynamic landscape, rich with possibilities and potential directions. As developers gain experience and expertise, they often find themselves at crossroads, contemplating whether to ascend the leadership ladder, delve into architectural roles, or remain hands-on as developers. This article explores the diverse career paths within the software development realm, emphasizing the importance of individual interests and strengths in making career decisions.

The Fork in the Road: People Leadership vs. Architectural Roles: As software developers progress in their careers, two prominent paths emerge: people leadership and architectural roles. People leaders guide and manage teams, fostering collaboration and ensuring project success. On the other hand, architects design and shape the technical landscape, requiring a balance of technical prowess and effective communication.Pros and Cons:

People Leadership:

Pros: Opportunity to influence team culture, mentorship, and overall project success.

Cons: Potential detachment from hands-on coding, increased managerial responsibilities.

Architectural Roles:

Pros: In-depth technical involvement, shaping the project’s technological direction.

Cons: Balancing technical depth with people skills, potential isolation from day-to-day team dynamics.

The Role of Coding in Architectural Positions:Architects, while not always hands-on with coding in the traditional sense, benefit from maintaining a connection to the codebase. This involvement ensures a practical understanding of the project’s intricacies, facilitates effective communication with the development team, and helps in making informed architectural decisions.Pros and Cons:

Pros: Grounded decision-making, easier collaboration with the development team.

Cons: Time constraints, potential divergence from coding practices.

Staying Grounded as a Software Developer: The Pros and Cons:

Some developers opt to remain focused on coding, rejecting the managerial or architectural path. This choice offers its own set of advantages and challenges.Pros and Cons:

Pros: Deep technical expertise, constant coding engagement, potential for specialization.

Cons: Limited influence on high-level decisions, potentially slower career progression.

The Importance of Self-Awareness and Personal Interests:Ultimately, the best career path is one aligned with an individual’s strengths, interests, and aspirations. Developers should reflect on whether they find joy in guiding teams, shaping architectures, or diving into intricate code. Continuous self-assessment and a keen awareness of personal preferences are crucial in making informed career choices.

Embracing a Hybrid Approach:Some software professionals find fulfillment in a hybrid model, where they balance leadership or architectural responsibilities with hands-on coding. This allows for a diversified skill set, ensuring both technical depth and broader organizational impact.

Conclusion:

The software developer’s career path is a unique journey shaped by individual choices, interests, and strengths. Whether one chooses the leadership, architectural, or coding-centric route, success lies in self-awareness and a commitment to continuous growth. Architects can benefit from maintaining a hands-on connection to coding, and developers who choose to stay in coding roles should not perceive it as a limitation but as a pathway to deep expertise. In the ever-evolving landscape of software development, there is no one-size-fits-all solution, and the key lies in embracing a career path that aligns with one’s passion and vision for professional fulfillment.

Command and Order Vs. Empower the Team

Leadership style plays an important role in software development team culture. There’s the traditional “Command and Order” approach, where a centralized authority dictates tasks and decisions. On the other side, there’s the more modern approach of “Empower the Team,” emphasizing collaboration, autonomy, and shared decision-making. We all know that we want to empower the team and there is no doubt about it given the benefits, it doesn’t mean that “Command and Order” doesn’t come to play at certain times. Let’s explore these two.
At a crisis situation, or regular operation when an incident happens, quick decision and clear direction is essential to get team to respond quick and get issues resolved. It is not time to experiment and test team members’ abilities though it is very good opportunities for team to learn.

The drawback on this is whenever something happens, team always look up to one or two key persons to give directions and couldn’t or are not confident enough to come up with their own resolutions in a timely manner. It is important to allow team members to grow in handling this type of situations. For incidents that severities not very high, the key people of the team should try to step back a little bit even though still will be in the loop but give other people a chance to resolve issues on their own.

Another time that team need more directions or guidance instead of being left alone is at early stage of a project. So team won’t be wasting time and go on a wrong direction. But again, it depends on if this type of going on a wrong direction can be considered a step toward learning. Always giving team directions one will make the key people become bottleneck of the team operation, two it doesn’t help team to grow in the long term.

Empowering team not only allow team to grow. The decisions made by them could be better because the people who are closer to the work details tends to have more information than others about the particular work. It also increase job satisfaction when team members can make their own decisions. Letting team to grow is the only way that team will be able to adapt fast to changes.

However, there are pitfalls with empowered team culture. There could result in indecision when too many different opinions coming from the team. There needs to be a balance between autonomy and someone providing guidance and direction.

For the same reason, when communication is open while there is no proper structures on decision making, overly empowered team may face challenges in coordinating efforts.

The things to watch out for is also leaders’ personalities. There are leaders that leaning more toward command-control type of style because their own personalities. It is hard for team to push back on that in that case. Though it is team culture, but leaders play an important role.

Strategies for rolling out service changes gradually and support backward compatibilities

Today in micro service’s world, we do frequent deployments. Make minimal changes and then push to prod. However, sometimes we still want to push out a feature gradually to reduce impacts if anything goes wrong.

Scenario I: We are changing implementations internally in a service. Such as instead of calling one api, the service is now calling another third party api.

Or before it was calling a provider’s SOAP service, now that external service got migrated to rest service. So our service is changing to call the rest service instead. For this scenario, there is no contract change. No endpoint change for our api. Client does not need to make any changes. But how do we gradually roll it out to reduce risk? Things could go wrong. Though we do a lot of testing, there may still be something not covered.

So how do we roll it out gradually to reduce risk?

Solution a: Istio – weight-based routing by percentage to point to different deployment/service

Pipeline needs to be tweaked to deploy a different version of codes and have a deployment name.

The following is an example virtual service that sends 75% of traffic to v1 and 25% traffic to v2.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
  - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 75
    - destination:
        host: reviews
        subset: v2
      weight: 25

Solution b: Codes – put some conditions at top level (controller/handler) to route traffic to the old or new implementation.

A simple random number can be used to decide where we want the traffic to be routed so that randomly 20% of the traffic goes to the new implementation while the rest still goes to the old implementation.

II.  Scenario:  api contract has small changes.

a.  add new fields to response – this does not break the client.  It simply makes new fields available to client and client can decide when to consume those new fields

b.  add new fields to request – as long as these fields are not marked as required and not null, client requests can still reach the api logic.  It may or may not require coordination with client.  It is a case by case situation.

c.  delete or remove fields in request or response – this will break the clients.  coordinations are needed.

d.  rename a field break the client

e. Change a field’s type will break a client. Such as before a field is a boolean. Then it was change to String type.

III.  Scenario:  Huge structure change in api contract

This could be a totally refactored request/response structure and fields. Let’s say it this way. Do whatever you can at the beginning to have a formal contract such as in a swagger yaml file. Scrutinize the request and response. Mercilessly refactor it if needed before it goes to prod or before more than one clients are using it. Any time after then making api contract change is painful and it requires a lot of coordination work with the client. Unless your api only has one client and your team also work on the client.

IV. Strategies to deal with API contract changes that could break the clients

Just saying. If the contract change is so big due to certain business requirement, then why not just make this as a new api, instead of updating the old api? That would be a simple approach. In most cases, this type of changes are very infrequent. Likely it happens several years in between.

An api could have many clients, let it be either different apps, or other services that need to call this api. When a new version of the api is introduced which has contract changes that break the clients, the clients will still be using the old version of contract until they are ready to switch to the new version. Until the last client is moved to use the new version of contract, or when it reaches the deadline the api owner gave to end the support of the old version, then we need to continue to support the older version of contract structure. This is called backward compatibility for the api.

Then how do we support backward compatibility?

a.  API Versioning. This is the oldest and still essential way to deal with breaking changes.

This is commonly done by different end point, having version number as part of endpoint. An example of this type of url path is like: /api/myawesomeapi/v2. There are other ways to provide this version, such as in a query parameters. Or in the headers.

This type of versioning backward compatibilities support can be done by the following:

a.1. Having separate deployments for each version 

Let’s talk about in Kubernetes’ term. Make the new version as a separate deployment so it has its own pods, while the older versions’s deployment still running in the same cluster. The different end point will route them to the corresponding versions.

Then client select to point to which one.  Service will simultaneously support both versions.  Having 2 or more deployments running in the cluster.

Different end points routing to the correct service can be done through either Istio or akamai.

a.2. Use codes to do the routing

Since it is still the same api, there are many commonalities at core though different versions. Then at higher level of codes, mostly the controller, send then to their individual handlers. An adapter pattern could be used to eventually both goes to the same core of the api.

b.  Using codes to support more than 1 version without endpoint changes

This technique is only good when the breaking change is small, such as 1-2 field change or a small structural change.

For a small structure change in the request, support both

old request

{
    "firstName": "John",
    "lastName": "Smith",
    "dateOfBirth": "1990-01-16",
    "membership": "Premium"
}

new request format

{
    "users": [
        {
            "firstName": "John",
            "lastName": "Smith",
            "dateOfBirth": "1990-01-16",
            "membership": "Premium"
        },
        {
            "firstName": "Marry",
            "lastName": "Smith",
            "dateOfBirth": "1990-01-16",
            "membership": "Premium"
        }
    ]
}

For backward compatibility reason, we can add the new structure to the existing request without removing the old fields. So it will be able to handle both type of request. Only remove the old fields when all clients are using the new structure type of requests.

{
    "firstName": "John",
    "lastName": "Smith",
    "dateOfBirth": "1990-01-16",
    "membership": "Premium",
    "users": [
        {
            "firstName": "John",
            "lastName": "Smith",
            "dateOfBirth": "1990-01-16",
            "membership": "Premium"
        },
        {
            "firstName": "Marry",
            "lastName": "Smith",
            "dateOfBirth": "1990-01-16",
            "membership": "Premium"
        }
    ]
}

Adapter pattern can still be used here. Taking the request of the old version and convert that also to a list of users. Then both versions can go the same core logic.

By doing it this way, without having to provide a different end point, we can support both versions. Whether some clients send in request in the old contract format, or other clients send in request in the new contract format, both are supported and handled by upper layer codes.

This only works when the changes to the contract are very limited and easy to keep track. Once all clients moved on to use the new contract, then the fields for the old contract can be cleaned up and codes be cleaned up to only support the new contract.

V. Final Words

Either strategy to handle the contract changes, it will be a hassle to support more than one version of contract. For public apis or apis that directly being called by mobile apps or other type of apps, there is no choice because it is hard to control when the clients will move on to use the newer version. Having separate deployments will have the codes isolated and provides more stability to the api.

For a company’s own service apis, since it is other teams calling an api, or in some case there is only one client, then it is much easier to collaborate with those clients and try to help them to move to use the new version of contract. For those cases, in another word, they are similar to trunk based development. It is better to just have a single version. Only another version is temporarily needed, make it short lived so the api can go back to a single version of contract.

Striking the Balance: Standardization vs. Flexibility in Managing Multiple Squads

For a product team that having multiple squads, or an organization that have many teams, they often face the challenge of finding the right balance between standardization and flexibility. Each approach carries its own set of pros and cons, influencing team efficiency, communication, and adaptability. Comparing the tradeoffs of different approach and finding the optimal balance between standardization and flexibility is an art. It is also a case by case decision.

Standardization gives consistency. It makes it efficient for decision making. Team don’t have to spend a lot of time either research on their own or debate on which process or technology to take. It also makes it either to move people around different teams and squads as they follow some standardization and team members moving to a new team can easily find similarity in things.

However, over standardize could result in rigidity and hinder team’s ability to find their own solutions and curb their creativity and innovative thinkings.

Giving flexibilities to teams on choosing their processes and technologies empowers the teams to make their own judgement based on their team’s specific needs and preferences. It fosters team’s ability to decide and adapt rapidly to changing requirements and priorities.

But as a result, it makes each team different from the others. This is supposed to be a good thing. But there may lack consistency among teams. And each new member of a team even though from the same organization also go through a learning curve to adapt to the new team.

There is no one perfect answer for what should be standardized and what should be decided by each individual team. It is probably a constant adjustment. But having a culture of empowering the team, encouraging sharing information among teams, and have an easy feedback loop and safe environment for team members to speak up, it will help to get to the optimal point on this.