Organisation transformation often results in large software integration projects, which include several systems exchanging data to provide proper services to users. While on the surface, this may seem to be a technical challenge, the main problems lay outside of pure point-to-point data exchange. They are related to organisation structure and delivery maturity, which can vary between different parts of the system.
One of the typical approaches toward solving complex integration problems is the introduction of ESB (Enterprise Service Bus). Open source or proprietary integration engines like ESBs cover the technical aspects of the solution including message/request routing, data transformations, error handling and monitoring. The ESB is configured, extended and managed by a central team. It becomes the heart of the integrated system.
The centralisation process may seem to provide several benefits, but it may defeat the effectiveness of delivery by creating a single point of failure, and the bottleneck for delivery as a central team controls the way software is developed and released.
Key challenges in integration
- Domain model complexity and functional limitations of the apps Calling the endpoint which is exposed through REST, SOAP or other protocol is usually a simple activity but how do you ensure that all applications are working correctly in the environment? The data may be passed through synchronous, asynchronous and batch processes. It can be produced/consumed through different channels at the same time. Solving technical challenges around integrations does not ensure that apps will behave correctly. The flow of the data throughout many systems may have surprising results.
Building large environments which include all of the components is not cost-effective and requires significant effort in order to manage it. Strong centralisation results in high coupling. You end up building more and more environments and coordinating deployments.
- Test data
With complex domains like financial, healthcare or business-sensitive information, it’s very challenging to create good test data sets. Many issues that are showing up during integration or testing are not related purely to interface specification but the data caveats which are not defined at the interface level.
- Vendor/department defensiveness
In many situations, the transformation of the organisation may require work in a hostile environment with conflicting interests. It’s vital to create a shared goal and understanding. The problem to collect test data or connect to the particular service may be far from a technical one. The priority of the project may not be shared between parties.
- Quality Assurance
Complex work environments have a significant impact on testing. The QA process is deferred and executed on a unified, integrated environment. The common agreement is that the sooner you catch the issue, the cheaper it is to resolve it, so lack of good early tests results in many issues during the end-to-end integration and slowdowns the delivery. Coupling between systems, teams and releases have a severe impact on the cycle time and overall quality.
- Software releases
Each component which is part of the integration will have its own release lifecycle and versioning scheme. Releasing software is hard, and integrated systems suffer from higher complexity. Failures to release software result in practices which do not solve the problem: more control, more coordination, scheduled releases. It means that you will go as fast as the slowest link.
Conway’s law — understanding communication
The list of challenges is not extensive, and we could dive into a discussion about error handling, transactions, security, synchronous vs asynchronous data exchange or protocols but at the very basic level you should start with the organisation and not with the software.
“Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations”
– Melvin Conway
I could take any project I worked on in the past and decompose some design decisions which were clearly a result of the intra-organisation communication chain. ESB is an example of such a solution where you have disconnected systems managed by departments in isolation, and you put a central team which is supposed to connect them. This kind of approach won’t handle the challenges that you will see on the way.
Only organisation shift will allow you to achieve your goals fully.
What does good look like?
There may be specific reasons to use products like ESBs or specialised integration engines, but they tend to solve technical problems while reinforcing challenges at the organisation level. Integration is never a plug and play activity. If you create a work environment which is highly controlled and there is high coupling between the teams, the chances that the project will fail are greater.
Now let’s look on it from a different perspective. What does good look like?
You want to have stable environments with a clear path to production.
The maturity/quality of each application will differ. You can encounter all kinds of limitations from lack of knowledge, inadequate licensing, lack of any CI/CD pipelines or even basic tests. It is not reasonable to assume that all of those issues will be resolved. It will be the opposite situation, and those issues at the service level will amplify the problems during integration.
Typical microservice architecture requires isolated deployment pipelines. This is a basic building block required for the stability of delivery and one of the key elements required to deliver the integration at a high pace. Instead of centralising and gating the deployment at the env level, ensure that each service can be delivered in a repeatable way in isolation.
This has to be set as a basic expectation from each service that is part of the larger integrated ecosystem as teams may have completely different work ethics.
If you want to minimise the number of issues, test and catch them as early as possible.
A fully integrated system can be tested end-to-end only when all components are working together. So how can we push testing to an earlier phase? One of the techniques used by teams delivering microservices is Consumer–Driven Contracts(CDC). We won’t be able to test all services together before deployment, but it’s possible to enforce that both consumers and producers integrated together cover contract testing before releasing software.
In many integration projects, the contracts at the interface level are the only thing exchanged between the teams. If you can ensure that teams exposing interface and the ones that consume the API share test cases and execute them during CI/CD pipeline, then you have much greater confidence that all parties understand how the integration will work.
- Test data is shared between consumer/producer
- Contracts can be validated before testing on end-to-end env
- Some basic stubs/mocks can be used to help with development
Techniques like that allow you to validate early instead of pushing QA to the end of the cycle and promote close cooperation between teams.
You want to ensure that integrated components can be deployed independently without causing issues in related applications.
If you look at traditional ESB deployment, it has a central bus which expects that the systems are connected. If you are lucky, you are connecting existing integration points, but you can end up in a situation where you build bridges between shores that do not exist yet. Another problem is the stability of some of the components and the time required to build a fully connected solution.
The solution must be built in mind that it has to be resilient. Stability patterns which are very popular in microservice architectures will protect the integration engine or point-to-point integration against cascading failures. This is good enough to handle error situations gracefully, but normal operation requires processes that decouple releases of features from the release of the particular component.
By using techniques like feature toggling with resilient services, you can avoid a big bang at the end of the integration project and gradually roll-out integrated systems. It’s really important to not only define how the systems will be connected but how the features will be rolled out. Each team may have issues with delivering some of the commitments. Those scenarios have to be handled gracefully, or they will impact overall delivery.
You want to deliver apps/integrations as fast as possible with teams working as effectively as possible.
There are plenty of factors when it comes to team efficiency. Usual common-sense practices like small, autonomous teams still apply, but the real challenge is around the set up of the teams and how they interact with each other.
Techniques described to the moment focused on making the delivery as asynchronous as possible by using tooling and approaches allowing to deliver without coordination. Setting similar expectations and giving them space to deliver is key. Too much coordination and control without setting high standards won’t allow them to go as fast as possible.
The post is about architecture — so where is it?
The evolution of the organisation and acknowledging that it’s the right direction and setting the expectation to follow a sensible set of practices will allow you to build a more reliable solution. It’s almost impossible to provide a simple answer that will be right in all scenarios, especially because of differences in tooling, skills, cloud adoption, and so on. Let’s try to define a common pattern that we’ve used on the projects.
Fact-based decision making rather than command and control.
If you think about the lowest coupling possible between systems, then one idea stands out: storing events in a queue/streaming solution. The event represents a fact about what actually happened in the system. It’s just data representing some small change. Immutable data flowing trough a solution like Apache Kafka is great to decouple systems and leverage the autonomy and effectiveness of the teams.
This central system is not responsible for any of the things ESB would handle besides allowing systems to exchange the data in a reliable way. For example, if you already have information about users stored on the stream and you need to add a new system, you can reuse the existing data flow without introducing other changes. All of the responsibilities are passed to the new system which allows smooth evolution of the solution.
Although introducing the concept of a message queue solves many problems, what is important is to understand that the integration of several systems at a large scale is more about the organisational change than a technical solution. Especially if you want to minimise the risk of failure and focus on the pace of delivery. As well, it is worth noting that this post is focused on larger scale integrations and may not be applicable to all situations.
However, as a conclusion, adopting agile techniques with cloud-native tooling and focusing on making sure that teams involved in the processes are empowered and autonomous will yield much better results than picking a product that moves data between systems.