Data Synchronization Patterns

The way to build better client-server apps

A software design pattern is a general, reusable solution to a commonly occurring problem within a given context in software design. A design pattern is not an implementation note that can be transformed directly into source or machine code. Rather, it is a description or template for how to solve a problem that can be used in many situations. Design patterns are formalized best practices that the programmer can use to solve common problems when designing an application or system. One of these architectural topics I will show you in this article.

Data Synchronization Patterns catalog was created in 2012 by Zach McCormick and Douglas Schmidt and named “Data Synchronization Patterns in Mobile Applications Design”. A lot of things changed since that time, such operating systems as Symbian from Nokia and Windows Mobile from Microsoft were stopped supporting. They were replaced by Android with the Java SDK and iOS with the Objective-C SDK. But almost a decade has passed, and progress is not standing still. Today there is no necessary need to know Java, Objective-C, or Swift to develop mobile applications — we have NativeScript and Flutter, it is not necessary to know C #, Java, or C ++ to develop desktop applications — we have Electron and Proton Native. But this is also not the limit if you remember about such technology as Progressive Web Applications, for which, in fact, only a browser is needed.

But with a huge amount of opportunities for a front-end developer, problems from related areas have also come, and in this article, we will review them.

Data Synchronization Mechanism Patterns

Data synchronization patterns are actually quite a little number, and they are divided into 3 categories — synchronization mechanisms, data storage and availability, and data transfer. Each of the categories answers its own questions and offers several options for solving them.

In my opinion, the simplest category is data synchronization mechanisms. The patterns in this category answer the question “when should an application sync data”. This question is quite trivial, but it has a direct dependence on the context of using the application.

Asynchronous Data Synchronization

As developers, the main challenge that we face today is fast access to data. Responsiveness and latency are two key elements that determine how quickly you as a user can access the data. An unresponsive or slow responding app leads to a poor user experience. However, even if the application responds quickly to user input, the user will be frustrated if they have to wait a significant period of time for data to load. Therefore, it is important to make sure that the application is not blocked when data syncing occurs.

As you can see in the state diagram, when the application is ready to use, in other words — the user can interact with the application, a sync event is triggered and the application is immediately returned to a working state.

The advantages of this solution are obvious — the availability of the application during data synchronization, which is a side effect of background data synchronization. But do not forget about the pitfalls of this solution — inconsistencies arising from concurrent access to a common data set, and the amount of data is not known to the user so it may lead to network congestion.

In fact, this category is the most difficult to give examples. It is very context-dependent, but of course I will give a few examples. The most striking of them, which came to us from the world of mobile development, is error logging systems for failures. When critical errors occur in the application, they are transferred in the background to the error logging system.

From a more contrived example, we can imagine a small e-commerce application consisting of a tree structure of categories and products. Our category structure is quite static and can be stored at the application level, but the products are more dynamic data. When loading products from a category normally, or when using the “infinity scroll” approach, it would be more correct for us to use asynchronous data loading.

Synchronous Data Synchronization

But there is also another reality. For some applications, it is critical to have a specific set of data before allowing the user to interact with the application. These can be the most common datasets on which the application is based or data that must be accurate in real-time.

This synchronization mechanism is also depicted in the state diagram, which shows that after hitting a component that requires synchronization, the application enters a state in which the users must wait for the end of synchronization before interaction becomes available to them.

The advantages of this solution are no less obvious — such a state loop is easy to manage, but in a natural way, it degrades the user interaction with the application.

A good example of using synchronous synchronization would be applications with user access rights checking. Let’s say we have an application with a monthly subscription. Accordingly, it is advisable for us, before giving full access to the functionality of the application, to block it until the subscription activity is clarified.

Returning to our example with an online store — an example of such an action can be work with products and orders from the administrator’s side since accuracy is critical in this part of the application

Data Storage and Availability Patterns

So, we figured out the interaction of the application with the user when synchronizing data. But apart from the question “when to synchronize data” we still have an equally important question — “how to store data” to ensure their maximum availability. For some applications to work, a sufficiently small set of data is needed to ensure their performance, for some, it is so large that it is simply impossible to save it on the device, and for some applications, the situation is completely different because they should manipulate a real-time data.

The category of data storage and availability is also quite trivial from an architectural point of view and also consists of only 2 templates.

Partial Storage

The main problem of all client applications is resource constraints, like network bandwidth, physical storage, or platform limitations on resource use. From the server-side, we would easily solve these issues by increasing the bandwidth of the communication channel, adding memory and processor resources, but from the client-side, we simply do not have this opportunity.

This diagram shows the sequence for accessing data when it is not stored in the application. Having received the data, the Data Access Object will no longer make requests to the server but will take data from the internal storage.

The benefits of using this pattern are obvious with limited storage resources, and it has the advantage of being able to synchronize at different granularity levels. The main problems with this approach are related to the network — these are the number of requests required to receive data, and the data transfer rate, which can clearly affect the user experience.

An example of this template is again our online store with the already mentioned previously saved categories. Also, a good example is GIS, such as Google Maps, which reload map fragments as it is used and save for the application session.

Complete Storage

In contrast to partial data retention, you can of course look at full retention. Despite the available broadband network connection, there are times when a network connection is not possible. Partial storage works by loading data on demand. In addition, the possibility of low network bandwidth can become a problem in the use of the application.

This sequence diagram gives an overview of the Complete Storage pattern. There is a clear difference between the two types of actions: synchronization and data retrieval. The sync action makes a network request and returns data. In Complete Storage it is a way to synchronize all data and the “get” action returns local data.

The obvious benefit of using full data storage is less dependence on network availability, but this solution is also not without drawbacks. Firstly, you need to take into account the size of the data so that the application can save it on the device. Secondly, the load on the communication channel increases for transferring all data in one go, albeit one-time.

There can be a lot of examples of such applications, the most correct will be client applications for file hosting or storing notes with the ability to work offline, such as Dropbox or Evernote.

Data Transfer Patterns

In my opinion, data transfer patterns are the most interesting category. Almost all modern applications exchange data, and technological progress is taking place by leaps and bounds. To date, 6G technology has already been presented, and there are devices supporting the 6th generation of mobile communications, from phones to cars. But this does not change the fact that in many countries there is a certain lag both in technological terms and in the coverage of the mobile network.

In this regard, logical problems arise that are posed before the development of applications — how to optimize the data transfer process.

Full Transfer

The complete transmission as a whole fully describes the whole essence of the pattern. This block diagram shows the simplicity of a complete transfer. The application simply initiates a transfer to receive all data.

The advantage of this approach is its simplicity, but this simplicity will have to pay with possible data redundancy.

There can be a lot of examples for using this template, from news sites, where it is indeed easier to download the entire set of fresh news than to find more complex synchronization algorithms, to the already mentioned file hosting services, where replacing the entire file will be correct or even the only way.

Timestamp Transfer

Taking into account the limitations of the network, which we have already mentioned more than once, the amount of transmitted data should be minimized. A complete transfer is wasting too many resources, especially if the data has not changed since the last synchronization, this will result in redundant data transfers.

This block diagram shows more complex logic for data transfer using a timestamp. The client initiates a request and attaches a timestamp to it, which is processed by the server to determine if any data should be returned.

The advantages of this approach can be emphasized that resource usage is less than in the case of full data transfer, but close attention should be paid to the source of the timestamp data. There is also a certain problem when synchronizing with this method, since it may not be obvious how to handle data deletion.

An example of this pattern would be applications that work with historical data, such as habit trackers or diet and activity diaries. Also, quite often this mechanism is used in social networks to download part of the message feed.

Mathematical Transfer

But what to do if the full data transfer does not suit you, and what is no less interesting — the problem may be that not all structures and data can be synchronized via a timestamp.

This flowchart is similar to a chart using timestamps, but with the addition of a separate process for calculating differences in datasets. In the case of mathematical transmission, this may be a more significant calculation, which should be considered as a separate process.

Despite the fact that this method potentially has less synchronization overhead than the previous two, the main stumbling block in its implementation is still the high cost of this solution and dependence on the context, which means low reusability of the code.

Here we can find a lot of examples from real life, starting with the already mentioned GIS systems, in which the identifiers of fragments that need to be displayed can act as a token, to really complex ones used in video streaming, such as “sums of absolute differences” or “sums of squares of differences” to determine the optimal coding scheme for a new frame.

Afterword

As uncle Ben said with great power comes great responsibility, so you should know how to use it. In my opinion, theoretical knowledge as important as practical skills and every developer should know how to handle work tasks in the easiest way to become a better developer.

I’ll be glad if this article helps somebody, and feel free to ask me in case of any question.

Software Engineer. Tech Lead. Software Architect