Definition of distributed software
Distributed computing is software that is designed from inception to run in a parallel and cooperative manner on multiple computers.
Traditional software was designed with the assumption that it would run on a single computer. Some traditional software has evolved such that it can run in a highly available manner, such as one hot version and one cold version, but that is not what we mean by distributed computing. Hence we state in our definition designed from inception.
Traditional software can run multiple instances on different computers, such as a word processor on different desktops, but each instance works in isolation of every other instance. Distributed software, on the other hand, does not work in isolation. Hence we state in our definition in a cooperative manner.
Microservices and Hadoop Distributed File System (HDFS) are an examples of distributed software.
With big data, streaming data and the Internet of things, the challenge is to scale services reliability. There are a four rules to consider.
- Every computer has finite resources. There is only so many cores and so much memory in a single computer. Thus, to truly scale, a service must be able to run in parallel over multiple computers.
- All computers eventually fail. The disk drive will crash, or the computer will lose power or network connectivity. Something can always go wrong.
- Service usage never stays constant. Services have spike usage, either during a certain time of day or during the business cycle or during a critical external event. Services also grow or shrink in overall usage over time.
- Every firm has finite resources and wants to contain costs. Traditionally one had to maintain enough computing power to meet the peak usage of a service, which meant that a large portion of the computing power went used. No one wants to pay for unused computing power.
These four rules drive us towards building distributed, scalable services.
- A service should run in parallel as multiple instances over multiple computers. That enables us to overcome the limitations of computer resources and reliability.
- A service should add and remove instances as the usage grows and shrinks. That enables us to address the changing usage while conserving resources.
Our purpose here is to address the challenges of building distributed, scalable services for stateful computation.
In order for software to cooperate across multiple computers, the software instances must communicate with each other over a network. Without communication there is no cooperation. Without cooperation there is no need to communicate.
There are three general modes of communication:
- Batch (request-notification)
The most common mode is request-response. Representational state transfer (REST) is a request-response mode of communication. One service sends a request to another service and expects a response back within a certain amount of time. If no response arrives within the expected amount of time, called the timeout window, then the request is assumed to have failed. Most microservices operate using REST or similar request-response modes.
The time necessary for most services to process a request is usually measured in milliseconds. In such cases, a one or two second timeout window is relatively large. However, some services can take minutes or hours to complete. A common example is a large map-reduced job. Another example is human interaction. When a service expects a response from a human, it may need to wait longer than a couple of seconds.
The batch or request-notification mode is necessary for long-running requests. The requestor initiates a process. The responder replies with an ID for the request results. The responder can then poll to see if the results are available. Alternatively, the requestor can provide the responder a means to notify it when the request is complete. (See for example Asynchronous Service Access Protocol.) Batch communication is the least formalized mode currently used in distributed computing.
The third mode is streaming. In this mode, a service receives a continuous stream of messages or events from other services and likewise produces messages and events. A service does not need to be aware of what services produced the events that it consumes, nor does the service need to be aware of what other services are consuming the events that it produces.
The decoupling of producers and consumers is the key differentiator in streaming versus request-response. In REST, the requestor sends the request to a known responder. The infrastructure may enable multiple services to act as the responder through network routing and proxies, but requestor must still have a name for the responder. In streaming, the producer simply publishes the event without knowing which service if any will consume it.
Note that a message bus (or message oriented middleware) and streaming are not synonymous. One can use a message bus to implement streaming, but traditionally message busses expect the producer to provide a name for the consumer.
Some computations by their very nature require no state. There is no need to consider what happened before. For instance, if I ask what the temperature is, the service reads the thermometer sensor and sends the value. However, if I ask how much has the temperature changed in the past 10 minutes, then the compute must know the previous temperature. That is state.
Stock trading provides another example of stateless and stateful computing. Like temperature, a stock ticker price is stateless. It simply indicates the last price at which the stock was sold. Position and profit calculations, on the other hand, are by nature stateful. A position is ownership of a particular entity of a particular stock at a particular time. The position is the sum of all the buys and sells of the stock by the entity over time.
State must be maintained for stateful computations. The question is where to maintain the state.
Stateless versus stateful services
A stateless service is one that does not store or maintain state. It is blissfully unaware of what has happened before. It merely reacts to events as they arrive. There are several advantages to stateless services when building distributed, scalable solutions. For instance, one can add and remove instances of the service with impunity.
A stateful service, on the other hand, is one that maintains the state within the service, that is, within the service container (ie Docker instance) or in the service process (ie JVM). Stateful services pose challenges when building distributed, scalable systems. For instance, one must insure that the state is recreated when adding a new instance of the service. Likewise, when removing an instance of the service, one must insure the state is migrated to another instance. There is also the challenge of two service instances attempting to change the same state simultaneously, known as a race condition.
In the next post we will discuss patterns for solving the challenges of implementing stateful streaming services.