A static model of a distributed architecture. Distributed architecture Multisegment network management

Distributed AIS has become an everyday reality. Numerous corporate AIS uses distributed databases. Methods of data distribution and management of distributed data, architectural approaches that ensure the scalability of systems, implementing the principles of the multi-tier client-server architecture, as well as the architecture of the middle layer have been worked out.

Mobile architectures are beginning to be applied in practice. This applies to both database systems and Web applications.

An approach to building distributed systems based on a peer-to-peer architecture is being revived, in which, in contrast to the client-server architecture dominating today in distributed systems, the roles of the interacting parties in the network are not fixed. They are assigned depending on the situation in the network, on the workload of its nodes.

In connection with the intensive development of communication technologies, mobile AIS is actively developing. The technical means and software for their creation have been developed. This has led to the development of mobile database systems. Many research teams conduct research on the specific features of such systems and create their various prototypes. Java technologies have become an important tool for mobile software development.

A Wireless Application Protocol (WAP) standard has been created and is already supported by some cell phone models. Based on WAP and XML, the W3C has developed a markup language for wireless communications, WML (Wireless Markup Language).

In the development of AIS, more attention has begun to be paid to metadata. Here, steps are taken in two directions - standardizing the presentation of metadata and ensuring their support in the system.

AIS uses a variety of ways and means of presenting metadata (various kinds of metadata repositories). The lack of unification in this area significantly complicates the solution of the problems of application mobility, reuse and integration of information resources and information technologies, as well as AIS reengineering.

To overcome these difficulties, the development of metadata standards focused on various information technologies is being actively pursued. In this area, there are already a number of international, national and industry standards that define the presentation of metadata and the exchange of metadata in AIS. Some of them have already acquired the status of de facto standards. We will limit ourselves here to mentioning only the most significant of them.

Probably the first de facto standard for this category was the CODASYL data description language for networked databases. The following standards should be named: the standard of the SQL query language for relational databases, containing the definition of the so-called information schema - a set of representations of relational database schemas; the ODMG object database standard component that describes the object schema repository interfaces; international standard IRDS (Information Resource Dictionary Systems), which describes systems for creating and maintaining directories of information resources of an organization.

Next, mention should be made of the Common Warehouse Metamodel (CWM) standard for representing data warehouse metadata developed by the OMG consortium, based on the previously created for broader purposes OIM (Open Information Model) standard by the MDC (Meta Data Coalition) consortium.

The new XML for the Web technology platform also includes metadata presentation standards. Metadata support is one of the most important innovations of the Web, radically changing the technology for managing its information resources. While metadata support was originally required in database technologies, metadata was not supported on the first generation Web.

Web metadata standards include a subset of the XML language used to describe the logical structure of some type of XML document. This description is called DTD (Document Type Definition). In addition, the XML platform includes the XML Schema standard, which offers more advanced capabilities for describing XML documents. The Resource Definition Framework (RDF) standard defines a simple knowledge representation language for describing the content of XML documents. Finally, the emerging OWL (Ontology Web Language) standard defines a formal ontology description language for the Semantic Web.

The Unified Modeling Language (UML) standard, which provides metadata representation for CASE visual object analysis and design tools, was developed by the OMG consortium. This language is supported in many CASE software products. The OMG also created the XML Metadata Interchange (XMI) standard for exchanging metadata between CASE tools using the UML.

Mention should also be made here of the Dublin Core (DC) standard, a set of metadata elements for describing the content of documents of different nature. This standard quickly gained popularity and found, in particular, widespread use in the Web environment (see Section 3.3).

Work on the development of existing and creation of new standards for the presentation of metadata for AIS continues. More detailed information about the standards in question can be found in the encyclopedia.

Nowadays, virtually all large software systems are distributed. A distributed system is a system in which information processing is concentrated not on one computer, but is distributed among several computers. When designing distributed systems, which has much in common with the design of any other software, there are still a number of specific features to consider. Some of these were already mentioned in the introduction to Chapter 10 when we looked at the client / server architecture, and are discussed in more detail here.

Since distributed systems are widespread these days, software developers should be familiar with the specifics of their design. Until recently, all large systems were largely centralized, running on a single host computer (mainframe) with terminals attached to it. The terminals practically did not process information - all calculations were performed on the host machine. The developers of such systems did not have to think about the problems of distributed computing.

All modern software systems can be divided into three broad classes.

1. Application software systems designed to work only on one personal computer or workstation. These include word processors, spreadsheets, graphics systems, and the like.

2. Embedded systems designed to run on a single processor or on an integrated group of processors. These include control systems for household appliances, various appliances, etc.

3. Distributed systems in which software runs on a weakly integrated group of parallel processors that are linked through a network. These include ATM systems owned by a bank, publishing systems, shared software systems, etc.

At present, there are clear boundaries between the listed classes of software systems, which will be increasingly blurred in the future. Over time, as high-speed wireless networks become widely available, it will be possible to dynamically integrate devices with embedded software systems, such as electronic organizers with more general systems.

There are six main characteristics of distributed systems.

1. Sharing resources. Distributed systems allow the sharing of hardware and software resources such as hard drives, printers, files, compilers, and the like that are linked through a network. It is obvious that the sharing of resources is also possible in multi-user systems, but in this case, the central computer should be responsible for the provision of resources and their management.

2. Openness. This is the ability to expand the system by adding new resources. Distributed systems are open systems that connect hardware and software from different manufacturers.

3. Parallelism. In distributed systems, multiple processes can run concurrently on different computers on a network. These processes can (but need not) interact with each other while they are running.

4. Scalability. In principle, all distributed systems are scalable: in order to meet new requirements, the system can be expanded by adding new computing resources. But in practice, the build-up can be limited to the network connecting the individual computers in the system. If many new machines are connected, the network bandwidth may not be sufficient.

5. Fault tolerance. The presence of multiple computers and the ability to duplicate information means that distributed systems are resistant to certain hardware and software errors. Most distributed systems can, as a rule, support at least partial functionality in case of error. A complete failure of the system occurs only in case of network errors.

6. Transparency. This property means that users are given completely transparent access to resources and at the same time information about the distribution of resources in the system is hidden from them. However, in many cases, specific knowledge of the organization of the system helps the user to make better use of resources.

Of course, distributed systems have a number of disadvantages.

Complexity. Distributed systems are more complex than centralized ones. It is much more difficult to understand and evaluate the properties of distributed systems in general, and also to test these systems. For example, here the system performance does not depend on the speed of one processor, but on the network bandwidth and the speed of the various processors. Moving resources from one part of the system to another can drastically affect system performance.

Security. Typically, the system can be accessed from several different machines, messages on the network can be viewed or intercepted. Therefore, in a distributed system, it is much more difficult to maintain security.

Controllability. The system can consist of different types of computers on which different versions of operating systems can be installed. Errors on one machine can propagate to other machines with unpredictable consequences. Therefore, much more effort is required to manage and maintain the system in working order.

Unpredictability. As all Web users know, the response of distributed systems to certain events is unpredictable and depends on the full load of the system, its organization and network load. Since all these parameters can constantly change, the time spent on the execution of the user's request at one time or another can vary significantly.

When discussing the advantages and disadvantages of distributed systems, a number of critical design problems for such systems are identified (Table 9.1).

Table 9.1. Distributed Systems Design Problems

The design problem Description
Resource identification Resources in a distributed system are located on different computers, so the resource naming system should be thought of so that users can easily access and refer to the resources they need. An example is the Uniform Resource Locator (URL) system, which defines the addresses of Web pages. Without an easily perceived and universal identification system, most of the resources will be inaccessible to the users of the system.
Communications The universal operability of the Internet and the efficient implementation of TCP / IP protocols in the Internet for most distributed systems are examples of the most effective way of organizing communication between computers. However, where special requirements are imposed on performance, reliability, etc., alternative methods of system communication can be used.
System service quality The quality of service offered by the system reflects its performance, operability and reliability. The quality of service is influenced by a number of factors: the distribution of system processes, resource allocation, system and network hardware, and the adaptability of the system.
Software architecture Software architecture describes the distribution of system functions among system components, as well as the distribution of these components across processors. If you need to maintain a high quality system service, choosing the right architecture turns out to be a decisive factor.

The challenge for distributed system designers is to design software or hardware to provide all the required characteristics of a distributed system. This requires knowing the advantages and disadvantages of various distributed systems architectures. Two related types of distributed system architectures stand out here.

1. Client / server architecture. In this model, the system can be thought of as a set of services provided by servers to clients. In such systems, the servers and clients differ significantly from each other.

2. Distributed Object Architecture. In this case, there are no differences between servers and clients, and the system can be thought of as a set of interacting objects, the location of which does not really matter. There is no distinction between the service provider and their users.

In a distributed system, different system components can be implemented in different programming languages ​​and run on different types of processors. Data models, information presentation, and communication protocols are not necessarily all of the same type in a distributed system. Consequently, for distributed systems such software is needed that can manage these different types of parts and guarantee the interaction and exchange of data between them. Middleware refers precisely to this class of software. It is, as it were, in the middle between different parts of the distributed components of the system.

Distributed systems are usually designed with an object-oriented approach. These systems are created from loosely integrated parts, each of which can directly interact with both the user and other parts of the system. As far as possible, these parts should react to independent events. Software objects based on these principles are natural components of distributed systems. If you are not already familiar with the concept of objects.

V large holdings tens of thousands of users work in subsidiaries. Each organization has its own internal business processes: approval of documents, issuance of instructions, etc. At the same time, some processes go beyond the boundaries of one company and affect the employees of another. For example, the head of the head office issues an order to the subsidiary, or an employee of the subsidiary sends an agreement for approval with the lawyers of the parent company. This requires a complex architecture using multiple systems.

Moreover, within one company many systems are used to solve different problems: an ERP system for accounting operations, separate installations of ECM systems for organizational and administrative documentation, for design estimates, etc.

The DIRECTUM system will help to ensure the interaction of different systems both within the holding and at the level of one organization.

DIRECTUM provides convenient tools for building managed distributed architecture organizing and solving the following tasks:

  • organization of end-to-end business processes and data synchronization between several systems of the same company and in the holding;
  • providing access to data from different installations of ECM systems. For example, search for a document in several specialized systems: with financial documentation, with design and estimate documentation, etc.
  • administration of many systems and services from a single point of management and creation of a comfortable IT infrastructure;
  • convenient distribution of development to distributed production systems.

Components of a Managed Distributed Architecture

Interconnection Mechanisms (DCI)

DCI mechanisms are used to organize end-to-end business processes and synchronize data between different systems within one or several organizations (holding).


The solution connects local business processes existing in companies into a single end-to-end process. Employees and their managers work with the already familiar interface of tasks, documents and reference books. At the same time, the actions of employees are transparent at every stage: they can see the text of the correspondence with a related company, see the status of document approval with the parent organization, etc.

Various DIRECTUM installations and other classes of systems (ERP, CRM, etc.) can be connected to DCI. As a rule, installations are divided by areas of business, taking into account the territorial or legal location of organizations and other factors.

Together with DCI, development components are supplied with a detailed description and code examples, thanks to which a developer can create an algorithm for the business processes of his organization.

DCI mechanisms are capable of transmitting large amounts of data and withstand peak loads. In addition, they provide fault tolerance in the event of communication failures and the protection of transmitted data.

Federated search

With federated search, you can find the tasks or documents you need at once in all individual DIRECTUM systems. For example, start a search simultaneously in the working system and in the system with archived documents.


Federated search allows you to:

  • view through the web client the progress of approval of an outgoing document in a subsidiary;
  • find agreements concluded with a counterparty in all subsidiaries, for example, for the preparation of negotiations. In this case, you can go to the tasks in which the contracts are enclosed;
  • check the status of execution of the order sent from the parent organization to the subsidiary, or documents and tasks created on it;
  • find documents simultaneously in several systems with different specializations, for example, with organizational and administrative documents and with contracts;
  • find primary accounting documents for audit or reconciliation with a counterparty immediately in the working system and in the system with an archive of documents;
  • exchange links to search results with colleagues.

The administrator can change standard searches, add new ones, and also customize which systems will be visible to the user.

DIRECTUM Services Administration Center

The DIRECTUM system solves many different tasks: interaction of employees, storage of documents, etc. This is possible due to the reliable operation of its services. And in large companies, they allocate entire installations of the DIRECTUM system with their own set of services for a specific task, for example, for storing archival documents. Installations and services are deployed on multiple servers. This infrastructure needs to be administered.

The DIRECTUM Services Administration Center is a single administrative entry point for configuring, monitoring, and managing DIRECTUM services and systems. The Center is a site for management tools for Session Server, Workflow Service, Event Service, File Storage Service, Input and Transform Services, Federated Search, and Web Help.


Convenient visual configuration of remote systems and services simplifies the work of the administrator. He does not need to go to each server and manually make changes to the configuration files.

Services are stopped and enabled in one click. The status of the services is instantly displayed on the screen.

The list of settings can be replenished and filtered. By default, the site only displays basic settings. At the same time, for all settings, you can see tips with recommendations for filling.

The DIRECTUM system effectively organizes the work of distributed organizations and provides users with a transparent exchange of documents, tasks and directory records.

Each component of a Managed Distributed Architecture can be used separately, but together they can bring greater business value to your organization.

AggreGate is one of the few IoT platforms in the world that truly supports a distributed architecture. This provides unlimited scalability to balance and segregate all AggreGate server operations at different tiers. Such an architecture can be the basis both for solving current problems and for meeting the needs of the future.

Unlike a failover cluster, AggreGate servers in a distributed architecture are completely independent. Each server has its own database, local user accounts and associated permissions.

AggreGate's distributed architecture is extremely flexible. Technically, it is based on the formation of peer-to-peer connections between servers and attaching parts of a single data model of some servers ("suppliers") to others ("consumers").

Goals of Distributed Operations

The main goals of a distributed architecture are:

  • Scalability... Downstream servers can be heavily loaded, collecting data and managing large numbers of devices in near real-time. However, in practice, the number of devices that can be serviced by one server is limited to several thousand. When scaling your system to manage a large number of devices, it is wise to set up multiple servers and combine them in a distributed installation.
  • Load balancing... Each server in a distributed installation solves a different problem. Network management servers check the availability and performance of the network infrastructure, while access control servers process requests from door controllers and turnstiles. Control operations such as generating reports and distributing them by mail can be performed on a central server.
  • Intrusion protection... Secondary probe servers can be installed at remote locations and connected to a central server. System operators connect only to the central server, thus eliminating the need to configure VPN and port forwarding to these servers.
  • Centralization... Secondary servers can operate in fully automatic mode, while their configuration and monitoring is carried out through the main server installed in the central control room.

Server Role Distribution

In this simple scenario, two servers are combined into a distributed infrastructure. The system operators are constantly connected to the monitoring server, performing their daily duties. The company's management connects to the reporting and analytics server when it needs to get a slice of the data. Regardless of the amount of data and the load on the server, this operation will not affect the work of operators.

Large Scale Cloud IoT Platform

Telecom and cloud service providers offer IoT services in IaaS / PaaS / SaaS models. In these cases, we are talking about millions of devices owned by thousands of users. Maintaining such a huge infrastructure requires hundreds of AggreGate servers, most of which can be grouped into two groups:

  • Servers that store the register of users and their devices, redirecting connections of operators and devices to lower-level servers, as well as aggregating data for subsequent analysis of information with the participation of lower-level servers
  • Servers that monitor and manage devices, as well as receive, store and process data

The user and device management servers are also responsible for interacting with the cloud management system, which is responsible for deploying and monitoring new storage and analytics servers.

Data storage and processing servers use resources (alarms, models, workflows, dashboards, etc.) received from template servers, which in turn store master copies of these resources.

Layered IoT Infrastructure

Thanks to the distributed infrastructure of AggreGate, any solution can include many servers of different levels. Some of them can work on IoT gateways, collecting data, others - to store and process information, and the rest - to carry out high-level aggregation and distributed computing.

Field equipment such as sensors and actuators can be connected to servers directly, through agents, through gateways, or a combination of the two.

Smart city management

This is an example of AggreGate-based layered architecture for complex automation of a large group of buildings:

  • Level 1: physical equipment (network routers, controllers, industrial equipment, etc.)
  • Level 2: management servers (network monitoring servers, access control servers, building automation servers, and others)
  • Level 3: building server control centers (one server per building that collects information from all second-tier servers)
  • Level 4: city district servers (final destination for escalating lower level alerts, real-time monitoring, integration with Service Desk systems)
  • Level 5: servers of the head office (control of district servers, collection and synthesis of reports, notifications)

Any of the above servers can be a multi-node failover cluster.

Multi-segment network management

AggreGate Network Manager is built on the AggreGate platform and is a typical use case for a distributed architecture. Large segmented networks of corporations and telecom operators cannot be monitored from a single center due to routing restrictions, security policies, or bandwidth limitations of communication channels with remote network segments.

Thus, a distributed monitoring system usually consists of the following components:

  • Primary or central a server that collects information from all network segments
  • Secondary servers or probe servers polling devices in isolated segments
  • Specialized servers such as traffic analysis servers processing billions of NetFlow events per day

Secondary and specialized servers are providers of information for the primary server, exposing part of their data model to the control center. This could be:

  • All contents of the context tree of the probe server, which allows complete control of the configuration from a central server. In this case, the probe server is simply used as a proxy to overcome the network segmentation problem.
  • Alerts generated by the probe server. In this case, 99% of workplaces can be remote, and the operator of the central server will immediately receive notifications from the secondary servers.
  • Custom datasets from probe servers, such as real-time information about the status of critical devices or summarized reports. All related work will be done on the secondary server, allowing for load balancing.

High performance event management

Some use cases for the AggreGate platform, such as centralized incident management, require a significant number of events to be received, processed, and permanently stored in a structured format. Sometimes streams can reach volumes of millions of events per second, and received from different sources.

In such cases, one AggreGate server cannot cope with the entire flow of events. A distributed architecture will help organize event handling:

  • Several local servers are installed on the objects generating events, which process these events. Several sources (probes) can connect to one processing server.
  • A dedicated storage server or multi-server big data storage cluster is bound to each local processing server. The number of cluster nodes can vary depending on the rate at which events are generated.
  • All on-premises storage servers perform pre-filtering, deduplication, correlation (using rules that apply to locally attached probes), enrichment, and event storage.
  • Local storage servers connect to a central aggregation server. The aggregation server is responsible for correlating important events throughout the system.
  • Central server operators can browse the entire event database, while the tasks of finding live data are distributed among the storage servers. Thus, it is possible to create centralized reporting and alerts based on a database for all events.

Digital enterprise

AggreGate can act as a coordinating platform for the digital enterprise. Each of the AggreGate servers can perform various functions, ranging from monitoring and managing remote objects to high-level services such as business intelligence or, for example, incident management.

All servers in a digital enterprise are connected to each other through a distributed infrastructure. Lower-level servers provide access to some of the contexts of a single data model to upper-level servers, allowing you to create a situational center for the entire enterprise.

Heterogeneous multicomputer systems

The largest number of currently existing distributed systems are built according to the scheme of heterogeneous multicomputer systems. This means that the computers that are part of this system can be very diverse, for example, in the type of processor, memory size, and I / O performance. In practice, the role of some of these computers can be played by high-performance parallel systems, for example, multiprocessor or homogeneous multicomputer systems.

The network connecting them can also be highly heterogeneous.

An example of heterogeneity is the creation of large multicomputer systems using existing networks and channels. So, for example, it is not unusual for the existence of university distributed systems, consisting of local networks of various faculties, interconnected by high-speed channels. In global systems, the various stations may in turn be connected by public networks, such as network services offered by commercial telecom operators, such as SMDS or Frame relay.

Unlike the systems discussed in the previous paragraphs, many large-scale heterogeneous multicomputer systems require a global approach. This means that an application cannot assume that certain performance or certain services will always be available to it.

Moving on to the scaling issues inherent in heterogeneous systems, and taking into account the need for a global approach inherent in most of them, we note that the creation of applications for heterogeneous multicomputer systems requires specialized software. This problem is dealt with by distributed systems. So that application developers don't have to worry about the hardware they are using, distributed systems provide a software shell that protects applications from what happens in the hardware (that is, they provide transparency).

The earliest and most fundamental distributed architecture is "client-server", in which one of the parties (client) initiates the exchange of data by sending a request to the other party (server). The server processes the request and, if necessary, sends a response to the client (Fig. 2.7).

Rice. 2.7. Client-server interaction model

Interaction within the framework of the client-server model can be either synchronous, when the client is waiting for the server to process its request, or asynchronous, in which the client sends a request to the server and continues its execution without waiting for the server's response. The client and server model can be used as a basis for describing various interactions. Consider the interaction of the components of the software that forms a distributed system.



Rice. 2.8. Application logic levels

Consider a certain typical application, which, in accordance with modern concepts, can be divided into the following logical levels (Fig. 2.8): user interface (UI), application logic (LP) and data access (DD), working with a database (DB) ... The user of the system interacts with it through the user interface, the database stores data describing the application domain, and the application logic layer implements all algorithms related to the domain.

Since, in practice, different users of the system are usually interested in accessing the same data, the simplest separation of the functions of such a system between several computers will be the separation of the logical layers of the application between one server part of the application, which is responsible for accessing the data, and the client parts located on several computers. implementing the user interface. Application logic can be assigned to the server, clients, or shared between them (Figure 2.9).

Rice. 2.9. Two-tier architecture

The architecture of applications built on this principle is called client-server or two-link... In practice, such systems are often not classified as distributed, but formally they can be considered the simplest representatives of distributed systems.

The development of the client-server architecture is three-tier architecture, in which the user interface, application logic and data access are separated into independent components of the system that can run on independent computers (Figure 2.10).

Rice. 2.10. Three-tier architecture

The user's request in such systems is sequentially processed by the client part of the system, the application logic server and the database server. However, a distributed system is usually understood as a system with a more complex architecture than a three-tier one.

Rice. 2.11. Distributed retail system

With regard to applications for automating enterprise activities, distributed systems are usually called systems with application logic distributed among several system components, each of which can be executed on a separate computer. For example, the implementation of the application logic of a retail sales system must use queries to the application logic of third parties, such as suppliers of goods, electronic payment systems, or banks that provide consumer loans (Figure 2.11).

Another example of a distributed system is networks direct data exchange between clients (peer-to-peer networks)... If the previous example had a "tree" architecture, then direct exchange networks are organized in a more complex way, Figure 2.12. Such systems are at the moment, probably one of the largest existing distributed systems, uniting millions of computers.

Rice. 2.12. System of direct data exchange between clients


Top