Decentralization of computing systems has several attractions: performance enhancements due to increased parallelism; resource sharing; and the increased reliability and availability of data due to redundant copies of the data. Providing these characteristics in a decentralized system requires proper organization of the system. With respect to increasing the reliability of a system, one model which has proven successful is the object/action model, where tasks performed by the system are organized as sequences of atomic operations. The system can determine which operations have been performed completely and so maintain the system in a consistent state. This dissertation describes the design and a prototype implementation of a storage management system for an object-oriented, action-based decentralized kernel. The storage manager is responsible for providing reliable secondary storage structures. First, the dissertation shows how the object model is supported at the lowest levels in the kernel by the storage manager. It also describes how storage managemet facilities are integrated into the virtual memory management provided by the kernel to support the mapping of objects into virtual memory. All input and output to secondary storage is done via virtual memory management. This disserttion discusses the role of the storage management system in locating objects, and a technique intended to short circuit searches whenever possible by avoiding unnecessary secondary storage queries at each site. It also presents a series of algorithms which support two-phase commit of atomic actions and then argues that these algorithms do indeed provide consistent recovery of object data. These algorightms make use of virtual memory management information to provide recovery, and relieve the action management system of the maintenance of the stable storage.
The goal of the Clouds project at Georgia Tech is the implementation of a fault-tolerant distributed operating system based on the notions of objects, actions, and processes, to provide an environment for the construction of reliable applications. The Aeolus programming language developed from the need for an implementation language for those portions of the Clouds system above the kernel level. Aeolus has evolved with these purposes:
*to provide the power needed for systems programming without sacrificing readability or maintainability; *to provide abstractions of the Clouds notions of objects, actions, and processes as features within the language; *to provide access to the recoverability and synchronization features of the Clouds system; and *to serve as a testbed for the study of programming methodologies for action-object systems such as Clouds.
In this paper, the features provided by the language for the support of readability and maintainability in systems programming are described briefly, as is the rationale underlying their design. Considerably more detail is devoted to features provided for support of object and action programming. Finally , an example making use of advanced fatures for action programming is presented, and the current status of the langauge and its use in the Clouds project is described.
This paper is intended to be an introduction to the internal structurs of the Clouds kernel. We will be constructing an experimental Clouds system during the next few years using dedicated minicomputers and personal computers. Further description of the Clouds kernel will be done as this experimental system continues to be designed and constructed.
The goal of the Clouds project at Georgia Tech is the implementation of a fault-tolerant distributed operating system based on the notions of objects, actions, and processes, which will provide an environment for the construction of reliable applications. The Aeolus programming language developed from the need for an implementation language for those portions of the Clouds system above the kernel level. Aeolus has evolved with these purposes:
*to provide the power needed for systems programming without sacrificing readabiliy or maintainability; *to provide abstractions of the Clouds notions of objects, actions, and processes as features within the language; *to provide access to the recoverability and synchronization features of the Clouds system; and *to serve as a testbed for the study of programming methodologies for action-object systems such as Clouds
Thus the main interest of Aeolus lies not in the language itself, but in what may be done with the language. We have avoided providing high-level features for programming actions with the intention of evolving designs for such features out of our experience with programming in Aeolus. These features will then be incorporated into an application language for the Clouds system.
The Clouds project is research directed towards producing a reliable distributed computing system. The initial goal of the project is to produce a kernel which provides a reliable environment with which a distributed operating system can be built. The Clouds kernel consists of a set of replicated sub-kernals, each of which runs on a machine in the Clouds system. Each sub-kernel is responsible for the management of resources on its machine; the sub-kernal components communicate to provide the cooperation necessary to meld the various machines into one kernel.
The goal of the Clouds project at Georgia Tech is the implementation of a fault-tolerant distributed operating system based on the notions of objects and actions, which will provide an environment for the construction of reliable applications. As part of the Clouds project, we are designing and implementing a high-level language in which those levels of the Clouds system above the kernel level will be implemented. The Aeolus langauge provides access to synchronization and recovery features of Clouds. It also provides a framework with which to study programming methodologies suitable for action-object systems such as Clouds. This paper provides a brief introduction to the features of the Clouds system which provide support for programming of objects and actions, and how these features are made available in the Aeolus language. We also present an example of Aeolus objec from our initial studies in programming methodologies for Clouds which demonstrates the use of these features for programming recoverable objects.
Clouds is a native operating system for a distributed environment. In this paper we give an overview of the main ideas behind Clouds as well as some of te reasons that prompted us to design a new Clouds kernal. The new kernal, called Ra, builds on the experience obtained from the first Clouds kernal and provides a general framework for implementing a variety of distributed operating systems. We describe the new kernal in detail and show how Clouds can be build from the RA primitives.
A distributed operating system is a control program running on a set of computers that are interconnected by a network. This control program unifies the different computers into a single integrated compute and storage resource. Depending on the facilities it provides, a distributed operating system is classified as general purpose, real time, or embedded.
The need for distributed operating systems stems from rapid changes in the hardware environment in many organizations. Hardware prices have fallen rapidly in the last decade, resulting in the proliferation of workstations, personal computers, data and compute servers, and networks. This proliferation has underlined the need for efficient and transparent management of these physically distributed resources.
This article presents a paradigm for structuring distributed operating systems, the potential and implications this paradigm has for users, and research directions for the future.
The goal of constructing reliable programs has led to the introduction of transaction (action) software into programming environments. The further goal of contructing reliable programs in a distributed environment has led to the extension of transaction systems to operate in a more decentralized environment.
We present the design of a transaction manager that is integrated within the kernal of a decentralized operating system: the Clouds kernal. This decentralized action management system supports nested actions, action-based locking, and efficient facilities for supporting recovery. The recovery facilities have been designed to support a systems programming language which recognizes the concept of an action. We also present a search protocol to locate objects in this distributed environment.
Orphans, disjoint parts of actions that have aborted, are identified and eliminated using a time-driven orphan detection scheme which requires a clock synchronization protocol; we present the facilities necessary to generate a system-wide global clock to support that protocol.
The design goal of this implementation has been to achieve the performance necessary to support an experimental testbed which can serve as the basis for further work in the area of decentralized systems.
This paper is a brief exposition of a subsystem design that enhances the fault tolerant characteristics of the Clouds operating system. We use a distributed probe-based monitoring system that keeps track of the status of various system components, both hardware and software. The monitoring system is then tied to the reconfiguration system to provide enhanced fault tolerance for the Clouds system.
Clouds is a native operating system for a distribution environment. The Clouds operating system is built on top of a kernal called Ra. Ra is a second generation kernal derived from our experience with the first version of the Clouds operating system. Ra is a minimal, flexible kernal that provides a framework for implementing a variety of distributed operating systems.
This paper presents the Clouds paradigm and a brief overview of its first implementation. We then present the details of the Ra kernal, the rationale for its design, and the system services that constitute the Clouds operating system.
The Clouds project at Georgia Tech was initiated to conduct research into failure resistant, efficient distributed architectures and operating systems. The project used state of the art techniques to design a distributed operating system kernal that can be supported on conventional, unreliable hardware, and be more reliable than the underlying electronics. Several approaches to the problem were considered, and after substantial research and construction effort, the current design emerged. This design unifies simplicity with efficiency and advanced concepts. The resulting system is quite versatile and can be adapted easily to suit most requirements of reliable distributed computing, in many different hardware configurations. The design is largely hardware independent and independent of system configuration.
This report describest the object and action based approach to building operating systems as incorporated in Clouds. We also describe in some detail the salient features of the system and the research directions that the project is expected to take.