• Loading...

Foundation Datastore

Tideway Foundation Data Store

Foundation uses an advanced proprietary object database. This document describes the main features of the data store and the rationale behind those features.

Introduction

When discussing Tideway Foundation, one of the most common questions is why it uses a proprietary database rather than an off-the-shelf relational database (RDBMS). The quick answer is that the structure of data stored in Foundation is quite unusual, and so the database used to store it is unusual too.

Foundation stores information about lots of objects (known as nodes) and a very large number of interconnections between those nodes. Because the relationships between nodes are so important, relationships are a core concept in the data store. Relationships do not just link two nodes - they also indicate the way in which they are related, and the role that each node is playing within the relationship.

In this example, an instance of VMware is related to two different host computers - one is the computer it is running on; the other is the virtual host it is providing:

[!Images^VMware.png!]

Once all the different kinds of relationships are taken into account, the number of interconnections becomes very large, such as in this view of some aspects of a simple application in the model:

[!Images^applicationview.png!]

This kind of highly interconnected data structure is known as a graph. Relational databases are not designed for storing graphs, and although almost anything can be modeled as a set of relational tables, the SQL queries needed to traverse the interconnected data quickly become extremely cumbersome and inefficient. Foundation's data store natively represents the graph structure, and provides a query language designed for traversing the graph structure, meaning that it is much more convenient to use, and more efficient at manipulating data within the graph.

Another aspect of Foundation's data is that the schema for it cannot be fully defined up-front. In the example above, the node representing VMware shares attributes such as type and version with nodes representing other software products, but it also has a number of VMware-specific attributes such as vm_name and vm_uuid which are used by the TPL patterns to link the virtual hosts together. The data store imposes no limits on the number and type of the data attributes on a node, meaning that patterns have the flexibility to store whatever data they require for their operation.

Other systems with similar requirements of storing interconnected flexible data use similar storage paradigms. Examples include Google's Application Engine data store and Facebook's data API.

More depth

This section describes the key features of the data store in more depth.

Full text index

All textual data is indexed in a full text index. This means that nodes of all kinds can rapidly be found based on a few key words. There is no need to specifically index particular attributes for them to be available in the index. This means that Foundation works well as a search engine for IT information.

Nodes, Roles and Relationships

Data about entities in the environment is stored in Nodes. Each node has a kind, such as Host to represent a host computer or SoftwareInstance to represent a running piece of software. Information about the item is stored in named attributes, such as os_class and processor_type.

Nodes are connected to each other in the graph structure with Relationships. Relationships share all the characteristics of nodes (in computing terms, Relationship is a subclass of Node), meaning that they also have a kind, such as RunningSoftware and Ownership, and have named attributes representing details of the relationship.

Relationships are connected to nodes via named Roles. Roles represent the role that each node is acting in within the relationship. If we were to model relationships between people, for example, the Person nodes could be related in many ways. In a Parenthood relationship, one Person would have the role Parent and the other would have the role Child; in an Employment relationship, one Person would have the role Manager, the other Employee. On the other hand, in a Friendship relationship, the roles at both ends would be Friend:

[!Images^relationshipsview.png!]

In Foundation's data, this kind of complex scenario is commonplace. For example, SoftwareInstance nodes can have Containment relationships, where one has the role SoftwareContainer, the other ContainedSoftware; they can have Communication relationships where one is Client and the other Server, or where both have role Peer, and they can have Dependency relationships where one is the Dependant and the other DependedUpon.

Free-form data and the Taxonomy

Nodes can have any number of named data attributes. Most attributes contain simple small text or numerical values, but the system also supports data items of any size and complex structured data types. This is useful for storing the contents of configuration files, for example.

The expected structure of the Foundation model is declared in the Taxonomy(which is a slight misnomer). For each kind of node, the Taxonomy lists the known attributes and their types (text string, integer, etc.), and the known relationships to other nodes. However, the Taxonomy is only a guide to the structure of the data - the attributes and relationships that can be stored are not constrained by it, meaning that nodes can be arbitrarily extended as required. This means that TPL patterns can easily add data attributes for their own purposes, without the overhead of extending a schema and migrating the data to it. In a relational database, such extensions would have to be implemented in a separate extensions table, which would make queries more complex. Adding something previously used as an extension to the schema would involve a complex and expensive data migration, and modification of all existing queries.

Attributes and relationships defined in the Taxonomy are flagged as either expected or optional. The presence or absence of expected attributes is used to drive a 'data completeness' measure that provides an indication of whether all expected values were successfully set for a particular node.

Foundation ships with a core Taxonomy containing the definition of the standard model. The Taxonomy can be easily extended and modified to accommodate custom model extensions for situations where the extensions are universal rather than on an ad-hoc node-by-node basis.

Searching / reporting language

The data store has a comprehensive query language designed for manipulating the highly-interconnected graph model. For simple queries based on a single node kind, it is similar in use and features to SQL, as used in relational databases. For example, to find all Host nodes and show some operating system details:

  SEARCH Host ORDER BY name SHOW name, os_type, os_version

Because it is designed for the graph model, the search language has specific support for performing traversals across the graph structure, including the ability to recursively expand a traversal. For example, to find the children of all Alice's employees in the diagram above:

  SEARCH Person WHERE name="Alice"
  TRAVERSE Manager:Employment:Employee:Person
  TRAVERSE Parent:Parenthood:ParentChild:Person
  SHOW name, age

Recursive expansion allows a query to start at a container node and traverse to the nodes contained in it, then to the nodes contained in those, and so on until a whole tree of nodes has been reached, for example. That kind of multiple-hop traversal would be extremely hard to represent in SQL. For example, to find all the software containing Oracle databases running on Solaris hosts, including applications contained inside other applications, the following query could be used:

  SEARCH Host WHERE os_class = 'Solaris'
  TRAVERSE Host:HostedSoftware:RunningSoftware:SoftwareInstance
  WHERE type = 'Oracle Database'
  EXPAND ContainedSoftware:SoftwareContainment:ContainedSoftware:

The query language makes use of the Taxonomy definitions to know the default ordering and field ordering for nodes but, like the rest of the data store, the queries that can be performed are not constrained by the Taxonomy. The query syntax for retrieving an attribute value is the same whether the attribute is declared in the Taxonomy or not, meaning that extension attributes can easily be accessed, and queries continue to work unchanged if attributes that were previously in use as extensions are added to the Taxonomy. For example, this query finds all VMware instances and retrieves the standard name and version attributes, and the extension vm_name attribute:

  SEARCH SoftwareInstance WHERE type = 'VMware VM' SHOW name, version, vm_name

History

The data store automatically maintains the historical state of all nodes and relationships. For any node, it is possible to retrieve its state at any time in the past, and see the Foundation user or subsystem responsible for making changes to it. When nodes are removed, they are marked as destroyed, but their state is still available until they are explicitly purged.

Data manipulation API

At a programmatic level, the data store is accessed through an object-oriented API, rather than merely through its query language. That means that data can be manipulated directly, avoiding the overheads of constructing and parsing query strings.

Export

Foundation has a comprehensive export system that allows subsets of the data to be exported to a number of targets, including direct insert into relational databases and construction of CSV files. This makes it easy to bridge the gap between the graph-oriented view of the model and a more traditional tabular view of those parts of the data where that is appropriate.

Implementation

The implementation details of the data store are invisible to Foundation users, and indeed to the other software subsystems within Foundation. Internally, the data is stored using Oracle's Berkeley DB embedded database. Berkeley DB provides a high performance API-based database supporting transactional access to simple tables of key-value pairs. Foundation's proprietary node and relationship model and its query language are layered on top of Berkeley DB.

The data store is implemented in a combination of C++ and Python; the export framework is implemented in Java.

Further reading

Much more information about Foundation, the data model, etc can be found on the Documentation Resources page.

Skip to end of metadata
Go to start of metadata
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.