blog

The Challenge of Moving Multi-Dimensional Data

Most companies do not have simple data anymore. The days of data being easily captured in rows and columns have gone and increasingly, companies have to deal with more complex data structures. But when it comes to moving that data, those same businesses often revert to converting their complex data into a flat two-dimensional structure so it can be more easily managed. This is an outdated approach towards data movement. For data with complex inter-relationships, the burning question is how do you move that data between systems and applications so that it can be used in its original structure?

In 1970 Edgar Codd formalized the concept of a relational database as a way of improving data access. In this approach, data was represented in a table (a relation) consisting of rows and columns – a two-dimensional representation. With relational databases came a data access language which was eventually called SQL (Structured Query Language) and the promise of easier access to data.

During the 1990s relational databases became the standard for processing corporate information such as invoices and purchase orders, and the term Online Transaction Processing (OLTP) became a common term. Business users who wanted to get information out of their corporate databases could use SQL to access large volumes of data. But they were often frustrated by the lack of flexibility offered by relational database technology and the SQL query building tools.

The Problem of Two-Dimensional Data

Relational databases are excellent for storing data and retrieving detailed transactions but are hard to use when you want to aggregate, calculate and create complex views of data. It was data warehouse guru, Ralph Kimball, who pointed out the problem of normalized data in a two-dimensional table. He presented an example of business data structured in accordance with the relational entity-relationship format and pointed out how difficult it was to understand this from a business perspective. Using an approach he called 'dimensional modeling' he showed that representing multiple two-dimensional tables as three-dimensional cubes made it easier to understand.

Entity-Relationship

Dimensional Model

Source: Ralph Kimball - The Data Warehouse Toolkit

From Two-Dimensional to Multi-Dimensional

By using the dimensional approach, companies could build complex three-dimensional constructs of their data, or multi-dimensional data structures. This was the understanding that most business people think about their businesses in dimensional terms.

The resulting growth of multi-dimensional databases made it possible to achieve flexible, high-performance access and analysis of large volumes of complex and interrelated data. Data manipulation that required minutes to run in a relational environment in many cases required only seconds to run in a multi-dimensional environment.

The Challenge of Integrating and Transferring Multi-Dimensional Data

Data teams often want to transfer or integrate their multi-dimensional data to another system or platform for different purposes, such as smaller, function-specific data marts. Because most data integration tools use XML or JSON, the first step of the process is to use a flatten process to convert the data into a two-dimensional layout because in this 'well formed' format it makes the data easier to convert back into its original structure. This creates other problems.

  • The process of flattening then un-flattening, using XPath, adds time
  • Links have to be created to handle index keys
  • This all creates a larger volume of data, in multiple tables, to transfer
The Synatic Solution

Synatic's Hybrid Integration Platform (HIP) eliminates the need to convert multi-dimensional data into two-dimensional tables.

  • Multi-dimensional data can be received in JSON or XML
  • The unflattened structure can be stored in Synatic's database
  • Doing this retains the capability to work with the data in native form, and additionally not lose performance on the speed of access
  • The data can be easily and rapidly published to API endpoints

Unlike other ETL solutions, with Synatic the data is kept in its original form and retains its complex data structure through transformation and storage. This enables you to access and work with your data quicker.

The volume of data is increasing and various forecasts expect it to grow to over 180 zettabytes (1020 bytes) by 2025. A significant proportion of this growth will be in multi-dimensional data - created from mobile data traffic, cloud-computing traffic and developments in artificial intelligence. This data is useless unless it can be analyzed and used for prediction and decision-making. It therefore follows that multi-dimensional databases will increase. Using a data transfer solution (like standard ETL) that forces the data into two-dimensions will result in wasted time and increased lag before the data is ready to be used for analysis and prediction purposes. Adopting the Synatic HIP will save time and effort, and ensure analytical data is available quicker for decision-making.

Jamie Peers
June 26, 2022
More From The Blog: