Unlocking Big Value from Time Series Data with Dynamic Schema

It’s common today for IT professionals to confront two competing priorities – the need to embrace distributed systems while providing real-time analytics about the business and its systems.  InfluxData’s Russ Savage explains how time series data with dynamic schema can fill this gap.

Tags: database, DevOps, dynamic schema, InfluxData, IoT, monitoring, time series,

Russ Savage, InfluxData
Russ Savage
director of product management
InfluxData


"Evidence continues to mount to show users can benefit from using time series with the flexibility of a dynamic schema."

Application Architecture Summit
Modern Application Development for Digital Business Success
January 27,
2022
Virtual Summit

Companies are discovering the utility of time series data at a rapid pace. Time-stamped data provides granular detail about change over time for specific systems, sensors, and processes. Companies need to gain insight into how these function in isolation. But even more value comes from the ability to aggregate this data and see how it influences a larger ecosystem.

 

For developers, time series data provides a new lens through which to think about, view, and analyze a wide range of practices and problems.

Getting Acquainted with ROI from Time Series Data

For those coming from the world of SQL databases, time series data can be challenging at first.

 

Perhaps the most significant challenge is the sheer volume of data that systems generate. A traditional SQL database may be able to handle time series data, but at scale the amount of additional maintenance and operational burdens start to outweigh the value of the data.

 

Also, an SQL database also requires you to define the schema of the data you plan to store up front. Is the data a string or an integer? Does the data conform to specific character length? There are infinite permutations of these questions that go into defining a schema. Of course, the benefit of a defined schema is that it allows an SQL database to process queries quickly.

 

The need to define a schema beforehand is problematic for time series data. Often there are unknowns going into a project that involves time-stamped data, so your system needs to be flexible enough to meet evolving needs.

For example, you may need to collect this type of data from many different sources, all of which produce different-shaped outputs. On top of that, you may not even know the actual number of endpoints your database will collect metrics from when you get started.

 

With all of these issues, it may sound like time series data is a burden. It’s not.

 

All you need are the proper tools to be able to easily unlock the huge value and deeper insights that time series data provides.

 

A time series database is one that is purpose-built to handle time-stamped data. Using a time series database that utilizes a dynamic schema approach (also called implicit schema or schema-on-write) fixes these issues right out of the box.

 

Another time-saving benefit is this type of database can accept any data that conforms to configurable ingestion protocols, which is a low threshold for valid data.

 

Users can streamline this process just by configuring an agent to collect data from each component of their system. This step ensures that the metrics adhere to the ingestion protocol. Once you configure the collection agent and metrics are ingested into the database, the schema is generated. As more data hits the time series database, the schema continually gets updated as necessary.

 

[Just be sure to choose a time series database that has an ingestion agent that can integrate with your systems]

Advantages to Dynamic Schema with Time Series

Evidence continues to mount to show users can benefit from using time series with the flexibility of a dynamic schema. Here are a pair of popular examples.

In a DevOps monitoring situation, you may not know all the different pieces of information coming out of a particular server, system, or infrastructure. In this situation, you may end up with hundreds or thousands of metrics, and defining a schema for them would be a very time-consuming process. A dynamic schema system removes this planning step and therefore accelerates the development process.

 

The IoT space is another area where dynamic schema presents an advantage. Let’s say you’re deploying hardware into the field and that hardware has its own set of sensors or systems that generate metrics. You can do remote updates on that hardware that changes the type of information the device sends back to your database.

 

If you need to spend time coordinating changes in the incoming data to your data stores, that adds significant complexity and time to your update process. If you want to tweak and refine the software on your field IoT devices, and you start collecting new metrics with an update or the name of a metric changes, then you don’t want data to get lost or dropped from the database.

 

With a dynamic schema, you continue to collect whatever metrics your devices generate, and the database adapts them to the schema as they come in.

In both the above examples, the pace of development and deployment with a dynamic schema is dramatically faster than with a SQL database. It can take weeks for a company to add a single field to an SQL database schema, and that can involve multiple meetings with database administrators and infrastructure engineers.

Final Thoughts on Benefits of Dynamic Schema with Time Series

If you’re looking to build applications that leverage time series data, for the best results use a database that supports dynamic schema.

 

Not only will this accelerate development by allowing you to deploy faster at the start of your project, but it will also make your applications more flexible as they mature. That’s because you can add new measurements on the fly, so you can quickly build out new features, generate more robust data, and create better user experiences. In short, a time series database with a dynamic schema allows you to build awesome applications in almost no time at all.

 


Russ Savage is the Director, Product Management at InfluxData where he focuses on enabling DevOps for teams using InfluxDB and the TICK Stack. He has a background in computer engineering and has been focused on various aspects of enterprise data for the past 10 years.




back