StreamSets Embeds InfluxData’s Time Series Technology To Deliver DataOps for Cloud- Native, Multi-Cloud Projects

StreamSets is partnering with InfluxData to bring time series database monitoring and management to its multi-cloud DataOps platform. By embedding InfluxDB, The partnership brings developers more fine-grained and iterative capabilities in DevOps builds and operations.

Tags: DataOps, DevOps, cloud, InfluxDB pipeline, StreamSets, time series,

StreamSets is partnering with InfluxData to bring time series database monitoring and management to its multi-cloud DataOps platform.


StreamSets offers a multi-cloud DataOps platform for modern data integration, designed to help companies continuously flow big, streaming and traditional data to their data scientists and data-intensive applications.


InfluxDB is a  designed to handle high write and query loads. It is an integral component of the TICK stack. InfluxDB is meant to be used as a backing store for any use case involving large amounts of timestamped data, including DevOps monitoring, application metrics, IoT sensor data, and real-time analytics.


Under the partnership, StreamSets’ Dataflow Performance Manager is shipping with the InfluxDB time series database embedded in its product to provide a default data platform to handle time series workloads for cloud-native data movement.  By embedding InfluxDB,  StreamSets will let users remain agile and more efficiently develop and operate data movement across multi-platform hybrid infrastructures, according to SteamSet’s director of engineering Harikiran Nayak.

The result of the StreamSets/InfluxData partnership brings developers more fine-grained and iterative build-operate processes for data movement and data integration.  


Working together, these capabilities are key to delivering on the emerging category of DataOps – often  defined as the application of DevOps practices to data analytics, Nayak added.  They also provide a powerful combination to assist with projects that depend on large volumes of data streams, and loosely-coupled API integrations and microservices, according to executives from both firms.


“As enterprises continuously deploy and scale out their cloud-native architectures, there is a geometric increase in the variety of data sources and the volume of data in motion,” Nayak said in a statement. “In InfluxData, we have integrated a first-class system to handle this relentless, ever-growing stream of time series data in order to maximize performance for our StreamSets users.”


Brian Mullen, InfluxData’s vice president of business development said of their StreamSets partnership, “We are thrilled to be working with their world-class product team to serve this massive market opportunity, bringing simplicity and performance for time series data to StreamSets developers.”


A Closer Look at How StreamSets, InfluxData Will Improve  DataOps

The StreamSets DataOps Platform helps users build and operate batch and streaming dataflow architectures while iterating with agility to address new sources, infrastructure and analytic requirements.


StreamSets’ DataOps approach combines the open source StreamSets Data Collector (for execution of any-to-any pipelines) with cloud-native StreamSets Control Hub (for the design, monitoring and performance management of multi-pipeline topologies).  StreamSets DPM offers advanced dataflow management, including Data SLAs for availability, accuracy and data privacy; as well as tracking of historical performance across data pipeline versions.


StreamSets’ approach to DataOps aims to allow adopters to create continuous data movement architectures that iterate in response to rapid changes to data sources, infrastructure and analytics requirements.  


With what it calls “ever-on flexibility,” StreamSets aims to overcome the brittleness of traditional data integration tools, and is designed to handle “data drift,” frequent and unexpected changes to data that break pipelines and damage data integrity, Nayak added.


Working together, StreamSets and InfluxData technologies have been combined to support developers working with DevOps and modern provisioning, testing, and scaling with complex dataflows.


InfluxData is an open source platform built specifically to capture, monitor and set events for time series data. The InfluxData Platform is designed to provide a complete platform for handling all time series data in real time (from apps, APIs, microservices, humans, sensors, or machines).


The InfluxData architecture allows for the seamless collecting, storing, visualizing, and turning of time series data into insight – and further into triggers, events and action. With both fast deployment and fast performance, InfluxData has three major product offerings: InfluxCloud (fully managed and hosted service offering), InfluxEnterprise (software that can run on-premises or on any cloud provider), and an open source Time Series Platform.


Time series data is increasingly being seen as a key component in cloud-native application infrastructure workloads.

In fact, time series databases have been the fastest growing database category for the last two years, according to DB-Engines. This growth is being fueled by two major industry trends—IoT and cloud-native applications and services, all of which are being instrumented for real-time visibility and control.


Time series databases deal with specific workloads and requirements that rely on ingesting millions of data points per second. Further, time series supports better results for real-time queries and perform complex time-bound queries across such large data sets – importantly, in a non-blocking manner. It also allows users to downsample and evict low-value data, as well as optimize data storage to reduce costs.


The Value of Time Series for DevOps, DataOps

IDN spoke with InfluxData’s head of product marketing Navdeep Sidhu to learn more about why DevOps and DataOps requires companies to modernize their monitoring approach, including the use of time series data. The following is an excerpt from a Q&A with Sidhu during IDN’s Application Architecture Summit online event held last fall.   



As DevOps, AI/ML, IoT and other newer app trends become more prevalent, many enterprise developers are feeling the pressure, Sidhu told IDN.   

“They’re struggling. They are using Hadoop, MySQL or traditional RDBMS. But that is like putting the square peg in the round hole. These are major reasons companies are looking to replace their existing monitoring solution – many of which were built decades ago – with time series,” he said.

“Customers want 99.99 percent uptime but are being constrained by existing monitoring tools because they would only provide information at one second intervals. Customers want [to monitor] faster than that,” Sidhu added.


“Also, legacy monitoring tools (even with plug-ins) often have trouble coping with many characteristics of today’s modern apps,” Sidhu said, “such as high data volumes, rapid sampling, deep granularity or the ability to capture and use metrics at sub-second response level,” he added. 


“InfluxData’s founder Paul Dix was a developer who encountered many of these issues with monitoring, Sidhu added. After looking for a solution, Dix realized there was no easy way for developers to effectively handle time series data with traditional tools – and so he built a purpose-built platform for time series,” Sidhu said. 

Sidhu also highlighted some of the proven results InfluxData users have report since adopting its time series platform.


“We have seen that when developers are deploying the build, they are always interested in certain parameters,” Sidhu said.  “Every microservice that is being used within the app can now be monitored. You can also measure the efficiency of the whole DevOps process as well,” he said. Users can see that a build was completed in 30 seconds, and how that compares to earlier builds. Users can also see the relative performance of e very build, he added.


“Every deployment that you do essentially generates a lot of metrics that you, as an app developer, own and have. But you may not necessarily publish those. But if you open that up, you can see how good a job you are doing now looking at all those different metrics,” Sidhu said.


StreamSets is available with the InfluxData platform for handling time series data using the open source InfluxDB or the Influx Enterprise commercial product.