StreamSets Builds Out Enterprise-Class Native Cloud DataOps Platform

DataOps provider StreamSets continues to expand to help enterprises more efficiently develop and operate agile data pipelines. IDN looks at the latest updates to the StreamSets DataOps Platform.

Tags: Cloud, data science, DevOps, integration, pipeline, StreamSets,

The StreamSets DataOps Platform is adding new certifications and partners to expand its reach for cloud-native enterprise data integration.


StreamSets’ latest cloud initiatives and partnerships aim to help customers simplify hybrid and cloud projects that combine streaming and traditional data for data science, deeper insights, and analytics applications.


“There is an increased urgency for adopting the cloud in the enterprise as we see rapidly changing needs for data storage and data delivery taking place outside of the workplace and data centers,” said Jobi George, StreamSets general manager for cloud and partnerships.


“Our goal is to help companies efficiently develop and operate data pipelines that can remain agile in the face of change, so helping our customers move to the cloud swiftly and securely has been a priority in recent months,” George  added. We also work closely with our customers once they are in the cloud to keep data anonymized and secure while so many are working from home.”


With StreamSets DataOps Platform, data workers can build pipelines in minutes using visual, full-lifecycle tools to design, operate, manage and optimize data pipelines across the enterprise.


Specifically, StreamSets provides a single view across all data operations, on-premises or in the cloud. This lets users run development and production projects on multiple platforms -- without rework to optimize decision-making and support different business requirements, George noted. This keeps data fresh and synchronized across multiple platforms, he added.


Also, StreamSets DataOps Platform abstracts away the complexity of modern data to deliver unmatched resiliency. The platform lets users build “smart pipelines with data drift handling,” along with end-to-end visibility. 


Working with AWS, StreamSets combines DataOps agility with the AWS ecosystem through its cloud-native integration.


With StreamSets and AWS, customers can move massive amounts of data very quickly into Amazon S3 (Simple Storage Service) or Amazon Redshift using any legacy or on-premises data source. Also, users can migrate data while managing their AWS resources via StreamSets Transformer, which includes data bulk loading and ingestion, cluster management for Amazon Redshift and Amazon Elastic MapReduce (EMR), and enterprise control for Amazon S3.


StreamSets has also achieved AWS Data & Analytics Competency as an AWS Partner. [This signifies that StreamSets provides specialized and demonstrated technical proficiency and proven customer success for data migration projects.] 


For cloud projects, StreamSets also partnered with Intel to offer optimized machine instances for both AWS and Microsoft Azure. The StreamSets/Intel partnership aims to provide higher performance and processing speeds without high costs.


StreamSets also partnered with HPE Ezmeral Container Platform to combine features of a hybrid DataOps platform with container support and open-source Kubernetes, George noted.

Inside the Components of StreamSets DataOps Platform

The StreamSets DataOps Platform features the StreamSets Data Collector. This component  simplifies data ingestion pipelines and lets DataOps teams build streaming data pipelines from any source to any destination, George noted, The collector serves as is an easy-to-use modern execution engine for fast data ingestion and light transformations.


To promote simplicity, StreamSets comes with more than 100 pre-built connectors to get pipelines up and running fast -- without special skills, according to the company. 


Connectors are available for many popular environments, including AWS, Azure, Cloudera, Oracle, Salesforce and Redis.


Having such a wide variety of connectors aims to deliver users simpler ways to achieve time-consuming tasks, George noted, including:

  • Design pipelines for streaming, batch and change data capture (CDC) in minutes;
  • Trigger CDC operations to keep data fresh and protected and
  • Monitor data in flight and handle data drift with fully instrumented pipelines.

StreamSets DataOps Platform also has other components, including StreamSets Transformer, which leverages Apache Spark for making ETL pipelines easier to build; and StreamSets Control Hub to help teams design, deploy, monitor, and manage smart data pipelines at scale.