Cloudera, ParallelM Partnership To Boost Power of Machine Learning in Production

Cloudera, in partnership with ParallelM, aims to deliver super-scale to machine learning capabilities during operations and production.  The latest partnership comes as the Cloudera / Hortonworks merger became official last month, and the newly-combined company’s vision for AI/ML and new-gen analytics begins to take shape.

Tags: AI/ML, Apache, CDSW, cloud, Cloudera, Hortonworks, MCenter, model, ParallelM,

Cloudera, in partnership with ParallelM, aims to deliver super-scale to machine learning capabilities during operations and production.  The latest partnership comes as the Cloudera / Hortonworks merger became official last month, and the newly-combined company’s vision for AI/ML begins to take shape.


Inside the ML-Drivers for the Cloudera / ParallelM Partnership

The Cloudera / ParallelM partnership brings together Cloudera Data Science Workbench (CDSW), the company’s popular machine learning development environment with ParallelM’s MCenter machine learning operationalization (MLOps) technology.


CDSW will work with ParallelM’s MCenter via a new model deployment connector. Thanks to the new connector, CDSW models can now be easily imported in MCenter, where they immediately enter an automated production process. Further, they can be managed on any cloud or on-premise environment, according to company execs.  


ParallelM’s MCenter is powered by a set of patent-pending technologies for production model management and collaboration. These technologies are specially designed to support delivery of automated analytics across the lifecycle to  ensure prediction quality and orchestration.


Thanks to the integration, Cloudera’s CDSW models can now be imported in ParallelM’s MCenter, where they automate the production process and can be managed on any cloud or on-premise. Under the covers, MCenter provides full capabilities for batch, real-time and streaming deployments with integrated model health monitoring and full model governance to track every detail regarding the model's use in production.


In addition, the CDSW/MCenter integrated environment provides centralized management of ML models, while taking advantage of existing infrastructure investments. The integrated environment lets users easily optimize new models (and model updates) by using actual production stats. Users also gain control over a full end-to-end model lifecycle management capability – from development to production, execs noted.


"Automating the deployment and management of machine learning models in production is a key part of our strategy to industrialize AI. Our customers are asking for full model lifecycle management including advanced model health monitoring,” said Cloudera’s general manager for machine learning, Hilary Mason in a statement. “Our partnership with ParallelM delivers that and more while leveraging our customers’ investments in Cloudera to help build the environment for the enterprise AI factories of the future."


ParallelM CEO Sivan Metzger added, "Cloudera has a clear vision of what the future looks like for AI-driven companies that leverage investments in big data and put them to work in actual ML-based applications using Cloudera tools for data science and machine learning. We are excited to be part of this vision and to put our industry-leading MLOps platform, MCenter, to work for Cloudera's customers.”


How ParallelM’s MCenter Works;  A Review of its Major Components 

MCenter works by moving ML pipelines into production, automating the orchestration, and guaranteeing machine learning performance 24/7. It is the single solution where data scientists, IT operations, and business analysts come together to automate, scale, and optimize machine learning across the enterprise.


Among MCenter’s major components are:  

MCenter Workspace: The MCenter workspace enables collaboration between operations & data science teams to ensure ML success in production. It includes intuitive dashboard views, ML Health indicators, KPIs, and advanced visualization and diagnostic tools.


MCenter Server: The MCenter server orchestrates ML Applications and pipelines via the MCenter agents. It executes policies, manages configuration, and sends data to the MCenter console. The MCenter server enables automation of all the key tasks related to deployment and management of ML.


MCenter Agents: The MCenter agents trigger analytics engines and manage local ML pipelines. They provide visibility into the activity of the pipeline and sends alerts, events, and statistics to the MCenter server. They are compatible with popular analytic engines including Spark, TensorFlow, and Flink.


Flexible Deployment Options: MCenter can be deployed in the cloud, on-premise, or hybrid scenarios. It also works across distributed computing architectures that include inter-operating, diverse analytic engines (Spark, TensorFlow, Flink, PyTorch).

Other Notable Cloudera AI/ML Initiatives, Developments

Here’s quick review of other Cloudera AI/ML initiatives – shipping technologies and roadmap plans:


CDSW models deployed in MCenter can also leverage Cloudera's Distribution Including Apache Hadoop (CDH) infrastructure for processing including dynamic retraining for streaming or batch predictions.


The Cloudera / ParallelM partnership will also extend to the upcoming cloud-native Cloudera Machine Learning platform, just unveiled in December 2018.  


The upcoming Cloudera Machine Learning is engineered to deliver fast provisioning and autoscaling as well as containerized, distributed processing on heterogeneous compute. Cloudera Machine Learning also aims to ensure secure data access with a unified experience across on-premises, public cloud, and hybrid environments.


In detail, Cloudera Machine Learning capabilities include:

  • Seamless portability across private cloud, public cloud, and hybrid cloud powered by Kubernetes
  • Rapid cloud provisioning and autoscaling
  • Scale-out data engineering and machine learning with seamless dependency management provided by containerized Python, R, and Spark-on-Kubernetes
  • High velocity deep learning powered by distributed GPU scheduling and training
  • Secure data access across HDFS, cloud object stores, and external databases


AI/ML is a Major Focus for the Newly-Merged Cloudera / Hortonworks

The merged Cloudera / Hortonworks will have AI / ML as a keen focus, according to execs.


“At Cloudera we see winning organizations embedding machine learning and AI across their business to improve customer expiree, automate operation, reduce risk and create real value,” Cloudera’s Mason told Datanami last month.   “This isn’t about building one applicant or model. Machine learning excellence requires the team, organization, and infrastructure to build and manage hundreds or even thousands of applications and models. At Cloudera, we refer to this trend as the industrialization of AI.. . [and] it’s our strategic focus.”


In fact, AI appears to have been a focus even before the merger – if not a driving factor.


At the time of the announcement of Cloudera / Hortonworks merger last October, Cloudera CEO Tom Reilly said: “Our businesses are highly complementary and strategic. By bringing together Hortonworks’ investments in end-to-end data management with Cloudera’s investments in data warehousing and machine learning, we will deliver the industry’s first enterprise data cloud from the edge to AI. This vision will enable our companies to advance our shared commitment to customer success in their pursuit of digital transformation.”


Among notable AI/ML activities in the Cloudera / Hortonworks merger roadmap are:


Cloudera is blending features from Hortonworks Data Platform (HDP) V3 and CDH V6 in a unified edition.


CDSW is integrating with HDP. A version is available expressly for HDP clusters. It aims to help data scientists perform collaborative data exploration, visualization, model development and deployment using popular open source tools (Python, R, and Spark). It also provides on-demand access to data and compute in HDP -- or data anywhere


Cloudera’s other support for AI/ML also includes:    

  • The building, training and deployment of  scalable ML and AI applications on Cloudera Enterprise Data Hub
  • Speeding time to results with Cloudera Fast Forward Labs applied ML advising and research and
  • Putting strategy and execution on the fast track with Cloudera Fast Forward Labs strategic advising and ML application development services.


Also notable for new-gen analytics, HortonWorks Data Flow (HDF) is being reborn as Cloudera DataFlow (CDF), an analytics-focused data-in-motion platform.


In a blog post, Cloudera tech evangelist Dinesh Chandrasekhar described CDF this way.

CDF is a scalable, real-time streaming data platform that collects, curates, and analyzes data so customers gain key insights for immediate actionable intelligence. It meets the challenges faced with data-in-motion, such as real-time stream processing, data provenance, and data ingestion from IoT devices and other streaming sources. Built on 100% open source technology, CDF helps. . . deliver a better customer experience, boost. . . operational efficiency and stay ahead of the competition. 

CDP is engineered to work with Apache NiFi, (and sub-project MiNiFi) and Kafka