Qubole Adds Crucial Features To Maximize Data Lakes Business Impact, ROI in 2020

Qubole’s cloud-native data management platform for analytics is adding new critical features to help companies maximize their ROI and business impact from data lakes. IDN explores Qubole's new data privacy support, deep Tableau integration and cost tracking with CEO Ashish Thusoo.

Tags: ACID, analytics, BI, cloud, data lakes, data management, integration, Qubole, Tableau, transactions,

Ashish Thusoo, Qubole
Ashish Thusoo
CEO and co-founder

"With Qubole ACID support for transactions, the data lake can now support changing data."

Intelligent Data Summit
Analytics, Apps & Data for Success in the Digital Enterprise
Online Conference

Qubole’s cloud-native data management platform for analytics is adding features to help companies prep for using data lakes in 2020.


With notable timing, as the California Consumer Privacy Act is set to come into effect January 2020, Qubole has updates to help companies efficiently ensure their data lakes will comply with data privacy requirements.


Qubole’s new ACID capabilities will ensure that once a consumer requests the deletion of their data, it will be deleted across multiple engines (Hive, Presto and Spark) via a single platform.


The addition of Qubole ACID is not only efficient; it changes the rules for how admins work and update with data lakes,  Qubole CEO and co-founder Ashish Thusoo told IDN. 


“In old data lakes, the paradigm was write once and read multiple times. If you had any updates, you had to update the whole data set,” Thusoo said.  “That means, without Qubole, companies have to reprocess all the data and republish the entire set all over again. Now, because Qubole supports transactions on data lakes with Qubole ACID, data lakes no longer need to write once and read multiple times.” 


ACID stands for four traits of database transactions: 

  • Atomicity (an operation either succeeds completely or fails, it does not leave partial data), 
  • Consistency (once an application performs an operation the results of that operation are visible to it in every subsequent operation), 
  • Isolation (an incomplete operation by one user does not cause unexpected side effects for other users), and 
  • Durability (once an operation is complete, it will be preserved even in the face of machine or system failure). 

Qubole’s new ACID ability will prove especially important for cost-effective compliance with CCPA (and GDPR) Thusoo added.  


“With GDPR, for example, if a user quested his information be deleted, the entire data set might have to be updated and then re-built. That’s because with GDPR and CCPA, you need to forget ALL the interactions that the user has done,” he said. 


That can mean time-intensive and costly updates to a data lake as well as logs. When considering that CCPA or GDPR requests may come in daily or weekly, it can quickly prove difficult, if not untenable, to run timely updates.   


“With Qubole ACID, and our support for transactions, the data lake can now support changing data and all these compliances very efficient way, without reprocessing all the data,” Thusoo said. This means companies can delete data in the data lake much more easily, he added.


“We’re thinking ahead of the curve to ensure that our customers have full confidence in their data integrity and financial discipline, no matter where their data is stored.” Thusoo said.


Qubole’s addition of comes as at least one survey suggests just 52% of businesses expect to be CCPA-compliant by the January 2020 deadline. 


Beyond Qubole ACID, the platform also released several other critical updates, including a “native” connector to Tableau and a way for companies to better assess the ROI from their data lake projects.  


Qubole, Tableau Partner To Simplify Data Lake Access for BI, Analytics  

Qubole and Tableau have partnered on a new solution to make it easier for companies to empower their BI workers to more easily tap into their immense and growing cloud-based data assets. 


The Qubole / Tableau cooperation has resulted in a new “native” connector, now available that allows Tableau users to access such cloud data seamlessly. 


Qubole’s Tableau Connector provides data teams the ability to run visual analytics on data lakes with simpler access, more choice and petabyte-level query. Equally valuable, the “native” approach offers high-performance and auto-scaling that can support queries that can benefit from machine learning. 


Even though Qubole has long worked with Tableau, the new native connector (and native driver) drastically simplifies Tableau user access to data lake analytics via Qubole, Thusoo said.  


“Without this, the user had a multi-stage process. Users had to install a driver, come back to Tableau, connect the driver and then work with Qubole. Now, they can go into Tableau and voila they are in Qubole,” Thusoo told IDN. “They don’t have to write SQL code or anything technical like that. They all go to Qubole seamlessly. 


Other benefits from Qubole's ‘native’ Tableau connector include: 

Increased openness and flexibility: Tableau customers have choice and flexibility, as Qubole’s Connector allows querying of unstructured or semi-structured data on any data lake regardless of the storage file format – CSV, JSON, AVRO or Parquet.


Performance boost for queries: Leveraging the power of optimized Presto on Qubole – a high-performance, distributed SQL query engine – Tableau users can query multiple big data sources in industry-leading response times, without changing their typical workflow.


Abstraction from administrative complexity: Qubole manages cloud infrastructure automatically based on workloads, eliminating the need for manual administration or rebalancing of compute clusters with changing BI needs.

Financial governance: With Qubole’s native workload-aware autoscaling and intelligent cluster management capabilities, Tableau customers avoid data processing cost overruns with guaranteed compute resources for their queries at all times.

Combining Tableau’s reliability and scalability with Qubole’s performance and auto-scaling for analytics and machine learning, enables faster, simpler petabyte-level querying of big data, he added. 


The release of Qubole’s Tableau Connector also reflects the growing adoption and popularity of Qubole from the Tableau community. As Thusoo put it: “This only happened with the pull from the market. Tableau let customers submit requests and Qubole was right up there among the top requests.” 


Qubole Cost Explorer Lets Admins Track Costs, ROI of Analytic Workload

Qubole also released its Qubole Cost Explorer, which delivers real-time reports and insights on job, cluster and user-level costs. The goal is to offer execs and teams granular insight into their spending on a minute-by-minute basis, Thusoo said. 


“We heard loud and clear from clients; they want a better way to justify costs. The client will say, ‘We run workloads and then I get hit with a bill. But I don’t know what the ROI from all that is,” he said.  “With Qubole Cost Explorer, we now enable the admins to understand -- at a very fine-grained, workload level how much cost that workload incurs. And they can see if there is enough ROI from that data lake task.”


Cost can be a real issue for enterprises, Thusoo told IDN.


“As soon as the data lake is productionized, there can be 1000s of people on the data lake. So, with that kind of scale you need some governance model where you can inspect that all the [expense from] compute is being justified,” he told IDN
Qubole’s new cost-tracking capability is not meant to stifle experimenting with new data models, Thusoo said. It’s simply a way to let admins track whether the model is panning out once it moves to production. 


"The bulk of compute expense goes into the production, and a small percentage into exploration. So it’s always best to optimize what expense is in production. 


"Once you create your new model and you spin it into production, after a month or two months, you want to figure out if that model may or may not be paying off. In this stage, you may find all these computations you’re running [to execute the model with real data] in production are actually costing me much more than what it is bringing in terms of ROI," he told IDN. 


When combined with Qubole’s workload-aware autoscaling, which automatically upscales, downscales and rebalances clusters depending on the amount of compute power a customer is using at a given time, Cost Explorer helps customers significantly decrease cloud costs.


Qubole also announced support for multiple open source engines (Hive, Spark and Presto) for Apache Ranger, a popular framework for data security. This support provides granular data access controls based on user privileges, all within the Qubole data platform.


These updates are available to all Qubole customers today. 


Qubole’s Tableau Connector is available through Tableau Desktop and Server.