3 Reasons Why SQL Code Conversion Doesn’t Live Up to Its Promise

Many enterprises are scrambling to modernize their data warehouses for the cloud. Datometry’s Mike Waas lays out some limitations to SQL code conversion, and suggests database virtualization can.

Tags: code conversion, data, migration, SQL, virtualization, warehouse,

Mike Waas, Datometry
Mike Waas

"Database migration is very much an 80-20 problem. The first 80% consumes only 20% of resources. The remaining 20% is where most migrations die."

Application Architecture Summit
Modern Application Development for Digital Business Success
September 23,
Virtual Summit

It’s no secret that enterprises are looking to replace their existing data warehouse with modern technology in the public cloud.


IT leaders have a difficult decision to make - what is the best way to transition? They can either virtualize the existing data warehouse onto the cloud or go for a manual migration.


Manual migrations where many, if not all, applications need to be rewritten are often regarded as a clean-slate approach. “Out with the old, in with the new” appeals to decision makers.


However, manual migrations often seem to spiral out of control. They run into the millions and take several years. On top of that, they may damage an IT leader’s career and tarnish the team.


In response to this challenge, a few vendors started offering software tools that aim at accelerating this process. They promise to automatically translate SQL code — as best they can. To use these tools, extract SQL from the application, run the converter, and then re-insert the translated code into the application. It seems like a helpful approach at first. However, at close inspection, several significant deficiencies become apparent.


Let’s examine the areas where SQL code conversion is applicable -- and where it fails to live up to its promise.

Code Converters Save Less Than 20% of Cost

First off, for quite a number of cases, code conversion for SQL does work. These are typically the “easy” cases. Every workload has them. However, few workloads if any consist only of these easy cases. Rather, the difficult cases are everywhere and cannot be separated from the easy ones.


By their own admission, code converters can confidently translate about 60-70% of a workload. This limitation is due to the fact that for each statement they need equivalent language in the destination data warehouse. For many complex constructs this is clearly not the case, hence, the discrepancy.


Database migration is very much an 80-20 problem.


The first 80% consumes only 20% of the time, funds, and resources of a project. Progress is swift and implementers are optimistic to finish soon. Then comes the remaining 20%. The tail end of the project typically devours 80% of budget, time and resources. This is where most migrations die: seemingly close to the finish line, stalled by insurmountable problems.


What does this mean for code conversion? Solving 60-70% clearly addresses the majority of the easy cases. If the first 80% make up only 20% of the project’s cost, code conversion can contribute at best 20% of savings. It’s a start but it may fall distinctly short of what was advertised. This less than stellar return on investment comes as quite a surprise even to practitioners.

Automatically-Generated Code is Hard to Maintain

Maintaining machine-generated code is difficult, in general. This applies to just about any system that emits code. SQL generators are no exception. Their output is not necessarily meant to be human readable. Nor is it meant to be debugged or modified by humans.


When new client requirements surface after putting the translated SQL into production, changes to the machine-generated code are needed. Analyzing any kind of SQL code can be challenging. However, machine-generated code is in a class of its own.


Modifying this kind of code confidently is not for the faint of heart. It requires understanding the principles of the original code generation, first. Then, implementers need to decide if additions to the code should follow the same principles. Frustrated, they will often choose to rewrite the code completely at that point, voiding the original benefit of the conversion.


But there is one more dimension to consider:


Cloud data warehouse technology is changing rapidly. The entire discipline is still nascent and all vendors have been frantically revving their software to keep up with feature requests.


Code conversion initially delivers what might be appropriate code for the then current version of the data warehouse.  But, as the data warehouse evolves, opportunities for more efficient translations will be missed. 


Hence another downside to code conversion is that it commits the enterprise to the state of the art of a specific point in time. It neither anticipates nor facilitates changes over time.

Operationalizing Code Conversion Does Not Scale

Code conversion may have limited applicability and all the characteristics of a quick patch. Does it at least get the project off the ground quickly and deliver results quickly?


Alas, the process of applying code conversion to a complex environment around an enterprise data warehouse defies quick wins. To apply converters, the code in question needs to be extracted first. Most of it sits in integrated ETL or BI and reporting systems or, worse, custom applications. Extraction is a non-trivial task and manual. Re-insertion into the environment is even more difficult.


The converted code needs to be copied into the client system and validated. This procedure offers plenty of room for mistakes and ultimately evolves into “trial and error”. Because of its reliance on manual labor, the approach does not scale easily. For very large systems, it might be prohibitively costly and constrained by personnel.

Database Virtualization (Not ‘Data Virtualization’) Offers an Alternative

While code conversation may offer some options for some specific use cases, IT leaders should know they can choose alternative solutions.


Recently emerging technology to virtualize a database promises to be a comprehensive solution. This is not to be confused with data virtualization.


Database virtualization is quite different from “classic” data virtualization. Like more established virtualization technology, database virtualization connects host and application layer through a hypervisor. The hypervisor emulates the functionality of the legacy database in real-time.


Put differently, database virtualization adds a powerful compatibility mode to the new destination data warehouse. In addition to its own query language, the destination system now also “speaks” legacy languages. Instead of converting any SQL at all, this technique lets existing applications run instantly on the destination cloud data warehouse. No rewrites are required.


Unlike code conversion, database virtualization is not limited by the expressiveness of the destination data warehouse. Rather, it dynamically compensates for the gaps between legacy and new destination systems. Database virtualization makes up for missing data types, functions, and even control flow concepts like stored procedures, SET tables, or Global Temporary Tables.


So, how does database virtualization compare to static code conversion?


Database virtualization typically achieves 99.5% compatibility and eliminates the cost of a migration almost entirely. Since it does not freeze code to a point in time, database virtualization can grow with the destination database: as the database adds new features, the hypervisor leverages them transparently to emulate the existing code more effectively.


And lastly, the per-application effort is almost entirely limited to repointing applications to the new stack, which makes for a highly scalable implementation process.


With database virtualization, IT leaders have a strong alternative to manual migrations and automatic code conversion.


Mike Waas is the founder and CEO of Datometry, a SaaS database virtualization platform to enable existing applications to run natively on modern cloud data management systems without being rewritten. Mike has held senior engineering positions at Microsoft, Amazon, EMC, and Pivotal.