How Python Boosts App Integration -- Part II
In this week's Part II, Martelli offers insights on Python's different approach to working with Win32, Java,.NET, and even C/C++ apps. In specific, he looks at the technical differences, including: syntax, dynamic typing and Python's tendancy to classify "first-class objects
In Part I, Martelli discussed the opportunities for using Python with Win32, Java and .NET, highlighting the similar skills and approaches that developers can use.
Read the interview:
An IDN Interview
with Alex Martelli
author, "Python in a Nutshell
IDN: You note the similarities between .NET Java and C compared to Python? Are there major differences, as well, that might trip up some developers?
Martelli: Yes, there are major differences, as well. But let me first dwell on another issue where Python does not differ from the languages you just mentioned, but does differ from other very high-level languages (such as Ruby and Perl) that in many ways do enjoy power similar to Python's: The gist of it is, those other languages tend to be monolithic, embedding all sort of powerful tools (such as regular expressions) into the language proper, while Python, again with the same philosophy as C and its descendants, is more modular, keeping the language proper to a minimum and offering most powerful tools through a standard library of modules that come with the language.
Powerful tools such as regular expressions play a completely different role than language fundamentals such as assignment, argument passing, raising and handling exceptions, returning values from functions. Python includes in the language proper only those concepts and functionalities that really belong there (including some powerful general-purpose datatypes such as strings, lists (which, despite their name, are more akin to dynamic vectors or arrays of other languages) and dictionaries (completely general and highly optimized data tables).
In Python (much like in C, C++, Java, and .NET), everything that can most reasonably be supplied via a library module is kept in a library module, always available for import as and when needed, but not complicating the language itself needlessly. Regular expressions, for example, are in the Python standard library module named re. If and when a developer needs their power, it's all there -- but often simple string manipulation will suffice, because Python strings are quite rich and powerful, too.
Consider, for example, the recently released Python 2.3 version. If you need to determine whether a certain string (referred to by the name needle) is part of another string (referred to by the name haystack), there's no need to pull out regular expressions in all their power -- just code:
if needle in haystack:
print 'not found'
Python makes simple things simple, and complicated things not too complicated. If you need the search to be case-insensitive and anchored on word boundaries, then it's time to whip out regular expressions:
re_needle = re.compile(r'%s'%re.escape(needle),re.IGNORECASE)
print 'found with all the trimmings'
print 'not found'
Of course, there is some learning involved here, especially if you've never used regular expressions). But you need only tackle that learning if and when you need regular expressions' powerful string-searching capabilities; many application areas might never need them at all.
IDN: What are some basic Python skills that would help a developer get started with his/her first project?
Martelli: You can get started (and become very productive, indeed) with Python without all the trimmings, beginning by using just the language proper and its core built-ins, and gradually adding to your know-how those modules of the standard library that you need, as and when you need them.
For example, you won't need to dabble in regular expressions until and unless you require advanced text-parsing abilities, just as you won't need to dip into other modules of the Python standard libraries (such as those that let you easily implement web servers, web clients, e-mail handling, XML-RPC distributed processing servers and clients and the like) until and unless you require those specific functionalities.
In this, Python is very different from other high-level scripting languages that put huge amounts of built-in functionality (such as regular expressions) in the language itself, rather than in appropriate library modules, thus forcing a very steep learning curve. Python is very easy and fast to learn and become highly productive in -- by design!
IDN: Do developers need to know object-oriented programming to take advantage of Python?
Martelli: No, Python is quite suitable for traditional procedural programming just as it is for object-oriented programming. In fact, Python is probably the best way for a "legacy programmer" to learn object-oriented programming in the first place, because it can offer a gradual, smooth transition from procedural to OO programming.
Built-in Python objects, such as strings, lists, files and dictionaries, offer methods -- i.e., named functions that you can call on to obtain access to some of their functionality. For example, given a list in a variable named foo, you can call foo.sort() to sort the list in-place; and you can call foo.index(23) to find out the index in the list of number 23 (this will raise an exception if number 23 is not present in the list).
Any programmer will easily catch on to the idea of calling such methods, e.g., somelist.index(anitem) to find an item in a list -- then, from there, you can move on quite naturally to writing your own object classes, implementing similar methods (and quite possibly subclassing existing types to add or tweak functionality). Python offers the programmer object-orientation not as a hurdle to be negotiated, but rather as one more useful tool to get the job done.
IDN: Could you highlight some of the more notable parallels and differences between working in Python versus other languages?
Martelli: In many ways, Python is a lot like Java. First of all, Python's typical implementation, like Java's, is based on a compiler that turns source into bytecode, and a virtual machine that runs the bytecode. In the Classic Python version, the bytecode and VM are specially designed and optimized for Python's own needs, while in the jython version, Python piggybacks onto Java's own bytecode. But the main difference in this respect between Python and Java is that, with Python, you never need to explicitly ask Python to compile your source files into bytecode files: Python will do that automatically and implicitly if and when it sees no bytecode file corresponding to a source file for a module it's being asked to import, or when the bytecode file exists but it's older than the source file. In a sense, it's like having an implicit, automatic make built-in (make is a popular facility on Unix, but also on Windows, in versions such as the nmake one that is part of Microsoft's Visual Studio product), so you never need to worry about compilation, per se.
More Python-Java parallels: Everything in Python inherits from 'object', just like in Java everything inherits from 'Object'. As I have already mentioned, Python also has consistent object-reference semantics, just like Java and, in fact, even more than Java (in that the same semantics also apply to numbers, which in Python are immutable, just like strings). Moreover, Python, like Java, comes with an extremely large and rich standard library, and has built-in and/or library support for such rich functionality as introspection, serialization and multi-threaded operation.
In many ways, Python is a lot like C++. First of all, Python is multi-paradigm, like C++: Unlike Java, Python and C++ don't force you to program in object-oriented ways -- they fully support OO programming, but you're also allowed to program procedurally when you believe that is more appropriate. Python, like C++'s templates, hinges on signature-based polymorphism -- in other words, despite the hugely different surface syntax, Python and C++'s templates have a deep commonality: You can substitute objects based strictly on the methods and operations they implement, without needing to introduce inheritance artificially just in order to achieve polymorphism.
More Python-C++ parallels: Python, like C++, supports multiple inheritance and operator overloading. Most of all, Python, like C++, offers you a sometimes dazzling variety of choices in terms of supporting technologies: For example, for both classic Python and C++, there isn't one single "official" GUI toolkit or two, but a choice of dozens, both free and commercial, both platform-specific and cross-platform. Of course, out of such dazzling variety, a few "best of breed" choices emerge; for example, the outstanding choices for cross-platform GUI toolkits for C++ are wxWindows if you want a free toolkit, and Qt if you want a commercial one. Unsurprisingly, the outstanding choices for Python are wxPython and PyQt, respectively, the exact counterparts of their C++ equivalents. (The GUI toolkit that comes with Python, Tkinter which I cover in Python in a Nutshell, is widely used because it's widespread and quite easy to use, but I wouldn't recommend it for commercial-quality applications, because I think wxPython and PyQt are better).
In some ways, Python is surprisingly like C -- indeed, Python is closer in spirit to C than either C++ or Java, despite the fact that the surface syntax of both C++ and Java is close to C, while Python's syntax is cleaner and more lucid. The ANSI C rationale expressed "The Spirit of C" very well and explicitly in five points:
- Trust the programmer.
- Don't prevent the programmer from doing what needs to be done.
- Keep the language small and simple.
- Provide only one way to do an operation.
- Make it fast, even if it is not guaranteed to be portable.
Python matches the first four of these five points strongly, and can thus be said to adhere to at least 80% of "The Spirit of C" -- which is far more than can be said of either of C's alleged successors, Java and C++. Admittedly, speed is not a focus of Python anywhere as strongly as it is for C (and C++), so the fifth point is only matched weakly. But 4 out of 5 ain't too bad..
In many ways, of course, Python is drastically different from each of C, C++ and Java. The key difference, I would say, is that C and C++ are optimized for machine performance, while Python is optimized for programmer performance -- Python programs will most often not be as fast as ones you could carefully craft in C, say, but it may take you 1/10th of the time to code the same functionality by programming in Python as it would take were you programming in C.
These differences are reflected in three technical ways:
- Clean syntax -- Minimizing "chart junk" ("pixels that carry no information," according to E. Tufte's memorable books on visual display of information): Blocks have no braces -- just indentation -- nor do if> or while need parentheses around the conditions, for example;
- Strong but dynamic typing --Objects have types (strong ones, not weakish as in C or C++ -- in Python, as in Java, you can't just use a cast to "forcibly interpret this bunch of bits as if it was of type X rather than Y") but names do not -- names are just names and can be attached to object of different types; therefore, there are no "declarations," just executable statements (each piece of code does something, rather than existing just to "tell" the compiler about some characteristic of the program); and
- Most everything is a first-class object -- This includes, for example, classes, functions, methods, modules, packages -- so you can have each and any of these as, for example, arguments to functions, items in a list, return values from methods -- the amount of "design patterns," boilerplate and general horsing-around that is obviated thanks to this incredibly simple and powerful design choice is nearly unbelievable.
IDN: In your book, you're very keen on the Python DBAPI. Can you provide a short example of how a programmer/DBA might use that technique to solve a common database access problem?
Martelli: Python's Database API defines a uniform interface to allow Python programs to access relational databases, and many third-party modules implement this interface while connecting to all kinds of different database engines, including commercial ones (such as Oracle, IBM DB/2, Sybase, ...), Open Source ones (such as PostgreSQL, MySQL, SAP/DB, ...), and lightweight relational DB implementations such as Gadfly (which is coded in Python) and SQLite (a free library that lets you embed SQL functionality directly in any program of your own).
The ways in which a programmer developing applications, and a DBA developing utilities to ease his or her job, would exploit this interface (and the modules that implement it), are rather different.
From a programmer's point of view, Python's DBAPI just eases a task that should always be undertaken, whatever programming language one chooses to embed SQL access functionality in: develop a database-independence layer for one's application programs, so that, even if at some point in time, the programs rely on one specific database engine or DB implementation, it remains possible and reasonably easy to port or migrate the application to different engines or DBs in the future. The DBAPI can't do it all, because, despite the existence of SQL standards, just about each DB engine or implementation still has its own quirks and "SQL dialect" issues.
But the task is nevertheless important, because just about every successful DB-centered application will eventually present a need or opportunity to port or migrate to using some other DB engine or implementation instead of, or in addition to, the original one; and while the main enabling factor in this task is to architect and factorize one's application properly, the language-level uniformity afforded by the DBAPI substantially eases the design and coding parts of the task (including the likely need to re-architect, re-factor, and re-code, if, as is often the case, the database independence issue had not been properly considered in the application's early architecting and design phases).
IDN: Based on that backdrop, what elements of Python would be interesting to a DBA looking to some build database sharing functions with Python?
Martelli: The program-level uniformity of the DBAPI will of course prove very useful in any case in which the DBA needs to exchange data between two different databases -- not all that widespread a need, but slowly growing as various enterprises merge or split their operations.
Also, even where a DBA needs only worry about one specific DB engine, the DBAPI's simplicity and power will help -- in this case, by getting the problem of "how do I connect to the DB and query and update it" reasonably well out of the way, reducing it to coding the appropriate SQL statements -- once that's done, the DBA will be able to use Python's power to process, display, manipulate or alter the data in any needed way, with conditional or repetitive processing, summaries, cross-correlation tables, statistical analyses and the like.
Using a module compliant with the DBAPI boils down to a few simple steps, once the proper modules for one's database needs have been selected and installed.
IDN: Could you outline how a developer might build a program using one of these "DBAPI-friendly" modules?
Martelli: They're called "DBAPI-compliant modules," actually, and yes, I can. First, it's helpful to understand that such a module supplies a 'connect' function that takes some arguments (such as 'database' for the name of the specific database to connect, 'user' and 'password' for authentication, and the like) and returns an object that is an instance of the Connection class.
Methods on the Connection instance control transactions (commit and rollback) and create 'cursors'; i.e., instances of class Cursor (the DBAPI ensures that such cursors can be obtained even for databases that normally don't supply them, such as MySQL). In turn, methods on the Cursor instance execute SQL statements (both queries such as SELECT, and DB updating operations such as UPDATE), optionally taking parameter values from Python-supplied data, and fetch results, converted into ordinary Python-usable data of appropriate types.
So, a typical sequence of operation steps might be, for example:
import [insert appropriate module here] as DB
connection = DB.connect( [insert appropriate parameters here] )
cursor = connection.cursor()
cursor.execute('SELECT a, b FROM table WHERE d=%s', datum)
try: a, b = cursor.fetchone()
except DB.Error: break
IDN: How might a developer begin to leverage some Python attributes to help support certain web services (XML, portals, database integration, custom GUI) projects?
Martelli: I would say that the web-service projects where Python will shine most brightly may be the most ambitious ones, the ones based on custom network servers and clients with high targets of scalability and availability. Python supports particularly well the XML-RPC protocol for web services, although SOAP support is also available (not in the standard Python library yet, but in free third-party modules).
For any Python network server or client application, a developer would be well advised to consider the free Twisted Matrix package, which lets you put together specialized, custom pure-Python network servers with ease, flexibility, and excellent performance and scalability. Indeed, the reason that Python is likely to shine in the hardest projects is just because the intrinsic difficulty of such projects may well require all of Python's power, boosted by Twisted, advanced XML processing tools, interfaces to databases, customized GUIs and the like.
IDN: What features in Twisted Matrix are so valuable to web services developers?
Martelli: Twisted offers a rich palette of standard implementations for just about all existing network protocols, and, when needed, you can specialize and customize those implementations by subclassing Twisted-provided classes and overriding some of their methods appropriately.
The whole Twisted architecture hinges on the key concept of asynchronous (also known as event-driven) operations (although threading and multi-processing can also be smoothly integrated when needed), which is really the secret weapon for getting very good performance and scalability without any sacrifice in flexibility and power -- the implementation of the event loop (technically known as a "reactor") at the core of the scheme can in turn be chosen among a palette of implementations that optimize integration with other systems, such as GUIs for client-side operations, or pure server performance, depending on the underlying operating system.
However, if the Python parts of the network server must integrate with existing infrastructure, and in particular with an existing web server, it's also possible to forego Twisted in favor of different architectures where Python piggybacks on the web server in question (of course, in this case, the performance and flexibility will probably not be as good as Twisted could allow, but nevertheless there may be good reasons, such as existing legacy services that aren't written in Python and thus can't be easily ported to Twisted, or the need to exploit existing arrangements for system administration and updates that are inextricably tied to existing web servers).
Whether you choose a free-standing (Twisted) architecture or one based on hosting of Python code in an existing server and framework, you still get all of Python's power available for your custom processing needs, and a host of library modules to help you, both standard ones that come with Python and powerful third-party add-ons when necessary.
A completely separate possibility is Zope, which some consider to be Python's " killer application" -- an Open Source application server, optimized for content management, portals and custom applications, entirely coded in Python and easily extensible, in Python, in any direction you may need. Zope's connection with Python is very strong -- at this time, many of Python's core developers, including Python's chief architect, Guido van Rossum, are employed by Zope Corporation.
However, I'm not really a Zope expert, myself, so I can't supply particularly useful advice about it, beyond that of taking a look at it if you think you may need such a rich, complete approach for some of your enterprise's networking needs. should tell you all about it.
IDN: What about the benefits that Python can provide in traditional Win32 environments? Can you put any of them in context with ADO, COM and ASP techniques?
Martelli: Python, on a Win32 platform, can integrate particularly well with COM -- and, through that royal road, with just about everything that runs on Win32, including applications, free and commercial components, system services and infrastructure. For that, you need "Classic Python" and the win32all extensions add-on (you can freely download Classic Python, in a win32 build that already integrates win32all and other useful components, from ActiveState, or you can choose to start with the standard Python win32 distribution from and get and install win32all separately).
Once you have Classic Python with the win32all extensions, you get complete access to COM and all of the lower-level Win32 APIs -- plus all of Python's other powerful possibilities, of course. For example, you can get a module that respects the DBAPI interface and is internally implemented on top of ADO -- so you can choose to program to ADO-interfaced data sources without losing the portability and simplicity of Python's DBAPI; alternatively, you can instead choose to use ADO directly from your own Python code (if you don't care about portability, of course).
Similarly, you can keep using Python's typical cross-platform GUI toolkits (the somewhat poor but very simple Tkinter, the rich, powerful and free wxPython, or the incredibly powerful commercial PyQt), or you can choose to use a Python interface to Microsoft's classic MFC (known as PythonWin, and included in win32all) -- you can integrate ActiveX Controls in the interface as you might in Visual Basic, and so on.
If you program server-side web logic in Python, you can choose to integrate with IIS via ASP (since, with win32all, Python implements Microsoft's "Active Scripting" standard), instead of using Python's own more powerful approaches such as the Webware toolkit (which can integrate with IIS, Apache or other servers, and gives you a wealth of choices in terms of architectures for your Python-coded web pages, including ASP-like embedding of Python code directly inside HTML, "servlets" similar to those Java made familiar, and/or templating approaches that often provide the solution that will be easiest to maintain and extend for non-programmer website designers).
When feasible, I would normally suggest choosing Python-specific architectures (such as Webware for websites, PyQt for GUIs, native DBAPI modules for SQL interfacing, etc), but Python's uncanny ability to infiltrate into every technological niche may allow its use in situations based on the need to extend existing systems, where technology-choice and architecture-choice constraints preclude the use of languages bent on providing their own preferred architectures and technologies.