I've been working on a data warehouse project lately, in python, to support different kind of data analysis I am developing as part of my current work. I decided to use SQLAlchemy as the ORM; I can then quickly move from my development version using SQLite database, to production, using MySQL or MSSQL databases.
SQLAlchemy is also one of these amazing ORM that support sharding -- It's not necessary to tell that it's very important when you develop a tool that will import, format, process and analyze gigabytes of data.
Also, working with a lot of data types, to register them into my ORM instance, and to persist them into a database, I need my software to be able to quickly generate an object representing the data type: a particular instance of the object. Developers usually create factories in order to create instances of objects. The main idea is to delegate the instantiation of the object to a third party object. In most factories, we specify a type of object that we want to create: Give me an instance of a pizza with mushroom, tomatoes and ham.
The last point on asking for a particular type (or sub-type) of object was the main limitation for my use. In fact, most of my types are related in some ways, but without strong inheritance (Dish > Pie > Pizza); another important point is the maintainability of a code where I would list all different types of object my factory needs to create... Well, I wanted something more generic: a data driven factory.
The data driven factory is a factory that, based on the data sent to the factory object constructor, will produce an instance. A simple example would be to be able to get an instance of a Margerita pizza when giving the certain ingredients (tomatoes, mozzarella and parmesan) or a Neapolitan if I add enchovies.
This type of factory, which depends only on the data to give in parameter, is possible in python by using the class inspection capabilities of the language. In fact, the implementation I propose requires to register each class to be constructed in the factory, constructor arguments (and defaults arguments) will be analyzed for a matcher later on, and to give as arguments the "type" of each data field (basically, the arguments); the factory will then get the appropriate object for you.
Side note: The fact that the factory doesn't return an instance of an object is for performances. In fact, I get the class from the factory, store it and loop through the instantiation with millions of data...
Example of use:
class Shape(object):
pass
class Circle(Shape):
def __init__(self, center, radius=RAD_MAX):
....
class DiskHole(Shape):
def __init__(self, center, radius, small_radius=RAD_SMALL):
....
factory = DDFactory()
factory.register(Shape)
factory.register(Circle)
factory.register(DiskHole)
print factory.get(['center', 'radius']) #> return 'Circle' ctor
print factory.get(['center', 'radius', 'small_radius']) #> return 'DiskHole' ctor
You can access this factory here: dd_factory.py
In the distributed code, I assume that each object to create has a
tablename class member that tells which database
table is the eventual target (which is my case using SQLAlchemy / declarative
objects). This is easy to change by replacing the factory register method by
something like this:
def register(self, cls):
if hasattr(cls, '__init__'):
s_cls = str(cls)
args, defaults_dict = DDFactory.defaults_values(cls)
if s_cls not in self.registrar:
self.registrar[s_cls] = {'class' : cls, 'args' : args, 'defaults' : defaults_dict}


Last comments