Code generation - here I come, once again.
Once upon a time I had to create a cms system. The system was quite large and I had to do it in about one month. What is more, the main reason I was creating the new system was to fix the issues we had with the old one. The point is that I had really not enough time.
However, I was able to create quite a decent application. I was rather satisfied with the result and the system didn't have a lot of code repetitions in its core.
Most of the logic was handled by business classes and they implemented good abstraction, i.e. versioning, cache and such stuff were implemented in base classes. However, the project had similar controllers and models (models in framework I used were something between a classic model and controller) for every module such as article, news, banner, etc. Due to a short deadline I was literally copying the files and changing names in them using replace function in IDE. Then, when I had some time to work on that code I changed all literal strings and other things specific to modules to constants that pointed to configuration in particular modules. My aim was to create classes that didn't force me to think while I was replacing text in source files, e.g. instead the sentence "Article has been added successfully" in template file I started using something like $moduleConfig[ARTICLE][TEXT_ADD_SUCCESS]. Well, that was my poor man's code generation. It had all the features of code generation besides the generator ;) At least I didn't have automatic code generation but I was manually copying files from other modules that were my templates.
To be honest, we've never encountered any serious problems with this approach and it worked quite well. However, I always thought that if I were given a week or so to work on this code and refactor it to extract base classes for models and controllers, the project would be so much better. The project was crippled with the fact that the whole code for these controller and model classes was duplicated. Unfortunately, I was never given the time to fix it before I left my ex-company. Everything was prepared, base classes should implement all the logic and child classes would know who they were and what constants they should use.
That story showed me that code generation, when used improperly, may seem an easy solution to a problem that should be addressed in another way.
Every tiny bit of logic that is generated shows that it should be placed in another element in design or higher in inheritance hierarchy.
First observation - IMHO, no application logic should be automatically generated.
But what about classes that give us strong typing for a cost of code generation? I'm not talking about typed DataSets in .NET because they suck so bad that they are not even worth mentioning but let's talk for example about classes generated from SubSonic. In the simplest approach generated classes reflect tables in database and they have strongly typed fields that represent columns in tables.
What are the advantages?
- you won't misspell a column in a table
- you also won't assign e.g. a string to an integer
But what are the disadvantages?
- you end up having lots of code that doesn't implement any logic at all
- every time you change database schema you need to regenerate your database classes
- if you generate CRUD from it, you will have problem achieving good level of code coverage
- if you change field name in database and want to keep property name in class, you may get confused using different names for same thing
Sometimes I think that we are paying too much for strong typing. Maybe the IDE should check the database and automagically check source code? For example, when I was writing a simple program in F#, the IDE checked if I specified correct format in literal string that was used in the method. When I specified in string representing format that I was expecting int and the corresponding argument was string, the IDE was smart enough to discover this problem. What is more, even if you misspell some fields you must have not tested the application at all if you didn't find this error.
Some frameworks let you write code that goes pretty well without the need to generate code (e.g. PHP frameworks); others (like ASP.NET that needs to see every control as a variable with a type, name and all that stuff) may try to force you to generate some code. Of course, I'm not a big fan of the latter.
And one final remark. I don't consider operations performed by a compiler to be code generation. Generated code may occur only in source files that developers are maintaining. We don't maintain byte code or low-level assembly instructions. The same goes with different forms of syntactic sugar that are handled by compiler and translated to "normal code" behind the scenes.
My friend wrote a note about on his blog: http://binary.freeperspective.net/countzero/2009/08/14/code-generators/. He says that he will come back to this issue. I'm eager to read his thoughts about the topic of code generation.