All blog entries reflect the opinions of the author and have not been expressly endorsed by the Ivan Allen College of Liberal Arts or the Georgia Institute of Technology.
Here in the Ivan Allen College of Liberal Arts (IAC), we have at least 150 websites to take care of when you include all of the test and dev sites along with a few servers that house custom web applications, and of that number 35 are Drupal sites, 22 of which I have some level of direct responsibility for maintaining.
Even at twenty-two sites, it's very challenging to keep up with the required regular maintenance, and with my being the first IAC college-wide web developer, there were no standards in place before I arrived. So, I kind of had my work cut out for me. On one hand, I had something of a clean slate on which to develop plans for handling everything into the future. On the other hand, I had to do a lot of ground work just to get a good sense of what was out there and what state it was in.
My first few blog posts here are going to be about lessons learned and my recommendations for managing large numbers of sites. You may not always agree with my ideas, and that's perfectly fine -- a lot of web development is about learning to agree to disagree, as there are many ways to accomplish goals in this field, and often there isn't a single method that works best in every situation.
My first topic is cataloging. Once I realized just how many sites IAC had, I knew immediately that I needed to create a catalog if I was every going to keep track of so much information coming at me so quickly. I thought about a spreadsheet, but realized that I needed a database, and if I was going to build a database, it may as well be a web application. I have a custom framework for PHP web apps that I've been refining for nearly a decade, with it's major focus on rapid development of applications that are very lightweight (i.e. little overhead). I created a Georgia Tech version of the framework sometime back, adding the Georgia Tech theme to it along with built in support for CAS and LDAP, which means that I have everything I need for a Georgia Tech themed application that lets users login with their GT Accounts. In about one day's time, I had the core of my website cataloging application built and was already populating it with data.
But, that was only the first goal. What I really wanted was an extension to the app that would catalog all of the modules and key installation details for all of our Drupal websites. For this, I began by writing a very simple Drupal module that initializes a core module data structure and then exports the whole thing as a JSON object. Access is protected by a simple shared secret that is enough to keep the bad guys from easily getting our site module lists. With the module installed on each Drupal site, it was then just a matter of extending the cataloging application to reach out to every site I had flagged as having Drupal, request the module information, and then store the details in the cataloging database.
The net result is that I was able to write a number of report generators to let me browse through the details on the sites either by site (showing all modules installed) or by module (showing all sites with that module). The latter is very helpful when a security alert comes out, as with just a few clicks I can find out which sites have the module in question and what version of the module they are running. I can also generate a compact report of all sites that have known out-of-date modules. Currently, this is based on comparing version numbers and considering the highest number to be the 'current' version of that module. Eventually, I'd like to poll the drupal.org site (like Drupal itself does) to grab real-time version information, thus allowing the system to always know the most recent version of each module.
While I'm not sure when or if I might make my cataloging application available to other Georgia Tech units, I do hope to release the Drupal module fairly soon, as it is very simple, yet should be a great starting point for anyone wanting to implement their own cataloging system. I should add that the cataloging tool is an immense help well beyond just its Drupal features. I implemented a number of data fields for storing both technical and non-technical details about every site cataloged, such as the owners / main contacts for the site, the hosting location of the site (OIT, IAC VM, IAC Virtual Host, etc.), SSH user account, method for accessing the site's database, etc. Yes, it's annoying to gather some of this data and get it into the catalog, but it pays off so nicely when I can run a quick search and see, for example, which of our websites live on a particular OIT web-plesk server.
That's all for now. Stay tuned for Lessons Learned #2: Development Server Organization Skills.
Update: My WordPress cataloging modules are now available on the downloads page (available on-campus or via the campus VPN only). The Drupal cataloging module is available in our campus-only Drupal module repository.