Sunday, April 24, 2011

The Intersections of Metadata, eDiscovery, Taxonomy, and Records Management

Designing and implementing systems which manage content (outside of creating/reviewing/approving content) such as metadata, ediscovery, taxonomy, and records management can be a challenge if done in a vacuum. Each of these systems of content description and rules has intersection points with one another.

I have witnessed what happens when one system is designed without taking the others into account: change management nightmares. Think about freezing a file plan for RM and then having to change it… Add to this, CMIS and other web services which try to do similar actions on content and we have a web of interactions that collide and push and pull on each other.

Why do monolithic ECM companies have to apply layer upon layer (apps mentioned above) of abstraction and rules and xml configurations to do very core things to content? Because it sells new products and brings in market share from other vendors: it allows ECM companies to grow and to please their shareholders. So where do the specialized products and do-it-all products meet? At the following components of description and action:

Tip: Plan for developing ways to migrate large amounts of information into the repository.
A mechanism to get content into the repository and to describe it as table entries in a database that points to the file location.

Tip: plan for inevitable change with a metadata repository.
Describe the important/pertinent aspects of the content for the purposes of discovery, 21 CFR part 11, ISO-15489, MOREQ2, SAS 70, etc. Describe for the rules and regulations not for the applications. Describe for the audit. Describe for the User trying to process invoices through approval and payment.

Tip: Google is Google for a reason, don’t try to copy them.
Searching for content especially lots of content is a major challenge for large repositories. For lawyers trying to find content pertinent to a class action suit this can be good or bad depending on the company’s strategy and how it weighs the fines for not auditing correctly vs. finding self-damaging evidence. The key is how to handle results and this is still not in its infancy.

Tip: think about the ramifications of completely changing the folder structure or overlaying it with multiple ways of “seeing” the information.
Many companies discount how powerful the folder hierarchy metaphor still is. They through content into repositories and hope they can find it through searching. Only later do they figure out that folders can be thought of as virtual in the sense that they can change in structure and labeling without disrupting other ways to find the content they need.

Tip: be careful of file names and deep folder paths.
Get the content and metadata out of the system for discovery, migration, or long-term storage. The issues are getting the attributes and audits together and maintaining the context of the content with locations and original modification dates, user data, and validation.

Tip: here’s where poor metadata and lazy content management really cost a company huge bucks in maintaining backups of worthless content.
Delete unwanted content period. If you are like some pharma or financial companies and send all of your old content to Iron Mountain, you are a hoarder and should seriously look at your retention policy.

Saturday, April 16, 2011

Parsing the xCP Buzz Words

Taking all of the buzz words out of the Documentum xCP pitch we’re left with “accelerated”, “content” and “platform”.

If a client would accept building a TaskSpace application from a napkin in their production environment, then I’d say this accelerates the application build process, but it no client’s IT department would allow it. TaskSpace apps are built in development and deployed to test and prod via Composer, quick right? Well, what about requirements and functional aspects of the solution? Is that made quicker? No, and here’s why:

Let’s say we have a solution where we scan/capture invoices, process them, and finally report on them. Easy, just like the end to end slick demo that EMC sales did, right? Install InputAccel  and you’re done? Install Forms and Process Builder and slap together a workflow? I don’t think so.

The problem with smoke and mirrors is that we as solution architects and developers get blamed for how slow it takes to build a solution that the sales guys touted as a piece of cake, 3 months to build max. The 3 month schedule should be more like 6 to 9 months. The sales guys are long gone and the customer is annoyed and start to cover their own asses as the bean counters are tapping their fingers.

These products might be easier to use for cookie cutter solutions, but what about that 20% of a solutions that doesn’t fit the mold? You need requirements which take time, you need functional specs to setup up the configurations for scanning, forms, processes, use-cases, etc. This takes more time than is usually allotted. This is not accelerated.

Where is going to happen with the old content? The legacy stuff needs to be migrated. Where are the requirements for this? What are the new attributes and object model for the new system? What is the mapping of old to new attributes? The sales guys didn’t talk about this. This is not part of the acceleration.

The “platform” is still a mashed up combination of export connections (InputAccel) and xml integrations between Forms Builder and TaskSpace and the Content Server. One application has variables, the other has attributes. One can parse scanned pages, the other reads a whole document. In order to put the whole solution together you have to be part developer, part UI designer, and part lucky. The reporting aspect of this platform is an afterthought and with BAM can bog the whole Java Method Server down to a halt.

Next generation of xCP
The next generation of xCP needs to address the following: 

  • Better coupling between requirements and functional specs, configuration, and validation of configurations. 
  • A smoother ride when developing/configuring the pieces of the solution puzzle in terms of common language of computing as well as nomenclature in manuals and tutorials. 
  • Build on open source platforms which are in common use, take a tip from Alfresco. 
  • Slowly eliminate the bottlenecks of configuration, for example, on a large project for each product there will be experts who are assigned to work on their one piece, however they always seem to hurry up and wait for others in the config chain to finish, or make changes.