Saturday, September 24, 2011

DCTM Outage Scenarios


Outage Windows
Typically, weekend outage times are acceptable to users of the system. These times will be used to deploy most fixes or upgrades.

Upgrades
Each server has specific requirements for OS upgrades as well as application upgrades. These upgrades may require downtime. Separate evaluations will have to be done by reviewing the risk matrix to determine the amount of integration dependencies.

Server Failures and VM Clones
Services on servers fail for a variety of reasons. Each server should have a recovery policy associated with it. For example, a clone of each server could be maintained for fast recovery of that particular server.

Routine Maintenance
Occasionally, patches will be applied to DCTM software installations. These patches may require restarting the services.

Break/Fix
For problems with individual applications on servers, a procedure for fixing the issue in development, testing in Validation, and deploying to Production will be followed.

Failover
The SLA required by the GxP rules state that 4 hours is acceptable. This means that HA for the DCTM is not required.

Thursday, September 22, 2011

The risks for service outage


The risks for service outage can be broken down into three categories:

Server: each server has services which are vulnerable to outage. These servers are the Content Server, Index Server, Application Server, Database Server, and the Storage Server.
Systemic: The dependency of each server’s integration(s) with each other is vulnerable to outage. For example, if the content server goes down, the application will be out; if the database or storage goes out, the content server is down, etc.
Disaster: This would mean that the whole server room is down. The disaster scenario would cause the DR system to synch and start up.

The risks of services going down are real and happen most often at the server level. User complaints occur during times when performance is slow which may be a sign that a service is in trouble. Many times integration between DCTM and other services are risky because it is assumed that the other services are always up. If a company is growing, the network will be changing, databases will stumble, even electricity circuits will blow, so keep all of this in mind and in your recovery plans regardless of assurances that this "will never happen".

Risk Matrix by Server


Scope
Server 
Outage
Description
Integration Dependency
Risk Level
Monitoring
Systemic
Storage App
Storage Services
Database, Content, Index, App
Low (If HA, redundancy)
monitoring scripts
Systemic
Oracle
Database Server
Content Server
Low (If HA, redundancy)
monitoring scripts
Systemic
LDAP Server
LDAP
App/Content Server
Low (If HA, redundancy)
monitoring scripts
Systemic
DNS Server
DNS
All Servers
Low (If HA, redundancy)
monitoring scripts
Server
DCTM
Repository Services
App Servers, Index Servers
Med (If standalone)
monitoring scripts
Server
DCTM
Java Method Server (JBoss)
Index agents, Jobs, workflow
Med (If standalone)
monitoring scripts
Server
Application
Tomcat
Med (If standalone)
monitoring scripts
Server
Index
xPlore Servers and Agents
App Server Search
Med (If standalone)
monitoring scripts

Disaster Recovery systems are replicated systems which constitute a low but viable risk.

Tuesday, September 6, 2011

In the Aftermath of EMC Sales and Sales Engineers


Any consultant who has landed a project after EMC sales and sales engineers have "sold" the DCTM software suite knows that resetting the client's expectations can be a challenge.  The motivations of sales and implementation are two completely difference animals. EMC Sales wants licenses and commissions, Consultants want to design, develop, and deploy the best possible solution (ideally). The intersection of these two perspectives is the customer who, more times than not, ends up feeling deceived and gipped.

So how do we accommodate the claims of EMC sales? First, accept that the client will want more than the software can deliver. For example, if the sales engineer said InputAccel for invoices can learn automatically how to pick up line items from an invoice, then you need to immediately explain in fuller detail what validation means and the steps taken for IA to actually “learn” the layout of an invoice.

Another example would be that it takes a few weeks to implement an enterprise wide solution for content management. If you installed the vanilla products and walked away maybe, but the client would be left with a car without a clue how to drive it and no roads to follow.

Second, do not make promised that you know you can’t keep. If you bid low to get a project, get ready to pay the consequences. Be honest and as comprehensive as possible. Show the client the details where they will have to pay more to accomplish what EMC sales had envisioned for them. The client wants a great deal and everything for free, but it is your job to bring them back to reality.