Storage Professionals

The New Storage Model

 

The current storage model of blocks, objects, files and directories don’t really have any idea of the content of what is being stored.   Expedite takes a new approach to storage.  Expedite implements a new storage model that knows what business information is being stored.

 

The new model is based upon the Information Asset.  This new structured is the core foundation for the new model.  See "Introduction To Information Asset Management" for a summary and background of this important new concept.

 

The presentation given at the SNIA Storage Innovation Conference is a good overview.  (Click here to see the presentation - PDF.)

 

 

 

 

 

 

A New User Community

 

Instead of going to the IT community, Expedite takes a different approach.  Expedite works with the owners and stewards of the business information.  These are the actual people who are trying to run their organization using unstructured data.  This new user community has a much different view of unstructured data and has a much more stringent set of requirements.  See the white paper, "Business User Requirements" for a list of over 100 requirements that differ from what today's storage products provide.

 

The Reason For The New Model

 

The reason for the new model is that there are currently a large number of problems in just about every area of unstructured data management that are unsolvable due to the limitations of the current model.  To see a sample of these, see "50 problems with file servers (PDF)" and "100 costs incurred when running your business using a file server (PDF)".

 

Backup

 

Another example is backup.  The backup industry has suffered greatly from the limitations of the existing storage model.  One of the most important issues is backup's inability to actually restore a file if it becomes corrupted.  It is up to a user to notice the problem, manually run restore, and know exactly the path and name of the file, which copy needs to be restored etc.  To see just how liberating the new storage model can be, see the "Ways Backup Can Be Improved Using Expedite (PDF)".  It identifies over 30 improvements that can be made to backup, including automatically restoring files, if integrated with Expedite.

 

ILM

 

Saving infrequently accessed files to some robotic storage library has been attempted countless times.  It seems as though, every few years, a new name is given to the same old technology and is tried, yet again, in to the marketplace.  As predicted, the same response comes back from the market ranging from a "not interested" to "get this stuff out of here".  Now granted, there are a couple of niche markets where it has taken a tiny foothold but in all of these cases, the entire library is limited to a single application, and often a single user.  See the white paper "What we have learned about ILM" to see just what we should have learned.

 

The potential benefits of ILM to just about every data center is enormous.  The often quoted stat that "60 to 80 percent of unstructured data hasn't even been looked at in over a year" means that customers are buying 3-5 times the online storage they need.  Then why hasn't this worked?  What is the problem?

 

The real issue with ILM is that in order to implement it, the data management problem must be solved first, then the data access problem can be implemented.  People have been doing ILM in the paper world for centuries!  As one manager told me when I started out, "Even our worst secretary knows how to do this!"    And he was right!  ILM does, however, require this new information model in order to implement it correctly.  For further details, see "ILM" presentation.

 

All the Other Unstructured Data Management Functions

 

Just about every management function associated with unstructured data is now limited by the old storage model.  These can include but not limited to the following:

 

Searching - Users want to search on attributed of information assets and return information assets, not just files.  One of the biggest complaints of full context searching is the impact it has on the storage.  The indexing process traverses the directory structure over and over looking for something to do.  Expedite, on the other hand, controls what files are indexed and when so the indexing overhead is kept to an absolute minimum.  Also, Expedite controls just what files are indexed so users don't see temp files or files in other locations that have no validity in the search.  The interesting thing about full context indexing is that if you give users access to the trusted master copy of data and give them ways to locate the information based upon its context, the need and value of full context search nearly disappears.  (There are, however, two use cases where full context indexing is desired.  One is converting unknown piles of files to some semblance of information assets, and the audit process.)

 

File Permissions - Current configurations of file servers end up allowing anyone to trash the files.  Setting the permissions correctly as their state changes can protect these important assets from unauthorized modification or destruction.

 

Mirroring and replication - Both propagate corruption and remove the last remaining good copy.  They don't know if they should propagate the change or restore the primary.  Expedite knows the difference and can protect files from corruption or loss.

 

Archiving - Knowing what to actually archive is the biggest limitation here.  Expedite can drive archive directly from the business process so no one has to periodically figure this out manually.

 

Preservation - The biggest limitation to preservation is the cost of saving everything.  In order to preserve a collection, it has to be organized, the redundant and unnecessary data removed, etc.  This is called curation.   Unfortunately, this process is far too expensive to perform since it is a manual process.  Expedite keeps things organized so the preservation step can be automated with its costs controlled.

 

Deletion - No one actually deletes anything.  Making copies that are abandoned, the fear that it might be needed, are all problems.  If the process is defined and controlled, the entire data management approach is changed from an ever-growing-pile to a pipeline of data.

 

Continuous Data Protection, Snapshots, etc. - These are all based upon storage events, not business events.  For example, "Which snapshot holds the version we sent to the customer?"  Answering that question is almost impossible.  Also, not every write needs to be stored.  In 2008, Strategic Research did a study and found that 324 companies were investing in continuous data protection with a total investment exceeding a billion dollars.  Today, how many people are actually running continuous data protection?  Very few.  The problem is that it is not based upon the business requirements of the data.  Compare that with the Continuous Asset Protection supported by Expedite that can keep things mirrored and actually restore the file upon a corruption all with very few resources.

 

DeDuplication - Customers want a trusted master copy of data.  Performing deduplication on primary storage may limit the capacity requirements but still maintains all the duplicate references to the file.  Which one should be used?  If one is changed, how can they know?  Also, sometimes it is necessary to actually have a duplicate copy.  Remote access, remote offices, offline access, etc., are all reasons to have duplicates.  What customers really want is a system that maintains a master copy and automatically controls, repairs, and rescinds the copies as required by the business processes.  This is exactly what Expedite does.

 

eDiscovery - Locating relevant information is always a problem.  However, it is much easier if controlled, identified, master copies are maintained.  Locating the history, context, users involved, etc. can make responding to an eDiscovery request much easier.

 

Cloud Access - Keeping data in the cloud is sometimes useful, necessary, or should be prevented.  How should that be controlled?  Handling the access patterns, performance, caching, redundancy, etc., of cloud access is a real challenge.  Some cloud storage gateways help with the performance but have no way to control what data goes to the cloud and what doesn't.  That should be defined and managed from the requirements of the business process.

 

Object Storage - Object storage is great that it can scale to huge numbers and large capacities but these are achieved at a cost.  Nearly all applications can't use object storage directly.  It needs to be used through a file system gateway, as a directly addressable storage device from the application itself, or via some type of file system migration device.  Expedite can support object storage as a migration target allowing data to be moved to and from on-site primary file systems, which is very useful to support cloud storage.

 

Security/Information Assurance - In order to secure something correctly, don't you have to know what it is?  Does next year's marketing plan need different protections than last year's bowling league videos?

 

Classification - Many companies have attempted to take a customer's "piles of files" and try to make something of them.  What users are really asking for is to convert them into information assets by creating the missing business context.  Unfortunately, the business content isn't even stored in the computer so no amount of scanning, processing, or big data analytics will really provide the customer with what they want.  These classification tools sometimes give information that can be useful in the data management but their success rate has been greatly limited.  Users are not real excited to have to redo the analysis periodically because they are not given a real solution to keep them out of the mess in the first place.  Expedite, on the other hand, provides a long term attack strategy to allow the customer to never lose the context in the first place.  This greatly improves the long term outlook and can also provide a structured to the user in order to make sense and control the pile of files they do have.

 

Emails - Not every email is important.  Anyone that has an email account for more than a day knows that.  Information assets can involve emails to and from external users that should be saved as part of the asset.  Expedite provides a way, through its attachment capability, to allow the storage of key emails with the asset itself.  For example, keeping an email from a customer approving a pricing change in a contract can be very valuable should there be a dispute at some future date.

 

The list can go on and on but you should now get the picture that the limitations of just about every aspect of storage management can be improved with the new Expedite storage model.