Automating information governance

When it first came into vogue a few years ago, ‘information governance’ (IG) was often considered to be just an updated form of records management (RM), extended to take account of the US legal discovery rules, and to an extent, Freedom of Information (FOI) requests. If all electronically stored information can be requested prior to a court case or FOI application, not just content that has been specifically declared as a record, then work-in-progress, content on laptops and mobiles, back-ups and in particular, email archives, are all discoverable, and need to be governed.
However, in the past twelve months, data leaks and security breaches, most especially the Edward Snowdon activities, have brought the security and privacy elements of information governance strongly into play. Metadata has become an issue for front-page news and heads-of-state discuss individuals’ rights to data privacy and information deletion. Meanwhile, massive data leaks of personal information have damaged corporate reputations and hardened already strong views in some jurisdictions.
Volume, Velocity and Variety
We therefore need to work harder to protect live content and preserve content records, but the volume, velocity and variety of content generation makes it nearly impossible to manually maintain and enforce the policies we so earnestly set. Computers are more consistent than humans, but you still have to teach them – and trust them. We seem to be at an adoption tipping-point for automating real-time compliance processes, and for machine audit of existing content for metadata accuracy, content security, and de-duplication. Is this a silver bullet?

In a recent survey, AIIM explored the current issues around information governance, and the early adopter experience of automated classification.
Over the last 10 years, records management has been migrating from handling just physical or paper records to dealing with electronic documents, and most recently many other types of electronic content. Traditionally though, records management only kicks in at the point where you declare a record as needing to be kept, for a certain length of time or retention period, in order to meet legal requirements. Those records will need to be secured, with controlled levels of access and may be needed as part of a legal discovery or information request process.

But in recent years the rules and the risks have changed and we now need to keep all electronically-stored information securely, compliantly and available to the compliance process, whether its work-in-progress documents, emails, collaboration tools, or any other repository of content.
Governance versus Management
One of the questions we asked people in our recent survey is “has the perception of information governance in your organisation progressed from management of declared records, to an IG way of thinking across all electronically stored information for privacy, security and e-discovery.” Up until a year or so ago, only 15 per cent in the public sector took this much wider view, but another 14 per cent have adjusted their view quite recently, and a further 27 per cent have plans in the next 12 months, so that’s a very strong movement of more than half changing their outlook on this.
Of course, the basis of good information governance is a sound and solid information governance policy. It needs to be comprehensive and to cover different types of content, including content-in-motion – on USB sticks, in the cloud, on mobiles, and so on. Creating such a policy is not easy. The biggest difficulty reported in our survey was getting senior management endorsement, but there is also a problem of getting the right people at the table and freeing up time from their day jobs. Information governance is best driven by a cross-departmental team, but only 17 per cent of government organisations do this, with 28 per cent relying on the existing records management or compliance department to take the lead, and 25 per cent expecting each department to manage their own records. 
Unfortunately, although the public sector is a little further forward in this respect than other sectors, only 14 per cent are prepared to say, that they have an IG policy in place which is communicated and enforced. One has to feel for the 20 per cent who have done all the work to create and agree a policy and then see that it’s largely un-referenced and un-audited. However, there is hope, as 53 per cent are working hard to achieve an organisation-wide view or have an effective IG policy in at least some departments.
Risks and Benefits
Our government sector survey participants ranked the inability to respond to requests (such as FOI) as the biggest risk of a failure of information governance, followed by loss of customer/citizen confidence, or bad publicity from a data loss. Across all sectors, excess litigation costs or damages resulting from poor records-keeping rated as the number one risk. Even in the public sector, this was ranked as number three, just ahead of the loss of intellectual property or confidential information.
On the benefits side, three key issues came out strongly: exploiting and sharing knowledge resources; faster response to events, accidents, press activities and FOI enquiries; and reduction of storage and infrastructure costs. This last aspect has soared in importance over the last few years as content volumes increase remorselessly and without sound policies for retention periods and admissible deletion, there is little that can be done. Indeed, half of the survey respondents admit that their strategy for managing increasing content volume is simply to buy more discs.

Computers, not Humans
So how do we overcome the problem of getting an information governance regime that works? Well the problem is with humans. We have a habit of storing too much stuff, the ROT – redundant, obsolete or trivial. In the survey 60 per cent of stored content was deemed to have no business value. We are bad at deleting stuff, often because we’re unsure of what is safe to delete. And we’re bad at classifying or tagging things, so we don’t get sufficient metadata to find stuff in the future. The way to fix all this is, of course, is to let the computer do it for us.
As long as we have the correct metadata attached to a piece of content, whether it’s an email, an office document, or a scanned inbound letter, we can manage it. We can allocate it to a type, with its origination date, and that will be sufficient to set its retention period. We may have to delve a little into the content of a document to tell if it contains sensitive personal information, but having done so, we can add a security classification, and from that we can set an appropriate level of access. And by picking out keywords within the content we can tag it for search, improving findability for the future.

In the past we have expected humans to do all this, but given the volume of content coming at them, the likely hood of achieving accurate and above all, consistent classification is slim. At the end of the day, we all hate filing, and with the current analytic capabilities of the computer, we are better off automating it.

Revitalise your ECM/RM System
As a result of poor initial planning and policy setting, many ECM and records management system projects reach a point where users have lost faith, and the content within the system is as chaotic as it was before. This is particularly true of SharePoint implementations which have in the past been mostly IT led, with little thought for defined classification schemes or fileplans, and no real concept of retention policies. Many of the automated classification concepts described above can be applied as batch agents or filters to existing content, detecting and removing duplicate files, correcting or adding metadata, re-allocating security levels, and deleting content beyond its useful life or its statutory retention period.  Often these data-cleaning processes will be part of a migration exercise to a new or consolidated ECM or RM system.
Once the data is in a better state, automated classification can be injected into various places in the system – on capture or ingestion of inbound content, as part of the business process, or at the point of archive. Here there may be a choice – let the machine take care of it completely, or use the rules-based intelligence to prompt the operator for appropriate tagging and metadata.
In our survey, only 11 per cent of public sector respondents are already using automated classification, but a further 35 per cent are just getting started, or have plans in place. We asked those early users if the computer be relied on to be accurate. All but 10 per cent had their expectations met or exceeded for electronic documents, and the results for scanned documents were even higher.
We are inclined to agree with the 47 per cent in the survey who feel that automated classification is the only way to keep up with information volumes – but remember, you still need to set the rules for the computer, and to do that you need to have a sound and agreed information governance policy in place.

Further information

Please register to comment on this article