There are many differences and disagreements surrounding the nature of dark data and what it entails.
Looking at the definition given by Gartner, the world’s leading information technology research and advisory company, dark data can be defined as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.”
In English? To put it in layman terms, dark data is information an organisation comes across daily, in everyday business activity, but these organisations (almost all organisations) do not use this data to its greatest value. Therefore, the data is only used for a single purpose, where its multipurpose value is not seen. On the other hand, the data is only stored for the sake of, because “we have to.” Storing data that is not being used is waste of money, but the chances are there is much use in the data. In essence dark data is information that an organisation may not understand or may not know how to use.
Where is the dark data?
Dark data is usually found in server log files, customer call detail records, mobile data, data repositories and so forth. Emails are also a source of large amounts of dark data. Examples of valuable information being left on the page include things such as annotations and notes, almost like scribbles and signatures. These annotations contain keywords that may unlock hidden information.
How to use dark data
Using techniques of OCR and ICR to scan, extract, analyse and perform keyword analysis to classify and store the data in the appropriate location. Another example is that of signatures, that can be verified for their accuracy and authenticity. Annotations present on a document that are not part of the structure of the document may provide useful information that is helpful for identification, classification and storage purposes.