All too often there will be a number of anomalies that need to be managed, including duplicates that have been extracted from the source systems, where documents have been copied multiple times, to different locations, often within the same environment. These duplicates need to be normalised, so that you can decide exactly what it is you want to migrate, e.g. only the original document, only the latest version, all versions, or none, or all. There will also be "near duplicates
" i.e. documents that have the same physical content, but maybe of a different format e.g. a PDF of a MS Word document; decisions will need to be made, such as whether to discard the PDF, regenerate this on import as a new rendition of the original or manage as two separate documents. Some content may not be able to be processed as it is in a format that is unreadable by modern search technology due to not having an "indexable" data layer e.g. a scanned document; this maybe need to be re-processed to convert the image data to digital. This is where our Intelligent Document Capture capability can significantly assist to retrieve a variety of information from the document that can be used to assist with the indexing of the document..
An informed decision is only possible after you have carried out the appropriate detailed analysis.