Jan 19, 2021

What is the Data Migration process about ? (Data Migration 101)

I decided to write this post as many times during the many projects that I have done over the last years, I keep on receiving the same questions over and over about what is the information that needs to be converted from a Legacy System into SAP, what is the strategy that we should follow and how to approach the different objects.

I haven't read anything on the subject out there, so I decided to write a Post about it.

Data conversion is the process of moving and translating the data from your Legacy System (any type or brand of legacy system) into your Target system during a system implementation so the new System can take over the operations and have the data ready and loaded for it. This process is called ETL (Extract, Transform, Load). 


Process

You will Extract all the different data objects from your Legacy System Database. This extraction could be done into multiple formats. It could be flat files (ASCII or Text), it could be into Excel Files, Access or SQL Database, into any other type of Database, you could use SAP Data Services (SAP's product for Data transformation and loading into SAP), or it could be into Staging Tables. Your data source might not be necessary a database, it could be Excel spreadsheets too.

Once the Data has been extracted, you will start the Transformation of it. This is the process where you will build your Dictionary that will help you understand / translate your Data Values and records between your Legacy System and your New System. Why ? Because your new system will have different fields, new fields and fields that do not mean the same and/or are not used the same way as in your old system. So for that you will have to build correspondence tables between one value in the old and the new value. There could be values that will have a 1:1 relationship, n:1 or 1:n, so you have to build all those mapping rules.

Ex. Your Customer or Vendor Numbers might need to be changed from one system to the other, Your General Ledger Accounts numbers might be different, or your old system might not have Cost Centers and your new System does. 

Like this you could have hundreds of examples within the different areas / modules of the system where you will have values that will need to be replaced by new values.

Now these mapping tables, are not static; some of them will evolve during the course of your implementation project so you need to instrument a mechanism to maintain them updated to prevent any records being rejected by your new system because you did not maintain the mapping. Some will even need to be maintained up until the very last minute. Ex. Customers and Vendors.

Once you have all the data mapped, you will be ready to start your Load process. This is where you start Migrating and saving (loading) the data into your new System. This is an extremely important process that will impact the data quality and accuracy of your new system. There is a say that we repeat over and over on projects ... "garbage in, garbage out". What does it mean ? If you load garbage into your system, you will output or have garbage results. And of course you do not want this to happen after having spent several millions dollars and several months in the implementation of a new system.


Data Cleansing

One of the most important aspects that needs to be taken into consideration all along this process that will work towards the success is Data Cleansing. All Legacy Systems, with no exceptions (even an old SAP one), will have garbage data or data that you do not want it to make it into your new system because you do not need it anymore. You will have duplicate records, incomplete records or records that have been created by mistake. In some systems you might even have corrupted data that cannot be used or even repaired anymore. All this data needs to be looked at and cleaned. If possible, this data should be removed from the Legacy System prior to its extraction, should not be extracted at all or should be cleaned up after extracted. But it should never make it into your new system. This should not be negotiable.


Going back in time

You will also need to establish certain cut-off criteria and decide how far do you want to go back in time to bring your data over. As a rule you could establish that you will not transfer Customers or Vendors that you have not done business with over the last 24 or 18 months. Any one beyond that point, should be disregarded and not transferred into the new system. Same applies for any other data objects that you will be migrating. This decision will be influenced by the type of business and industry that you are in. It is not the same for someone that it is in the retail business and sells through POS (Point of Sale) machines and does not manages his Customers by name versus being a Utility company or the Government that manage millions of Customer records and might even be subject to strict regulations on data retention periods.

This cut-off criteria will also be managed on an object by object basis, as it might be different for Customers, Vendors, Materials and so on.

These are all ground rules that need to be established as part of your Data Migration Strategy.


Reconciliation

During each of the steps of the E-T-L process, you will also need to establish reconciliation processes to ensure data consistency and accuracy to avoid "losing" records in between each of the steps and overall during the process. Ex. If you are transferring all your Vendor Invoices you need to be able to know what is the actual total amount of it in your Legacy System, what is the total after extraction, total before loading and finally the total after loading it into the new system. They should all be the same, if they are not, you should reconcile the differences, be able to explain and/or remediate them. Each and every single converted object should be subject to the same reconciliation process.

Depending on the number of records and type of records that you are converting, you might want to establish different approaches like random checks, spot checks, statistical samples or 100% record check. The first that should be applied is based on number of records. Then, if you are dealing with Quantities or Currency amounts, those should (or must) all balance to the penny. 


Mock loads

This process will and should not be done only once. This is an iterative process that will take several attempts with different targeted accuracy until you reach 100% (or almost). During the course of your project you will repeat this end-to-end process several times. Depending on the size of the project it could be 2, 3 or even up to 4 times where you will have to establish different and increasing accuracy levels. This attempts are called "Mock data loading" (Mock 1, 2, 3, 4). These will be really big milestones in your project.


Build

Individually the ETL process steps while your are working and building them, you will have many individual tries and attempts isolated from the whole process. 

You will build your Extraction program or query and while at it, you will tweak it and refine it until it meets all the requirements that you established for it.

You will work on your Transformation process, build it, map the data from the source to the target data structure. Test the transformation process.

And finally you will work on your loading program/s. You will attempt to load 1 record (happy path or sunny day), clean all the data errors and solve any issues that your program might have. Then attempt to load 5 or 10 records, analyze the issues, rejections and fix it and load again. Then you will attempt to load specific and complex data scenarios (rainy day). Repeat until is working correctly. Finally you will go for the volume. This is were you will expect that everything is going to blow up in the air and have tons of errors. It could be expected. You have to work on them until every single record passes.

Once you have all of that, you would think you have a solid process. At that point is where you would have planned to run the whole ETL process for the specific object that you are working with. And of course adjust and fix.

Finally, you will attempt to load all the different conversion objects (Customer, Vendors, Material, GL Accounts, Inventory, etc, etc). All the previous described steps are all cogs of a big and complex machine that is your whole data migration project.

As mentioned before, you will do several Mock runs, which are nothing more that a play rehearsal. All pointing towards the big event that will be your final cut-over event which is when you will go from your Legacy System to your new System.


Environments

During this iterative process of building your whole ETL, you will generate a lot of throw away data, as you are building your ETL process. This will pollute your Database and you could be impacting other people that want to test programs, processes and reports that could be distorted by your data. So that is why, all these attempts should be done in parallel environments, instances and/or databases to the ones used by the rest of the project team.

For this you will have to plan to have separate instances and/or Databases that will need to be refreshed / wiped out several times during the course of the project. 


Dependencies

In the overall plan of this Data Conversion work, you need to work on scheduling and putting each conversion object in the right order of execution. This cannot be done on the day of the cut-over, this needs to be done early on in the project. You need to establish the right sequence of events and loads. For this you will build something like an Ms Project Plan, Gantt Chart or Perth Diagram, establish relationships and dependencies (Start-Start, Start-Finish, Finish-Start, etc).

Examples 

  • If you want to load your GL Balances, you will need to have previously loaded your Chart of Accounts that contains the list of GL Accounts. Otherwise your system will not have your GL accounts to load your Balances.
  • If you want to load your Bill of Materials (BOM), a pre-requisite (dependency) will be to have loaded your Material Master records.

Every single conversion object could have a predecessor/s and a successor/s.


Data Snapshot

When doing your Mock rehearsal exercises it is extremely important that the data that you extract be taken out of the Legacy System (or Systems in there are more than one) all at the same time (or almost). Why ? For consistency and reconciliation purposes. If your Data Extraction (snapshot) for one module or system has a time gap in between, and new records are created or updated in the other system that uses related data; you risk having inconsistencies that make your reconciliation efforts much harder or in some cases almost impossible.

Example

  • Inventory values are handled in your Finance system that has an interface with your Warehouse system that handles Quantity. If you do not extract both at the same time and you have days in between, your Amounts will not balance with your Quantities.


History

During many of the projects that I have done in the past, a lot of clients wants us to Migrate historical transactional data into the system. In 99% of the cases this is not possible. In an integrated system, posting transactional data has consequences and impacts that cannot be avoided. Ex. If I want to post all historical inventory movements, any inventory transaction that I post will have its corresponding accounting impact that then will reflect in my Balance Sheet and/or P&L. So If do that, I will have to find a way to counteract the effect of having posted many inventory transactions that impacted my Accounting books. So for that reason, it is almost impossible to post many of the historical transactions and we can only Migrate an snapshot at a given time and date. 

For this reason, companies will have to take this into consideration and provide access in read-only mode to their old Legacy Systems for a certain period of time for traceability, investigation, reference and audit purposes. In the majority of the countries, tax authorities and governments can go back in time "X" number of years and ask you for information. So that is why you might have to keep those systems alive, dump the information in Tables or any other method that would allow you to trace back information and provide it to the authorities.


In my next Post, I will be talking about individual conversion objects in SAP Finance with its particularities and strategy.

SAP Finance conversion objects, all about them


If your Company and/or Project needs to implement this, or any of the functionalities described in my Blog, or advise about them, do not hesitate to reach out to me and I will be happy to provide you my services.

No comments:

Post a Comment