Read up!

Contract Data Migration: Everything You Need to Know

Featuring Samir Bhatia

Aired on:February 24, 2021


On today's episode of The Contract Lens Podcast Allison Martyn, a Business Analyst at Malbek, talks with Samir Bhatia, CEO at Brightleaf Solutions, about everything you need to know when migrating your legacy contracts.

Samir walks through the practical steps involved when you're staring at boxes of paper contracts or files in a share drive and need to get them into a new CLM system. He explains what OCR is and why contract migration will never involve just passing a bunch of files to a consultant and getting back a spreadsheet. He also gives his best recommendation for when to start thinking about contract  migration and why data extraction is the most cumbersome part of the whole process. You'll want to listen to this gem of an episode full of great advice for any company contemplating contract migration, especially for Samir's story about migrating a handwritten contract in cursive from 1862! So grab a glass of wine, and let's talk contracts!

Becky: Welcome to the Contract Lens podcast brought to you by Malbek. In this podcast, we have conversations with contract management thought leaders and practitioners about everything contracts and its ecosystem. On today's episode, we dive into the huge and often overwhelming topic of contract data migration. Leading the discussion is Allison Martyn, a business analyst at Malbek and former CLM user at several previous companies. She is joined by Samir Bhatia, CEO at Brightleaf Solutions, a technology-powered service to extract information from contracts. Brightleaf uses a combination of their own proprietary technology, a team of lawyers, and Six Sigma process to deliver accurate contract data extraction. So now it's time to relax, grab a glass of wine, and let's talk contracts.

Allison: All right, so, Samir, thank you so much for joining me today. We wanted to talk today a little bit about the migration process and where some of our customers and prospective customers might be thinking about their legacy contracts and about how they would want to move those legacy contracts into a nice digital organized system like Malbek. That can be really overwhelming for a lot of customers. I've been a customer myself and I've stood in the rooms full of paper that is completely non digitized and felt very overwhelmed with exactly what that process was going to entail. So I wanted to get some feedback and some thoughts from you on how would you recommend that somebody would get started if they're standing in a room full of paper that has not been digitized or if they already have their contracts digitized, at least in digital form, that they can easily access that data. Where do you even start?

Samir: Excellent question. Excellent question. And so if you look at the word "migration". People call it legacy migration. It's the full process. Like you mentioned, there's a lot of contracts, paper contracts in boxes all the way to you have to take those boxes. You've got to scan in the documents so that they become at least digitally, digital contracts in PDF, et cetera. The best practices around that, which I'll talk to. Once you scan them in, you need to organize them into different buckets of MSAs and NDAs and, you know, service orders, et cetera, which itself is time consuming, then you've got to de-duplicate them. You've got to then match up massive addendums. Only then after you do that, then you have to decide what are the metadata elements or attributes, as we call them, that you want to extract out of those contracts. People call that tagging. People also call that extraction. They even call it OCR. The term OCR is loosely used. OCR is actually just optical character recognition. When you take a contract, when you could a scanned document, it's a picture, which is pixels, and you convert those pictures and pixels into text. That process is known as OCR. Many off the shelf software available for that. You have to go through that process. Then you have to define what elements from each contract type you need to track and the way you think about is what do you want to report on across a set of contracts? If you go to any lawyer, they will say, I want to know everything what's within one contract. That's not the purpose of legacy contract migration. It is reporting not knowing what's within one contract. So you have to define what are the elements that you want to report on from NDAs. It might be only like five or seven elements. Party? Counter-party? Term? Is it auto-renew? Is it perpetual? Is there materials destruction clause? Is there a non-solicitation clause? So you define this list of elements from each contract type. MSAs will have a lot more. Maybe service orders will have different ones. And then you have to configure a software to the specifications of how you define each of these attributes, which each client change even the simplest attributes. And then you configure the software, run it through the AI software. Then is the most cumbersome part, which is the only way to get accuracy out of this, out of extraction is to check by lawyers every single attribute. Very important because you don't know how bad the original contract is. One of the examples I give you was for a class one railroad company. We did contracts, batch of thirty thousand contracts. We've done many, many, much more for them. But one of the batches were their track maintenance agreements, and they had these they had millions of miles of track, railway track across the US, but it sits on somebody else's property. And the maintenance agreement spells out whose responsibility and obligation was it to maintain that mileage of track? We were asked to extract from the maps mileage marker that the track started on, mileage marker that the maps ended on, the length of track in miles. This is all from maps. One of the contracts that we did was from 1862.

Allison: Wow! Cursive handwriting.

Samir: There's no software.

Allison: That's amazing that that you could actually manage that at all. That's very impressive.

Samir: And, you know, we were discussing with the client, is this an S? Is this a five? You know, and nobody knows the answer. We had to make that calls, but there's no software that can work on handwritten stuff, even though there is OCR and other sorts of stuff. I can't read my own handwriting. So there's no software that can. So, you know, it is a combination of software, lawyers, and a thorough process of checking each and every attribute which can get you to accuracy. And what cannot be done by software needs to be done manually. What can be done by software need to check, especially in legacy contracts when dates are missing, months are missing. What happens if the effective date somebody just finds as forgets the year or forgets a month or forgets a date? What are the rules do apply so that you get consistency of extracting. You know, when you give the old adage when you give one contract to ten lawyers you get eleven answers. How do you get one answer? So we follow stringent processes, the clients so that the client to define every single attribute, to document that in a manual and then only sign off on the manual, which becomes the playbook, and only then start the bulk extraction. And we've done that with Malbek too for many clients already. So migration is the full process. Extraction is the most cumbersome, but there's a ton of work to be done even before you got to get to extraction of sorting through the legacy contracts to put them into a decent folder structure.

Allison: Yeah, and something that I've run across and you can tell me if you've run across this as well, is that there's sort of a preconception from clients that this is sort of a set it and forget it type process that I can just hand you over my documents and then you'll just sort it out and give me back the data. But in reality, there's quite a lot of customer engagement that needs to be considered.

Samir: If any client comes to me and says, I'm going to give you a files and you give me back a spreadsheet, I never take that work. Because each lawyer will say the other lawyer is wrong. And it happened with a company called Alorica. The VP at that time was Scott Wilson, and he said, Samir I thought I'll hand you a bunch of contracts and you will give me back a spreadsheet. But the process your team made me go through made me understand how to look at my data and configure my CLM very well. So I know exactly what's in there. And he even today, this is three, four years later, he's so appreciative of it because I told him, if you're not involved, I'm not taking the work. And even today, as every every client just calls this phase of working with them enlighten me.

Allison: Those things really go hand in hand, don't they? Understanding what your contract is and what your contracts say and the data that you need to understand and report on helps you configure that CLM because then you know your important data going forward as well.

Samir: I'll give you an example. What if the client says that from my statements of work extract the start date. How many SOWs say this engagement will start on? If it doesn't say that, somebody might leave it blank. Somebody might say it's the effective date. Somebody might say it's the date of signature by counter-party. Somebody might say date of signature by your company and some might take the date of the file. Nothing is wrong, but it's not consistent. So you have to go through level setting each and every attribute with the client of how they want to interpret this. What if end date is missing but in Malbek it's configured as a mandatory field? You can't leave it blank. We can't go back not available. It's a text field or a date configuration. So then the client sometimes say, OK, let's pick February 20th of 2059 so that I know it was blank. But there's a date. So they had these different rules you have to go through for even the simplest activity with the client.

Allison: Yeah, and I've run into that exact situation several times with various clients that we have. We're doing the migration right now, right this second, I am uploading the documents and I find that I have a required text field, like a start date, that just has no data in it for at least some of them. And you have to decide what are you going to do? Are we going to stop this whole process right now and then go back and figure out those dates? Are we going to put in a false date? Are we going to make that field not required and then upload it that way? All of those are solutions that we've come up with. But really, in the end, your data is not as consistent as you were hoping for once you put it into the system.

Samir: Right, and you don't do this at the back end. You do it at the front end you define everything, so you don't have to go back into the contract to make these decisions.

Allison: Which leads me to my next question is, with so much effort that's going into your data extraction and into preparing your migration, at what point should a client start thinking about, Oh, I have contracts I need to migrate. I got to do something about that.

Samir: As soon as they decide. I think it's a chicken and the egg. As soon as they know they don't have a handle on the contracts, the first thing that comes to their mind is, I need a system. At the time they decide they need a system, they need to organize their contracts. The way they should, let me talk a little bit about how they should, one of the processes to organize their contracts. So, you know, contracts are lying in somebody's desktop, somebody's laptop, some SharePoint, some network drives, something, and different groups are there sometimes in worldwide companies in different locations. So what I, the webinar that that we did a little while ago and white paper, they said let somebody first have a project manager who owns this. It should not be just one person's responsibility to get a handle around all the contracts. That should be the person with dispatches it. So they identify each person in each, in each group, one person in each group that they are the head of getting the contracts around. So the purchasing group, who is the person responsible? If it's purchasing group within US, is it different than purchasing group EMEA, then you'll have two people. You create a shared network drive. You have a folder by each group. Under the folder you put in the contract types MSAs, NDAs, etc. Whatever work you can do up front saves a lot of work when you're actually doing the extraction. So now if you have a folder structure of the group folder saying all contracts, group one EMEA, group two EMEA, group one purchasing group US, etc. You have all these folders, people then under the folders can put their names and start dumping the folders as sub-folders within that. So then you're starting to collat all the contracts in one place. Once you have this massive folder done, then a company like Brightleaf can take over and then go to de-duplications, can go through unnecessary documents, non-signed documents and more match up masters and addendums, et cetera. But whatever work the client can do, of course, saves effort and effort is money from a company like us to do.

Allison: Yeah. So the more organization that the customer does on the front end, the more likely they are to save time and money.

Samir: That's correct. On the back end. And it's not that the two main things is one person cannot do it all, but there has to be one person responsible for whole task. And but there has to be commitment from all the groups that they're going to do this.

Allison: Yeah, and it sounds like a pretty significant commitment because you're going to be putting a lot of time and effort into it. I think from my perspective, when people start talking about a CLM, it's just they kind of think like, Oh, and then we'll just migrate the data. And I think that there's probably not enough consideration of what exactly that process is. If you want to have good, solid, reliable data at the end, you need to put effort into it upfront.

Samir: Absolutely. And you ask when should they start? And as I said, as soon as they know, then they need to get a system and get on the system that time they start because in parallel, while they're trying to identify the system, identify the vendor, Malbek, and then configure Malbek, all this is being done in parallel so that when Malbek is configured and ready to go, at least some of this work might be done, extracted, and started to load. Typically, when there are a lot of contracts, the extraction part takes longer than the deployment, especially in a nice system like Malbek, which is out of the box really good. It doesn't require you to customize it and then deploy it. So our part of extraction and sorting through the contracts might take longer, but they don't need to wait till the end. What they do is tell us this is the priority of contracts, maybe MSAs first, maybe this is the list of clients which are important. And then we would do that first and then in batches we can upload, which is what we did with our joint clients. So really their first call needs to be to Brightleaf and the second call needs to be to Malbek. Actually either or is fine, to be honest with you, but it should be around the same time so that both can get going on what they have.

Allison: Yeah. And we can be coordinated from the beginning, which certainly makes things easier all the way through.

Samir: That's right. Don't wait for one to finish for the other. It can be done in parallel.

Allison: You mentioned something earlier that I kind of wanted to go back to, so you mentioned the number of elements that are getting reported on and you said maybe like five to seven. I'm curious, what is the average number of elements that you see people pulling out of their contract? So as an example, I worked for a client or worked for a company and we were doing data extraction and we had about 30 some odd fields that we were pulling. And as we spoke with various CLMs, we learned that that was a really high number. So I'm curious, what is the average or what is the recommended number of data elements that you mostly see from your clients?

Samir: I've yet to come across a client prospect who hasn't asked me that question, and there is no one answer. The way that they need to think about it is what do they want to report on across a set of contracts, which I said earlier. I'll give you an example. And this is a loose example. The jurisdiction. The only time you want to look at jurisdiction is when the vendor is not providing you the service properly or something and you want to take them to court. Why would you extract it from nine thousand nine hundred ninety nine other contracts? If it happens once a year, open the contract and see the jurisdiction. We don't need it for reporting purposes. But if you go to the lawyer, they'll say, Yeah, I want to know jurisdiction. So this is how and of course, the bill will be you're paying for the extraction of jurisdiction from nine thousand nine hundred ninety nine contracts when it's not going to be used for just the one contract a year. Right. So that way you can decrease the bill as well as get things done much faster. So they have to look at it from a reporting standpoint of the extraction.

Allison: That's a great point and made me wonder, do lawyers have different requirements than normal business unit users might sometimes have?

Samir: Absolutely. So lawyers typically want to know everything within one contract. I did a webinar with one of our other clients with Nasdaq. It's on our website and the VP at Nasdaq, they said that the lawyers was, the hard part she had was convincing them that they don't want to know everything about one contract. Trying to get the mindset out of one contract or a sixty thousand foot level of reporting was a tough challenge for them. But once they understood that, then things became more clear for them. So, yes, it is always a challenge, but of course, if it's more of the administrative lawyers that are there, they understanding because they always ask for the reports, give me all the contracts that have LOL for example.

Allison: Yeah, I was at a company that was going through bankruptcy and we had that exact situation where we were having to provide copies of our records to the courts and they always wanted to know, send me all of your contracts that say this. And at the time, we did not have any kind of recording system. We did not have any CLM. We didn't even have really any data extraction. We didn't have anything. It wasn't organized. So literally, every time one of those requests come through, we just had to start grabbing files and flipping through pages, trying to find all the contracts that had that. They since have a CLM.

Samir: You can't think of all the thing that'll ever come up. You cannot. Cannot. No, no, you can't. You cannot. But you try to do whatever comes up on a daily operational basis or the case you mentioned might be exceptional. And in that case, you can you'll have to either come back to us and we can do the extraction of those extra elements again or you can do it yourself if you haven't thought of it. But at least think of everything that you do on an operational basis. And the more you do now, you pay more but less, a lot less is saved when it's necessary. So coming back to the number of elements, I've got clients which I'm doing two, three thousand elements because we extract every obligation. This party will do that. This party shall do that. They're asking us to extract what clause number? What is the clause text? What is the clause title? Whose obligation is it? Is it triggered? Is it time bound? What is the trigger? So it gets to the point as multiple all the way down to some say only like five or eight elements, which are the basic ones which are mandatory in Malbek. You would need to have the counter-party because that's the account need. You need to know whether it's current at any given time. In order to know if a contract is current, you need to extract the term. Is it perpetual? Is it auto renew? Is there an expiration date? Is it evergreen? So there's multiple things that go into trying to see if a contract is current at a given time? So you never say give me all the current contracts to date because if you do it to just for to date, yes, no, current to date, tomorrow goes out of sync. So you've got to extract these five or six base elements, to find out at any given time is the contract current. You see, those are the basic elements that you want to extract and some clients say, OK, if I have those, excellent. The other thing I preach is. If there, if you know, this is the set of contracts which are old and expired and those lines don't exist, but I need to move them from my old system, you can create an account within Malbek which says old expired contracts, put all of them into one account at least they had one central repository, one level deeper is extracted at least the counter-party names associated with that client. Right. And then you can say next level is here are the contracts, which are 80 percent of the clients would generate only 10 percent of the revenue. I just extract the basic elements out of it. But here are my most important clients and let me go the deepest level, so you can even save your money of extraction based on some of these criterions.

Allison: That's a great idea and from a money saving standpoint, to pick your hundred most busiest clients and then extract 90 percent of what you need from there and then do a little bit less for your inactive contracts or for your not as not as priority clients.

Samir: Right.

Allison: Great idea. You mentioned processing files in batches. How big does a batch tend to be?

Samir: Well, it's fine to make it or whatever, you know, but you would see the whole project and it comprises of two dimensions. What is the number of elements and how many documents there are? So sometimes like for British Telecom, which is first client extracting every obligation. Three thousand data points. We have a contract a week or something or two contracts in a week or two weeks, but it takes a lot of time to process that. Right. So because there are a lot of data points in there, so the batches are one or two, but if, typically, if it's a standard project, then, you know, if it's a ten thousand contract, then we can. But you can make batches of thousand, two thousand, three thousand, whatever the clients look for because they still want to just spot check and see if everything's OK or not. We work with, we customize that for every client and also work with Malbek to see what they want us to ingest, so a three way conversation that happens.

Allison: And so if I'm a customer and I've contracted with you to do some data extraction and I get all my documents together and I send them over, who is actually reading my documents? Is that a software program that is looking for specific data points or is that an actual person?

Samir: It's both. So first thing we do is configure the software to the specifications the client wants. You know, some of the attributes already done with the client tweaks the definition. So we create our configuration parameters. In order to do that, our legal team, which is lawyers, they look at the contract, they look at what tweaks need to be made to the configuration, and then we bump the contracts through it and our lawyers on their screen in our AI software, half of the screen shows the original contract and the other half shows the extract of metadata. And we force them to check visually and click on a button called verify. Then we know they've checked it and we can do that. We call it QC one, which is one level of QC of checking every single element. We can also do the same thing by an independent lawyer to check every single element and we call it two levels of QC. And of course the cost goes up, but the accuracy goes up and we can go up to three levels of QC to get a Six Sigma results and some clients use like the class One railroads company, because it was revenue recovery. They found, by the way, 8.2 million dollars because they were maintaining the tracks versus the counter-party. They did three levels of QC, but typically NDAs, they do at one level and other important agreements, they'll do at two levels of QC. But you force them to check every single element. So the lawyers see the contracts they have to to visually check that. But it's gone put through a software first. The software sees it and the lawyers.

Allison: You mentioned AI software. That must be a fairly recent innovation that has come to your industry. How has that software evolved and changed over time and where do you see that going in the future?

Samir: Brightleaf started because of AI software. We developed the software and then we got the clients. And of course we've had the team of twenty, twenty-five people for seven years and so is very mature software. We are on what we call internally a third-generation of the software and we can go out and enhance it. So, we are one of the earlier people, we just never licensed it. We've never licensed it. We've just provided it as a technology-enabled service. And the software always had all these process controls to make sure that every activity was checked by the lawyers. We even keep statistics of how each lawyer is checking it, what their accuracy is, if they're making mistakes and they're checking. So we've got a lot of stuff in the software.

Allison: From a data security standpoint. You hear a lot in the media about how there's data breaches and this data was collected by Google or Amazon or whomever and kept on their systems and then privacy agreements and all that sort of thing. How does Brightleaf deal with that if you're dealing with a lot of very sensitive contractual information?

Samir: So from day one, we went in for ISO twenty-seven double O one certification. We are ISO twenty-seven double O one certified. We've got SOC-II attestation also. So we are hyper, hyper secure in our processes. Physical security, logical security, server security. We do pen test. We have external agencies trying to break in, we do everything possible to make sure everything is secure. And I give you an example. We went through we always go through audits with our clients. Our clients are Nasdaq and big banks and money managers and whoever. And the contracts are the most sensitive information for any client. So we are very cognizant of that from day one. We go through for one client, we went through thirteen hundred and fifty thirteen, fifty-one or thirteen fifty-two questions that we answer.

Allison: Wow.

Samir: So we go through audits all the time, of course, passed with flying colors. Some clients with special special requirements come in like one clients said, I want a special room for me or clean room. And nobody else can sit in that room except the one that processes. We can entertain that. Not a problem. We go through hyper security. And what happens in that same client that had 1300 questions? The first thing the client's business team does is email us a contract. Most insecure way. The worst way to do that. So there is such a divide between the infosec group of the clients and the business teams.

Allison: That probably happens a lot because there's a big difference between security and between convenience.

Samir: Exactly the words I use for my team. Convenience does not, pardon the term I'm using, trump security.

Allison: Yeah, exactly.

Samir: Yeah, we run into that a lot, too. And we that is entrenched into our DNA from day one.

Allison: Yeah, security is a big deal for us as well. When you're housing customer contracts, as you said, it's their most sensitive information. It is the most important thing that they have and their data is the most important thing and the most valuable asset that their company has. And yeah, when you're housing that, it requires a lot of security and a lot of attention to detail on that. And the software has all sorts of stuff built into it.

Samir: None of the data within our software is raw data. It's always encrypted. Even if somebody can get to us, there's nothing they can do. It's all encrypted stuff within the database, outside the database, the file itself, everything's encrypted. So you've got hyper stuff built into our software.

Allison: Fantastic. I'm sure your clients definitely feel better about that.

Samir: But each one has the same question that you're asking them and we have to prove ourselves every single time because it's the new plan and we're happy to do it.

Allison: So if I have decided, OK, I want to start working on I want to start working towards a CLM. My data is a mess. I need to contact someone to do that. How would I be able to reach out to you or to your team?

Samir: We've got I've got two people on the US soil, which is why our prices are the lowest they can be and all the whole team of lawyers and everybody else is in India. The downside to that is we cannot do defense contracts and stuff because that requires US lawyers on US soil so we can't do that. The other thing we cannot do is non-English contracts because they're not English speaking people in India, non-English speaking people in India. So those are the two restrictions that we do have to be upfront about that. But other than that, all they have to do is just send me an email or my sales guy, who is Daniel Berdichevsky or just send it to It comes to us and, yeah, hyper responsive.

Allison: Well, Samir, thank you so much for spending this time with us today to discuss contract migration and how our clients can start thinking about migration and especially the importance that they need to put on their data extraction of migration and that those expectations might be something different than what they're starting out with.

Samir: Pleasure to be on. Thank you for having me.