Data ModelTechnical DebtTips & Tricks

Cleaning a Salesforce Org

How we deleted 2,600,000 records from Salesforce

Data is such a big focus for anyone working with a CRM, Salesforce is no exception.  Previously we looked at migrating data into Salesforce.  But what happens when you need to remove data?  Cleaning a Salesforce org can present a few challenges.

Say you need to clean your Salesforce org and delete/archive, because…  you might have gone over your Salesforce data allowance?

This is what happened to us recently.  One of the orgs my team manages went over its allocated Data Space, and we started getting emails / calls from Salesforce to remind us that we have used up our allowance.

We had a choice, buy more storage (at Salesforce’s very inflated data prices).  I mean c’mon it is 2017, 1TB with Dropbox/Google/Onedrive/etc is only around $100/year…

As you can probably guess, this wasn’t our “go to” option, so we had to find out what we had in Salesforce vs what we needed, and then make a decision…

(c) Dilbert

Legacy org & technical debt

The org in question was what I deem a “legacy org”, and has had presented a few challenges over the years.  It has been an active Salesforce org since 2003. And without constant love and attention has built up so much technical debt because it was never actively managed/improved/developed…

Even relatively simple things like enabling and rolling out Chatter were never done – even years after Chatter launched.

Clean Your Salesforce Org: a balloon waiting to burst
Clean Your Salesforce Org: Our org was a balloon waiting to burst

As the company has grown, obviously so has the data stored within the org.

Add into the mix that new apps installed in the org which have driven a sudden increase in the volumes of records being created (eg telephony integration and training people to log calls).  And hey-presto data storage & how we are using it is suddenly a priority.

Where is the data?

For an org which has grown from around 50-100 people in 2003, and a very simple business processes.  The question was how could we be using up out data storage suddenly?

Why now?  How do we get to the bottom of what is happening?

Admittedly it wasn’t something we had kept an eye on.  So the first port of call was the setup menu.

As you might know Salesforce offer a section in the Setup menu called ‘Storage Usage’, which is quite basic but gives you a snapshot of where your data and file storage is used.  To use it, go to Setup -> in Quick Find, and search for ‘Storage Usage’.

Boom, there it was.

The org with over 23,000,000 records…

The org had ballooned to over 23,000,000 records.  Shocking as our number of accounts are a fraction of that overall volume.

What was even more shocking for us was that it was two objects consuming almost 70% of the total storage!

Salesforce Data Storage
Yes those numbers are real… We needed to clean our Salesforce org!

The two objects in question: Tasks & Email Messages…

<sarcasm> Oh joy! </sarcasm>

Not all objects are equal

Why was I so *not* excited that it was Tasks & Email Messages using up our storage?

Salesforce Tasks & Archiving

Activities in Salesforce have a unique feature which means they get archived by Salesforce after a set number of days.  This is typically around 365 days of being closed (but there are a few caveats to that), and can also be extended if you request it from Salesforce.

This is an issue, as once archived you can no longer use the standard Salesforce reporting to analyse.

And due to the sheer volume of records (over 12,500,000), it was even crashing Dataloader/Workbench/Developer Console when trying to export the data.

When I did manage to get the export, by trying to filter by created year, the file was still too big to view in Excel.  Also we were limited to just Excel, which meant we hit a brick wall.

Email to Case & Email Messages

data storageThis org heavily relies on Email-to-Case.  And when received, the email is stored in the EmailMessage object.

Additionally all auto-response emails are also saved against the case, in addition to any replies from the customer.

Great for keeping track of all communication.  But once again creates some difficulties when trying to report and analyse.

Also as an email gets saved against a case, it also creates a task.  So we end up with a sort of duplication, with a task and an email message created and linked to the same case.

Getting the data out of Salesforce

In the end we contacted Salesforce Support, as we couldn’t use normal methods to export and clean the data.

The only suggestion that Salesforce could provide was to schedule a data export of the objects we wanted to export and analyse.  Simple enough…

Anyone who has used this feature will know the output of this is zip files, which contain CSV files inside.  Great it was going to be small enough to work with!

CSV SplitterNope… Each CSV was still over 1,000,000 rows.

Excel still was too unusable to analyse the data.  At this point I really was wishing for something like Access/MySQL to load the files into.

Enter CSVSplitter, a really simple tool that allowed us to split the CSV files down into smaller more manageable chunks.

Once they were broken into the smaller files, then we were able to start analysing the tasks.

Analysing the data

Inspector Gadget time!
Inspector Gadget time!

The road to cleaning a Salesforce org is paved with lots of data to analyse!

You need to understand what you have, before you can understand what you need.

So we started analysing the data.  And we dissected the data in many different ways to understand what was driving the volume we were seeing.

We looked at tasks created by month and year.  Were there specific users who created more than others?  Were there common subject lines – which might point to auto-generated tasks.

Our Salesforce, had never ever been cleaned.  And we have used tasks in the past to drive system automation within the business.  So where relevant, these records could go!

Archiving the data

So we had analysed the data, now to archive it somewhere in case we actually needed to reverse the process. (FYI – while researching this post, I found this useful guide to creating a Archiving Policy for your company).

Though storing the records outside of Salesforce and then trying to restore in the case of a profile would be painful if we had to.  But at least we had a fall back plan, and if needed the business could still use in reports.

In addition a lot of care was taken in being conservative with what we are removing and working with various stakeholders to ensure the different departments were on board with our plan.

Now we could start the actual cleanup!

Cleaning a Salesforce Org: 2,600,000 records deleted (so far)…

I have to admit, this part ended up actually feeling strangely cathartic.

Being able to delete over 2.6 million records from an org was also a first for me 🙂

We essentially identified the records to be deleted, creating a CSV file of the IDs of the records we wanted to delete, and then use Dataloader to remove the record.

Once we started with the tasks, we were able to then also move on to other records and start an overall clean out of Salesforce.  Opportunities, Accounts, Cases… All are now in scope and we have created a data clean-up roadmap and are making great headway.

We have a long way to go still, but at least we can start to let go of the legacy data past.

And most importantly, make Salesforce a focused CRM for the sales and customer service teams, so it is easier for them to use.

Salesforce Data Storage - Before
Still a long way to go… But we shall prevail!

 

Got your own ‘lesson learned’?  Share your tips…

I have worked in org’s where hitting the data storage limit was expected and almost required.  As we deployed tools like FinancialForce which create a lot of records (and they need to).  So we simply bought more data.  But it really depends on your scenario, as every orgs needs are a little different.

Have you been in a similar situation?  How did you decide what to archive?  Did you use any specific tools to help you?  Share them in the comments below or in our Lessons Learned forum!

As part of this issue, we were able to make a business case for getting tools like DemandTools (paid app).

And I am currently investigating Passage Tech’s ‘Storage Helper‘ and ‘Rollup Helper‘ (both have a limited free version) to see if they can help profile accounts to then identify what records can still be removed and archived from Salesforce.  But I will save the details for another post later 😉

Tags

Adam Gill

I have been working with the Salesforce platform for over seven years now. My background includes a variety of CRM projects and experiences (both greenfield and legacy), covering a variety of industries. I currently have 4x certified, though always looking to increase that count! :-)

Related Articles