When we get the data spreadsheet, after understanding what data is included in the spreadsheet and its meaning, we now have to ensure that data is complete and proper for further analysis.
Consider few things that can make your task easier and will ensure that you don't miss any important thing.
- Size of the data and number of spreadsheets
- Number of columns and its content
- General observation of what this data is about
- Are you able to quickly see anything related in these columns?
- Missing data (Empty cell)
- Unformatted data (Extra spaced in cell)
- Different data type (Example: 75 in one row and sixty five in another)
In EXCEL, we can follow some basic steps to ensure that data is aligned in each cells.
- Find empty cells
- Remove duplicates
- Update date format
- Text to column
- Use COUNTIF to find outliers or abnormal values
- Use LEN to ensure character lengths are in order
- Use LEFT or RIGHT to separate text as per requirement
- Use MID to separate middle characters from alpha numeric characters
- Use CONCATENATE to join text in 2 different columns
- Use TRIM to remove extra spaces from cells
- Use spelling check function to ensure text is spelled correctly
- Plotting- Sometimes, when you want to take a high level view of the numbers presented in the data, highlight the column and plot it. This way if there is any odd number out there, you would know immediately and then can ask the data source to check if the value is right or not.
- Use Data Mapping if you have data in 2 different spreadsheets and you need to combine them together. Make sure that both have similar columns...For example, in one spreadsheet, city is included in the address column along with street number and address. In another spreadsheet, city is in a separate column and address column has street number and name. You need to use CONCATENATE or text to column and make sure both spreadsheet have similar columns before combining the data.
- Also, one spreadsheet has province column that has short abbreviation of province (ON) but another spreadsheet has province name written in full(Ontario)
No comments:
Post a Comment