Rattenbury 2015 (†872)Rattenbury, Tye. "Six Core Data Wrangling Activities" (Trifacta, 2 October 2015).
- data wrangling : Data wrangling [is] a process that includes six core activities. . . . 1. Discovering is something of an umbrella term for the entire process; in it, you learn what is in your data and what might be the best approach for productive analytic explorations. 2. Structuring is needed because data comes in all shapes and sizes. 3. Cleaning involves taking out data that might distort the analysis. 4. Enriching allows you to take advantage of the wrangling you have already done to ask yourself: “Now that I have a sense of my data, what other data might be useful in this analysis?” Or, “What new kinds of data can I derive from the data I already have?” 5. Validating is the activity that surfaces data quality and consistency issues, or verifies that they have been properly addressed by applied transformations. 6. Publishing refers to planning for and delivering the output of your data wrangling efforts for downstream project needs (like loading the data in a particular analysis package) or for future project needs (like documenting and archiving transformation logic). (†2617)