DataOps is NOT DevOps
DataOps? Why It shouldn't be treated like DevOps! and You Should be Using CRISP-DM
DataOps should be treated like a continuous iterative cycle where at each cycle you could be required to move back a process because you learned more about the data and are required to make changes.
--Business Understanding
This initial phase focuses on understanding the project objectives and requirements from a business perspective. The goal here is to convert this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives.
--Data Understanding
This stage starts with data collection and proceeds with activities to get familiar with the data, to identify data quality issues, to discover first insights into the data, or to detect interesting subsets to form hypotheses for hidden information. It involves exploring the data through various techniques to understand the patterns, the relationships, and the structure within it.
--Data Preparation
The data preparation phase covers all activities needed to construct the final dataset from the initial raw data. This may include table, record, and attribute selection, as well as data cleaning and transformation tasks. The aim is to develop a dataset that is suitable for the modeling tools.
--Modeling
In this phase, various modeling techniques are selected and applied to the prepared data. Depending on the problem, several techniques might be necessary to achieve the best model. This stage often requires selecting the appropriate modeling techniques, setting parameters, and iteratively running models to test and refine them.
--Evaluation
After modeling, this phase evaluates the model or models to ensure they meet the business objectives set in the first phase. It's crucial to assess the model thoroughly before deployment, ensuring it is effectively answering the intended questions and achieving the desired outcomes without overfitting or other issues.
--Deployment
The final phase of the process involves deploying the data mining solution to the business. This could be as simple as generating a report or as complex as implementing a repeatable data mining process. Depending on the requirements, the deployment phase could involve the integration of the data mining outputs into the business process to make routine decisions and actions.