Data Version Control Explained

In today's data-driven world, machine learning experts and data scientists deal with a large volume of datasets, files, and metrics to carry out day-to-day operations. The varying versions of these artifacts need to be tracked and managed as experiments are performed on them in multiple iterations. Data Version Control is a great practice for managing numerous datasets, machine learning models, and files in addition to keeping a record of multiple iterations – i.e. when, why, and what was altered.

This is a companion discussion topic for the original entry at

Does DVC run on all platforms? Windows, Linux, and Mac OS?

Are there alternatives to DVC?

Is DVC cost effective for startups?

How much time does it take to implement an effective DVC system?

Yes, there are a number of alternatives and competitors to DVC such as Pachyderm, MLflow and SVN (Subversion) etc.