Kaishi (開始)¶
Kaishi is a toolkit from KUNGFU.AI used to accelerate the initial phases of exploratory data analysis, as well as to enable rapid dataset preparation for downstream tasks.
More simply, Kaishi helps you automate steps to get from a raw dataset (in the form of a directory of files) to something that’s usable for machine learning (or any other task you may need a clean dataset for).
Examples of common operations include:
Filtering duplicate files
Standardizing image sizes
Detecting similar data that may not add value to machine learning tasks
Deduplication of table rows
Many more…