Kaishi (開始)

Kaishi is a toolkit from KUNGFU.AI used to accelerate the initial phases of exploratory data analysis, as well as to enable rapid dataset preparation for downstream tasks.

More simply, Kaishi helps you automate steps to get from a raw dataset (in the form of a directory of files) to something that’s usable for machine learning (or any other task you may need a clean dataset for).

Examples of common operations include:

  • Filtering duplicate files

  • Standardizing image sizes

  • Detecting similar data that may not add value to machine learning tasks

  • Deduplication of table rows

  • Many more…