kaishi.tabular.file_group

Class definition for group of tabular files.

Module Contents

class kaishi.tabular.file_group.TabularFileGroup(source: str, recursive: bool, use_predefined_pipeline: bool = False, out_dir: str = None)

Bases: kaishi.core.file_group.FileGroup

Object containing groups of kaishi.tabular.file.File objects with methods to perform common operations on them.

_get_indexes_with_valid_dataframe(self)

Get a list of indexes with valid dataframes.

Returns

indexes with valid dataframe

Return type

list

_get_valid_dataframes(self)

Get a list of valid dataframe objects.

Returns

valid dataframes

Return type

list[pandas.core.frame.DataFrame]

save(self, out_dir: str, file_format: str = 'csv')

Save the processed dataset as individual files or as one file with all the data.

Parameters
  • out_dir (str) – The path of the output directory. If the directory does not exist, it will be created.

  • file_format (str) – The format of output files. Currently only supports “csv”.

load_all(self)

Load all files from the source directory.

run_pipeline(self, verbose: bool = False)

Run the pipeline as configured.

Parameters

verbose (bool) – flag indicating verbosity

report(self)

Print a report of the dataset in its current state.