arche.data_quality_report module¶
-
class
arche.data_quality_report.DataQualityReport(items: arche.readers.items.Items, schema: Dict[str, Dict[str, Union[str, bool, int, float, None, List[T]]]], report: arche.report.Report, bucket: Optional[str] = None)¶ Bases:
object-
coverage_by_categories(df, tagged_fields)¶ Makes tables which show the number of items per category, set up with a category tag
- Parameters
df – a dataframe of items
tagged_fields – a dict of tags
-
create_appendix(schema)¶
-
create_figures(items, items_dicts)¶
-
drop_service_columns(df)¶
-
job_summary_table(job)¶
-
plot_html_to_stream()¶
-
plot_to_notebook()¶
-
rules_summary_table(df, no_of_validation_warnings, name_field, url_field, no_of_checked_duplicated_items, no_of_duplicated_items, unique, no_of_checked_skus, no_of_duplicated_skus, price_field, price_was_field, no_of_checked_price_items, no_of_price_warns, **kwargs)¶
-
save_report_to_bucket(project_id, spider, bucket)¶
-
score_table(quality_estimation, field_accuracy)¶
-
scraped_fields_coverage(job, df)¶
-
scraped_items_history(job_no, job_numbers, date_items)¶
-