Deequ

Deequ

Deequ is an open-source library developed by Amazon, designed to assess and validate data quality in large datasets using Apache Spark. It allows users to define “unit tests” for data, ensuring that data meets specified quality constraints before it’s used in analytics or machine learning applications.

Last Releases

  • 2.0.11
    What’s Changed Add AnalyzerOptions to Analyzer serialize / deserialize logic by @kchaturvedi in #597 Refine row count retrieval to skip redundant Size() scans by @lawofcycles in #605 Updated version in… Read more: 2.0.11
  • 2.0.10
    New Features Are unique check by @eycho-am in #599 add DQDL parser dependency by @happy-coral in #603 scaffolding for checking data quality agains DQDL rulesets by @happy-coral in #604 Implement… Read more: 2.0.10
  • 2.0.9
    Maintenance / Fixes Fix row level bug when composing outcome #594 Full Changelog: 2.0.8…2.0.9
  • Comprehensive Overview of Deequ: A Data Quality Tool

    Introduction Deequ is an open-source library developed by AWS Labs to address the critical need for data quality in large-scale datasets. Designed for data engineers and analysts, it simplifies the process of evaluating, monitoring, and enforcing data quality constraints in ETL pipelines and other workflows. Built on Apache Spark, it leverages the distributed computing power…

  • How to Install Deequ Locally

    Introduction Deequ is an open-source data quality library developed by AWS for validating large datasets efficiently using Spark. It enables data engineers to define and enforce quality constraints on datasets, ensuring reliability in ETL pipelines. Installing Deequ locally is useful for testing, debugging, and developing custom quality checks before deploying them in production. This guide…

Recent Comments

No comments to show.