Quick :4] Pricing POC_Prod - Purchase info Record

 Report name:  Purchase info Record

>> Field Mapping:

# Fix:

Find in BW  table

# LOC:


# Code:

 '' AS `Last Price Update`, --A017.Datab

        CAST(NULL AS STRING) AS `Last Close PO Delivery Date`,--EKET EINDT

        CAST(NULL AS STRING) AS `Last Close PO Price` --EKPO NETPR

# Comment:

Three Fields=>

Last Price Update  [ Z017 - DATAB ]

Last Close PO Delivery Date [ EKET - EINDT ]

Last Close PO Price [ EKPO - NETPR ]




|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|

Rough=>

Planned Delivery Time (in Days) : 

Fix: (Hardcoded precision may not match SAP Verify SAP EINE-APLFZ field type)


Net Price:

Fix: Hardcoded precision/scale Verify SAP EINE-NETPR field type


Price Unit: 

Fix: Hardcoded precision may not match SAP Verify SAP EINE-PEINH field type


goods_invoice_receipt:

Fix: 


Last Price Update

Fix:


Last Close PO Delivery Date

Fix:


Last Close PO Price

Fix: 


>> Join Validation 

Join Name: JOIN product_plant

# ============================================================

# JOIN VALIDATION FIX - product_plant

# ------------------------------------------------------------

# Issue:

# The join with product_plant assumes that each combination of

# (product_id, plant, source_system_identifier) is unique.

# If duplicates exist, it can create duplicate rows (data explosion)

# and lead to incorrect reporting.


# Original Join:

# LEFT JOIN product_plant pp

# ON gir.product_id = pp.product_id 

# AND gir.plant = pp.plant

# AND gir.source_system_identifier = pp.source_system_identifier


# Fix Implemented:

# Added a pre-validation check to ensure no duplicate records exist

# for the join keys before executing the join.


#Code:

dup_check = spark.sql(f"""

SELECT product_id, plant, source_system_identifier, COUNT(*) as cnt

FROM {gold_catalog_name}.master_data.product_plant

GROUP BY product_id, plant, source_system_identifier

HAVING COUNT(*) > 1

""")


if dup_check.count() > 0:

    raise ValueError("Duplicate records found in product_plant for join keys")


# Benefit:

# Prevents incorrect data due to duplicate joins and ensures data accuracy.

# ============================================================


JOIN NAME: JOIN product_warehouse_supplier

# ============================================================

# JOIN VALIDATION FIX - product_warehouse_supplier

# ------------------------------------------------------------

# Issue:

# The join with product_warehouse_supplier assumes that each

# combination of (product_id, warehouse, supplier_id,

# source_system_identifier) is unique.

# If duplicates exist, it can create duplicate rows and lead

# to incorrect reporting.


# Original Join:

# LEFT JOIN product_warehouse_supplier pws

# ON gir.product_id = pws.product_id

# AND gir.plant = pws.warehouse

# AND gir.vendor = pws.supplier_id

# AND gir.source_system_identifier = pws.source_system_identifier


# Fix Implemented:

# Added a pre-validation check to ensure no duplicate records exist

# for the join keys before executing the join.


#Code:

dup_check_pws = spark.sql(f"""

SELECT product_id, warehouse, supplier_id, source_system_identifier, COUNT(*) as cnt

FROM {gold_catalog_name}.master_data.product_warehouse_supplier

GROUP BY product_id, warehouse, supplier_id, source_system_identifier

HAVING COUNT(*) > 1

""")


if dup_check_pws.count() > 0:

    raise ValueError("Duplicate records found in product_warehouse_supplier")


# Benefit:

# Prevents duplicate join issues and ensures accurate data output.

# ============================================================


>> Filter Validation

a) Date Range Filter

b) Validity Date Range


a) Date Range Filter








>> Data Quality Checks

a) NULL Value Validation on Key Fields

b) Referential Integrity (Join Success Rate)

c) Duplicate Record Detection

d) Data Completeness Check

Comments

Popular posts from this blog

QUICK: 2]Material Valuation

130 Days Challenge