Quick :4] Pricing POC_Prod - Purchase info Record
Report name: Purchase info Record
>> Field Mapping:
# Fix:
Find in BW table
# LOC:
# Code:
'' AS `Last Price Update`, --A017.Datab
CAST(NULL AS STRING) AS `Last Close PO Delivery Date`,--EKET EINDT
CAST(NULL AS STRING) AS `Last Close PO Price` --EKPO NETPR
# Comment:
Three Fields=>
Last Price Update [ Z017 - DATAB ]
Last Close PO Delivery Date [ EKET - EINDT ]
Last Close PO Price [ EKPO - NETPR ]
|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
Rough=>
Planned Delivery Time (in Days) :
Fix: (Hardcoded precision may not match SAP Verify SAP EINE-APLFZ field type)
Net Price:
Fix: Hardcoded precision/scale Verify SAP EINE-NETPR field type
Price Unit:
Fix: Hardcoded precision may not match SAP Verify SAP EINE-PEINH field type
goods_invoice_receipt:
Fix:
Last Price Update
Fix:
Last Close PO Delivery Date
Fix:
Last Close PO Price
Fix:
>> Join Validation
Join Name: JOIN product_plant
# ============================================================
# JOIN VALIDATION FIX - product_plant
# ------------------------------------------------------------
# Issue:
# The join with product_plant assumes that each combination of
# (product_id, plant, source_system_identifier) is unique.
# If duplicates exist, it can create duplicate rows (data explosion)
# and lead to incorrect reporting.
# Original Join:
# LEFT JOIN product_plant pp
# ON gir.product_id = pp.product_id
# AND gir.plant = pp.plant
# AND gir.source_system_identifier = pp.source_system_identifier
# Fix Implemented:
# Added a pre-validation check to ensure no duplicate records exist
# for the join keys before executing the join.
#Code:
dup_check = spark.sql(f"""
SELECT product_id, plant, source_system_identifier, COUNT(*) as cnt
FROM {gold_catalog_name}.master_data.product_plant
GROUP BY product_id, plant, source_system_identifier
HAVING COUNT(*) > 1
""")
if dup_check.count() > 0:
raise ValueError("Duplicate records found in product_plant for join keys")
# Benefit:
# Prevents incorrect data due to duplicate joins and ensures data accuracy.
# ============================================================
JOIN NAME: JOIN product_warehouse_supplier
# ============================================================
# JOIN VALIDATION FIX - product_warehouse_supplier
# ------------------------------------------------------------
# Issue:
# The join with product_warehouse_supplier assumes that each
# combination of (product_id, warehouse, supplier_id,
# source_system_identifier) is unique.
# If duplicates exist, it can create duplicate rows and lead
# to incorrect reporting.
# Original Join:
# LEFT JOIN product_warehouse_supplier pws
# ON gir.product_id = pws.product_id
# AND gir.plant = pws.warehouse
# AND gir.vendor = pws.supplier_id
# AND gir.source_system_identifier = pws.source_system_identifier
# Fix Implemented:
# Added a pre-validation check to ensure no duplicate records exist
# for the join keys before executing the join.
#Code:
dup_check_pws = spark.sql(f"""
SELECT product_id, warehouse, supplier_id, source_system_identifier, COUNT(*) as cnt
FROM {gold_catalog_name}.master_data.product_warehouse_supplier
GROUP BY product_id, warehouse, supplier_id, source_system_identifier
HAVING COUNT(*) > 1
""")
if dup_check_pws.count() > 0:
raise ValueError("Duplicate records found in product_warehouse_supplier")
# Benefit:
# Prevents duplicate join issues and ensures accurate data output.
# ============================================================
>> Filter Validation
a) Date Range Filter
b) Validity Date Range
a) Date Range Filter
>> Data Quality Checks
a) NULL Value Validation on Key Fields
b) Referential Integrity (Join Success Rate)
c) Duplicate Record Detection
d) Data Completeness Check
Comments
Post a Comment