Quick :4] Pricing POC_Prod - Purchase info Record

Report name: Purchase info Record

>> Field Mapping:

# Fix:

Find in BW table

# LOC:

# Code:

'' AS `Last Price Update`, --A017.Datab

CAST(NULL AS STRING) AS `Last Close PO Delivery Date`,--EKET EINDT

CAST(NULL AS STRING) AS `Last Close PO Price` --EKPO NETPR

# Comment:

Three Fields=>

Last Price Update [ Z017 - DATAB ]

Last Close PO Delivery Date [ EKET - EINDT ]

Last Close PO Price [ EKPO - NETPR ]

|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|

Rough=>

Planned Delivery Time (in Days) :

Fix: (Hardcoded precision may not match SAP Verify SAP EINE-APLFZ field type)

Net Price:

Fix: Hardcoded precision/scale Verify SAP EINE-NETPR field type

Price Unit:

Fix: Hardcoded precision may not match SAP Verify SAP EINE-PEINH field type

goods_invoice_receipt:

Fix:

Last Price Update

Fix:

Last Close PO Delivery Date

Fix:

Last Close PO Price

Fix:

>> Join Validation

Join Name: JOIN product_plant

# ============================================================

# JOIN VALIDATION FIX - product_plant

# ------------------------------------------------------------

# Issue:

# The join with product_plant assumes that each combination of

# (product_id, plant, source_system_identifier) is unique.

# If duplicates exist, it can create duplicate rows (data explosion)

# and lead to incorrect reporting.

# Original Join:

# LEFT JOIN product_plant pp

# ON gir.product_id = pp.product_id

# AND gir.plant = pp.plant

# AND gir.source_system_identifier = pp.source_system_identifier

# Fix Implemented:

# Added a pre-validation check to ensure no duplicate records exist

# for the join keys before executing the join.

#Code:

dup_check = spark.sql(f"""

SELECT product_id, plant, source_system_identifier, COUNT(*) as cnt

FROM {gold_catalog_name}.master_data.product_plant

GROUP BY product_id, plant, source_system_identifier

HAVING COUNT(*) > 1

""")

if dup_check.count() > 0:

raise ValueError("Duplicate records found in product_plant for join keys")

# Benefit:

# Prevents incorrect data due to duplicate joins and ensures data accuracy.

# ============================================================

JOIN NAME: JOIN product_warehouse_supplier

# ============================================================

# JOIN VALIDATION FIX - product_warehouse_supplier

# ------------------------------------------------------------

# Issue:

# The join with product_warehouse_supplier assumes that each

# combination of (product_id, warehouse, supplier_id,

# source_system_identifier) is unique.

# If duplicates exist, it can create duplicate rows and lead

# to incorrect reporting.

# Original Join:

# LEFT JOIN product_warehouse_supplier pws

# ON gir.product_id = pws.product_id

# AND gir.plant = pws.warehouse

# AND gir.vendor = pws.supplier_id

# AND gir.source_system_identifier = pws.source_system_identifier

# Fix Implemented:

# Added a pre-validation check to ensure no duplicate records exist

# for the join keys before executing the join.

#Code:

dup_check_pws = spark.sql(f"""

SELECT product_id, warehouse, supplier_id, source_system_identifier, COUNT(*) as cnt

FROM {gold_catalog_name}.master_data.product_warehouse_supplier

GROUP BY product_id, warehouse, supplier_id, source_system_identifier

HAVING COUNT(*) > 1

""")

if dup_check_pws.count() > 0:

raise ValueError("Duplicate records found in product_warehouse_supplier")

# Benefit:

# Prevents duplicate join issues and ensures accurate data output.

# ============================================================

>> Filter Validation

a) Date Range Filter

b) Validity Date Range

a) Date Range Filter

>> Data Quality Checks

a) NULL Value Validation on Key Fields

b) Referential Integrity (Join Success Rate)

c) Duplicate Record Detection

d) Data Completeness Check

Search This Blog

Code With Sauru💻 [ALL]

Quick :4] Pricing POC_Prod - Purchase info Record

Comments

Post a Comment

Popular posts from this blog

QUICK: 2]Material Valuation

130 Days Challenge

[STRUCTURED] Data with Darshil [Apache Spark With Databricks For Data Engineering ]