0% found this document useful (0 votes)
11 views3 pages

Database Population Guide for Walmart Task

python language
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views3 pages

Database Population Guide for Walmart Task

python language
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

✅ Part 1: Get the Data

Steps:
Fork and clone the repo:

bash

git clone [Link]


cd forage-walmart-task-4
You should see:

A data/ folder with Spreadsheet 0, 1, 2 (probably CSV or XLSX).

An SQLite database (likely [Link] or similar).

Part 2: Populate the Database


Files Overview:
Spreadsheet 0: Direct data insert (likely a locations, products, or customers
table).

Spreadsheet 1: Shipment items — each row = a product in a shipment.

Spreadsheet 2: Shipping metadata — contains shipment_id, origin, destination, etc.

Python Script Template (populate_db.py)

import sqlite3
import pandas as pd

# Paths
DB_PATH = '[Link]'
SPREADSHEET_0 = 'data/spreadsheet_0.csv'
SPREADSHEET_1 = 'data/spreadsheet_1.csv'
SPREADSHEET_2 = 'data/spreadsheet_2.csv'

# Connect to the database


conn = [Link](DB_PATH)
cur = [Link]()

# === Step 1: Insert data from Spreadsheet 0 ===


df0 = pd.read_csv(SPREADSHEET_0)
df0.to_sql('some_table_name', conn, if_exists='append', index=False) # Update
'some_table_name'

# === Step 2: Load spreadsheet 1 and 2 ===


df1 = pd.read_csv(SPREADSHEET_1) # product-level shipment data
df2 = pd.read_csv(SPREADSHEET_2) # shipment meta (origin/destination)

# === Step 3: Merge shipment metadata ===


merged_df = [Link](df2, on='shipment_id')
# === Step 4: Insert into shipments and shipment_products ===
for _, row in merged_df.iterrows():
# Insert or get origin location ID
[Link]("SELECT id FROM Location WHERE name = ? AND zip_code = ?",
(row['origin_name'], row['origin_zip']))
origin = [Link]()
if not origin:
[Link]("INSERT INTO Location (name, zip_code) VALUES (?, ?)",
(row['origin_name'], row['origin_zip']))
origin_id = [Link]
else:
origin_id = origin[0]

# Insert or get destination location ID


[Link]("SELECT id FROM Location WHERE name = ? AND zip_code = ?",
(row['destination_name'], row['destination_zip']))
dest = [Link]()
if not dest:
[Link]("INSERT INTO Location (name, zip_code) VALUES (?, ?)",
(row['destination_name'], row['destination_zip']))
dest_id = [Link]
else:
dest_id = dest[0]

# Insert shipment
[Link]("""
INSERT OR IGNORE INTO Shipment (shipment_id, origin_location_id,
destination_location_id, shipment_date)
VALUES (?, ?, ?, ?)
""", (row['shipment_id'], origin_id, dest_id, row['shipment_date']))

# Insert product-shipment link


[Link]("SELECT id FROM Product WHERE name = ?", (row['product_name'],))
product_id = [Link]()[0]

[Link]("""
INSERT INTO ShipmentProduct (shipment_id, product_id, quantity)
VALUES (?, ?, ?)
""", (row['shipment_id'], product_id, row['quantity']))

# Commit and close


[Link]()
[Link]()
print("Database populated successfully.")

Notes:
You might need to update the table names (some_table_name, etc.) to match the
actual DB schema.

Use pandas.read_excel() if the files are .xlsx.


Always inspect the first few rows with [Link]() to understand column structure.

You might also like