✅ Part 1: Get the Data
Steps:
Fork and clone the repo:
bash
git clone [Link]
cd forage-walmart-task-4
You should see:
A data/ folder with Spreadsheet 0, 1, 2 (probably CSV or XLSX).
An SQLite database (likely [Link] or similar).
Part 2: Populate the Database
Files Overview:
Spreadsheet 0: Direct data insert (likely a locations, products, or customers
table).
Spreadsheet 1: Shipment items — each row = a product in a shipment.
Spreadsheet 2: Shipping metadata — contains shipment_id, origin, destination, etc.
Python Script Template (populate_db.py)
import sqlite3
import pandas as pd
# Paths
DB_PATH = '[Link]'
SPREADSHEET_0 = 'data/spreadsheet_0.csv'
SPREADSHEET_1 = 'data/spreadsheet_1.csv'
SPREADSHEET_2 = 'data/spreadsheet_2.csv'
# Connect to the database
conn = [Link](DB_PATH)
cur = [Link]()
# === Step 1: Insert data from Spreadsheet 0 ===
df0 = pd.read_csv(SPREADSHEET_0)
df0.to_sql('some_table_name', conn, if_exists='append', index=False) # Update
'some_table_name'
# === Step 2: Load spreadsheet 1 and 2 ===
df1 = pd.read_csv(SPREADSHEET_1) # product-level shipment data
df2 = pd.read_csv(SPREADSHEET_2) # shipment meta (origin/destination)
# === Step 3: Merge shipment metadata ===
merged_df = [Link](df2, on='shipment_id')
# === Step 4: Insert into shipments and shipment_products ===
for _, row in merged_df.iterrows():
# Insert or get origin location ID
[Link]("SELECT id FROM Location WHERE name = ? AND zip_code = ?",
(row['origin_name'], row['origin_zip']))
origin = [Link]()
if not origin:
[Link]("INSERT INTO Location (name, zip_code) VALUES (?, ?)",
(row['origin_name'], row['origin_zip']))
origin_id = [Link]
else:
origin_id = origin[0]
# Insert or get destination location ID
[Link]("SELECT id FROM Location WHERE name = ? AND zip_code = ?",
(row['destination_name'], row['destination_zip']))
dest = [Link]()
if not dest:
[Link]("INSERT INTO Location (name, zip_code) VALUES (?, ?)",
(row['destination_name'], row['destination_zip']))
dest_id = [Link]
else:
dest_id = dest[0]
# Insert shipment
[Link]("""
INSERT OR IGNORE INTO Shipment (shipment_id, origin_location_id,
destination_location_id, shipment_date)
VALUES (?, ?, ?, ?)
""", (row['shipment_id'], origin_id, dest_id, row['shipment_date']))
# Insert product-shipment link
[Link]("SELECT id FROM Product WHERE name = ?", (row['product_name'],))
product_id = [Link]()[0]
[Link]("""
INSERT INTO ShipmentProduct (shipment_id, product_id, quantity)
VALUES (?, ?, ?)
""", (row['shipment_id'], product_id, row['quantity']))
# Commit and close
[Link]()
[Link]()
print("Database populated successfully.")
Notes:
You might need to update the table names (some_table_name, etc.) to match the
actual DB schema.
Use pandas.read_excel() if the files are .xlsx.
Always inspect the first few rows with [Link]() to understand column structure.