7.9 KiB
ETL Pipeline: ERP → NocoDB Daily Sales
Goal
Replace the current client-side ERP fetching (which downloads hundreds of MBs of raw transactions to the browser) with a server-side ETL pipeline that aggregates ERP data into NocoDB. The dashboard reads pre-aggregated data from NocoDB — fast and lightweight.
Data Flow
Daily (2am cron):
ERP API → Server (fetch + aggregate) → NocoDB "DailySales" table
On page load:
NocoDB "DailySales" → Dashboard client (small payload, fast)
NocoDB "DailySales" Table
One row per date/museum/channel combination. Flat — no lookup tables needed.
| Column | Type | Example |
|---|---|---|
| Date | string | 2025-03-01 |
| MuseumName | string | Revelation Exhibition |
| Channel | string | HiHala Website/App |
| Visits | number | 702 |
| Tickets | number | 71 |
| GrossRevenue | number | 12049.00 |
| NetRevenue | number | 10477.40 |
Museums are derived from product descriptions using a priority-ordered keyword mapping (46 products → 6 museums). Channels are derived from OperatingAreaName with display labels (e.g. B2C → "HiHala Website/App").
Server Architecture
New files
| File | Responsibility |
|---|---|
server/src/config/museumMapping.ts |
Product → museum mapping, channel labels (moved from client) |
server/src/types.ts |
Server-side ERP types (ERPSaleRecord, ERPProduct, ERPPayment, AggregatedRecord) |
server/src/services/nocodbClient.ts |
NocoDB table discovery (via process.env, NOT import.meta.env) + paginated read/write |
server/src/services/etlSync.ts |
Orchestrate: fetch ERP → aggregate → write NocoDB |
server/src/routes/etl.ts |
POST /api/etl/sync endpoint (protected by secret token) |
Modified files
| File | Change |
|---|---|
server/src/config.ts |
Add NocoDB config (process.env.NOCODB_*) |
server/src/index.ts |
Mount ETL route |
server/.env |
Add NOCODB_* and ETL_SECRET vars |
server/.env.example |
Add NOCODB_* and ETL_SECRET placeholders |
src/services/dataService.ts |
Revert to NocoDB fetch with paginated reads for DailySales |
Removed files
| File | Reason |
|---|---|
server/src/routes/erp.ts |
Client no longer calls ERP directly |
src/services/erpService.ts |
Client no longer aggregates transactions |
src/config/museumMapping.ts |
Moved to server |
ETL Sync Endpoint
POST /api/etl/sync?mode=full|incremental
Authorization: Bearer <ETL_SECRET>
Protected by a secret token (ETL_SECRET env var). Requests without a valid token get 401. The cron passes it: curl -H "Authorization: Bearer $ETL_SECRET" -X POST ....
- incremental (default): fetch current month from ERP, aggregate, upsert into NocoDB. Used by daily cron.
- full: fetch all months from 2024-01 to now, clear and replace all NocoDB DailySales data. Used for initial setup or recovery.
Incremental date range
The current month is defined as:
startDate:YYYY-MM-01T00:00:00(first of current month)endDate:YYYY-{MM+1}-01T00:00:00(first of next month, exclusive)
This matches the convention already used in erpService.ts month boundary generation.
Response:
{
"status": "ok",
"mode": "incremental",
"transactionsFetched": 12744,
"recordsWritten": 342,
"duration": "8.2s"
}
Aggregation Logic
For each ERP transaction:
- Extract date from
TransactionDate(split on space, take first part) - Map
OperatingAreaName→ channel label viagetChannelLabel() - For each product in
Products[]:- Map
ProductDescription→ museum name viagetMuseumFromProduct()(priority-ordered keyword matching) - Accumulate into composite key
date|museum|channel:visits += PeopleCounttickets += UnitQuantityGrossRevenue += TotalPriceNetRevenue += TotalPrice - TaxAmount
- Map
Negative quantities (refunds) sum correctly by default.
NocoDB Upsert Strategy
For incremental sync:
- Delete all rows in DailySales where
Datefalls within the fetched month range - Insert the newly aggregated rows
For full sync:
- Delete all rows in DailySales
- Insert all aggregated rows
This avoids duplicate detection complexity — just replace the month's data.
Race condition note
During the delete/insert window, dashboard reads may see incomplete data. Mitigations:
- The sync runs at 2am when traffic is minimal
- The client's localStorage cache (7-day TTL) means most page loads never hit NocoDB
- The client checks if fetched data is suspiciously small (< 10 rows) and prefers cached data over a likely-incomplete NocoDB read
- For full syncs, the window is larger (~2-5 minutes). If this becomes a problem, a shadow-table swap pattern can be added later.
Client Changes
dataService.ts
Revert to reading from NocoDB. The DailySales table is flat, so no joins needed. Must use paginated fetch (NocoDB defaults to 25 rows per page, max 1000). The existing fetchNocoDBTable() helper already handles pagination — reintroduce it.
async function fetchFromNocoDB(): Promise<MuseumRecord[]> {
const tables = await discoverTableIds();
const rows = await fetchNocoDBTable<NocoDBDailySale>(tables['DailySales']);
return rows.map(row => ({
date: row.Date,
museum_name: row.MuseumName,
channel: row.Channel,
visits: row.Visits,
tickets: row.Tickets,
revenue_gross: row.GrossRevenue,
revenue_net: row.NetRevenue,
year: row.Date.substring(0, 4),
quarter: computeQuarter(row.Date),
}));
}
Add a NocoDBDailySale type to src/types/index.ts:
export interface NocoDBDailySale {
Id: number;
Date: string;
MuseumName: string;
Channel: string;
Visits: number;
Tickets: number;
GrossRevenue: number;
NetRevenue: number;
}
No Districts, Museums, or DailyStats tables needed — just DailySales and PilgrimStats.
Suspicious data check
In fetchData(), if NocoDB returns fewer than 10 rows and a cache exists, prefer the cache:
if (data.length < 10 && cached) {
console.warn('NocoDB returned suspiciously few rows, using cache');
return { data: cached.data, fromCache: true, cacheTimestamp: cached.timestamp };
}
Server Environment
Add to server/.env:
NOCODB_URL=http://localhost:8090
NOCODB_TOKEN=<token>
NOCODB_BASE_ID=<base_id>
ETL_SECRET=<random-secret-for-cron>
Note: Client .env.local retains its existing VITE_NOCODB_* vars — the client still reads NocoDB directly for both DailySales and PilgrimStats.
Update server/.env.example with the same keys (placeholder values).
Server-Side Types
ERP types are re-declared in server/src/types.ts (not imported from the client src/types/index.ts):
export interface ERPProduct {
ProductDescription: string;
SiteDescription: string | null;
UnitQuantity: number;
PeopleCount: number;
TaxAmount: number;
TotalPrice: number;
}
export interface ERPSaleRecord {
SaleId: number;
TransactionDate: string;
CustIdentification: string;
OperatingAreaName: string;
Payments: Array<{ PaymentMethodDescription: string }>;
Products: ERPProduct[];
}
export interface AggregatedRecord {
Date: string;
MuseumName: string;
Channel: string;
Visits: number;
Tickets: number;
GrossRevenue: number;
NetRevenue: number;
}
Cron
0 2 * * * curl -s -H "Authorization: Bearer $ETL_SECRET" -X POST http://localhost:3002/api/etl/sync
Runs daily at 2am. The incremental mode fetches only the current month (~15-25K transactions), aggregates server-side, and writes ~300-500 rows to NocoDB.
What's NOT Changing
- PilgrimStats still fetched from NocoDB by the client (unchanged)
- Client
.env.localretainsVITE_NOCODB_*vars (still needed for client reads) - All dashboard UI components (Dashboard, Comparison) stay as-is
- Channel and museum filters stay as-is
- Cache/offline fallback logic stays as-is (enhanced with suspicious-data check)
- Dark mode, i18n, accessibility — all unchanged