# ETL Pipeline: ERP → NocoDB Daily Sales ## Goal Replace the current client-side ERP fetching (which downloads hundreds of MBs of raw transactions to the browser) with a server-side ETL pipeline that aggregates ERP data into NocoDB. The dashboard reads pre-aggregated data from NocoDB — fast and lightweight. ## Data Flow ``` Daily (2am cron): ERP API → Server (fetch + aggregate) → NocoDB "DailySales" table On page load: NocoDB "DailySales" → Dashboard client (small payload, fast) ``` ## NocoDB "DailySales" Table One row per date/museum/channel combination. Flat — no lookup tables needed. | Column | Type | Example | |--------|------|---------| | Date | string | `2025-03-01` | | MuseumName | string | `Revelation Exhibition` | | Channel | string | `HiHala Website/App` | | Visits | number | `702` | | Tickets | number | `71` | | GrossRevenue | number | `12049.00` | | NetRevenue | number | `10477.40` | Museums are derived from product descriptions using a priority-ordered keyword mapping (46 products → 6 museums). Channels are derived from `OperatingAreaName` with display labels (e.g. B2C → "HiHala Website/App"). ## Server Architecture ### New files | File | Responsibility | |------|----------------| | `server/src/config/museumMapping.ts` | Product → museum mapping, channel labels (moved from client) | | `server/src/types.ts` | Server-side ERP types (`ERPSaleRecord`, `ERPProduct`, `ERPPayment`, `AggregatedRecord`) | | `server/src/services/nocodbClient.ts` | NocoDB table discovery (via `process.env`, NOT `import.meta.env`) + paginated read/write | | `server/src/services/etlSync.ts` | Orchestrate: fetch ERP → aggregate → write NocoDB | | `server/src/routes/etl.ts` | `POST /api/etl/sync` endpoint (protected by secret token) | ### Modified files | File | Change | |------|--------| | `server/src/config.ts` | Add NocoDB config (`process.env.NOCODB_*`) | | `server/src/index.ts` | Mount ETL route | | `server/.env` | Add `NOCODB_*` and `ETL_SECRET` vars | | `server/.env.example` | Add `NOCODB_*` and `ETL_SECRET` placeholders | | `src/services/dataService.ts` | Revert to NocoDB fetch with paginated reads for DailySales | ### Removed files | File | Reason | |------|--------| | `server/src/routes/erp.ts` | Client no longer calls ERP directly | | `src/services/erpService.ts` | Client no longer aggregates transactions | | `src/config/museumMapping.ts` | Moved to server | ## ETL Sync Endpoint ``` POST /api/etl/sync?mode=full|incremental Authorization: Bearer ``` Protected by a secret token (`ETL_SECRET` env var). Requests without a valid token get 401. The cron passes it: `curl -H "Authorization: Bearer $ETL_SECRET" -X POST ...`. - **incremental** (default): fetch current month from ERP, aggregate, upsert into NocoDB. Used by daily cron. - **full**: fetch all months from 2024-01 to now, clear and replace all NocoDB DailySales data. Used for initial setup or recovery. ### Incremental date range The current month is defined as: - `startDate`: `YYYY-MM-01T00:00:00` (first of current month) - `endDate`: `YYYY-{MM+1}-01T00:00:00` (first of next month, exclusive) This matches the convention already used in `erpService.ts` month boundary generation. Response: ```json { "status": "ok", "mode": "incremental", "transactionsFetched": 12744, "recordsWritten": 342, "duration": "8.2s" } ``` ## Aggregation Logic For each ERP transaction: 1. Extract date from `TransactionDate` (split on space, take first part) 2. Map `OperatingAreaName` → channel label via `getChannelLabel()` 3. For each product in `Products[]`: - Map `ProductDescription` → museum name via `getMuseumFromProduct()` (priority-ordered keyword matching) - Accumulate into composite key `date|museum|channel`: - `visits += PeopleCount` - `tickets += UnitQuantity` - `GrossRevenue += TotalPrice` - `NetRevenue += TotalPrice - TaxAmount` Negative quantities (refunds) sum correctly by default. ## NocoDB Upsert Strategy For **incremental** sync: 1. Delete all rows in DailySales where `Date` falls within the fetched month range 2. Insert the newly aggregated rows For **full** sync: 1. Delete all rows in DailySales 2. Insert all aggregated rows This avoids duplicate detection complexity — just replace the month's data. ### Race condition note During the delete/insert window, dashboard reads may see incomplete data. Mitigations: - The sync runs at 2am when traffic is minimal - The client's localStorage cache (7-day TTL) means most page loads never hit NocoDB - The client checks if fetched data is suspiciously small (< 10 rows) and prefers cached data over a likely-incomplete NocoDB read - For full syncs, the window is larger (~2-5 minutes). If this becomes a problem, a shadow-table swap pattern can be added later. ## Client Changes ### dataService.ts Revert to reading from NocoDB. The `DailySales` table is flat, so no joins needed. **Must use paginated fetch** (NocoDB defaults to 25 rows per page, max 1000). The existing `fetchNocoDBTable()` helper already handles pagination — reintroduce it. ```typescript async function fetchFromNocoDB(): Promise { const tables = await discoverTableIds(); const rows = await fetchNocoDBTable(tables['DailySales']); return rows.map(row => ({ date: row.Date, museum_name: row.MuseumName, channel: row.Channel, visits: row.Visits, tickets: row.Tickets, revenue_gross: row.GrossRevenue, revenue_net: row.NetRevenue, year: row.Date.substring(0, 4), quarter: computeQuarter(row.Date), })); } ``` Add a `NocoDBDailySale` type to `src/types/index.ts`: ```typescript export interface NocoDBDailySale { Id: number; Date: string; MuseumName: string; Channel: string; Visits: number; Tickets: number; GrossRevenue: number; NetRevenue: number; } ``` No `Districts`, `Museums`, or `DailyStats` tables needed — just `DailySales` and `PilgrimStats`. ### Suspicious data check In `fetchData()`, if NocoDB returns fewer than 10 rows and a cache exists, prefer the cache: ```typescript if (data.length < 10 && cached) { console.warn('NocoDB returned suspiciously few rows, using cache'); return { data: cached.data, fromCache: true, cacheTimestamp: cached.timestamp }; } ``` ## Server Environment Add to `server/.env`: ``` NOCODB_URL=http://localhost:8090 NOCODB_TOKEN= NOCODB_BASE_ID= ETL_SECRET= ``` **Note:** Client `.env.local` retains its existing `VITE_NOCODB_*` vars — the client still reads NocoDB directly for both DailySales and PilgrimStats. Update `server/.env.example` with the same keys (placeholder values). ## Server-Side Types ERP types are re-declared in `server/src/types.ts` (not imported from the client `src/types/index.ts`): ```typescript export interface ERPProduct { ProductDescription: string; SiteDescription: string | null; UnitQuantity: number; PeopleCount: number; TaxAmount: number; TotalPrice: number; } export interface ERPSaleRecord { SaleId: number; TransactionDate: string; CustIdentification: string; OperatingAreaName: string; Payments: Array<{ PaymentMethodDescription: string }>; Products: ERPProduct[]; } export interface AggregatedRecord { Date: string; MuseumName: string; Channel: string; Visits: number; Tickets: number; GrossRevenue: number; NetRevenue: number; } ``` ## Cron ```bash 0 2 * * * curl -s -H "Authorization: Bearer $ETL_SECRET" -X POST http://localhost:3002/api/etl/sync ``` Runs daily at 2am. The incremental mode fetches only the current month (~15-25K transactions), aggregates server-side, and writes ~300-500 rows to NocoDB. ## What's NOT Changing - PilgrimStats still fetched from NocoDB by the client (unchanged) - Client `.env.local` retains `VITE_NOCODB_*` vars (still needed for client reads) - All dashboard UI components (Dashboard, Comparison) stay as-is - Channel and museum filters stay as-is - Cache/offline fallback logic stays as-is (enhanced with suspicious-data check) - Dark mode, i18n, accessibility — all unchanged