Data Source Connect
Replace the demo unicorns data with user's own data source. Supports two modes:
- Database URL - Connect to an existing PostgreSQL database
- CSV File - Import a CSV file and create a new table
Mode 1: Database URL
When user provides a PostgreSQL connection URL:
Step 1: Update Environment
Update .env file:
POSTGRES_URL="<user-provided-connection-string>"
Step 2: Discover Schema
Connect to the database and retrieve schema information:
SELECT table_name, column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'public'
ORDER BY table_name, ordinal_position;
Step 3: Update Actions
Edit app/actions.ts - update the schema in these locations:
-
generateQueryfunction (~line 18): Replace the unicorns schema in the system prompt with the new table schema. Include:- Table name and column definitions
- Data type hints (e.g., "valuation is in billions")
- Available enum values for categorical fields
- Query patterns specific to the data
-
explainQueryfunction (~line 115): Replace the unicorns schema in the system prompt. -
runGenerateSQLQueryfunction (~line 93): Update the table name in error handling.
Step 4: Generate Sample Prompts
Create 3-5 sample natural language queries for the new data to help users understand what they can ask.
Mode 2: CSV File
When user provides a CSV file:
Step 1: Analyze CSV Structure
Read the CSV file and analyze:
- Column names (header row)
- Data types (infer from sample values)
- Null/empty value handling requirements
Step 2: Generate Schema
Create SQL schema based on CSV structure. Type inference rules:
- Numbers with decimals ->
DECIMAL(10, 2) - Integers ->
INTEGER - Dates (various formats) ->
DATE - Boolean values ->
BOOLEAN - Everything else ->
VARCHAR(255)orTEXTfor long content
Step 3: Update seed.ts
Edit lib/seed.ts:
- Update
CREATE TABLEstatement with new schema - Update CSV parsing logic for new column names
- Add type conversion for each column
- Handle missing/null values with appropriate defaults or skip logic
Example seed.ts structure:
export async function seed() {
await pool.query(`
CREATE TABLE IF NOT EXISTS <table_name> (
id SERIAL PRIMARY KEY,
<column1> <type1>,
<column2> <type2>,
...
);
`);
// Parse CSV
const results: any[] = [];
const csvFilePath = path.join(process.cwd(), '<filename>.csv');
await new Promise((resolve, reject) => {
fs.createReadStream(csvFilePath)
.pipe(csv())
.on('data', (data) => results.push(data))
.on('end', resolve)
.on('error', reject);
});
// Insert with null handling
for (const row of results) {
// Skip rows with critical missing data or use defaults
const value1 = row['ColumnName'] || null;
await pool.query(
`INSERT INTO <table_name> (...) VALUES (...) ON CONFLICT DO NOTHING`,
[value1, ...]
);
}
}
Step 4: Update Actions
Same as Database URL Mode Step 3 - update schema references in app/actions.ts.
Step 5: Run Seed
pnpm run seed
Step 6: Generate Sample Prompts
Create 3-5 sample natural language queries for the new data to help users understand what they can ask. Examples should cover:
- Basic filtering (e.g., "Show all records where X > Y")
- Aggregations (e.g., "What is the total/average X by Y?")
- Time-based queries if applicable (e.g., "Show trends over time")
- Top/bottom queries (e.g., "Top 10 by X")
Critical Files to Update
| File | What to Update |
|------|----------------|
| .env | POSTGRES_URL connection string |
| lib/seed.ts | Table schema, CSV parsing, insert logic |
| app/actions.ts | Schema in generateQuery and explainQuery prompts |
Schema Prompt Template
When updating the schema in app/actions.ts, use this template:
system: `You are a SQL (postgres) and data visualization expert. Your job is to help the user write a SQL query to retrieve the data they need. The table schema is as follows:
<table_name> (
id SERIAL PRIMARY KEY,
<column_name> <data_type> <constraints>,
...
);
Only retrieval queries are allowed.
[Add data-specific hints here, e.g.:]
- Use ILIKE for case-insensitive text search
- <field_name> values include: value1, value2, value3
- <numeric_field> is in <units> (e.g., thousands, millions)
EVERY QUERY SHOULD RETURN QUANTITATIVE DATA THAT CAN BE PLOTTED ON A CHART!
`,
Null Value Handling
For CSV imports with missing data:
- Skip row: When critical identifier is missing
- Default value: Use sensible defaults (0 for numbers, 'Unknown' for strings)
- Allow null: For truly optional fields
微信扫一扫