Pinecone Integration

Pinecone is the vector database powering the RAG (Retrieval-Augmented Generation) system for AI email generation.

Architecture (RAG v2)

A single Pinecone index holds vectors plus small metadata. The full HTML lives in the Supabase rag_documents table (service-role only, no size cap) and is fetched by ID after retrieval — Pinecone never stores document bodies.

Store	Holds
Pinecone v2 index	Embedding vector + small metadata + `doc_id`
Supabase `rag_documents`	Full HTML, metadata, embedding text (source of truth)

Embeddings use OpenAI text-embedding-3-large (3072 dims).

Configuration

PINECONE_API_KEY=your-api-key
PINECONE_INDEX_NAME_V2=your-v2-index    # 3072-dim, text-embedding-3-large

Vector Metadata

Each vector stores small filtering metadata (never the HTML itself). The structure varies by content type:

HTML Code Examples (`type: 'html'`)

{
  "type": "html",
  "doc_id": "html-1720000000000-abc123",
  "description": "Product showcase with tabbed navigation",
  "technique": "tabs",
  "complexity": "intermediate",
  "htmlType": "complete",
  "emailPurpose": "ecommerce",
  "exampleType": "positive",
  "keyFeatures": ["lightswitch", "mobileResponsive"],
  "bestPracticeTags": ["tableStructure", "msoConditionals"],
  "submittedAt": "2026-01-15T..."
}

AMP Code Examples (`type: 'amp'`)

{
  "type": "amp",
  "doc_id": "amp-1720000000000-def456",
  "description": "AMP tabbed product showcase",
  "technique": "tabs",
  "complexity": "intermediate",
  "htmlType": "complete",
  "ampComponents": ["amp-selector", "amp-bind"],
  "ampValidator": "pass",
  "submittedAt": "2026-01-15T..."
}

Blog Articles (`type: 'blog'`)

{
  "type": "blog",
  "doc_id": "blog-1720000000000-ghi789",
  "contentFocus": "kinetic",
  "blogTitle": "Building Accessible Tab Interfaces in Email",
  "blogTopic": "kinetic-techniques",
  "learningLevel": "intermediate",
  "techniquesCovered": ["tabs", "accessibility"],
  "keyTakeaways": "Summary of key learnings...",
  "submittedAt": "2026-01-15T..."
}

The contentFocus field (kinetic | amp | general, defaulting to general) tags which build type a blog applies to. Note that blog/concept prose is excluded from the code-generation context — a "build me tabs" query pulls the tabs module, never a blog paragraph about tabs.

Retrieval Pipeline

Two queries run in parallel for every RAG-enabled generation:

// Query A: technique-filtered (best code for the specific technique)
index.query({ vector, topK: 10, includeMetadata: true,
  filter: { technique: { $eq: detectedTechnique } } });

// Query B: broad unfiltered (cross-technique patterns)
index.query({ vector, topK: 15, includeMetadata: true });

Results are merged and deduplicated, gated at a 50% similarity score (positive examples only), re-ranked by Claude down to the top 7, and then hydrated — the full HTML for surviving matches is fetched from Supabase rag_documents by ID.

Content Management

Admin endpoints manage RAG content:

Endpoint	Action
`POST /api/admin/submit-content`	Embed + store (Supabase row + Pinecone vector)
`POST /api/admin/update-content`	Re-embed and update existing content
`POST /api/admin/delete-content`	Remove content by ID
`GET /api/admin/list-content`	Browse the full library via `listPaginated` (no topK cap)

See RAG Overview for the full retrieval pipeline.

Architecture (RAG v2)​

Configuration​

Vector Metadata​

HTML Code Examples (type: 'html')​

AMP Code Examples (type: 'amp')​

Blog Articles (type: 'blog')​

Retrieval Pipeline​

Content Management​