Documentation Index Fetch the complete documentation index at: https://mintlify.com/kamathhrishi/finance-agent/llms.txt
Use this file to discover all available pages before exploring further.
Installation guide
This guide covers complete installation of Finance Agent, including database setup, data ingestion, and production deployment.
System requirements
Minimum requirements
Python: 3.9 or higher
PostgreSQL: 12+ with pgvector extension
RAM: 4GB minimum, 8GB recommended
Disk: 10GB for application + data
OS: Linux, macOS, or Windows with WSL2
Optional components
Redis: For session caching and WebSocket management
DuckDB: For financial screener (included in requirements)
AWS S3: For storing full transcript and filing documents
Installation methods
Local development
Production deployment
Docker
Step 1: Install Python and dependencies # Check Python version (3.9+ required)
python --version
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Clone repository
git clone https://github.com/kamathhrishi/stratalensai.git
cd finance_agent
# Install dependencies
pip install -r requirements.txt
Step 2: Install PostgreSQL with pgvector
# Install PostgreSQL
sudo apt update
sudo apt install postgresql postgresql-contrib
# Install pgvector extension
sudo apt install postgresql-server-dev-all
cd /tmp
git clone --branch v0.5.1 https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
# Create database
sudo -u postgres createdb stratalens
sudo -u postgres psql stratalens -c "CREATE EXTENSION vector;"
# Install PostgreSQL via Homebrew
brew install postgresql@14
brew services start postgresql@14
# Install pgvector
brew install pgvector
# Create database
createdb stratalens
psql stratalens -c "CREATE EXTENSION vector;"
# Run PostgreSQL with pgvector using Docker
docker run -d \
--name finance-agent-postgres \
-e POSTGRES_DB=stratalens \
-e POSTGRES_PASSWORD=changeme \
-p 5432:5432 \
pgvector/pgvector:pg14
# Verify pgvector extension
docker exec -it finance-agent-postgres psql -U postgres -d stratalens -c "CREATE EXTENSION IF NOT EXISTS vector;"
Step 3: Install Redis (optional) # Ubuntu/Debian
sudo apt install redis-server
sudo systemctl start redis
# macOS
brew install redis
brew services start redis
# Docker
docker run -d --name finance-agent-redis -p 6379:6379 redis:7-alpine
Copy and edit the environment file: Edit .env with your configuration: # ========================================
# AI Model API Keys (REQUIRED)
# ========================================
OPENAI_API_KEY = sk-your-openai-api-key-here
CEREBRAS_API_KEY = your-cerebras-api-key-here
API_NINJAS_KEY = your-api-ninjas-key-here
# Which LLM to use (cerebras | openai | auto)
RAG_LLM_PROVIDER = cerebras
# Optional: Real-time news search
TAVILY_API_KEY = your-tavily-api-key-here
# ========================================
# Database Configuration
# ========================================
DATABASE_URL = postgresql://postgres:changeme@localhost:5432/stratalens
# ========================================
# Application Settings
# ========================================
ENVIRONMENT = development
PORT = 8000
HOST = 0.0.0.0
BASE_URL = http://localhost:8000
# ========================================
# Authentication (Production)
# ========================================
# Get these from Clerk Dashboard: https://dashboard.clerk.com
CLERK_SECRET_KEY = sk_test_your-clerk-secret-key
CLERK_PUBLISHABLE_KEY = pk_test_your-clerk-publishable-key
# Frontend (Vite requires VITE_ prefix)
VITE_CLERK_PUBLISHABLE_KEY = pk_test_your-clerk-publishable-key
# Auth bypass for development (set to false in production)
AUTH_DISABLED = true
# ========================================
# Optional Services
# ========================================
REDIS_URL = redis://localhost:6379
LOGFIRE_TOKEN = your-logfire-token-here # For observability
# ========================================
# Logging
# ========================================
LOG_LEVEL = INFO
RAG_DEBUG_MODE = false # Set to true for detailed agent reasoning
Step 5: Verify installation # Test database connection
python -c "import psycopg2; conn = psycopg2.connect('postgresql://postgres:changeme@localhost:5432/stratalens'); print('Database OK')"
# Test dependencies
python -c "import fastapi, openai, langchain, sentence_transformers; print('Dependencies OK')"
# Start server
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000
Access at http://localhost:8000 Railway deployment (recommended) Finance Agent is optimized for Railway deployment:
Create Railway project
Visit railway.app
Click “New Project”
Select “Deploy from GitHub repo”
Connect your forked repository
Add PostgreSQL service
Click “New Service” → “Database” → “PostgreSQL”
Railway automatically configures DATABASE_URL
Connect to database and install pgvector:
CREATE EXTENSION IF NOT EXISTS vector ;
Add Redis service
Click “New Service” → “Database” → “Redis”
Railway automatically configures REDIS_URL
Configure environment variables
Add these to your Railway service: # AI APIs
OPENAI_API_KEY = sk-...
CEREBRAS_API_KEY = ...
API_NINJAS_KEY = ...
TAVILY_API_KEY = ... # Optional
# Application
ENVIRONMENT = production
BASE_URL = https://your-app.railway.app
# Auth (from Clerk Dashboard)
CLERK_SECRET_KEY = sk_live_...
CLERK_PUBLISHABLE_KEY = pk_live_...
VITE_CLERK_PUBLISHABLE_KEY = pk_live_...
AUTH_DISABLED = false # Enable auth in production
# Optional
LOGFIRE_TOKEN = ... # For monitoring
Deploy
Railway automatically builds and deploys on git push: Your app will be available at https://your-app.railway.app Configuration file Railway deployment uses railway.toml: [ build ]
builder = "nixpacks"
[ deploy ]
startCommand = "uvicorn app.main:app --host 0.0.0.0 --port $PORT"
restartPolicyType = "on-failure"
restartPolicyMaxRetries = 10
Using Docker Compose Create docker-compose.yml: version : '3.8'
services :
postgres :
image : pgvector/pgvector:pg14
environment :
POSTGRES_DB : stratalens
POSTGRES_PASSWORD : changeme
volumes :
- postgres_data:/var/lib/postgresql/data
ports :
- "5432:5432"
redis :
image : redis:7-alpine
ports :
- "6379:6379"
app :
build : .
ports :
- "8000:8000"
environment :
DATABASE_URL : postgresql://postgres:changeme@postgres:5432/stratalens
REDIS_URL : redis://redis:6379
env_file :
- .env
depends_on :
- postgres
- redis
command : uvicorn app.main:app --host 0.0.0.0 --port 8000
volumes :
postgres_data :
Create Dockerfile: FROM python:3.11-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
gcc \
postgresql-client \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
EXPOSE 8000
CMD [ "uvicorn" , "app.main:app" , "--host" , "0.0.0.0" , "--port" , "8000" ]
Run with:
Python dependencies
The requirements.txt includes:
Core web framework
fastapi
uvicorn[standard]
starlette
Database and ORM
asyncpg==0.30.0 # PostgreSQL async driver
psycopg2-binary # PostgreSQL sync driver
SQLAlchemy # ORM
Authentication and security
PyJWT[crypto] # JWT verification (Clerk integration)
python-jose[cryptography]
passlib[bcrypt]==1.7.4
AI and LLM
openai # OpenAI API client
cerebras-cloud-sdk # Cerebras API client
langchain==0.3.18 # LLM framework
langchain-community==0.3.17
langchain-core==0.3.34
langchain-openai
langchain-text-splitters==0.3.6
sentence-transformers==3.4.1 # Embeddings (all-MiniLM-L6-v2)
tiktoken==0.8.0 # Token counting
Data processing
pandas
numpy==2.2.5
python-multipart
python-dotenv
Caching and HTTP
redis==5.2.1
requests
httpx
websockets==15.0.1
Utilities
aiofiles==24.1.0
pydantic==2.10.6
pydantic-settings==2.7.1
tavily==1.1.0 # Real-time news search
tenacity==9.0.0 # Retry logic
logfire[fastapi] # Observability (optional)
boto3>=1.35.0 # AWS S3 for document storage
Data ingestion
Data ingestion is optional for testing. You can use the live platform at stratalens.ai which already has data loaded.
To set up your own data:
Earnings transcripts
Download transcripts
python agent/rag/data_ingestion/download_transcripts.py
This downloads earnings call transcripts from API Ninjas.
Ingest to database
# Ingest specific company and years
python agent/rag/data_ingestion/ingest_with_structure.py \
--ticker AAPL \
--year-start 2020 \
--year-end 2025
# Batch ingest multiple companies
for ticker in AAPL MSFT GOOGL NVDA TSLA ; do
python agent/rag/data_ingestion/ingest_with_structure.py \
--ticker $ticker \
--year-start 2020 \
--year-end 2025
done
Generate embeddings
python agent/rag/data_ingestion/create_and_store_embeddings.py
This creates vector embeddings using sentence-transformers (all-MiniLM-L6-v2).
SEC 10-K filings
Download 10-K filings
# Download S&P 500 companies' 10-K filings
python agent/rag/data_ingestion/ingest_sp500_10k.py
Process and ingest
python agent/rag/data_ingestion/ingest_10k_to_database.py \
--ticker AAPL \
--year 2024
This:
Parses SEC filings into sections (Item 1, Item 7, Item 8, etc.)
Extracts financial statement tables
Generates embeddings for text chunks
Stores in PostgreSQL with pgvector
Database schema
The ingestion process creates these tables:
-- Earnings transcript chunks
CREATE TABLE transcript_chunks (
id SERIAL PRIMARY KEY ,
chunk_text TEXT ,
embedding VECTOR ( 384 ), -- all-MiniLM-L6-v2 embeddings
ticker VARCHAR ( 10 ),
year INTEGER ,
quarter INTEGER ,
metadata JSONB
);
-- SEC 10-K text chunks
CREATE TABLE ten_k_chunks (
id SERIAL PRIMARY KEY ,
chunk_text TEXT ,
embedding VECTOR ( 384 ),
ticker VARCHAR ( 10 ),
fiscal_year INTEGER ,
sec_section VARCHAR ( 50 ), -- item_1, item_7, item_8, etc.
sec_section_title TEXT ,
is_financial_statement BOOLEAN ,
metadata JSONB
);
-- SEC 10-K financial tables
CREATE TABLE ten_k_tables (
id SERIAL PRIMARY KEY ,
ticker VARCHAR ( 10 ),
fiscal_year INTEGER ,
content JSONB, -- Structured table data
statement_type VARCHAR ( 50 ), -- income_statement, balance_sheet, cash_flow
is_financial_statement BOOLEAN ,
metadata JSONB
);
Configuration reference
Environment variables
Variable Required Default Description OPENAI_API_KEYYes - OpenAI API key for embeddings and LLM CEREBRAS_API_KEYRecommended - Cerebras API key for fast inference API_NINJAS_KEYYes - API Ninjas key for earnings transcripts TAVILY_API_KEYOptional - Tavily key for real-time news search DATABASE_URLYes - PostgreSQL connection string REDIS_URLOptional redis://localhost:6379 Redis connection string ENVIRONMENTNo development Environment (development/production) PORTNo 8000 Server port BASE_URLNo http://localhost:8000 Base URL for the application RAG_LLM_PROVIDERNo cerebras LLM provider (cerebras/openai/auto) RAG_DEBUG_MODENo false Enable detailed agent logging AUTH_DISABLEDNo true Bypass authentication (dev only) CLERK_SECRET_KEYProduction - Clerk auth secret key CLERK_PUBLISHABLE_KEYProduction - Clerk auth publishable key LOG_LEVELNo INFO Logging level
LLM provider configuration
Choose between OpenAI and Cerebras:
# Cerebras (default - fast and cost-effective)
RAG_LLM_PROVIDER = cerebras
CEREBRAS_API_KEY = your-key
# OpenAI (fallback)
RAG_LLM_PROVIDER = openai
OPENAI_API_KEY = sk-your-key
# Auto (uses Cerebras if available, else OpenAI)
RAG_LLM_PROVIDER = auto
Models used:
Cerebras: qwen-3-235b-a22b-instruct-2507 (fast inference)
OpenAI: gpt-5-nano-2025-08-07 (fallback)
Embeddings: all-MiniLM-L6-v2 (384 dimensions)
Database connection pool
Production vs. development settings (from config.py):
# Production
min_size: 5
max_size: 30
command_timeout: 20
timeout: 15
# Development
min_size: 10
max_size: 50
command_timeout: 30
timeout: 20
Troubleshooting
pgvector extension not found
# Ubuntu/Debian
sudo apt install postgresql-server-dev-all
cd /tmp
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
# macOS
brew install pgvector
# Then in PostgreSQL:
psql -d stratalens -c "CREATE EXTENSION vector;"
Memory errors during data ingestion
Process data in smaller batches: # Ingest one year at a time
for year in { 2020..2025} ; do
python agent/rag/data_ingestion/ingest_with_structure.py \
--ticker AAPL \
--year-start $year \
--year-end $year
done
If you hit rate limits during ingestion:
OpenAI: Upgrade to higher tier or add delays
API Ninjas: Free tier has limits, consider paid plan
Cerebras: Contact for higher limits
Add retry logic: from tenacity import retry, stop_after_attempt, wait_exponential
@retry ( stop = stop_after_attempt( 3 ), wait = wait_exponential( multiplier = 1 , min = 4 , max = 10 ))
def api_call ():
# Your API call here
pass
Slow vector search queries
Create indexes on frequently queried columns: -- Index for vector similarity search
CREATE INDEX ON transcript_chunks USING ivfflat (embedding vector_cosine_ops);
CREATE INDEX ON ten_k_chunks USING ivfflat (embedding vector_cosine_ops);
-- Indexes for filtering
CREATE INDEX idx_ticker ON transcript_chunks(ticker);
CREATE INDEX idx_year_quarter ON transcript_chunks( year , quarter );
cd frontend
# Install dependencies
npm install
# Build
npm run build
# Development mode
npm run dev
Production checklist
Before deploying to production:
Next steps
Quickstart Run your first query in 5 minutes
Agent system Understand the RAG architecture
API reference Explore endpoints and integration
Data ingestion Deep dive into the ingestion pipeline