Project

13F Biotech Intelligence Tool

In production at Bluestar Global Management · Q1 2026 onward

A tool that turns WhaleWisdom 13F screenshots into a fully-formatted biotech hedge fund tracker, collapsing what was a multi-hour manual workflow into a single command. Twenty-six biotech-focused funds, automatically ingested, filtered, deduplicated, and cross-referenced for consensus signal — every quarter, in about an hour of wall-clock time.

What it does

Every quarter, biotech hedge funds file 13F disclosures revealing their long positions. Aggregating and analyzing these across a coverage universe is meaningful work — but the manual version of it is tedious, error-prone, and almost entirely transcription. The tool removes that layer.

Given a folder of screenshots from WhaleWisdom, it extracts every position, applies a configurable policy filter, handles edge cases (exited positions, sub-fund consolidation, pagination artifacts), and produces a polished Excel workbook with three tabs per fund plus a master list and a consensus view. A companion auditor verifies the output against the same source screenshots — a self-consistency check that currently passes for 26 of 26 funds on Q1 2026 data.

The pipeline

screenshots/ ↓ shared/extractor.py ← Claude Sonnet 4.6 vision extraction (cached by file hash) ↓ shared/policy_filter.py ← three-rule filter: is_equity, is_healthcare, meets_threshold ↓ ├── generator/ → polished Excel: MASTER LIST · CONSENSUS · per-fund tabs └── auditor/ → verification against source screenshots

By the numbers

26 / 26

Funds passing self-consistency check

~$2

Per-quarter API cost

~1 hr

End-to-end runtime

Engineering decisions worth surfacing

Vision extraction over scraping
WhaleWisdom's interface resists clean scraping and changes layout frequently. Extracting from screenshots via Claude Sonnet 4.6 trades a few cents per page for resilience to UI changes — and unlocks structured extraction from a source no scraper would touch.
A single policy filter, three rules
Every row passes through one filter that enforces three criteria: equity-only (no warrants, options, or convertible bonds), healthcare-only, and a magnitude threshold of |%Q1| > 0.5 or |change| > 0.25pp. Centralizing this logic in one module — rather than scattering it across extractor and writer — makes the rules auditable and policy changes one-line edits.
Fund-specific deduplication
Some funds file under multiple legal sub-entities (Foresite Capital Management III, IV, V, VI). The default deduplication strategy keeps the row with the largest position; for known sub-fund cases it sums shares and market values, recomputes percentages from combined values, and takes the maximum prior weight across entities. The strategy is configurable per fund.
Exited positions are a separate code path
When a fund fully exits a position, WhaleWisdom shows blank cells rather than zeros — and the extractor was initially misreading these. A dedicated post-processing rule now detects exited rows (zero shares, zero market value) and labels them explicitly, preserving the signal value of a high-conviction position closed in the most recent quarter.
Caching by content hash
Extracted rows are cached as JSON files keyed by the MD5 of the source screenshot. Re-runs on unchanged data are free; only modified screenshots trigger fresh API calls. The cache is shared between the generator and the auditor.
Verification as a first-class concern
The auditor is a separate tool that reads any generated workbook and re-extracts the source screenshots, comparing field-by-field. When the generator and auditor agree, the pipeline is internally consistent. This isn't a substitute for a domain-expert review, but it catches the entire class of silent transcription errors that would otherwise require manual spot-checks.

What it produces

Each quarter, the deliverable is a single Excel workbook with three sheet types: a master list of all positions across the coverage universe, a consensus tab with cross-fund aggregates (number of funds holding, average weight, signal counts), and one tab per fund. Every tab shares a common ten-column layout, with formulas for quarter-over-quarter change, action labeling (New / Added / Trimmed / Exited / No change), and threshold flagging.

Charts and analysis derived from a recent run are published in the Q1 2026 Biotech Positioning Brief.

Built with

Python, Claude Code, Anthropic API (Sonnet 4.6 for vision), openpyxl for Excel writing. Source remains internal to Bluestar; this page describes architecture and decisions, not implementation.