In production at Bluestar Global Management · Q1 2026 onward
A tool that turns WhaleWisdom 13F screenshots into a fully-formatted biotech hedge fund tracker, collapsing what was a multi-hour manual workflow into a single command. Twenty-six biotech-focused funds, automatically ingested, filtered, deduplicated, and cross-referenced for consensus signal — every quarter, in about an hour of wall-clock time.
Every quarter, biotech hedge funds file 13F disclosures revealing their long positions. Aggregating and analyzing these across a coverage universe is meaningful work — but the manual version of it is tedious, error-prone, and almost entirely transcription. The tool removes that layer.
Given a folder of screenshots from WhaleWisdom, it extracts every position, applies a configurable policy filter, handles edge cases (exited positions, sub-fund consolidation, pagination artifacts), and produces a polished Excel workbook with three tabs per fund plus a master list and a consensus view. A companion auditor verifies the output against the same source screenshots — a self-consistency check that currently passes for 26 of 26 funds on Q1 2026 data.
WhaleWisdom's interface resists clean scraping and changes layout frequently. Extracting from screenshots via Claude Sonnet 4.6 trades a few cents per page for resilience to UI changes — and unlocks structured extraction from a source no scraper would touch.
Every row passes through one filter that enforces three criteria: equity-only (no warrants, options, or convertible bonds), healthcare-only, and a magnitude threshold of |%Q1| > 0.5 or |change| > 0.25pp. Centralizing this logic in one module — rather than scattering it across extractor and writer — makes the rules auditable and policy changes one-line edits.
Some funds file under multiple legal sub-entities (Foresite Capital Management III, IV, V, VI). The default deduplication strategy keeps the row with the largest position; for known sub-fund cases it sums shares and market values, recomputes percentages from combined values, and takes the maximum prior weight across entities. The strategy is configurable per fund.
When a fund fully exits a position, WhaleWisdom shows blank cells rather than zeros — and the extractor was initially misreading these. A dedicated post-processing rule now detects exited rows (zero shares, zero market value) and labels them explicitly, preserving the signal value of a high-conviction position closed in the most recent quarter.
Extracted rows are cached as JSON files keyed by the MD5 of the source screenshot. Re-runs on unchanged data are free; only modified screenshots trigger fresh API calls. The cache is shared between the generator and the auditor.
The auditor is a separate tool that reads any generated workbook and re-extracts the source screenshots, comparing field-by-field. When the generator and auditor agree, the pipeline is internally consistent. This isn't a substitute for a domain-expert review, but it catches the entire class of silent transcription errors that would otherwise require manual spot-checks.
Each quarter, the deliverable is a single Excel workbook with three sheet types: a master list of all positions across the coverage universe, a consensus tab with cross-fund aggregates (number of funds holding, average weight, signal counts), and one tab per fund. Every tab shares a common ten-column layout, with formulas for quarter-over-quarter change, action labeling (New / Added / Trimmed / Exited / No change), and threshold flagging.
Charts and analysis derived from a recent run are published in the Q1 2026 Biotech Positioning Brief.
Python, Claude Code, Anthropic API (Sonnet 4.6 for vision), openpyxl for Excel writing. Source remains internal to Bluestar; this page describes architecture and decisions, not implementation.