CogSpace — Open-source tools for computational cognition

About

CogSpace is an open-source organization at the intersection of computational neuroscience and artificial intelligence. We build tools that model how the brain represents, predicts, and monitors its own knowledge of the world. Our work is grounded in established neuroscience theory and driven by a commitment to reproducible, transparent science.

Open Science

All code, data, and methods are publicly available. Science advances faster when it's shared.

Reproducibility

Every experiment is designed to be replicated. Rigorous methods, versioned code, documented pipelines.

Bridging Neuro & AI

We translate neuroscience theories into computational models, and use AI to test cognitive hypotheses.

Projects

PRISM is the first computational test of the hippocampal meta-map thesis: the idea that the brain's spatial mapping system also monitors its own knowledge. Using the successor representation as a unified substrate, PRISM builds an uncertainty map that enables calibrated self-monitoring and directed exploration.

Successor Representation

A predictive map of expected future states, inspired by hippocampal place cells. The foundation for both navigation and self-monitoring.

Uncertainty Mapping

A parallel map that tracks what the agent knows and doesn't know, derived from the structure of the SR itself.

Metacognitive Calibration

Evaluated with psychophysics tools: reliability diagrams, calibration error, and a continuous "I don't know" signal.

View on GitHub Interactive Demo

WAGER is a metacognition benchmark for LLMs. The model answers a question and wagers 0-100 points on its own correctness. Correct = +wager, wrong = -wager. Total profit is the single metacognitive score. 292 items across 5 categories test factual knowledge, reasoning, inference, awareness of ignorance, and self-knowledge.

The Wager Mechanism

A simple, interpretable protocol: the model risks points on its own answer. Profit separates models that know what they know from those that don't.

Five Metacognitive Facets

Factual knowledge, multi-step reasoning, post-cutoff inference, true unknowables (March 2026), and self-knowledge about own capabilities.

Frontier Model Comparison

Claude, GPT, Gemini, and Mistral compared head-to-head. Accuracy and metacognition turn out to be orthogonal capabilities.

View on GitHub Interactive Results

LUCID is an exploratory line of work on how large language models represent, express, and regulate their own uncertainty. Each experiment takes one concrete question — can we train a model to hedge better? what does fine-tuning actually change inside it? — and pushes it through both behavioural and mechanistic lenses.

TEMPER — Training it

A supervised fine-tuning method that doubles the hedging-accuracy correlation of Qwen2.5-7B without collapse, and transfers unexpectedly to numerical wager calibration.

MIRROR — Probing it

Linear probes on base and fine-tuned activations show the base model already has strong latent metacognition (AUC 0.83). TEMPER does not amplify it — only re-routes it.

Ongoing — Testing the routing hypothesis

If TEMPER's gain is a routing redistribution, adversarial prompts that force an assertive format should undo the calibration. Next experiments test and generalise this prediction.

View on GitHub Interactive Story