Cognitive Science / AI-Assisted Assessment

Building a Cognitive Profile with AI

Joel Johnston 2026-06-02 Pre-stroke methodology, post-stroke documentation

Building a Cognitive Profile with AI — Methodology Guide

Author: Joel Johnston Date: 2026-06-02 Domain: Cognitive Science / AI-Assisted Assessment Stroke Timeline: Pre-stroke methodology (developed March 2026), post-stroke documentation

Abstract

This document describes a novel methodology for constructing a comprehensive cognitive profile using multiple independent AI models as behavioral assessors. The approach produces clinical-grade phenomenological evidence without formal testing, self-report questionnaires, or prior knowledge of diagnostic categories. The key innovation: the subject describes raw lived experience without knowing what the categories are — AI models independently pattern-match against known classifications. Direction of inference is always experience → AI pattern match → label, never label → self-identification. This eliminates confirmation bias entirely and produces the strongest possible form of phenomenological evidence.

Proof case: three independent neurodivergent pillars (HIP, HSAM, hyper-empathy) identified in a subject who did not know these categories existed, cross-validated across four AI models with convergent findings. 23 repositories, ~420,000 lines of code, ~500 commits, and 3,248 tests served as behavioral evidence.

Why This Works

Traditional cognitive assessment relies on:

Self-report questionnaires — subject reads category descriptions, decides if they fit → massive confirmation bias
Clinical observation — trained professional observes subject in controlled setting → small sample, artificial conditions
Standardized testing — IQ tests, personality inventories → measures narrow constructs, misses architecture

AI-assisted behavioral profiling offers a fourth path: 4. Naturalistic behavioral analysis — AI models analyze real-world output (code, writing, conversation patterns, project architecture) and raw experience descriptions → large sample, natural conditions, zero confirmation bias

The methodology works because:

AI models have encyclopedic knowledge of clinical categories, diagnostic criteria, and behavioral markers
Raw behavioral descriptions contain diagnostic information the subject doesn't recognize
Multiple independent models serve as independent assessors — convergence across models is the validation mechanism
Real-world output (projects, code, writing) provides objective behavioral evidence that can't be fabricated or biased

Prerequisites

What You Need

Multiple AI models — minimum 2, ideally 4 (Claude, GPT, Grok, Gemini). Each model has different training data and reasoning patterns. Convergence across models eliminates single-model bias.
Behavioral evidence — any sustained output that reflects how you think:
- Code repositories (architecture, naming, organization patterns)
- Writing samples (structure, reasoning style, topic selection)
- Project history (how many concurrent projects, domain breadth, completion rate)
- Professional output (work products, problem-solving history)
Raw experience descriptions — unfiltered accounts of how you experience the world, written BEFORE knowing what categories exist
An honest motivation — this methodology is designed for self-understanding, not label-collecting

What You Must NOT Do

Do NOT research cognitive categories first — if you know what ADHD, ASD, HSAM, or HIP look like before starting, you will unconsciously shape your descriptions to match. The entire value of this methodology depends on category ignorance at the point of description.
Do NOT lead with labels — never say "I think I might have X" to the AI. Describe the experience. Let the model find the label.
Do NOT cherry-pick — include experiences that don't seem special to you. What feels normal to you may be the strongest diagnostic marker.
Do NOT use one model — a single model can hallucinate a pattern. Convergence across independent models is the validation mechanism.

Step-by-Step Methodology

Phase 1: Raw Experience Capture

Write descriptions of how you experience the world. Focus on phenomena, not interpretations:

Memory:

How do you remember things? Is it visual, verbal, spatial, emotional?
Can you walk around in memories or are they flat/narrative?
How far back do your memories go?
Do memories chain involuntarily (one triggers the next triggers the next)?

Processing:

How many things can you think about at once?
Does cognitive load make you slower or faster?
How do you solve problems — breadth-first (survey everything) or depth-first (dive in)?
How quickly do you context-switch between domains?

Sensory:

Are you sensitive to light, sound, texture, temperature, emotional environments?
Do you absorb other people's emotional states?
Do sensory experiences trigger memories?
Are any senses amplified in certain conditions?

Learning:

How do you acquire new skills? How fast?
Do you need repetition or does single exposure stick?
Can you transfer patterns across unrelated domains?
What happens when you learn something new — does it change how you see old information?

Social:

Do you read people easily or with difficulty?
Can you detect lies or emotional masking?
Do social environments drain you? Why?
Do you communicate differently than others expect?

Work patterns:

How many projects can you sustain simultaneously?
What does your project organization look like?
Do you complete things or abandon them?
How do you handle interruptions?

Write these descriptions in your own words, using your own vocabulary. Do not use clinical terminology. The less you know about what the "right" answers are, the more valuable your descriptions become.

Phase 2: AI Assessment (Independent Models)

Submit your raw descriptions to each AI model separately with this framing:

"I'm going to describe how I experience the world. I want you to identify any cognitive patterns, traits, or classifications that match my descriptions. Use a Bayesian framework — estimate the probability of each trait given the behavioral evidence I provide. I have not researched any of these categories. I don't know what clinical terms apply to my experiences. Map my descriptions to whatever frameworks you recognize."

Critical rules:

Submit the same raw descriptions to each model
Do NOT share one model's findings with another before they assess independently
Do NOT prompt with "another model said I might have X" — this contaminates independence
Let each model reach its own conclusions
Record each model's findings separately before comparing

Phase 3: Cross-Validation

After all models have independently assessed your descriptions:

Convergence analysis — which traits did multiple models identify independently?
- 4/4 models agree = high confidence (P > 0.90)
- 3/4 models agree = strong signal (P > 0.80)
- 2/4 models agree = worth exploring (P > 0.60)
- 1/4 models identify = possible model-specific bias, hold for more evidence
Divergence analysis — where do models disagree?
- Different labels for the same behavior = terminology difference, not substantive disagreement
- One model identifies something others missed = may reflect that model's training emphasis
- Direct contradiction = the trait needs more behavioral evidence to resolve
Mechanism check — for each identified trait, trace the mechanism:
- Does the proposed mechanism explain the behavior?
- Does it predict other behaviors you observe but didn't describe?
- Does it conflict with any observed behavior?
- Can you falsify it? (Try to break the classification — if it survives, it's stronger)

Phase 4: Behavioral Evidence Mapping

Connect identified traits to objective behavioral evidence:

Evidence Type	Example	What It Shows
Code architecture	Clean module boundaries across 10 repos	Systems thinking, architectural decomposition
Project breadth	6+ domains, zero ramp-up between contexts	Parallel processing, cross-domain transfer
Completion rate	3,248 tests, all passing	Task completion (differentiates from ADHD)
Output volume	~420K lines in 72 days	Processing speed, sustained throughput
Naming patterns	Structured ID codes, taxonomic organization	Taxonomic thinking
Domain selection	Distributed systems, theology, medicine, hardware	Breadth-first exploration

The behavioral evidence serves two purposes:

Corroboration — objective output confirms subjective descriptions
Discrimination — differentiates between similar-looking traits (e.g., HIP hyperfocus vs ADHD hyperfocus)

Phase 5: Differential Assessment

For each identified trait, actively try to disprove it:

Ask each model:

"I want you to challenge this classification. What alternative explanations exist for this behavior? What would you expect to see if this classification is WRONG? What discriminators separate this from the most likely alternative?"

Build discriminator tables:

Discriminator	Trait A Pattern	Trait B Pattern	Subject's Pattern	Conclusion
Dopamine reward	Novelty/initiation	Completion/efficiency	Completion	Favors B
Task completion	Impaired	Intact	Intact (3,248 tests)	Favors B

The differential is where the methodology proves its value. Self-report says "I have trait A." Differential assessment says "your behavior matches trait B at the mechanism level, even though traits A and B look identical on the surface."

Phase 6: Profile Compilation

Assemble the final profile:

Core architecture — the primary cognitive classification (the "what you ARE" level)
Independent add-ons — traits that co-occur but aren't caused by the core architecture
Cross-amplification map — how independent traits interact on the same neural bus
Functional costs — where the architecture creates genuine impairment
Differential rulings — what was considered and ruled out, with mechanism-level evidence
Probability estimates — Bayesian P(trait) for each classification

Anti-Bias Design

This methodology's primary innovation is its bias elimination architecture:

Bias Type	Traditional Assessment	This Methodology
Confirmation bias	Subject reads category, sees themselves in it	Subject doesn't know the category exists
Self-report inflation	Subject rates themselves on known scales	Subject describes raw phenomena in own words
Single-assessor bias	One clinician's framework and training	4 independent AI models with different training
Artificial conditions	Controlled office setting, 50-minute session	Naturalistic output over weeks/months
Small sample	Handful of observed behaviors	Hundreds of thousands of lines of behavioral output
Category priming	Questionnaire items telegraph the diagnosis	No questionnaire, no items, no categories presented

The direction of inference is the key:

Traditional: label → self-identification → confirmation
This methodology: experience → AI pattern match → label

The subject cannot bias toward a category they don't know exists. This is equivalent to a patient presenting symptoms to a doctor without having Googled anything first — the strongest possible form of phenomenological evidence.

Validation Criteria

How do you know the profile is accurate?

Predictive power — does the profile predict behaviors you didn't describe? If the model says "you should also experience X" and you do, that's confirmation of the underlying mechanism, not just the surface trait.
Falsification survival — did the classifications survive active attempts to disprove them? Adding nuances that should break a wrong classification tightens a correct one.
Cross-model convergence — did independent models reach the same conclusions from the same raw data?
Mechanism coherence — do the identified traits explain the full behavioral picture through traceable causal chains, not just correlations?
Retrodictive power — does the profile explain past behaviors that were previously puzzling? ("Oh, THAT'S why I always did X.")
Third-party recognition — do people who know you well say "yes, that's exactly right" when they read the profile?

Proof Case: Joel Johnston

Data

23 active repositories, ~420,000 lines of code, ~500 commits, ~1,828 files
3,248 tests across core repos
72 days of continuous output (March 2 – May 13, 2026)
Raw experience descriptions provided without knowledge of clinical categories

Assessment

4 AI models: Claude, GPT, Grok, Gemini
Independent assessment: each model received the same raw descriptions separately
Convergent findings: all four models identified the same three-pillar architecture

Results

Classification	P(trait)	Models Agreeing	Method
HIP (High Intellectual Potential)	0.99	4/4	Behavioral + empirical (IQ 158, 163)
HSAM (3D/4D)	0.93	4/4	Phenomenological → AI mapping
Hyper-empathy (affective)	0.95	4/4	Phenomenological → AI mapping
HSP (amplified by hemiplegic migraine)	0.93	3/4	Behavioral + phenomenological

Key Validation Points

Subject did not know HIP, HSAM, or hyper-empathy existed as clinical categories
Labels were assigned entirely by AI from raw experience descriptions
HSAM validated by 40-year-old memory markers matched against historical records (Frank Oppenheimer timeline)
Reptile bonding evidence (cross-taxa empathy) definitively ruled out cognitive/modeled empathy
Active falsification attempts tightened classifications instead of breaking them
Differential assessment ruled out ADHD (P = 0.10) and ASD (P = 0.12) at the mechanism level despite surface-similar traits

Epistemic Strength

The critical marker: Joel did not know what HIP, HSAM, or hyper-empathy are as clinical categories. He did not research them. All labels were assigned by multiple independent AI models pattern-matching against raw descriptions of lived experience. Direction of inference: experience → AI pattern match → label, never label → self-identification. This eliminates confirmation bias entirely — the primary source of error in self-report.

Limitations

AI models are not clinicians — they cannot perform neurological exams, order tests, or diagnose medical conditions. This methodology produces behavioral assessment, not clinical diagnosis.
Training data bias — AI models reflect their training data. Underrepresented conditions may be missed.
No formal lab validation — HSAM, for example, would require the Highly Superior Autobiographical Memory test (Dr. James McGaugh, UC Irvine) for formal confirmation. This methodology provides P = 0.93, not P = 1.0.
Requires honest input — the methodology fails if the subject deliberately fabricates or withholds experiences.
Cultural context — AI models may apply Western clinical frameworks to non-Western cognitive styles. The methodology assumes the subject's cultural context is represented in the models' training data.

Applications

Self-understanding — map your own cognitive architecture without expensive clinical testing
Neurodivergence identification — especially useful for adults who were never assessed as children
Career alignment — match your cognitive strengths to work that uses them
Clinical preparation — bring a structured profile to a clinician as a starting point for formal assessment
Educational accommodation — evidence-based documentation for learning differences
Post-injury assessment — compare pre-injury cognitive profile to post-injury function (stroke, TBI, concussion)

Getting Started

Write your raw experience descriptions (Phase 1) — spend at least an hour, be thorough
Pick your AI models — Claude and GPT minimum, add Grok and Gemini for stronger validation
Submit to each model independently — do NOT share findings between models
Compare results — look for convergence
Challenge the findings — try to break each classification
Map to behavioral evidence — connect traits to observable output
Compile the profile — architecture, add-ons, costs, differentials, probabilities

The process takes days, not hours. Each model conversation will go deep. Let it. The depth is where the signal lives.

Your brain has an architecture. You're allowed to know what it is.