πŸ“„ PUBLIC TECHNICAL DOCUMENTATION

THE BOYLE SYSTEM

AI-Assisted Documentary Infrastructure for Scientific Reproducibility

Technical reference for researchers, institutional partners, executive education programs, think tanks, and schools: architecture, MVAL protocol, corpus management, adaptive instructional design, and implementation roadmap.

"The knowledge that disappears is the knowledge that never existed."

Version 1.1 | March 2026

Medhavy LLC  |  Bear Brown LLC  |  Humanitarians AI

bear@bearbrown.co  |  bear@humanitarians.ai

Table of Contents

Presented byMedhavy LLC
In association withBear Brown LLC
Research partnerHumanitarians AI (501(c)(3))
Contactbear@bearbrown.co | bear@humanitarians.ai
PART I: SYSTEM OVERVIEW
1. Problem Statement & Motivation
2. Historical Foundations (Boyle)
3. System Architecture
4. Three-Role AI Partnership
PART II: MVAL PROTOCOL
5. Minimum Viable Analytical Log
6. Field Specifications
7. Failure Artifact Protocol
PART III: CORPUS MANAGEMENT
8. Source Ingestion & Formats
9. Ouroboros Technique
10. Source Stitching
11. Notebook Segmentation
PART IV: ADAPTIVE INSTRUCTION
12. The Five Learning Modes
13. Multi-Armed Bandit Architecture
14. Reward Modeling & Context Vectors
15. GAMBITTS & LLM Integration
PART V: CROSS-SYSTEM ANALYSIS
16. RAG vs. Long-Context
17. Grounded vs. Non-Grounded LLM
18. Technical Debt Registry
PART VI: OPERATIONS
19. Target Deployment Contexts
20. Active Deployments (Pilot)
21. Integration & Automation
22. Security & Privacy
PART VII: ROADMAP
23. Prioritized Improvements
24. Open Questions
25. Future Feature Roadmap

PART I: SYSTEM OVERVIEW

1. Problem Statement & Motivation

A client flags a number on a dashboard built six months ago. The data is still there. The pipeline is still running. The dashboard is still live. But the analyst who built it is gone. The metric definition was never written down. Is the number wrong? Nobody knows. Nobody can know. This is not a rare disaster. It is Tuesday.

Modern AI research occurs within virtualized, elastic cloud environments engineered for rapid instantiation and immediate abandonment. This architecture facilitates the "vanishing laboratory" β€” where the intricate web of dependencies, library versions, hardware configurations, and environmental variables that produced a result evaporates the moment a virtual machine is decommissioned.

The Vanishing Laboratory

Virtual machines dissolve. Library versions, dataset checksums, and hardware configurations evaporate. The result survives. The conditions do not.

The Documentation Gap

Critical decisions happen in undocumented threads, ephemeral terminal sessions, and local notebooks never committed to a repository. The "why" disappears with each personnel transition.

ℹ️ Core Insight The reproducibility crisis in machine learning is not primarily a problem of statistical methodology. It is a problem of vanishing laboratories. The Boyle System is a structural intervention β€” making the right behavior the natural one, not the effortful one.

2. Historical Foundations

Robert Boyle (1627–1691) understood that for an experiment to be scientifically valid, it had to be verifiable by others. Because the physical laboratory was private, Boyle developed a style of reporting so detailed that readers could become "virtual witnesses." The Boyle System applies this same philosophy to cloud credentials, API keys, library versions, and instructional design choices.

Documentation Dimension Aristotelian (Pre-Boyle) Boyle's Empirical Approach The Boyle System (Modern)
Primary Methodology Abstract logic and reasoning Observation and experimentation Grounded AI synthesis via RAG
Documentation Depth Minimal; focused on final truths Extensive; focused on conditions Mandatory MVAL fields (all six)
Role of Failure Ignored as an error of logic Recorded as essential data Logged as a first-class artifact
Verification Mechanism Internal consistency of argument "Virtual witnessing" via narrative Citation-backed source grounding
Social Structure Individual philosopher Royal Society "matter of fact" Collaborative AI research labs & classrooms

3. System Architecture

3.1 Technical Core: Retrieval-Augmented Generation

The Boyle System is powered by NotebookLM's Source-Grounded RAG pipeline. Unlike standard LLMs that generate from pre-trained patterns, the system can only "know" what has been uploaded to its corpus β€” its limitation is its superpower.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        RESEARCHER / LEARNER INPUT                    β”‚
β”‚   Project Charter Β· Degree Requirements Β· Boyle Principles Β·        β”‚
β”‚   MVAL Entries Β· Cloud Configs Β· Failed Experiment Logs              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚ Upload / Ingest
                                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     NOTEBOOKLM CORPUS (RAG)                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Document        β”‚   β”‚ Gemini Embedding β”‚   β”‚ Vector Index    β”‚   β”‚
β”‚  β”‚ Ingestion       │──▢│ Model           │──▢│ (Nearest        β”‚   β”‚
β”‚  β”‚ (Chunking)      β”‚   β”‚ (Vectorization) β”‚   β”‚  Neighbor)      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                         β”‚ Cosine Similarity Retrieval
                                                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚               THREE-ROLE AI PARTNER + ADAPTIVE INSTRUCTOR            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  TUTOR   β”‚  β”‚  CRITIC  β”‚  β”‚  GUIDE   β”‚  β”‚  MAB PEDAGOGY      β”‚ β”‚
β”‚  β”‚ Context- β”‚  β”‚Challengesβ”‚  β”‚  Cloud   β”‚  β”‚  ENGINE (5 Modes)  β”‚ β”‚
β”‚  β”‚ aware    β”‚  β”‚  vague   β”‚  β”‚  infra   β”‚  β”‚  SocraticΒ·Scaffold  β”‚ β”‚
β”‚  β”‚ guidance β”‚  β”‚  entries β”‚  β”‚  logging β”‚  β”‚  DirectΒ·Apprentice  β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  Metacognitive     β”‚ β”‚
β”‚                                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚ Cited, Grounded, Personalized Response
                                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         MVAL LOG ENTRY                               β”‚
β”‚         What Β· Why Β· How Β· Environment Β· Results Β· Questions         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3.2 Platform Capacity

ResourceLimitNotes
Notebooks per account100Segment by project / domain / cohort
Sources per notebook50Managed via Ouroboros + stitching strategies
Words per source500,000Maximized via source stitching
Total corpus per notebook~25 million wordsEquivalent to ~25 large technical monographs
Context window (Gemini 1.5 Pro)1M tokensNear-perfect recall (>99.7%) up to this limit

4. Three-Role AI Partnership

πŸŽ“ Role 1: Tutor

Function: Context-aware documentation guidance grounded in the researcher's or learner's actual project charter, degree requirements, and institutional standards.

Example: A researcher asks how to document a Python web-scraping project. A generic AI returns README advice. The Boyle System returns guidance specific to the team's standards, citing page references from the uploaded project charter and compliance requirements from the institutional protocol document.

Key behavior: Cannot give generic advice β€” has no generic context to draw from.

πŸ” Role 2: Critic

Function: Continuous audit of log entries. Surfaces vague outcomes, implicit assumptions, and missing failure records.

Example prompts generated by the critic:

Key behavior: Combats "interpretive drift" β€” the gradual transformation of nuanced observations into unsupported factual declarations.

βš™οΈ Role 3: Operational Guide

Function: Treats cloud credentials, API keys, library versions, and environment variables as first-class research artifacts integrated into every log entry.

Key behavior: Transforms administrative overhead into a reproducible infrastructure artifact β€” the "matter of fact" of the cloud laboratory.

PART II: MVAL PROTOCOL

5. Minimum Viable Analytical Log

ℹ️ MVAL is not a form to fill out after the work is done. It is the structure through which the work gets done. Every log entry within the Boyle System must address all six fields before the entry is considered complete.

6. Field Specifications

WHAT
The specific task or experiment attempted. Must describe the operational goal in granular detail. Avoid: "worked on pipeline." Require: "Implemented retry logic for the ATS scraper to handle 429 rate-limit responses."
WHY
The underlying reasoning for the chosen approach, including alternatives considered and rejected. This is the field most often lost during personnel transitions. It captures institutional logic behind a decision.
HOW
Precise methodology: code logic, data transformations, API calls, tool configurations, and exact steps. Should be reproducible from this field alone. The "virtual witnessing" passage.
ENVIRONMENT
Runtime configuration, library versions, cloud infrastructure, credentials used (names/roles β€” never raw keys), dataset identifiers and checksums. The cloud laboratory must be rebuildable from this field.
RESULTS
Actual outcome β€” including failures. A failed pipeline is logged with the same rigor as a successful result. Include error messages verbatim, stack traces, and unexpected outputs. Failures are not mistakes; they are data.
QUESTIONS
A record of uncertainties, open hypotheses, and follow-up threads generated by this session. Prevents the closure illusion β€” the false sense that a completed task means all related questions are resolved.

7. Failure Artifact Protocol

Event TypeMVAL TreatmentRequired Fields
Successful pipeline runStandard MVAL entryAll six fields
Failed pipeline runStandard MVAL entry β€” identical rigorAll six + error verbatim in Results
Partial / ambiguous resultStandard MVAL with explicit uncertaintyAll six + explicit uncertainty in Questions
Abandoned approachMVAL entry logging the rejection reasoningWhy (critical) + Results (why stopped)
Undocumented prior decisionRetroactive MVAL reconstructionWhy + How + note that this is reconstructed

PART III: CORPUS MANAGEMENT

8. Source Ingestion & Format Performance

FormatRetrieval QualityTechnical ConsiderationRecommendation
Markdown / Plain Textβ– β– β– β– β–  HighestNo layout noise; ideal for RAG chunkingPrimary target format
Google Docs / Wordβ– β– β– β–  HighStructured formatting facilitates parsingAcceptable; export to Markdown if possible
Text-Based PDFβ– β– β–  StrongMulti-column layouts may cause chunking errorsUse; convert to Markdown for critical sources
Scanned PDFβ– β–  MixedSensitive to scan resolution and lightingApply OCR preprocessing before ingestion
Handwritten Notes (OCR)β–  VariableCursive notation reduces reliabilityHybrid pipeline: OCR + Gemini self-correction
Audio (MP3 Overview)β– β–  High abstractionMulti-modal, conversational perspectiveTrack lineage; avoid multi-generation re-upload
Website URLsβ– β– β–  VariableDynamic content may not index correctlyPrefer static pages; exclude dynamic URL patterns

9. The Ouroboros Technique

OUROBOROS WORKFLOW

Research Session 1–N
      β”‚
      β–Ό
Accumulated MVAL Entries + AI Responses
      β”‚
      β–Ό (Select notes in NotebookLM UI)
"Convert to Source" β†’ New Dense Source Document
      β”‚
      β”œβ”€β”€β”€ βœ“ Delete original bulky source files (free slots)
      β”‚
      └─── ⚠️  REQUIRED before conversion:
                Manually embed key metadata:
                - Original citation references
                - Author / date / document title
                - Source page numbers
                (Conversion strips inline citations)
🚨 BD-002: Citation Loss on Ouroboros Conversion Converting notes to sources strips original inline citations. Mandate: manually embed original citation metadata before every conversion.

10. Source Stitching

StrategyMechanismBenefitRisk
Source StitchingCombining multiple PDFs into one fileBypasses the 50-source count limitSlightly slower specific passage retrieval
Ouroboros (Note β†’ Source)Converting AI-generated notes into a new sourceDistills knowledge and clears source slotsLoss of inline citations if metadata not preserved
Audio as SourceRe-uploading Audio Overview MP3sMulti-modal perspectiveCreeping errors across generational summaries
Metadata TaggingIncluding authors/titles in the text flowImproves citation accuracy and retrievalManual overhead in document preparation
Notebook SegmentationSplitting corpus by content type64% retrieval improvement (benchmarked)Requires disciplined categorization at ingestion

11. Notebook Segmentation Strategy

Notebook TypeRecommended ContentsRationale
Project Charter NotebookCharters, standards, institutional protocolsGrounds the Tutor role; isolated from research data
Active Research NotebookMVAL entries, experiment logs, pipeline docsPrimary working notebook; updated continuously
Literature NotebookAcademic papers, stitched research surveysSeparates authoritative external sources from internal logs
Handoff NotebookDistilled MVAL summaries, onboarding guidesDesigned for personnel transition; new-reader optimized
Failure ArchiveFailed experiment logs, dead-end documentationSearchable record prevents duplicate negative work

PART IV: ADAPTIVE INSTRUCTIONAL ARCHITECTURE

12. The Five Learning Modes

The Boyle System integrates five evidence-based instructional theories as discrete, selectable pedagogical modes. Each mode is calibrated to a specific learner state, cognitive load condition, and desired learning outcome. Together they address the Assistance Dilemma: providing enough support to facilitate progress without inducing reliance that undermines long-term retention.

ℹ️ The Assistance Dilemma Too much assistance β†’ high immediate success but shallow cognitive structures. Too little assistance β†’ impasse-driven learning only if the learner has sufficient self-regulation; otherwise, disengagement. The five-mode system navigates this dynamically.
πŸ’¬ Mode 1: Socratic Questioning

Operational definition: Iterative probing using leading questions and progressive hints that elicit latent knowledge from the learner rather than delivering information directly.

Theoretical basis: Active retrieval and schema refinement. Knowledge elicited is retained longer than knowledge delivered.

Best for: Learners with adequate foundational schemas; integrative or synthesis tasks; executive education case discussions.

Caution: Can induce frustration and cognitive overload when foundational schemas are absent. The bandit engine detects this via rising response latency without accuracy gains.

πŸ—οΈ Mode 2: Scaffolding

Operational definition: Reducing degrees of freedom by removing distractors, pre-filling procedural steps, or providing structured templates that allow the learner to focus on the core knowledge component.

Theoretical basis: Vygotsky's Zone of Proximal Development (ZPD) β€” the system maintains the learner at the edge of their capability without exceeding it.

Best for: High cognitive load conditions; new procedural skills; onboarding scenarios in executive education.

Caution: Expertise Reversal Effect β€” once mastery is achieved, continued scaffolding actively impedes fluency. The bandit detects this transition and reduces scaffold weight.

πŸ“‹ Mode 3: Direct Instruction

Operational definition: Explicit delivery of facts, definitions, or procedural rules. No elicitation; information is provided directly and efficiently.

Theoretical basis: Cognitive Load Theory β€” minimizes extraneous cognitive load when the learner lacks prerequisite schemas, enabling rapid acquisition of new Knowledge Components (KCs).

Best for: Prerequisite concept introduction; low-energy or high-stress learner states; situations where exploratory modes would cause disengagement.

Caution: Risk of passive dependency if used exclusively. The system enforces a minimum exploration rate across all other modes (fairness constraint).

πŸ”¬ Mode 4: Cognitive Apprenticeship

Operational definition: Modeling expert processes via worked examples, "think-aloud" demonstrations, or "first letter" hints that reveal the structure of expert reasoning without completing the task for the learner.

Theoretical basis: Observational learning and expert visualization. Learners acquire procedural fluency and strategy adoption by watching expert processes made visible.

Best for: Complex multi-step procedures; professional practice domains (consulting, research methodology, case analysis); think tank workflows.

Caution: High LLM generation cost. The IC-Cache optimization routes apprenticeship requests to cached high-quality examples where possible.

🧠 Mode 5: Meta-cognitive Feedback

Operational definition: Prompts for reflection, strategy evaluation, and self-monitoring. Rather than providing content, the system asks the learner to evaluate their own approach, predict their performance, or identify their gaps.

Theoretical basis: Self-Regulated Learning (SRL) theory. Learners who can monitor and regulate their own cognition perform significantly better on transfer tasks.

Best for: Advanced learners approaching mastery; post-task review; program-level reflection in executive education; doctoral and research training contexts.

Caution: Ineffective for novices who lack the foundational knowledge to evaluate their own performance accurately.

12.1 Mode Selection Summary

Mode Theoretical Basis Optimal Learner State Primary Risk
Socratic Questioning Active retrieval, schema refinement Moderate–high prior knowledge Frustration if schemas absent
Scaffolding Zone of Proximal Development Low–moderate; high cognitive load Expertise Reversal Effect
Direct Instruction Cognitive Load Theory Novice; low energy; high stress Passive dependency
Cognitive Apprenticeship Observational learning Intermediate; procedural tasks High generation cost
Meta-cognitive Feedback Self-Regulated Learning Advanced; near or post-mastery Ineffective for novices

13. Multi-Armed Bandit Architecture

Each of the five instructional modes is treated as a discrete "arm" of a Multi-Armed Bandit (MAB). The bandit engine selects which mode to apply at each instructional moment, balancing exploration (trying modes with uncertain effectiveness for this learner) against exploitation (using the mode currently estimated to be most effective).

13.1 Thompson Sampling

Bayesian Mode Selection

For each instructional mode a ∈ {1,...,5}, the system maintains a belief state modeled as a Beta distribution Beta(αₐ, βₐ) for binary rewards, or a Gaussian distribution N(μₐ, σₐ²) for continuous learning progress metrics.

Thompson Sampling draws a sample from each mode's posterior and selects the mode with the highest sample. This naturally produces high exploration early in a session (when uncertainty is high) and converges on the most effective personalized strategy as evidence accumulates.

13.2 Contextual Bandit: The Boyle Context Vector

A context-free bandit cannot achieve true personalization. The Contextual MAB (CMAB) incorporates a feature vector xβ‚œ representing the learner's current state:

E[rβ‚œ | xβ‚œ, a] = xβ‚œα΅€ θₐ

Where:
  xβ‚œ  = learner context vector at time t
  θₐ  = learned weight vector for instructional mode a
  rβ‚œ  = expected reward (learning progress)
Feature CategoryFeatures IncludedRole in Bandit
Surface-Level (Stable) Baseline education level, prior academic performance, domain background Sets initial priors; "warm start" for new learners
Deep-Level (Dynamic) Current Knowledge Component mastery, error distributions, response latency Primary signal for real-time mode switching
Affective State Estimated mood, energy level, stress indicators Temporarily biases toward lower-load modes (Direct, Scaffolding)
Knowledge Tracing (DKT/BKT) Mastery probability per skill from sequence of prior responses Detects Expertise Reversal; triggers mode drift
ℹ️ Response Latency as Cognitive Load Proxy An increase in response latency without a corresponding increase in accuracy is a high-fidelity signal that the current instructional mode is failing to provide adequate support. The bandit treats this pattern as a negative reward signal and shifts toward more structured modes.

13.3 Three Implementation Phases

Phase 1: Expert-Guided Initialization (Cold Start)

Before the system has enough data to personalize, it uses expert knowledge to seed the bandit's priors. Direct Instruction is the default mode for prerequisite concepts; Socratic Questioning is prioritized for integrative tasks. This warm-start mechanism prevents detrimental random exploration in early sessions.

Phase 2: Online Adaptation and Clustering

As learners interact with the system, the bandit refines its models. Local Clustering in Bandits (LOCB) groups learners by preference parameters θₐ. New learners whose initial behavior matches an existing cluster inherit that cluster's learned policy β€” dramatically accelerating personalization without requiring extended individual observation.

This collaborative filtering approach scales intelligence across entire cohorts in executive education programs and research training environments.

Phase 3: Non-Stationary Drift (Expertise Reversal Management)

Learning is inherently non-stationary. Sliding Window UCB or Discounted Thompson Sampling gives more weight to recent observations. As deep-level features indicate higher competence, rewards for Direct Instruction and heavy Scaffolding naturally decline, while rewards for Socratic Questioning and Meta-cognitive feedback increase. The bandit policy drifts with the learner β€” a seamless transition from guided structure to open-ended exploration.

14. Reward Modeling

14.1 Learning Progress as Reward Signal

Simple correctness rewards create a perverse incentive: the bandit maximizes help to guarantee "success," producing over-assistance. The Boyle System uses Learning Progress (LP) as its primary reward signal:

r = cα΅’(t) - cα΅’(t-1)

Where cα΅’(t) = probability of mastery for Knowledge Component i at time t

If a learner already knows a concept: cα΅’(t) - cα΅’(t-1) β‰ˆ 0
β†’ Bandit shifts to more challenging content or Meta-cognitive mode

If progress is rapid: reward is high
β†’ Bandit reinforces the current instructional mode

14.2 Composite Reward Function

Reward ComponentMetricPurpose
Immediate SuccessP(Correct | Mode)Maintains learner motivation and "flow"
Knowledge GainΞ”MasteryEnsures the mode is actually teaching
Efficiency1 / Time-on-TaskPenalizes unnecessarily verbose modes
PersistenceSession completion rateEncourages modes that sustain long-term engagement

15. GAMBITTS & LLM Integration

When instructional content is generated by LLMs in real time, the bandit selects an action (e.g., "provide a Socratic hint") but the treatment delivered to the learner is the stochastic output of the LLM. The GAMBITTS framework (Generator-Mediated Bandit-Thompson Sampling) explicitly models this action-treatment split.

GAMBITTS PIPELINE

Bandit Agent
  └─ Selects: Instructional mode A + prompt template P
              (e.g., "Use Socratic questioning to explain concept X")
                          β”‚
                          β–Ό
              LLM Generator (stochastic)
  └─ Produces: Specific text string Gβ‚œ
                          β”‚
                          β–Ό
              Embedding Projection
  └─ Projects: High-dim text Gβ‚œ β†’ Low-dim embedding Zβ‚œ
              (Enables bandit to detect when different outputs deliver same pedagogy)
                          β”‚
                          β–Ό
              Reward Signal
  └─ Learner response β†’ LP reward β†’ Update θₐ posterior

15.1 Architectural Optimization (IC-Cache)

System ComponentOptimization StrategyPedagogical Impact
Example SelectorCaches high-utility request-response pairs from larger modelsEnables smaller, faster models to emulate Cognitive Apprenticeship
Request RouterRoutes simple queries to small models, complex ones (Socratic) to large modelsMaintains low latency during critical "flow" states
Example ManagerContinuously refines cached examples based on learner rewardsEnsures scaffolding remains current with pedagogical best practices

15.2 Algorithmic Fairness Constraints

Diversity-Aware Exploration

If a bandit observes that a demographic subgroup has historically responded well to Direct Instruction (potentially due to prior educational disadvantage), it may permanently route those learners into a Direct Instruction loop β€” denying them access to higher-order modes like Socratic Questioning or Meta-cognitive Feedback.

The Boyle System enforces fairness constraints: a minimum exploration rate across all five instructional modes for all learner demographics. Every learner is regularly given the opportunity to succeed with more challenging, exploratory modes, regardless of initial cluster assignment. The system's decisions must not mirror existing social biases in the training data.

PART V: CROSS-SYSTEM ANALYSIS

16. RAG vs. Long-Context Window

DimensionLong-Context WindowSource-Grounded RAG (Boyle System)
Data locationEntire document in active working memorySemantic index; chunks retrieved per query
Citation precisionLow β€” reasoning is holisticHigh β€” specific passage linked to every claim
Hallucination riskHigher β€” model may blend sourcesLower β€” constrained to retrieved chunks
Audit trailDifficult β€” cannot trace specific claim to passageBuilt-in β€” inline citation to exact text
Best use caseHolistic synthesis of a single large documentPrecise retrieval across 50+ diverse sources
Regulatory suitabilityLimited β€” hard to satisfy audit requirementsStrong β€” every claim traceable

17. Grounded vs. Non-Grounded LLM Performance

MetricNon-Grounded LLMNotebookLM (Boyle System)
Hallucination rate~40%~13% (0% on specific queries)
Citation precisionLow / Variable95% in audited clinical tasks
Context windowPre-trained knowledge (static)~25 million words per notebook (dynamic)
Update frequencyRequires retraining or fine-tuningInstantaneous upon document upload
Data privacyOften shared for trainingPrivate; no sharing under enterprise agreement
Specificity of responseGeneric β€” drawn from broad pre-trainingContext-specific β€” only what has been uploaded

18. Technical Debt Registry

BD-001: Source Slot Ceiling
HIGH | Corpus
50-source limit per notebook constrains long-running projects. Mitigated by Ouroboros and stitching, but adds manual overhead and citation risk.
Recommendation: Source slot monitoring with automated alerts at 40-source threshold.
BD-002: Citation Loss on Ouroboros Conversion
CRITICAL | Corpus
Converting notes to sources strips original inline citations. Risk escalates with each cycle.
Recommendation: Mandatory metadata checklist before every conversion.
BD-003: No Native Python Execution
HIGH | Core
NotebookLM cannot run code or perform mathematical calculations. May return confident but incorrect quantitative answers.
Recommendation: Integration protocol with Vertex AI Workbench or Colab; quantitative outputs logged back to MVAL.
BD-004: Knowledge-Based Poisoning Vulnerability
HIGH | Core
Malicious or corrupted documents can bias outputs. Zero-width Unicode characters are invisible to human reviewers but readable by the AI.
Recommendation: Mandatory source validation workflow before ingestion.
BD-005: No Automated Hallucination Scoring
MEDIUM | Core
Hallucination auditing is currently manual.
Recommendation: Planned β€” passage-level verification with automated reliability score (see roadmap).
BD-006: MVAL Not Enforced at Platform Level
CRITICAL | MVAL
MVAL compliance depends entirely on researcher discipline. No hard field validation or submission gate exists. This is the structural gap most likely to undermine the system's core mission.
Recommendation: Structured intake form (Google Form β†’ auto-ingested Doc) or custom lightweight web front-end with required fields.
BD-007: MAB Cold-Start Data Dependency
MEDIUM | Adaptive
The bandit requires interaction data to personalize. New institutional deployments begin in expert-guided Phase 1 with limited personalization capability.
Recommendation: Pre-load institutional cluster priors from similar cohort profiles where available.

PART VI: OPERATIONS

19. Target Deployment Contexts

The Boyle System is designed for institutional contexts where reproducibility, knowledge transfer, and structured learning are high-value requirements. The following represent primary partnership targets.

ContextPrimary Value PropositionKey Features Used
Business School Executive Education Preserve case analysis reasoning across cohorts; structure participant documentation; reduce facilitator gap-filling MVAL (Why/Decisions field critical), Cognitive Apprenticeship mode, Handoff Notebook
Think Tanks & Policy Research Organizations Document research lineage; prevent institutional memory loss at analyst transitions; enable audit-ready citation trails Source-grounded RAG, Failure Archive, Passage-Level Citation, CRITIQ integration
Graduate & Professional Schools Replace ad hoc research documentation; shift advisor meetings from gap-filling to strategy; train reproducibility habits Full MVAL protocol, Project Charter Notebook, Pre-Meeting Brief Generation, MAB pedagogy engine
Independent & Private Schools (STEM programs) Build structured research documentation habits early; scaffold inquiry-based learning; track student progress longitudinally Scaffolding + Direct Instruction modes, MVAL simplified template, Notebook Segmentation
Applied AI Research Labs Solve the vanishing laboratory problem in cloud-native ML research; enable reproducible experiment infrastructure Environment field (MVAL), Failure Artifact Protocol, MCP integration, Python execution bridge

20. Active Deployments (Pilot)

20.1 Humanitarians AI Fellows Program

ProgramResearch DomainPrimary Boyle Use CaseStatus
AI Skunkworks (Partner University) Applied AI / Data Science Cloud pipeline documentation, inference reproducibility Live
Lyrical Literacy Music, neuroscience, language acquisition Software dev logs, neural connectivity tracking Live
Botspeak AI fluency and human-AI task delegation Strategic delegation logs, ethical boundary records Live
Fellows Program (general) Multi-domain applied AI (~150 volunteers) Onboarding documentation, project handoff infrastructure Live

20.2 Pilot Metrics

Measured Outcomes β€” Active Pilot
MetricBefore Boyle SystemAfter Boyle System
Mentor meeting time on gap-review~60%~20%
Mentor meeting time on strategic discussion~40%~80%
Onboarding time for new team membersBaselineTarget: >50% reduction
Duplicate work incidentsFrequentTarget: near zero

21. Integration & Automation

Integration MethodTechnical MechanismKey CapabilityStability
Python SDK (notebooklm-py)Browser automation via PlaywrightFull access to chat, sources, and artifacts⚠ Unofficial; brittle
MCP ServerModel Context ProtocolIntegration with Claude Desktop / Claude Code⚠ Unofficial; promising
Discovery Engine APIOfficial GCP REST EndpointsEnterprise-grade notebook managementβœ“ Official (enterprise)
Typer CLICommand-line interfaceHuman-operated automation from terminal⚠ Unofficial

22. Security & Privacy

Data ClassStandard NotebookLMWorkspace / Enterprise
Public research papers, documentationβœ“ Permittedβœ“ Permitted
Internal project charters, MVAL logs⚠ Assess riskβœ“ Permitted
Personal health records (HIPAA)βœ— Prohibited⚠ Requires BAA
Financial recordsβœ— Prohibited⚠ Assess compliance
Export-controlled data (ITAR/EAR)βœ— Prohibitedβœ— Prohibited
IRB-adjacent human subjects dataβœ— Prohibited⚠ Consult IRB first

PART VII: ROADMAP & OPEN QUESTIONS

23. Prioritized Improvements

23.1 Critical Priority

AI-001: MVAL Enforcement Mechanism
CRITICAL | Effort: 3–5 days | MVAL
Design and implement structural enforcement of MVAL field completion. Options: Google Form with required fields β†’ auto-ingested Google Doc; Markdown template with required section headers; custom lightweight web front-end.
AI-002: Ouroboros Citation Preservation Protocol
CRITICAL | Effort: 2 days | Corpus
Mandatory metadata checklist and standard template for pre-conversion documentation.

23.2 High Priority

AI-003: Python Execution Integration
HIGH | Effort: 3–5 days | Core
Define protocol for routing quantitative queries to Vertex AI Workbench or Colab. Outputs logged back to MVAL as artifacts.
AI-004: MAB Phase 1 Priors for Executive Education
HIGH | Effort: 5 days | Adaptive
Develop expert-seeded prior configurations for executive education, think tank, and graduate school cohort profiles. Reduces cold-start period for institutional deployments.
AI-005: Data Classification Governance Document
HIGH | Effort: 2 days | Partners
One-page data classification guide for all institutional partners. Cover: what can go in standard NotebookLM, what requires enterprise, what is prohibited in any cloud system.

23.3 Medium Priority

AI-006: Notebook Taxonomy Standard
MEDIUM | Effort: 1 day
Publish recommended notebook segmentation taxonomy. Standardize naming conventions across all deployments.
AI-007: MCP Server Deployment
MEDIUM | Effort: 3–5 days | Core
Configure and document MCP server integration for Claude Code / Claude Desktop. Publish configuration template.
AI-008: Fairness Audit Protocol
MEDIUM | Effort: 3 days | Adaptive
Implement monitoring to verify that minimum exploration rates across all five modes are maintained across learner demographics. Flag pigeonholing patterns before they solidify.

24. Open Questions for Partners

Architecture

  1. MVAL enforcement path β€” What is the lowest-friction mechanism for hard field validation that won't create overhead that prompts partners to circumvent it? Google Form vs. Markdown template vs. custom UI?
  2. Quantitative integration β€” What is the preferred path for quantitative tasks: Vertex AI Workbench sidebar, a Colab integration, or a separate notebook layer that feeds outputs back to MVAL as artifacts?
  3. MAB deployment scope β€” Is the full five-mode bandit appropriate for executive education contexts, or is a simplified two-mode system (Direct vs. Socratic) more appropriate for program-level adoption?

Institutional Deployment

  1. Enterprise vs. Workspace β€” For graduate school and think tank partners, does Google Workspace for Education / Workspace for Organizations satisfy data protection requirements, or does full GCP Enterprise become necessary?
  2. EU AI Act compliance β€” The EU AI Act becomes fully applicable August 2026. For European institutional partners, does the current regional deployment model (EU multi-region via Discovery Engine) satisfy governance documentation requirements?
  3. Executive education MVAL variant β€” The standard MVAL includes an Environment field designed for cloud computing contexts. What is the appropriate adaptation of this field for executive education or policy research contexts where "environment" means organizational and analytical context rather than compute infrastructure?

Measurement

  1. Pilot instrumentation β€” The target metrics are: gap-review <20%, onboarding reduction >50%, duplicate work near zero. How are these currently measured, and what is the instrumentation plan for formal partner deployments?
  2. MAB reward calibration β€” How should the composite reward function be weighted differently for executive education (where persistence and engagement may outweigh raw mastery gain) versus research training (where knowledge gain is paramount)?
  3. CRITIQ integration β€” Could CRITIQ's peer review protocol run against MVAL entries as a structured critic layer, automatically flagging statistical integrity issues or reproducibility gaps?

25. Future Feature Roadmap

FeatureMechanismImpactStatus
Passage-Level Verification Block outputs lacking direct cited evidence Eliminates interpretive overreach and drift Planned
Hallucination Detector Post-hoc corpus auditing with reliability score Quantitative documentation quality metric per entry Planned
Full MAB Engine (5 Modes) Thompson Sampling + CMAB + IC-Cache Real-time personalized instructional mode selection Planned
GAMBITTS Integration LLM treatment embedding + bandit policy learning Robust learning despite stochastic LLM output Planned
MVAL Web Interface Required-field form β†’ auto-ingests to notebook Structural enforcement of documentation standard Planned
CRITIQ Γ— Boyle Integration Peer review protocol applied to MVAL entries Automated statistical integrity flagging Planned
Executive Education MVAL Variant Adapted field definitions for non-technical contexts Extends Boyle System to biz school and policy contexts Planned
OPT / Visa-Transition Handoff Template MVAL variant optimized for personnel transition documentation Preserves institutional knowledge across team changes Planned
Diagram Generation Multimodal visualization of experimental setups Improves legibility of complex workflows Planned

25.1 Ecosystem Tool Integration

ToolFunctionBoyle Integration
CRITIQPeer review: manuscript evaluation, statistical integrityPlanned
SOCRITSocratic prompt evaluation (Paul-Elder framework)Planned
PopperAssertion verification β€” flags factual claims for reviewPlanned
Bookie the BookmakerChapter drafting for domain-specific textbooksPlanned
Eddy the EditorArticle review: structure, line edit, SEO, publish strategyPlanned
Medhavi PlatformAI-assisted textbook delivery and student documentationRoadmap TBD