PART I: SYSTEM OVERVIEW

1. Problem Statement & Motivation

A client flags a number on a dashboard built six months ago. The data is still there. The pipeline is still running. The dashboard is still live. But the analyst who built it is gone. The metric definition was never written down. Is the number wrong? Nobody knows. Nobody can know. This is not a rare disaster. It is Tuesday.

Modern AI research occurs within virtualized, elastic cloud environments engineered for rapid instantiation and immediate abandonment. This architecture facilitates the "vanishing laboratory" — where the intricate web of dependencies, library versions, hardware configurations, and environmental variables that produced a result evaporates the moment a virtual machine is decommissioned.

The Vanishing Laboratory

Virtual machines dissolve. Library versions, dataset checksums, and hardware configurations evaporate. The result survives. The conditions do not.

The Documentation Gap

Critical decisions happen in undocumented threads, ephemeral terminal sessions, and local notebooks never committed to a repository. The "why" disappears with each personnel transition.

ℹ️ Core Insight The reproducibility crisis in machine learning is not primarily a problem of statistical methodology. It is a problem of vanishing laboratories. The Boyle System is a structural intervention — making the right behavior the natural one, not the effortful one.

2. Historical Foundations

Robert Boyle (1627–1691) understood that for an experiment to be scientifically valid, it had to be verifiable by others. Because the physical laboratory was private, Boyle developed a style of reporting so detailed that readers could become "virtual witnesses." The Boyle System applies this same philosophy to cloud credentials, API keys, library versions, and instructional design choices.

Documentation Dimension	Aristotelian (Pre-Boyle)	Boyle's Empirical Approach	The Boyle System (Modern)
Primary Methodology	Abstract logic and reasoning	Observation and experimentation	Grounded AI synthesis via RAG
Documentation Depth	Minimal; focused on final truths	Extensive; focused on conditions	Mandatory MVAL fields (all six)
Role of Failure	Ignored as an error of logic	Recorded as essential data	Logged as a first-class artifact
Verification Mechanism	Internal consistency of argument	"Virtual witnessing" via narrative	Citation-backed source grounding
Social Structure	Individual philosopher	Royal Society "matter of fact"	Collaborative AI research labs & classrooms

3. System Architecture

3.1 Technical Core: Retrieval-Augmented Generation

The Boyle System is powered by NotebookLM's Source-Grounded RAG pipeline. Unlike standard LLMs that generate from pre-trained patterns, the system can only "know" what has been uploaded to its corpus — its limitation is its superpower.

┌─────────────────────────────────────────────────────────────────────┐
│                        RESEARCHER / LEARNER INPUT                    │
│   Project Charter · Degree Requirements · Boyle Principles ·        │
│   MVAL Entries · Cloud Configs · Failed Experiment Logs              │
└───────────────────────────────┬─────────────────────────────────────┘
                                │ Upload / Ingest
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     NOTEBOOKLM CORPUS (RAG)                          │
│  ┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐   │
│  │ Document        │   │ Gemini Embedding │   │ Vector Index    │   │
│  │ Ingestion       │──▶│ Model           │──▶│ (Nearest        │   │
│  │ (Chunking)      │   │ (Vectorization) │   │  Neighbor)      │   │
│  └─────────────────┘   └─────────────────┘   └────────┬────────┘   │
└────────────────────────────────────────────────────────┼────────────┘
                                                         │ Cosine Similarity Retrieval
                                                         ▼
┌─────────────────────────────────────────────────────────────────────┐
│               THREE-ROLE AI PARTNER + ADAPTIVE INSTRUCTOR            │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────────────┐ │
│  │  TUTOR   │  │  CRITIC  │  │  GUIDE   │  │  MAB PEDAGOGY      │ │
│  │ Context- │  │Challenges│  │  Cloud   │  │  ENGINE (5 Modes)  │ │
│  │ aware    │  │  vague   │  │  infra   │  │  Socratic·Scaffold  │ │
│  │ guidance │  │  entries │  │  logging │  │  Direct·Apprentice  │ │
│  └──────────┘  └──────────┘  └──────────┘  │  Metacognitive     │ │
│                                              └────────────────────┘ │
└───────────────────────────────┬─────────────────────────────────────┘
                                │ Cited, Grounded, Personalized Response
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         MVAL LOG ENTRY                               │
│         What · Why · How · Environment · Results · Questions         │
└─────────────────────────────────────────────────────────────────────┘

3.2 Platform Capacity

Resource	Limit	Notes
Notebooks per account	100	Segment by project / domain / cohort
Sources per notebook	50	Managed via Ouroboros + stitching strategies
Words per source	500,000	Maximized via source stitching
Total corpus per notebook	~25 million words	Equivalent to ~25 large technical monographs
Context window (Gemini 1.5 Pro)	1M tokens	Near-perfect recall (>99.7%) up to this limit

4. Three-Role AI Partnership

🎓 Role 1: Tutor

Function: Context-aware documentation guidance grounded in the researcher's or learner's actual project charter, degree requirements, and institutional standards.

Example: A researcher asks how to document a Python web-scraping project. A generic AI returns README advice. The Boyle System returns guidance specific to the team's standards, citing page references from the uploaded project charter and compliance requirements from the institutional protocol document.

Key behavior: Cannot give generic advice — has no generic context to draw from.

🔍 Role 2: Critic

Function: Continuous audit of log entries. Surfaces vague outcomes, implicit assumptions, and missing failure records.

Example prompts generated by the critic:

"This is an outcome, not a method. How did you get here?"
"What failed before this worked?"
"What assumptions are implicit here that the next researcher won't know?"

Key behavior: Combats "interpretive drift" — the gradual transformation of nuanced observations into unsupported factual declarations.

⚙️ Role 3: Operational Guide

Function: Treats cloud credentials, API keys, library versions, and environment variables as first-class research artifacts integrated into every log entry.

Key behavior: Transforms administrative overhead into a reproducible infrastructure artifact — the "matter of fact" of the cloud laboratory.

PART II: MVAL PROTOCOL

5. Minimum Viable Analytical Log

ℹ️ MVAL is not a form to fill out after the work is done. It is the structure through which the work gets done. Every log entry within the Boyle System must address all six fields before the entry is considered complete.

6. Field Specifications

WHAT

The specific task or experiment attempted. Must describe the operational goal in granular detail. Avoid: "worked on pipeline." Require: "Implemented retry logic for the ATS scraper to handle 429 rate-limit responses."

WHY

The underlying reasoning for the chosen approach, including alternatives considered and rejected. This is the field most often lost during personnel transitions. It captures institutional logic behind a decision.

HOW

Precise methodology: code logic, data transformations, API calls, tool configurations, and exact steps. Should be reproducible from this field alone. The "virtual witnessing" passage.

ENVIRONMENT

Runtime configuration, library versions, cloud infrastructure, credentials used (names/roles — never raw keys), dataset identifiers and checksums. The cloud laboratory must be rebuildable from this field.

RESULTS

Actual outcome — including failures. A failed pipeline is logged with the same rigor as a successful result. Include error messages verbatim, stack traces, and unexpected outputs. Failures are not mistakes; they are data.

QUESTIONS

A record of uncertainties, open hypotheses, and follow-up threads generated by this session. Prevents the closure illusion — the false sense that a completed task means all related questions are resolved.

7. Failure Artifact Protocol

Event Type	MVAL Treatment	Required Fields
Successful pipeline run	Standard MVAL entry	All six fields
Failed pipeline run	Standard MVAL entry — identical rigor	All six + error verbatim in Results
Partial / ambiguous result	Standard MVAL with explicit uncertainty	All six + explicit uncertainty in Questions
Abandoned approach	MVAL entry logging the rejection reasoning	Why (critical) + Results (why stopped)
Undocumented prior decision	Retroactive MVAL reconstruction	Why + How + note that this is reconstructed

PART III: CORPUS MANAGEMENT

8. Source Ingestion & Format Performance

Format	Retrieval Quality	Technical Consideration	Recommendation
Markdown / Plain Text	■■■■■ Highest	No layout noise; ideal for RAG chunking	Primary target format
Google Docs / Word	■■■■ High	Structured formatting facilitates parsing	Acceptable; export to Markdown if possible
Text-Based PDF	■■■ Strong	Multi-column layouts may cause chunking errors	Use; convert to Markdown for critical sources
Scanned PDF	■■ Mixed	Sensitive to scan resolution and lighting	Apply OCR preprocessing before ingestion
Handwritten Notes (OCR)	■ Variable	Cursive notation reduces reliability	Hybrid pipeline: OCR + Gemini self-correction
Audio (MP3 Overview)	■■ High abstraction	Multi-modal, conversational perspective	Track lineage; avoid multi-generation re-upload
Website URLs	■■■ Variable	Dynamic content may not index correctly	Prefer static pages; exclude dynamic URL patterns

9. The Ouroboros Technique

OUROBOROS WORKFLOW

Research Session 1–N
      │
      ▼
Accumulated MVAL Entries + AI Responses
      │
      ▼ (Select notes in NotebookLM UI)
"Convert to Source" → New Dense Source Document
      │
      ├─── ✓ Delete original bulky source files (free slots)
      │
      └─── ⚠️  REQUIRED before conversion:
                Manually embed key metadata:
                - Original citation references
                - Author / date / document title
                - Source page numbers
                (Conversion strips inline citations)

🚨 BD-002: Citation Loss on Ouroboros Conversion Converting notes to sources strips original inline citations. Mandate: manually embed original citation metadata before every conversion.

10. Source Stitching

Strategy	Mechanism	Benefit	Risk
Source Stitching	Combining multiple PDFs into one file	Bypasses the 50-source count limit	Slightly slower specific passage retrieval
Ouroboros (Note → Source)	Converting AI-generated notes into a new source	Distills knowledge and clears source slots	Loss of inline citations if metadata not preserved
Audio as Source	Re-uploading Audio Overview MP3s	Multi-modal perspective	Creeping errors across generational summaries
Metadata Tagging	Including authors/titles in the text flow	Improves citation accuracy and retrieval	Manual overhead in document preparation
Notebook Segmentation	Splitting corpus by content type	64% retrieval improvement (benchmarked)	Requires disciplined categorization at ingestion

11. Notebook Segmentation Strategy

Notebook Type	Recommended Contents	Rationale
Project Charter Notebook	Charters, standards, institutional protocols	Grounds the Tutor role; isolated from research data
Active Research Notebook	MVAL entries, experiment logs, pipeline docs	Primary working notebook; updated continuously
Literature Notebook	Academic papers, stitched research surveys	Separates authoritative external sources from internal logs
Handoff Notebook	Distilled MVAL summaries, onboarding guides	Designed for personnel transition; new-reader optimized
Failure Archive	Failed experiment logs, dead-end documentation	Searchable record prevents duplicate negative work

PART IV: ADAPTIVE INSTRUCTIONAL ARCHITECTURE

12. The Five Learning Modes

The Boyle System integrates five evidence-based instructional theories as discrete, selectable pedagogical modes. Each mode is calibrated to a specific learner state, cognitive load condition, and desired learning outcome. Together they address the Assistance Dilemma: providing enough support to facilitate progress without inducing reliance that undermines long-term retention.

ℹ️ The Assistance Dilemma Too much assistance → high immediate success but shallow cognitive structures. Too little assistance → impasse-driven learning only if the learner has sufficient self-regulation; otherwise, disengagement. The five-mode system navigates this dynamically.

💬 Mode 1: Socratic Questioning

Operational definition: Iterative probing using leading questions and progressive hints that elicit latent knowledge from the learner rather than delivering information directly.

Theoretical basis: Active retrieval and schema refinement. Knowledge elicited is retained longer than knowledge delivered.

Best for: Learners with adequate foundational schemas; integrative or synthesis tasks; executive education case discussions.

Caution: Can induce frustration and cognitive overload when foundational schemas are absent. The bandit engine detects this via rising response latency without accuracy gains.

🏗️ Mode 2: Scaffolding

Operational definition: Reducing degrees of freedom by removing distractors, pre-filling procedural steps, or providing structured templates that allow the learner to focus on the core knowledge component.

Theoretical basis: Vygotsky's Zone of Proximal Development (ZPD) — the system maintains the learner at the edge of their capability without exceeding it.

Best for: High cognitive load conditions; new procedural skills; onboarding scenarios in executive education.

Caution: Expertise Reversal Effect — once mastery is achieved, continued scaffolding actively impedes fluency. The bandit detects this transition and reduces scaffold weight.

📋 Mode 3: Direct Instruction

Operational definition: Explicit delivery of facts, definitions, or procedural rules. No elicitation; information is provided directly and efficiently.

Theoretical basis: Cognitive Load Theory — minimizes extraneous cognitive load when the learner lacks prerequisite schemas, enabling rapid acquisition of new Knowledge Components (KCs).

Best for: Prerequisite concept introduction; low-energy or high-stress learner states; situations where exploratory modes would cause disengagement.

Caution: Risk of passive dependency if used exclusively. The system enforces a minimum exploration rate across all other modes (fairness constraint).

🔬 Mode 4: Cognitive Apprenticeship

Operational definition: Modeling expert processes via worked examples, "think-aloud" demonstrations, or "first letter" hints that reveal the structure of expert reasoning without completing the task for the learner.

Theoretical basis: Observational learning and expert visualization. Learners acquire procedural fluency and strategy adoption by watching expert processes made visible.

Best for: Complex multi-step procedures; professional practice domains (consulting, research methodology, case analysis); think tank workflows.

Caution: High LLM generation cost. The IC-Cache optimization routes apprenticeship requests to cached high-quality examples where possible.

🧠 Mode 5: Meta-cognitive Feedback

Operational definition: Prompts for reflection, strategy evaluation, and self-monitoring. Rather than providing content, the system asks the learner to evaluate their own approach, predict their performance, or identify their gaps.

Theoretical basis: Self-Regulated Learning (SRL) theory. Learners who can monitor and regulate their own cognition perform significantly better on transfer tasks.

Best for: Advanced learners approaching mastery; post-task review; program-level reflection in executive education; doctoral and research training contexts.

Caution: Ineffective for novices who lack the foundational knowledge to evaluate their own performance accurately.

12.1 Mode Selection Summary

Mode	Theoretical Basis	Optimal Learner State	Primary Risk
Socratic Questioning	Active retrieval, schema refinement	Moderate–high prior knowledge	Frustration if schemas absent
Scaffolding	Zone of Proximal Development	Low–moderate; high cognitive load	Expertise Reversal Effect
Direct Instruction	Cognitive Load Theory	Novice; low energy; high stress	Passive dependency
Cognitive Apprenticeship	Observational learning	Intermediate; procedural tasks	High generation cost
Meta-cognitive Feedback	Self-Regulated Learning	Advanced; near or post-mastery	Ineffective for novices

13. Multi-Armed Bandit Architecture

Each of the five instructional modes is treated as a discrete "arm" of a Multi-Armed Bandit (MAB). The bandit engine selects which mode to apply at each instructional moment, balancing exploration (trying modes with uncertain effectiveness for this learner) against exploitation (using the mode currently estimated to be most effective).

13.1 Thompson Sampling

Bayesian Mode Selection

For each instructional mode a ∈ {1,...,5}, the system maintains a belief state modeled as a Beta distribution Beta(αₐ, βₐ) for binary rewards, or a Gaussian distribution N(μₐ, σₐ²) for continuous learning progress metrics.

Thompson Sampling draws a sample from each mode's posterior and selects the mode with the highest sample. This naturally produces high exploration early in a session (when uncertainty is high) and converges on the most effective personalized strategy as evidence accumulates.

13.2 Contextual Bandit: The Boyle Context Vector

A context-free bandit cannot achieve true personalization. The Contextual MAB (CMAB) incorporates a feature vector xₜ representing the learner's current state:

E[rₜ | xₜ, a] = xₜᵀ θₐ

Where:
  xₜ  = learner context vector at time t
  θₐ  = learned weight vector for instructional mode a
  rₜ  = expected reward (learning progress)

Feature Category	Features Included	Role in Bandit
Surface-Level (Stable)	Baseline education level, prior academic performance, domain background	Sets initial priors; "warm start" for new learners
Deep-Level (Dynamic)	Current Knowledge Component mastery, error distributions, response latency	Primary signal for real-time mode switching
Affective State	Estimated mood, energy level, stress indicators	Temporarily biases toward lower-load modes (Direct, Scaffolding)
Knowledge Tracing (DKT/BKT)	Mastery probability per skill from sequence of prior responses	Detects Expertise Reversal; triggers mode drift

ℹ️ Response Latency as Cognitive Load Proxy An increase in response latency without a corresponding increase in accuracy is a high-fidelity signal that the current instructional mode is failing to provide adequate support. The bandit treats this pattern as a negative reward signal and shifts toward more structured modes.

13.3 Three Implementation Phases

Phase 1: Expert-Guided Initialization (Cold Start)

Before the system has enough data to personalize, it uses expert knowledge to seed the bandit's priors. Direct Instruction is the default mode for prerequisite concepts; Socratic Questioning is prioritized for integrative tasks. This warm-start mechanism prevents detrimental random exploration in early sessions.

Phase 2: Online Adaptation and Clustering

As learners interact with the system, the bandit refines its models. Local Clustering in Bandits (LOCB) groups learners by preference parameters θₐ. New learners whose initial behavior matches an existing cluster inherit that cluster's learned policy — dramatically accelerating personalization without requiring extended individual observation.

This collaborative filtering approach scales intelligence across entire cohorts in executive education programs and research training environments.

Phase 3: Non-Stationary Drift (Expertise Reversal Management)

Learning is inherently non-stationary. Sliding Window UCB or Discounted Thompson Sampling gives more weight to recent observations. As deep-level features indicate higher competence, rewards for Direct Instruction and heavy Scaffolding naturally decline, while rewards for Socratic Questioning and Meta-cognitive feedback increase. The bandit policy drifts with the learner — a seamless transition from guided structure to open-ended exploration.

14. Reward Modeling

14.1 Learning Progress as Reward Signal

Simple correctness rewards create a perverse incentive: the bandit maximizes help to guarantee "success," producing over-assistance. The Boyle System uses Learning Progress (LP) as its primary reward signal:

r = cᵢ(t) - cᵢ(t-1)

Where cᵢ(t) = probability of mastery for Knowledge Component i at time t

If a learner already knows a concept: cᵢ(t) - cᵢ(t-1) ≈ 0
→ Bandit shifts to more challenging content or Meta-cognitive mode

If progress is rapid: reward is high
→ Bandit reinforces the current instructional mode

14.2 Composite Reward Function

Reward Component	Metric	Purpose
Immediate Success	P(Correct \| Mode)	Maintains learner motivation and "flow"
Knowledge Gain	ΔMastery	Ensures the mode is actually teaching
Efficiency	1 / Time-on-Task	Penalizes unnecessarily verbose modes
Persistence	Session completion rate	Encourages modes that sustain long-term engagement

15. GAMBITTS & LLM Integration

When instructional content is generated by LLMs in real time, the bandit selects an action (e.g., "provide a Socratic hint") but the treatment delivered to the learner is the stochastic output of the LLM. The GAMBITTS framework (Generator-Mediated Bandit-Thompson Sampling) explicitly models this action-treatment split.

GAMBITTS PIPELINE

Bandit Agent
  └─ Selects: Instructional mode A + prompt template P
              (e.g., "Use Socratic questioning to explain concept X")
                          │
                          ▼
              LLM Generator (stochastic)
  └─ Produces: Specific text string Gₜ
                          │
                          ▼
              Embedding Projection
  └─ Projects: High-dim text Gₜ → Low-dim embedding Zₜ
              (Enables bandit to detect when different outputs deliver same pedagogy)
                          │
                          ▼
              Reward Signal
  └─ Learner response → LP reward → Update θₐ posterior

15.1 Architectural Optimization (IC-Cache)

System Component	Optimization Strategy	Pedagogical Impact
Example Selector	Caches high-utility request-response pairs from larger models	Enables smaller, faster models to emulate Cognitive Apprenticeship
Request Router	Routes simple queries to small models, complex ones (Socratic) to large models	Maintains low latency during critical "flow" states
Example Manager	Continuously refines cached examples based on learner rewards	Ensures scaffolding remains current with pedagogical best practices

15.2 Algorithmic Fairness Constraints

Diversity-Aware Exploration

If a bandit observes that a demographic subgroup has historically responded well to Direct Instruction (potentially due to prior educational disadvantage), it may permanently route those learners into a Direct Instruction loop — denying them access to higher-order modes like Socratic Questioning or Meta-cognitive Feedback.

The Boyle System enforces fairness constraints: a minimum exploration rate across all five instructional modes for all learner demographics. Every learner is regularly given the opportunity to succeed with more challenging, exploratory modes, regardless of initial cluster assignment. The system's decisions must not mirror existing social biases in the training data.

PART V: CROSS-SYSTEM ANALYSIS

16. RAG vs. Long-Context Window

Dimension	Long-Context Window	Source-Grounded RAG (Boyle System)
Data location	Entire document in active working memory	Semantic index; chunks retrieved per query
Citation precision	Low — reasoning is holistic	High — specific passage linked to every claim
Hallucination risk	Higher — model may blend sources	Lower — constrained to retrieved chunks
Audit trail	Difficult — cannot trace specific claim to passage	Built-in — inline citation to exact text
Best use case	Holistic synthesis of a single large document	Precise retrieval across 50+ diverse sources
Regulatory suitability	Limited — hard to satisfy audit requirements	Strong — every claim traceable

17. Grounded vs. Non-Grounded LLM Performance

Metric	Non-Grounded LLM	NotebookLM (Boyle System)
Hallucination rate	~40%	~13% (0% on specific queries)
Citation precision	Low / Variable	95% in audited clinical tasks
Context window	Pre-trained knowledge (static)	~25 million words per notebook (dynamic)
Update frequency	Requires retraining or fine-tuning	Instantaneous upon document upload
Data privacy	Often shared for training	Private; no sharing under enterprise agreement
Specificity of response	Generic — drawn from broad pre-training	Context-specific — only what has been uploaded

18. Technical Debt Registry

BD-001: Source Slot Ceiling
HIGH | Corpus
50-source limit per notebook constrains long-running projects. Mitigated by Ouroboros and stitching, but adds manual overhead and citation risk.
Recommendation: Source slot monitoring with automated alerts at 40-source threshold.

BD-002: Citation Loss on Ouroboros Conversion
CRITICAL | Corpus
Converting notes to sources strips original inline citations. Risk escalates with each cycle.
Recommendation: Mandatory metadata checklist before every conversion.

BD-003: No Native Python Execution
HIGH | Core
NotebookLM cannot run code or perform mathematical calculations. May return confident but incorrect quantitative answers.
Recommendation: Integration protocol with Vertex AI Workbench or Colab; quantitative outputs logged back to MVAL.

BD-004: Knowledge-Based Poisoning Vulnerability
HIGH | Core
Malicious or corrupted documents can bias outputs. Zero-width Unicode characters are invisible to human reviewers but readable by the AI.
Recommendation: Mandatory source validation workflow before ingestion.

BD-005: No Automated Hallucination Scoring
MEDIUM | Core
Hallucination auditing is currently manual.
Recommendation: Planned — passage-level verification with automated reliability score (see roadmap).

BD-006: MVAL Not Enforced at Platform Level
CRITICAL | MVAL
MVAL compliance depends entirely on researcher discipline. No hard field validation or submission gate exists. This is the structural gap most likely to undermine the system's core mission.
Recommendation: Structured intake form (Google Form → auto-ingested Doc) or custom lightweight web front-end with required fields.

BD-007: MAB Cold-Start Data Dependency
MEDIUM | Adaptive
The bandit requires interaction data to personalize. New institutional deployments begin in expert-guided Phase 1 with limited personalization capability.
Recommendation: Pre-load institutional cluster priors from similar cohort profiles where available.

PART VI: OPERATIONS

19. Target Deployment Contexts

The Boyle System is designed for institutional contexts where reproducibility, knowledge transfer, and structured learning are high-value requirements. The following represent primary partnership targets.

Context	Primary Value Proposition	Key Features Used
Business School Executive Education	Preserve case analysis reasoning across cohorts; structure participant documentation; reduce facilitator gap-filling	MVAL (Why/Decisions field critical), Cognitive Apprenticeship mode, Handoff Notebook
Think Tanks & Policy Research Organizations	Document research lineage; prevent institutional memory loss at analyst transitions; enable audit-ready citation trails	Source-grounded RAG, Failure Archive, Passage-Level Citation, CRITIQ integration
Graduate & Professional Schools	Replace ad hoc research documentation; shift advisor meetings from gap-filling to strategy; train reproducibility habits	Full MVAL protocol, Project Charter Notebook, Pre-Meeting Brief Generation, MAB pedagogy engine
Independent & Private Schools (STEM programs)	Build structured research documentation habits early; scaffold inquiry-based learning; track student progress longitudinally	Scaffolding + Direct Instruction modes, MVAL simplified template, Notebook Segmentation
Applied AI Research Labs	Solve the vanishing laboratory problem in cloud-native ML research; enable reproducible experiment infrastructure	Environment field (MVAL), Failure Artifact Protocol, MCP integration, Python execution bridge

20. Active Deployments (Pilot)

20.1 Humanitarians AI Fellows Program

Program	Research Domain	Primary Boyle Use Case	Status
AI Skunkworks (Partner University)	Applied AI / Data Science	Cloud pipeline documentation, inference reproducibility	Live
Lyrical Literacy	Music, neuroscience, language acquisition	Software dev logs, neural connectivity tracking	Live
Botspeak	AI fluency and human-AI task delegation	Strategic delegation logs, ethical boundary records	Live
Fellows Program (general)	Multi-domain applied AI (~150 volunteers)	Onboarding documentation, project handoff infrastructure	Live

20.2 Pilot Metrics

Measured Outcomes — Active Pilot

Metric	Before Boyle System	After Boyle System
Mentor meeting time on gap-review	~60%	~20%
Mentor meeting time on strategic discussion	~40%	~80%
Onboarding time for new team members	Baseline	Target: >50% reduction
Duplicate work incidents	Frequent	Target: near zero

21. Integration & Automation

Integration Method	Technical Mechanism	Key Capability	Stability
Python SDK (notebooklm-py)	Browser automation via Playwright	Full access to chat, sources, and artifacts	⚠ Unofficial; brittle
MCP Server	Model Context Protocol	Integration with Claude Desktop / Claude Code	⚠ Unofficial; promising
Discovery Engine API	Official GCP REST Endpoints	Enterprise-grade notebook management	✓ Official (enterprise)
Typer CLI	Command-line interface	Human-operated automation from terminal	⚠ Unofficial

22. Security & Privacy

Data Class	Standard NotebookLM	Workspace / Enterprise
Public research papers, documentation	✓ Permitted	✓ Permitted
Internal project charters, MVAL logs	⚠ Assess risk	✓ Permitted
Personal health records (HIPAA)	✗ Prohibited	⚠ Requires BAA
Financial records	✗ Prohibited	⚠ Assess compliance
Export-controlled data (ITAR/EAR)	✗ Prohibited	✗ Prohibited
IRB-adjacent human subjects data	✗ Prohibited	⚠ Consult IRB first

PART VII: ROADMAP & OPEN QUESTIONS

23. Prioritized Improvements

23.1 Critical Priority

AI-001: MVAL Enforcement Mechanism
CRITICAL | Effort: 3–5 days | MVAL
Design and implement structural enforcement of MVAL field completion. Options: Google Form with required fields → auto-ingested Google Doc; Markdown template with required section headers; custom lightweight web front-end.

AI-002: Ouroboros Citation Preservation Protocol
CRITICAL | Effort: 2 days | Corpus
Mandatory metadata checklist and standard template for pre-conversion documentation.

23.2 High Priority

AI-003: Python Execution Integration
HIGH | Effort: 3–5 days | Core
Define protocol for routing quantitative queries to Vertex AI Workbench or Colab. Outputs logged back to MVAL as artifacts.

AI-004: MAB Phase 1 Priors for Executive Education
HIGH | Effort: 5 days | Adaptive
Develop expert-seeded prior configurations for executive education, think tank, and graduate school cohort profiles. Reduces cold-start period for institutional deployments.

AI-005: Data Classification Governance Document
HIGH | Effort: 2 days | Partners
One-page data classification guide for all institutional partners. Cover: what can go in standard NotebookLM, what requires enterprise, what is prohibited in any cloud system.

23.3 Medium Priority

AI-006: Notebook Taxonomy Standard
MEDIUM | Effort: 1 day
Publish recommended notebook segmentation taxonomy. Standardize naming conventions across all deployments.

AI-007: MCP Server Deployment
MEDIUM | Effort: 3–5 days | Core
Configure and document MCP server integration for Claude Code / Claude Desktop. Publish configuration template.

AI-008: Fairness Audit Protocol
MEDIUM | Effort: 3 days | Adaptive
Implement monitoring to verify that minimum exploration rates across all five modes are maintained across learner demographics. Flag pigeonholing patterns before they solidify.

24. Open Questions for Partners

Architecture

MVAL enforcement path — What is the lowest-friction mechanism for hard field validation that won't create overhead that prompts partners to circumvent it? Google Form vs. Markdown template vs. custom UI?
Quantitative integration — What is the preferred path for quantitative tasks: Vertex AI Workbench sidebar, a Colab integration, or a separate notebook layer that feeds outputs back to MVAL as artifacts?
MAB deployment scope — Is the full five-mode bandit appropriate for executive education contexts, or is a simplified two-mode system (Direct vs. Socratic) more appropriate for program-level adoption?

Institutional Deployment

Enterprise vs. Workspace — For graduate school and think tank partners, does Google Workspace for Education / Workspace for Organizations satisfy data protection requirements, or does full GCP Enterprise become necessary?
EU AI Act compliance — The EU AI Act becomes fully applicable August 2026. For European institutional partners, does the current regional deployment model (EU multi-region via Discovery Engine) satisfy governance documentation requirements?
Executive education MVAL variant — The standard MVAL includes an Environment field designed for cloud computing contexts. What is the appropriate adaptation of this field for executive education or policy research contexts where "environment" means organizational and analytical context rather than compute infrastructure?

Measurement

Pilot instrumentation — The target metrics are: gap-review <20%, onboarding reduction >50%, duplicate work near zero. How are these currently measured, and what is the instrumentation plan for formal partner deployments?
MAB reward calibration — How should the composite reward function be weighted differently for executive education (where persistence and engagement may outweigh raw mastery gain) versus research training (where knowledge gain is paramount)?
CRITIQ integration — Could CRITIQ's peer review protocol run against MVAL entries as a structured critic layer, automatically flagging statistical integrity issues or reproducibility gaps?

25. Future Feature Roadmap

Feature	Mechanism	Impact	Status
Passage-Level Verification	Block outputs lacking direct cited evidence	Eliminates interpretive overreach and drift	Planned
Hallucination Detector	Post-hoc corpus auditing with reliability score	Quantitative documentation quality metric per entry	Planned
Full MAB Engine (5 Modes)	Thompson Sampling + CMAB + IC-Cache	Real-time personalized instructional mode selection	Planned
GAMBITTS Integration	LLM treatment embedding + bandit policy learning	Robust learning despite stochastic LLM output	Planned
MVAL Web Interface	Required-field form → auto-ingests to notebook	Structural enforcement of documentation standard	Planned
CRITIQ × Boyle Integration	Peer review protocol applied to MVAL entries	Automated statistical integrity flagging	Planned
Executive Education MVAL Variant	Adapted field definitions for non-technical contexts	Extends Boyle System to biz school and policy contexts	Planned
OPT / Visa-Transition Handoff Template	MVAL variant optimized for personnel transition documentation	Preserves institutional knowledge across team changes	Planned
Diagram Generation	Multimodal visualization of experimental setups	Improves legibility of complex workflows	Planned

25.1 Ecosystem Tool Integration

Tool	Function	Boyle Integration
CRITIQ	Peer review: manuscript evaluation, statistical integrity	Planned
SOCRIT	Socratic prompt evaluation (Paul-Elder framework)	Planned
Popper	Assertion verification — flags factual claims for review	Planned
Bookie the Bookmaker	Chapter drafting for domain-specific textbooks	Planned
Eddy the Editor	Article review: structure, line edit, SEO, publish strategy	Planned
Medhavi Platform	AI-assisted textbook delivery and student documentation	Roadmap TBD

THE BOYLE SYSTEM

Table of Contents