Publiclear Simulation Studio Implementation Spec

Single handoff document for frontend, backend, and realtime infrastructure. Covers screen flow, data contracts, streaming events, and acceptance tests.

Updated: February 22, 2026 (Asia/Kolkata)

OpenAPI Contracts TypeScript Interfaces Realtime Streaming Acceptance Tests

0) Spec Index

Product Architecture
Screen Map and Route Flow
Ingestion State Machine
Personality Calibration Schema
Voice Training Contract
Avatar Generation Contract
Core REST API Endpoints
OpenAPI Snippets
TypeScript Shared Interfaces
Realtime Streaming Events
Frontend Realtime Handling Rules
Acceptance Test Matrix
Performance and SLO Targets
Security and Compliance Controls

1) Product Architecture

Frontend

React + TypeScript SPA.
Zustand or equivalent store for multi-step studio state.
WebRTC/WebSocket realtime transport.
UI modules: Demo, Studio, Portal, Admin/Owner controls.

Backend

API service for CRUD and auth.
Worker queue for ingestion and model jobs.
Vector database for memory retrieval.
Object storage for audio/photo/docs.

Core Services

simulation-service: workspace lifecycle, persona, relationships, access links.
ingestion-service: parse docs/chats, chunk, embed, index.
voice-service: quality checks, training orchestration, speaker profiles.
avatar-service: photo validation, render jobs, output manifests.
realtime-orchestrator: turn execution, RAG, LLM response, TTS, visemes.

2) Screen Map and Route Flow

Screen	Route	Goal	Primary UI	API Calls	Done Condition
S0	`/studio/new`	Create workspace	Name, language, consent checkbox	`POST /v1/simulations`	`simulation_id` exists
S1	`/studio/:id/ingest`	Add source memories	Upload queue, stage progress, retry	`POST /v1/uploads/presign`, `POST /v1/simulations/:id/ingestions`, `GET /v1/jobs/:job_id/events`	At least one ingestion complete
S2	`/studio/:id/personality`	Calibrate personality	Sliders, values, catchphrases, taboo topics	`PUT /v1/simulations/:id/persona`	Persona validation pass
S3	`/studio/:id/relationships`	Relationship behavior map	Relation cards and tone overrides	`PUT /v1/simulations/:id/relationships`	One relation profile created
S4	`/studio/:id/voice`	Train voice model	Recorder, noise checks, segment quality report	`POST /v1/simulations/:id/voice/samples`, `POST /v1/simulations/:id/voice/train`	Voice status `ready`
S5	`/studio/:id/avatar`	Generate avatar	Photo upload, quality checks, preview	`POST /v1/simulations/:id/avatar/upload`, `POST /v1/simulations/:id/avatar/render`	Avatar status `ready`
S6	`/studio/:id/preview`	End-to-end test	Realtime chat + avatar + timeline	`POST /v1/realtime/sessions`	Audio, text, viseme streams healthy
S7	`/studio/:id/publish`	Go live	Access rules, share link, slug setup	`POST /v1/simulations/:id/publish`, `POST /v1/simulations/:id/access-links`	Portal link active
S8	`/portal/:slug`	Family usage	Avatar call, chat transcript, memory timeline	`POST /v1/realtime/sessions`, `POST /v1/conversations/:id/turns`	Stable session and persisted transcript

3) Ingestion State Machine

{
  "states": [
    "idle",
    "uploading",
    "queued",
    "extracting_text",
    "chunking",
    "embedding",
    "indexing",
    "complete",
    "failed"
  ],
  "metrics": {
    "progress_pct": "number",
    "chunks_done": "number",
    "chunks_total": "number",
    "token_count": "number"
  }
}

UI Rules During Ingestion

Show current stage text, percent, and chunk counters at all times.
Support parallel source processing without blocking whole pipeline.
Allow retry on failed item; keep successful items untouched.
Do not close page warning while jobs are running.

4) Personality Calibration Schema

{
  "display_name": "string (2-80)",
  "primary_language": "en-IN | hi-IN | ...",
  "secondary_languages": ["string"],
  "speech_style": {
    "warmth": 0,
    "humor": 0,
    "formality": 0,
    "directness": 0,
    "spirituality": 0
  },
  "catchphrases": ["string"],
  "core_values_ranked": ["family", "discipline", "kindness"],
  "taboo_topics": ["string"],
  "favorite_topics": ["string"],
  "memory_cards": [
    {
      "year": 1983,
      "title": "World Cup memory",
      "people": ["brother", "neighbors"],
      "context": "street radio listening",
      "tone": "joyful"
    }
  ]
}

5) Voice Training Contract

Validation Requirements

Total voiced duration: at least 120 seconds.
Segment duration: 8 to 25 seconds each.
Sample rate: at least 16kHz.
SNR: at least 18dB.
Silence ratio: less than 35%.
Single speaker confidence: at least 0.85.

Quality Failure Example

{
  "status": "quality_failed",
  "report": {
    "snr_db": 13.2,
    "clipping_pct": 0.4,
    "silence_pct": 42.1,
    "single_speaker_confidence": 0.91
  },
  "issues": [
    {"code": "LOW_SNR", "message": "Move to quieter room"},
    {"code": "TOO_MUCH_SILENCE", "message": "Speak continuously"}
  ]
}

6) Avatar Upload Contract

Formats: jpg, png, webp.
Max size: 10MB.
Resolution: at least 1024x1024 (recommended 1536+).
Single face only; no heavy occlusion.
Pose limits: yaw/pitch/roll around 15 degrees max.
Face area should occupy 25% to 70% of frame.

{
  "status": "accepted",
  "quality_report": {
    "face_count": 1,
    "occlusion_score": 0.08,
    "yaw_deg": 6.1,
    "pitch_deg": 4.2,
    "lighting_score": 0.82
  }
}

7) REST API Endpoints

Method	Path	Purpose	Auth
POST	`/v1/simulations`	Create simulation workspace	Bearer
GET	`/v1/simulations/:id`	Read workspace metadata	Bearer
POST	`/v1/uploads/presign`	Create upload URL	Bearer
POST	`/v1/simulations/:id/ingestions`	Start ingestion job	Bearer
GET	`/v1/jobs/:job_id/events`	Stream job progress	Bearer
PUT	`/v1/simulations/:id/persona`	Save personality config	Bearer
PUT	`/v1/simulations/:id/relationships`	Save relationship maps	Bearer
POST	`/v1/simulations/:id/voice/samples`	Register voice segments	Bearer
POST	`/v1/simulations/:id/voice/train`	Start voice training	Bearer
POST	`/v1/simulations/:id/avatar/upload`	Upload avatar source image	Bearer
POST	`/v1/simulations/:id/avatar/render`	Start avatar render job	Bearer
POST	`/v1/realtime/sessions`	Create realtime session token	Bearer
POST	`/v1/simulations/:id/publish`	Publish simulation portal	Bearer
POST	`/v1/simulations/:id/access-links`	Create family share links	Bearer
POST	`/v1/conversations/:id/turns`	Persist chat turns	Bearer

8) OpenAPI Snippets (YAML)

openapi: 3.1.0
info:
  title: Publiclear API
  version: 1.0.0
paths:
  /v1/simulations:
    post:
      summary: Create simulation
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateSimulationRequest'
      responses:
        '201':
          description: Created
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Simulation'
  /v1/realtime/sessions:
    post:
      summary: Create realtime session
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/RealtimeSessionRequest'
      responses:
        '200':
          description: Session token
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/RealtimeSession'
components:
  schemas:
    CreateSimulationRequest:
      type: object
      required: [name, primary_language, consent]
      properties:
        name: {type: string, minLength: 2, maxLength: 80}
        primary_language: {type: string}
        consent: {type: boolean}
    Simulation:
      type: object
      properties:
        simulation_id: {type: string}
        owner_id: {type: string}
        status: {type: string}
    RealtimeSessionRequest:
      type: object
      required: [simulation_id, relationship_id]
      properties:
        simulation_id: {type: string}
        relationship_id: {type: string}
        mode: {type: string, enum: [audio, text, multimodal]}
    RealtimeSession:
      type: object
      properties:
        session_id: {type: string}
        token: {type: string}
        expires_at: {type: string, format: date-time}

9) TypeScript Shared Interfaces

export interface Simulation {
  simulation_id: string;
  owner_id: string;
  name: string;
  primary_language: string;
  status: "draft" | "ready" | "published";
  created_at: string;
  updated_at: string;
}

export interface IngestionItem {
  ingestion_id: string;
  simulation_id: string;
  source_type: "file" | "whatsapp" | "manual" | "email_export";
  stage:
    | "idle"
    | "uploading"
    | "queued"
    | "extracting_text"
    | "chunking"
    | "embedding"
    | "indexing"
    | "complete"
    | "failed";
  progress_pct: number;
  chunks_done: number;
  chunks_total: number;
  token_count: number;
  error_code?: string;
  error_message?: string;
}

export interface RelationshipProfile {
  person_id: string;
  name: string;
  relation: string;
  closeness: 1 | 2 | 3 | 4 | 5;
  address_style: string;
  inside_jokes: string[];
  avoid_topics: string[];
  response_tone_override?: {
    warmth?: number;
    formality?: number;
    humor?: number;
    directness?: number;
  };
}

export interface TurnRequest {
  session_id: string;
  simulation_id: string;
  relationship_id: string;
  input: {
    modality: "audio" | "text";
    text?: string;
    language: string;
    audio_ref?: string;
  };
  options: {
    stream: boolean;
    temperature: number;
    max_output_tokens: number;
  };
}

export interface TurnResponse {
  turn_id: string;
  output_text: string;
  memory_citations: string[];
  latency_ms: {
    retrieval: number;
    first_token: number;
    total: number;
  };
}

10) Realtime Streaming Events

Event Types

turn.started
input_transcript.partial
input_transcript.final
retrieval.ready
response.output_text.delta
response.output_text.done
response.output_audio.delta
response.output_audio.done
avatar.viseme.delta
turn.completed
turn.error

{
  "type": "response.output_text.delta",
  "turn_id": "turn_001",
  "seq": 12,
  "delta": "In school, we loved",
  "timestamp": "2026-02-22T10:15:17.311Z"
}

{
  "type": "avatar.viseme.delta",
  "turn_id": "turn_001",
  "seq": 12,
  "audio_offset_ms": 1640,
  "visemes": [
    {"t_ms": 0, "v": "AA", "w": 0.7},
    {"t_ms": 90, "v": "M", "w": 0.6},
    {"t_ms": 170, "v": "EH", "w": 0.8}
  ]
}

11) Frontend Realtime Handling Rules

Append output_text.delta to active bubble immediately.
Audio chunks go through 80ms to 120ms jitter buffer.
Apply visemes against the same audio playback clock.
If visemes are delayed, fallback to amplitude-based mouth animation.
Turn is complete only after both text and audio done events arrive.

12) Acceptance Test Matrix

Area	Test Case	Expected Result
S0 Create	Create simulation with consent unchecked	Submit blocked with inline validation
S1 Ingestion	Upload mixed files (pdf, txt, invalid)	Valid files proceed, invalid file shows actionable error
S1 Ingestion	Interrupt network during embedding	Progress resumes or retry starts from last persisted checkpoint
S2 Persona	Save with missing required sliders	Field-level errors and save prevented
S3 Relationships	Add tone override for one relationship	Override reflected in preview responses
S4 Voice	Audio with high noise	Quality fail with exact issue codes and recapture guidance
S5 Avatar	Upload photo with two faces	Rejected with multi-face error and guidance
S6 Preview	Run multimodal turn	Text delta, audio delta, viseme delta all stream successfully
S7 Publish	Create share link with expiry	Link generated, expiry enforced server-side
S8 Portal	Concurrent family sessions	No cross-session memory leakage

13) Performance and SLO Targets

P95 first text token under 700ms for warm sessions.
P95 end-to-end voice turn under 2200ms.
P95 websocket disconnect rate under 1% per day.
Ingestion completion success above 99% excluding corrupted uploads.
Portal availability target: 99.9% monthly.

14) Security and Compliance Controls

Consent record required before training or publish actions.
Encryption in transit (TLS 1.2+) and at rest.
Role-based access for owner, editor, and family viewer roles.
Signed URL uploads with short expiry windows.
Audit log for model changes, publication, and share link events.
Explicit AI disclosure inside portal UI and metadata responses.