Project Legacy Oracle

LegacyOracle Super Agent: Definitive Specification & Implementation Blueprint (v6.6.2)

Document Version: 6.6.2
Date: October 27, 2023 (Document Finalization Date)
Status: Definitive, Self-Contained Handover Specification

Table of Contents:

  1. Introduction
    • 1.1 Vision (Sentient OS v6.6.2)
    • 1.2 Goals
    • 1.3 Scope
    • 1.4 Guiding Principles
  2. Evolution History & Design Rationale
    • 2.1 Conceptual v1 -> v6.6.2 Summary
  3. Architectural Style
    • 3.1 Hybrid Approach Explained
  4. System Overview
    • 4.1 High-Level Architecture Diagram (v6.6.2)
    • 4.2 Component Interaction Flow Summary
  5. Technology Stack (Specific Libraries & Versions)
  6. Environment Setup (Mandatory & Explicit Steps)
    • 6.1 Hardware Requirements
    • 6.2 Operating System Configuration (Windows 11 Pro)
    • 6.3 Core Software Installation (Python 3.10.12, Git, Docker Desktop)
    • 6.4 GPU Drivers & Libraries (AMD ROCm 5.6 / DirectML)
    • 6.5 LM Studio Setup
    • 6.6 Automatic1111 WebUI Setup (Image Gen Service)
    • 6.7 ROCm Generative Services Setup (Video & Audio FastAPI Wrappers)
    • 6.8 ByteCraft Setup
    • 6.9 Amuse Implementation Setup
    • 6.10 OCR/Object Detection Setup
    • 6.11 WebDrivers Setup
    • 6.12 Core Agent Python Environment (requirements_agent.txt)
    • 6.13 VS Code & Roo Code Setup (Optional)
    • 6.14 Initial System Configuration (settings.yaml)
  7. Project Structure (Detailed Folder Layout)
  8. Core Agentic Flow & Philosophy
    • 8.1 Contrast with Static Approaches
    • 8.2 The Dynamic Agentic Flow Explained (Detailed Steps)
    • 8.3 Role of the Orchestrating Model (phi-4-mini default)
    • 8.4 Dynamic Model Selection (ModelSelector Logic Explained)
    • 8.5 MoA (Mixture-of-Agents) Cross-Checking/Refinement Logic
    • 8.6 Benefits Summary
  9. Component Deep Dive (Full Specifications)
    • 9.1 UI (AI Studio GUI – PyQt6)
    • 9.2 Agent Core (OpenManus Framework – core/)
    • 9.3 Skills Layer (skills/ & plugins/)
    • 9.4 Sub-Agents (agents/subagents/) & Conductor Agent Interaction
    • 9.5 Reasoning Models (reasoning_models/)
    • 9.6 Backend Clients (clients/)
    • 9.7 External Services (API Contracts Detailed)
    • 9.8 Task Automation Framework (GUI Macros)
    • 9.9 Web Automation Framework
    • 9.10 Flow Automation Framework (WorkflowEngine Logic & Error Handling)
    • 9.11 Perception “View Window”
  10. Configuration & Data Management
    • 10.1 Configuration Files (model_config.py – FULL CONTENT, settings.yaml – FULL SCHEMA EXAMPLE, user_config.yaml – FULL SCHEMA EXAMPLE, tasks.json/web_tasks.json – JSON Schemas & Examples)
    • 10.2 Data Storage (agent_data.db – FULL SQL SCHEMAS, Cache Strategy Details)
    • 10.3 Versioning Strategy Details
  11. Key Workflows (Detailed Component Interactions & Data Flow)
  12. Cross-Cutting Concerns (Detailed Implementation Strategies)
    • 12.1 Security
    • 12.2 Performance
    • 12.3 Error Handling
    • 12.4 Maintainability
    • 12.5 Globalization
    • 12.6 Concurrency
  13. Testing Strategy (Detailed Plans)
    • 13.1 Unit Testing
    • 13.2 Integration Testing
    • 13.3 Service Testing
    • 13.4 End-to-End (E2E) Testing
    • 13.5 UI Testing
    • 13.6 Security Testing
    • 13.7 Performance Testing
  14. Deployment & Operations
    • 14.1 docker-compose.yaml (Complete & Verified Example)
    • 14.2 Manual Setup Guide (Exhaustive Step-by-Step)
    • 14.3 Service Management (Detailed nssm/Task Scheduler Examples)
    • 14.4 Health Checks & Monitoring (OpsManager Implementation Details)
    • 14.5 First-Run Setup Wizard (UI_SetupWizard Detailed Flow)
    • 14.6 Updating Procedures (Agent & Services)
  15. Development Plan / Phasing (Detailed Task Breakdown)
  16. Developer Integration Steps (Checklist)
  17. Future Considerations (Post v6.6.2 – Consolidated List)
  18. Glossary (Comprehensive Definitions)
  19. API Documentation (Consolidated & Detailed Specs)
  20. Tutorials / Examples (Outline)
  21. Final Handover Statement

1. Introduction

1.1 Vision (Sentient OS v6.6.2)

LegacyOracle v6.6.2 is the definitive blueprint for an intelligent, autonomous layer fused with the Windows 11 OS. It serves as a Sentient Software Sovereign—a proactive, autonomous co-pilot providing unparalleled perception (multimodal inputs, system state, visual screen understanding), dynamic local reasoning (LM Studio), decisive action (native OS control, application automation, VS Code integration), rich multimodal generation (Image, Video, Audio, SWF via local services optimized for AMD), adaptive assistance, self-improvement, and knowledge management. It prioritizes robustness, user customization, advanced AI adaptability, maintainability, globalization, performance optimization, security, and local processing, all presented through the personalized “AI Studio GUI”.

1.2 Goals

Deep Native Windows Integration: Utilize core Windows APIs/tools (WMI, UI Automation, PowerShell, WSL, Docker, Registry, Events, Power Mgmt, PerfCounters, Search Index) for comprehensive context and control.

Dynamic & Optimized Local Reasoning: Dynamically select/load/configure optimal LM Studio models (incl. Logic tasks) via ModelSelector, manage resources via ResourceGovernor, stream reasoning steps.

Comprehensive & Extensible Skillset (OpenManus): Implement modular skills (System Admin, Automation, Coding, Search, Code Exec, Vision, Knowledge Acq., Gen AI clients, Logic, OS Control, Macros, Few-Shot) expandable via a PluginManager.

Service-Oriented Multimodal Generation: Orchestrate stable local services (A1111 API, ROCm FastAPI Wrappers for Video/Audio) for AMD hardware-accelerated Image, Video, Audio generation; include ByteCraft SWF generation via CLI and Amuse via dedicated implementation/client.

Proactive & Adaptive Intelligence: Analyze user behavior/system/visual context for suggestions, automate workflows (UserConfigManager), adapt agent persona.

Autonomy & Self-Improvement (LEGOPROMIND): Implement scheduled tasks, reflection, secure self-updating, introspective memory, performance logging/optimization (LearningAgent), RL optimization (RLAgent), continual learning.

Advanced UI (“AI Studio GUI”): Deliver the specified multi-panel dashboard (PyQt6) with customizable themes, animated face (#ffdc96), reasoning stream ([Reasoning: ...]), green-text terminal (#00FF00), context management, system vitals, generative controls/viewers (Ruffle), live overlay, native notifications (win11toast), System Tray integration (pystray), workflow editor. Maintain base background colors (#f0f0f0, #ffffff).

Secure Privilege Management: Handle elevated actions via SecurityManager, UAC prompting, configurable ACLs.

Knowledge Acquisition: Actively retrieve, document, structure (SQLite/VectorDB), and integrate learned knowledge.

Multi-Agent Thought Fabric: Employ ConductorAgent managing specialized sub-agents.

Operational Robustness: Implement automated service recovery, load balancing, and health monitoring (OperationsManager).

Maintainability: Integrate automated dependency checks (DependencyManager), plugin support, versioning.

Globalization: Support multiple languages (LocalizationManager, gettext).

Performance Optimization: Utilize model quantization/pruning/caching (ModelOptimizer, AMD GAIA).

Privacy & Security First: Enforce local processing, sandboxing, permission awareness, user control.

1.3 Scope

Definitive architecture and implementation guide for v6.6.2, detailing all components, interfaces, APIs, data structures, specific setup instructions with defined URLs/paths, testing, phasing, and operational procedures for Windows 11 with AMD RX 7800 XT (16GB VRAM) / 128GB RAM target hardware. This document is self-contained and supersedes all previous versions.

1.4 Guiding Principles

Modularity, Separation of Concerns, Asynchronicity, Security-by-Design, Configurability, Local-First, Extensibility, Robustness, User Control, Performance Awareness.

2. Evolution History & Design Rationale

The LegacyOracle Super Agent has evolved through several key stages:

v1: Basic CLI task executor with a static knowledge base. Focused on predefined enterprise workflows. Limitations included lack of flexibility, no generative capabilities, and minimal user interaction.

v2-v4: Introduced modularity via an OpenManus-style skill system and event-driven architecture. A basic GUI (initially PyQt, then standardized on PyQt6) was added. Performance improved with asyncio, and a logging system was implemented. Decision-making was enhanced with rule engines, and the knowledge base became more dynamic, integrating external data.

v5.x (incl. v5.0-v5.5 concepts): Marked a significant leap towards the “Super Agent” concept. Key additions included: Integration of local LLMs via LM Studio; Adoption of a Service-Oriented Architecture (SOA) for heavy generative tasks (Image, Video, Audio) focusing on AMD hardware (ROCm/DirectML); Implementation of dynamic model selection (ModelSelector); Deep OS Integration concepts (WMI, pywin32); Multi-Agent ideas (ConductorAgent); Proactive features (ProactiveManager) and Autonomy Loops (Reflection, Self-Update); Advanced “AI Studio GUI” design (animated face, streaming reasoning); Specific tool integrations like ByteCraft and Amuse; Introduction of core managers for Security/Privileges, Resources, Configuration, Errors, Operations, User Config, Plugins, Localization, Dependencies, and AI Learning/Optimization; Addition of Knowledge Acquisition, Web/GUI Automation Frameworks (including Macro/Web Recording).

v6.x (This Document – v6.6.2): Consolidates all previous designs and enhancements into this single, definitive blueprint. It refines component interactions based on review feedback (sub-agent delegation, matrix format, config scope, Amuse API, flow error handling), provides explicit API contracts, data schemas, detailed setup instructions (including specific URLs/Paths/Matrix data), addresses specific developer transition points (Section 8), includes the specific implementation details for features like the “Record Macro” capability, clarifies advanced AI concepts like RL and Few-Shot learning, incorporates AMD GAIA optimization concepts, and presents a complete, actionable plan for building the most advanced version envisioned.

3. Architectural Style

Hybrid: Modular Core (OpenManus) + Service-Oriented Generation + Multi-Agent System + Deep OS Integration Layer (Windows Native) + Extensible Plugin & Automation (GUI/Web) System

4. System Overview

4.1 High-Level Architecture Diagram (v6.6.2)

graph TD
subgraph User Interface (AI Studio GUI - PyQt6)
UI_MainWindow[Main Window: Panels, Menus (#f0f0f0)] --> UI_Ctrl[UI Controller (Handles Signals/Slots)]
UI_Ctrl <--> AC_CommInterface # Bi-directional Protocol
UI_ChatPanel[Chat Panel (#ffffff, Streams Reasoning, Feedback Buttons)]
UI_ContextPanel[Context/Workspace Panel (Checkboxes, Token Count)]
UI_SettingsPanel[Settings (Models, Services, Proactive, Theme, Priorities, Workflows, Agents, Privilege Mode, Language)]
UI_StatusPanel[Status Bar (Agent State, Active Model, System Vitals)]
UI_GenTabs[Generative Tabs (Params, Media Viewers/Players)]
UI_FaceController --> UI_FaceWidget[Animated Face Widget (#ffdc96, Code-Drawn)]
UI_OverlayController --> UI_OverlayWidget[(Opt.) Floating Transparent Overlay]
UI_TerminalPanel[Terminal Log Panel (#ffffff, Green Text #00FF00)]
UI_NativeIntegration[Native Notifications (win11toast) & System Tray (pystray)]
UI_MediaPlayers[Embedded Ruffle, Image Viewer, Video/Audio Players]
UI_SetupWizard[First-Run Setup Wizard]
UI_PluginManagerUI[Plugin Management UI]
UI_WorkflowEditor[Workflow Editor UI (GUI & Web Steps)]
UI_LanguageSelector[Language Selector]
UI_TaskMgmtPanel[Task Management Panel] --> UI_RecordMacroBtn[Record GUI Macro Button]
UI_TaskMgmtPanel --> UI_RecordWebBtn[Record Web Task Button]
UI_TaskMgmtPanel --> UI_LoadTaskBtn[Load Task JSON Button]
UI_TaskMgmtPanel --> UI_TaskSelector[Execute Task Dropdown]
UI_TaskMgmtPanel --> UI_ExecuteTaskBtn[Execute Task Button]
UI_RecordMacroDialog[Record GUI Macro Dialog (pynput listener)]
UI_RecordWebDialog[Record Web Task Dialog]
UI_ViewWindowPanel[Perception View Window Panel]
end
subgraph Agent Core & Orchestration (Python - OpenManus Framework)
AC_Agent[Agent Orchestrator (`agent.py`)] --> AC_CommInterface[UI Communication Interface]
AC_StateManager[State Manager (State, Emotion)]
AC_MemoryManager[Episodic & Knowledge Memory (SQLite/VectorDB)]
AC_ReasoningCtrl[Reasoning Controller (Selects Strategy, Streams)]
AC_ModelSelector[LM Studio Model Selector (w/ Logic, Preloading, Resource Checks)]
AC_ProactiveMgr[Proactive Manager (Logger, Analyser, Notifier)]
AC_Scheduler[Task Scheduler (APScheduler Interface)]
AC_SkillDispatcher[Skill Dispatcher (Handles Built-in, Plugins & Automation Tasks, Uses Priorities)]
AC_AsyncTaskMgr[Async Task Manager (Polling Gen Services, Background Jobs)]
AC_ResourceGovernor[Resource Governor (Dynamic VRAM/RAM Tracking)]
AC_ConductorAgent[Conductor Agent (Manages Sub-Agents)]
AC_SecurityManager[Security & Permissions Manager]
AC_ConfigManager[Configuration Manager]
AC_ErrorHandler[Central Error Handler]
AC_LearningAgent[Learning Agent (Performance Logging, Matrix Update, Continual Learning)]
AC_RLAgent[Reinforcement Learning Agent]
AC_UserConfigMgr[User Config Manager]
AC_OpsManager[Operations Manager (Health, Recovery, Load Balance)]
AC_PluginMgr[Plugin Manager]
AC_LocalizationMgr[Localization Manager (i18n/l10n)]
AC_DepManager[Dependency Manager]
AC_ModelOptimizer[Model Optimizer (Quant/Prune/Cache, GAIA Integration)]
AC_TaskManager[Task Manager (Loads/Manages GUI & Web Task Definitions)]
AC_WorkflowEngine[Workflow Engine (Executes Multi-Step Flows)]
end
subgraph Reasoning Models (`reasoning_models/`)
RM_Base[BaseReasoningStrategy (Streams Steps)]
RM_Code[CodeReasoning]
RM_General[GeneralReasoning]
RM_Creative[CreativeReasoning]
RM_SelfCritique[SelfCritiqueReasoning]
RM_Visual[Visual Reasoning Strategy]
RM_Logic[Logic Reasoning Strategy]
end
subgraph Skills Layer (Python Modules - `skills/` + `plugins/`)
Skill_Base[BaseSkill Interface]
Skill_APIClientBase[Base API Client Skill]
Skill_RooCode[RooCode Client Skill]
Skill_FileSystem[FileSystem Skill]
Skill_WebSearch[WebSearch Skill]
Skill_CodeInterpreter[Code Interpreter Skill]
Skill_Vision[Vision Skill (Capture, OCR, Detect, Classify, View Window Feed)]
Skill_ByteCraft[ByteCraft Skill]
Skill_Autonomous[Autonomous Tasks Skill]
Skill_NLQuery[Natural Language Skill]
Skill_ImageGenClient[Image Gen API Client Skill]
Skill_VideoGenClient[Video Gen API Client Skill]
Skill_AudioGenClient[Audio Gen API Client Skill]
Skill_AmuseGen[Amuse Image Gen Skill]
Skill_SystemMonitor[System Monitoring Skill]
Skill_UserContext[User Context Skill]
Skill_OSControl[OS Control "Manus" Skill (Adv. UI Auto, Cred Mgmt, Power, Config)]
Skill_KnowledgeAcq[Knowledge Acquisition Skill]
Skill_EnvironmentAwareness[Environment Awareness Skill]
Skill_MacroExecution[GUI Macro Execution Skill]
Skill_WebAutomation[Web Automation Skill]
Skill_FewShot[Few-Shot Learning Skill]
Skill_Logic[Logic Skill]
# Dynamically loaded plugin skills via AC_PluginMgr
end
subgraph Sub-Agents (`agents/subagents/`)
SubAgent_Base[Base Sub-Agent Class]
SubAgent_CodeMaster[CodeMaster Agent]
SubAgent_VisionAnalyst[Vision Analyst Agent]
SubAgent_LogicSolver[Logic Solver Agent]
SubAgent_WebNavigator[Web Navigator Agent]
end
subgraph Backend Clients (`clients/`)
LMS_Client[LM Studio Client (Inference & Model Mgmt)]
VSC_Client[VS Code API Client (Roo Code)]
SearchAPI_Client[Web Search API Client (e.g., DuckDuckGo)]
Sandbox_Client[Docker/Sandbox Client]
BC_CLI_Client[ByteCraft CLI Wrapper (`subprocess`)]
Git_Client[Git CLI Wrapper (`subprocess`/`gitpython`)]
Email_Client[Email Client (`smtplib`/`imaplib`)]
Input_Sim_Client[Input Simulation Client (`pyautogui`/`keyboard`/`pynput`)]
GenService_Image_Client[A1111 API Client (`httpx`)]
GenService_Video_Client[Video Gen FastAPI Client (`httpx`)]
GenService_Audio_Client[Audio Gen FastAPI Client (`httpx`)]
Amuse_Client[Amuse Implementation Client (`subprocess`)]
OS_Client_Monitor[OS Monitoring Client (WMI, PerfCounters, Event Logs)]
OS_Client_Context[OS User Context Client (pywin32 Window/Clipboard/Search API)]
OS_Client_Advanced[OS Control Client (pywin32/ctypes/uiautomation/Power)]
OS_Client_PSExecutor[PowerShell Executor Client (`subprocess`)]
OS_Client_EnvAwareness[Env. Awareness Client (WMI, Registry, PS Commands)]
Docker_Client[Docker CLI/SDK Client]
OCR_Client[OCR Client (Tesseract/PaddleOCR)]
ObjectDetect_Client[Object Detection Client (ONNX/YOLO - GAIA Optimized Runtime)]
WebDriver_Client[WebDriver Client (Selenium/Playwright)]
CredentialManager_Client[Credential Manager Client (`keyring`)]
end
subgraph External Services & Runtimes
Ext_LMS[LM Studio Server (Local)]
Ext_VSC[VS Code + Roo Code Ext (Local)]
Ext_A1111[A1111 WebUI Service w/ API (Local)]
Ext_VideoSvc[Video Gen FastAPI Service (Local, ROCm Env)]
Ext_AudioSvc[Audio Gen FastAPI Service (Local, ROCm Env)]
Ext_Docker[(Opt.) Docker Daemon (Local)]
Ext_Git[Git Executable]
Ext_OS_APIs[Windows APIs (WMI, UI Automation, User32, Kernel32, PowerCfg, Search Index, Event Log, Registry, App Mgmt APIs)]
Ext_PowerShell[PowerShell Runtime]
Ext_WSL[WSL Runtime & CLI]
Ext_DockerDesktop[Docker Desktop & CLI/API]
Ext_ByteCraft[ByteCraft CLI Installation]
Ext_AmuseImpl[Amuse Implementation Python Env & Scripts]
Ext_AmuseApp[Amuse Software Installation]
Ext_WebDriverBinaries[ChromeDriver/GeckoDriver etc.]
Ext_Browsers[Chrome/Firefox/Edge]
WinCredMgr[Windows Credential Manager]
Ext_OCR[Local OCR Engine Installation]
Ext_ObjDetect[Local Object Detection Model/Engine Files]
Ext_Gettext[Gettext MO/PO Files (`locale/`)]
Ext_GaiaRuntime[(Opt.) AMD GAIA Optimized ONNX Runtime]
end
subgraph Data Storage (`data/`)
Data_SQLite[SQLite DB (`agent_data.db`)]
Data_Config[Config Files (`settings.yaml`, `user_config.yaml`, `model_config.py`, `tasks.json` / `web_tasks.json`)]
Data_Logs[Log Files (`logs/`)]
Data_Outputs[Generated Media/Files (`outputs/`)]
Data_ModelCache[Model Output Cache (`data/cache/`)]
Data_PluginStore[Plugin Storage/Registry (`plugins/`)]
end
# --- Key Interactions ---
UI_MainWindow -->|User Input| UI_Ctrl -->|Agent Request| AC_CommInterface --> AC_Agent
AC_Agent -->|State Updates| AC_CommInterface -->|UI Updates| UI_MainWindow
AC_Agent --> AC_ModelSelector --> AC_ResourceGovernor & LMS_Client
AC_Agent --> AC_ReasoningCtrl --> Reasoning Models # Passes model/temp
AC_ReasoningCtrl -->|Stream Steps| UI_ChatPanel
AC_Agent --> AC_ConductorAgent --> SubAgents
AC_Agent --> AC_SkillDispatcher --> Skills Layer
Skills Layer --> Backend Clients --> External Services & Runtimes
AC_OpsManager --> Backend Clients # Health Checks
AC_LearningAgent <-- Skills Layer # Performance Data
Backend Clients --> Ext_GaiaRuntime # Optimized Inference
AC_ModelOptimizer --> Ext_GaiaRuntime # Optimization Tools

4.2 Component Interaction Flow Summary

User interacts with AI Studio GUI. UI Controller sends requests to Agent Core via AC_CommInterface. Core Orchestrator analyzes request (using Orchestrator Model), selects execution model (ModelSelector potentially influenced by RLAgent), determines plan (ReasoningController). Simple tasks dispatched via SkillDispatcher to Skills. Complex tasks delegated to ConductorAgent managing Sub-Agents. Skills/Sub-Agents use Backend Clients to interact with External Services (LM Studio, A1111, ROCm Services), OS APIs, or run external tools (ByteCraft, Amuse). Local ONNX inference clients (OCR_Client, ObjectDetect_Client) utilize AMD GAIA optimized runtimes if configured. Results flow back to Core, then UI. Proactive Manager monitors activity and triggers suggestions. OpsManager monitors service health. ResourceGovernor manages system load. SecurityManager handles privileges. LearningAgent logs performance, updates empirical matrix, potentially trains RLAgent. ModelOptimizer handles caching and optional GAIA-powered quantization/pruning.

5. Technology Stack (Specific Libraries & Versions)

Core Language: Python 3.10.12

UI: PyQt6 6.5.0+, PyQt6-WebEngine 6.5.0+

Core Framework: APScheduler 3.10.4

LLM Interface: litellm 1.36.7+, httpx[http2] 0.25.2+

Generative Services: FastAPI 0.104.1+, Uvicorn 0.24.0+

AMD Acceleration: ROCm 5.6+ enabled PyTorch (e.g., 2.0.1+rocm5.6), DirectML (via A1111/ONNX)

OS Interaction: psutil 5.9.6+, pywin32 b306+, wmi 1.5.1+, uiautomation 2.0.17+, pystray 0.19.5+, win11toast-pyqt6 0.3.2+ (or winrt-notifications), python-registry/winreg, keyboard 0.13.5+, pyautogui 0.9.54+

Vision: mss 6.1.0+, Pillow 10.1.0+, pytesseract 0.3.10+, paddleocr 2.7.0+, paddlepaddle 2.5.2+, onnx 1.15.0+, onnxruntime-directml 1.16.3+

Data: sqlite3 (stdlib), PyYAML 6.0.1+, chromadb 0.4.18+ (Opt.), faiss-cpu 1.7.4+ (Opt.)

Sandboxing: docker 7.0.0+, subprocess (stdlib)

Internationalization: gettext (stdlib)

Model Optimization: optimum[onnxruntime] 1.16.1+, torch, gaia-toolbox (Install via AMD GAIA repo)

Reinforcement Learning: stable-baselines3[extra] 2.2.0+, numpy 1.26.2+

Dependency Check: pipdeptree 2.13.2+

Web Automation: selenium 4.15.2+, playwright 1.40.0+

Credential Management: keyring 24.3.0+

Other: requests 2.31.0+, pygments 2.17.2+, pyperclip 1.8.2+, gitpython 3.1.40+, python-dotenv 1.0.0+, watchdog 3.0.0+, diskcache 5.6.3+, pynput 1.7.6+

6. Environment Setup (Detailed & Mandatory)

Target System: Windows 11 Pro, 128GB RAM, AMD RX 7800 XT 16GB VRAM, Admin privileges. Internet connection required for initial downloads. Assumes project root is D:\legacy_oracle. Adjust paths as necessary.

Core Software Installation:

Python 3.10.12: Download 64-bit installer from python.org. During installation: Check “Add Python 3.10 to PATH”. Verify: Open Command Prompt (cmd) and run python --version. Output should be Python 3.10.12.

Git: Download installer from git-scm.com. Install with default options, ensuring Git is added to PATH. Verify: git --version.

Docker Desktop: Download from docker.com. Install. During setup or in Docker Desktop Settings > General: Ensure “Use the WSL 2 based engine” is checked. Start Docker Desktop. Verify: docker --version. Allocate resources (Settings > Resources): Recommend >=8GB RAM, adjust CPU cores as needed.

GPU Drivers & Libraries (AMD ROCm/DirectML):

AMD Adrenalin Drivers: Go to amd.com/en/support, auto-detect or manually select RX 7800 XT for Windows 11, download the latest recommended/WHQL Adrenalin Edition driver package. Run the installer, choose “Full Install”. Restart if prompted.

ROCm for Windows (v5.6 or compatible): CRITICAL STEP. Follow the exact instructions from AMD’s official ROCm documentation for Windows: https://rocm.docs.amd.com/en/latest/deploy/windows/index.html. This typically involves installing specific components via the driver installer or separate packages. Ensure HIP SDK is installed. Verify: Open a “Command Prompt (Admin)” or “ROCm Command Prompt” (if created by installer) and run rocminfo. It should list your RX 7800 XT details without errors. Note the installed ROCm version (e.g., 5.6.x).

DirectML: Usually part of Windows/DirectX. Verify onnxruntime-directml installation later.

LM Studio Setup:

Download installer from https://lmstudio.ai/.

Run LM-Studio-Setup-X.Y.Z.exe. Install to default location.

Launch LM Studio.

Download Models: Use search bar (magnifying glass). Download GGUF format models, prioritizing quantizations like Q4_K_M or Q8_0 balancing performance and quality within 16GB VRAM. Required models (examples):

microsoft/Phi-3-mini-4k-instruct-gguf (Choose Q4_K_M or similar) – Alternative for phi-3-mini

MaziyarPanahi/phi-4-mini-instruct-reasoning-ita-sft-cold-start-claude-3.7-distillation-v2-GGUF (Q8_0) – Orchestrator

NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF (Q8_0) – Alternative for Mistral-Small if VRAM is tight

deepseek-ai/deepseek-coder-6.7b-instruct-GGUF (Q6_K) – Alternative for deepseek-coder-v2-lite

Qwen/Qwen1.5-7B-Chat-GGUF (Q8_0) – Alternative for qwen2.5-7b-instruct

Qwen/Qwen1.5-0.5B-Chat-GGUF (Q4_K_M) – Alternative for qwen2.5-0.5b-instruct

google/gemma-7b-it-gguf (Q8_0) – Alternative for gemma-12b

ibm/Granite-13B-Chat-v2-GGUF (Select appropriate quant) – Alternative for granite-8b

VQGAN/Granite-Vision-3.2-2B-GGUF (Q8_0) – Confirm exact repo/filename

Ensure models covering Logic/Math (like Phi variants) and Coding (like Deepseek Coder) are present.

Start Server: Go to Local Server tab (<>). Select one model (e.g., phi-4-mini...) from dropdown. Click “Start Server”. Check logs for confirmation.

Verify API: Open http://localhost:1234/v1/models in browser. Confirm JSON response. Default URL: http://localhost:1234.

Automatic1111 WebUI Setup (Image Gen Service):

Open Git Bash or Command Prompt.

cd D:\legacy_oracle (or your chosen parent dir)

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git external/stable-diffusion-webui

cd external/stable-diffusion-webui

Create dedicated env: python -m venv .venv

Activate: .\.venv\Scripts\activate

Install dependencies (primarily PyTorch related to DirectML, requires careful version matching): Follow A1111’s specific instructions, often involves running webui-user.bat once which attempts installations. Ensure pip install torch-directml is successful if needed. Install other requirements: pip install -r requirements_versions.txt.

Download SD Model(s): Download Stable Diffusion checkpoints (e.g., sd_xl_base_1.0.safetensors or v1-5-pruned-emaonly.safetensors) from Hugging Face or Civitai. Place .safetensors or .ckpt files in D:\legacy_oracle\external\stable-diffusion-webui\models\Stable-diffusion.

Edit webui-user.bat: Open in text editor. Find line set COMMANDLINE_ARGS= and change it to:
bat set COMMANDLINE_ARGS=--api --listen --use-directml --no-half-vae --opt-sub-quad-attention --opt-split-attention-v1
(Adjust DirectML flags based on A1111 wiki/recommendations for AMD)

Run: Double-click webui-user.bat. Wait for it to start and provide URL.

Verify API: Open http://localhost:7860/docs in browser. Default URL: http://localhost:7860.

ROCm Generative Services Setup (Video & Audio FastAPI):

Prerequisite: Working ROCm installation (Step 6.4).

Get Service Code: (Assuming hypothetical repos, replace with actual source)
bash cd D:\legacy_oracle\external git clone https://github.com/YourOrg/rocm-video-service.git git clone https://github.com/YourOrg/rocm-audio-service.git

Create Service Environments (Use Anaconda Prompt with ROCm paths configured): conda create -n rocm_video_env python=3.10 -y conda activate rocm_video_env # CRITICAL: Install ROCm PyTorch matching your ROCm version (e.g., 5.6) pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6 # Install specific service deps (Developer must create requirements_video_svc.txt) pip install -r D:\legacy_oracle\external\rocm-video-service\requirements_video_svc.txt conda create -n rocm_audio_env python=3.10 -y conda activate rocm_audio_env pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6 # Install specific service deps (Developer must create requirements_audio_svc.txt) pip install -r D:\legacy_oracle\external\rocm-audio-service\requirements_audio_svc.txt

Run Services:

Video: conda activate rocm_video_env && cd D:\legacy_oracle\external\rocm-video-service && uvicorn main:app --host 0.0.0.0 --port 8001

Audio: conda activate rocm_audio_env && cd D:\legacy_oracle\external\rocm-audio-service && uvicorn main:app --host 0.0.0.0 --port 8002

Verify: Check http://localhost:8001/docs and http://localhost:8002/docs. Default URLs: http://localhost:8001, http://localhost:8002.

ByteCraft Setup:

Clone repo (replace with actual URL): git clone https://github.com/YourOrg/bytecraft.git external/bytecraft

Setup dedicated env (optional): cd external/bytecraft && python -m venv .venv && .\.venv\Scripts\activate

Install deps: pip install -r requirements.txt (assuming it exists).

Note Script Path: D:\legacy_oracle\external\bytecraft\run_bytecraft.py.

Amuse Implementation Setup:

Install Amuse: Run Amuse installer, target D:\Program Files\Amuse.

Place Implementation: Clone/copy files to D:\legacy_oracle\external\amuse_implementation\.

Env: cd D:\legacy_oracle\external\amuse_implementation && python -m venv .venv && .\.venv\Scripts\activate

Deps: pip install psutil pillow numpy pywin32 pyautogui (or from requirements_amuse_impl.txt).

Configure (Run in Amuse Env):

mkdir config (if needed)

python -c "from amuse_generator import ConfigManager; ConfigManager('config/amuse_config.json').save_config()"

python -c "from amuse_generator import AmuseWrapper; wrapper = AmuseWrapper(amuse_path='D:\\Program Files\\Amuse\\Amuse.exe'); wrapper.config_manager.save_config()"

mkdir output (if needed)

python -c "from amuse_generator import AmuseWrapper; wrapper = AmuseWrapper(); wrapper.set_output_directory(r'%cd%\output')"

Note Paths: Python: D:\legacy_oracle\external\amuse_implementation\.venv\Scripts\python.exe, Script: D:\legacy_oracle\external\amuse_implementation\amuse_generator.py.

Test: python example_usage.py --prompt "test" (in Amuse env).

OCR/Object Detection Setup:

Tesseract: Download installer for Windows from UB Mannheim repo. Install, ensuring to add tesseract.exe to system PATH. Install eng language data. Verify: tesseract --version.

ONNX Model: Download yolov8n.onnx (e.g., from Ultralytics releases) to D:\legacy_oracle\models\detection\.

Runtime: Installed later with core agent deps (onnxruntime-directml).

Core Agent Python Environment:

Navigate to D:\legacy_oracle.

Create/Activate venv: python -m venv .venv, .\.venv\Scripts\activate.

Create requirements_agent.txt:
txt # D:\legacy_oracle\requirements_agent.txt PyQt6>=6.5.0 PyQt6-WebEngine>=6.5.0 APScheduler>=3.10.4 litellm>=1.36.7 httpx[http2]>=0.25.2 psutil>=5.9.6 pywin32>=306 wmi>=1.5.1 uiautomation>=2.0.17 pystray>=0.19.5 win11toast-pyqt6>=0.3.2 # Or winrt-notifications python-registry>=1.3.1 # Or use built-in winreg keyboard>=0.13.5 pyautogui>=0.9.54 mss>=6.1.0 Pillow>=10.1.0 pytesseract>=0.3.10 paddleocr>=2.7.0 paddlepaddle>=2.5.2 # Check CPU/GPU version needed onnx>=1.15.0 onnxruntime-directml>=1.16.3 # For DirectML inference PyYAML>=6.0.1 chromadb>=0.4.18 # Optional Vector DB # faiss-cpu>=1.7.4 # Optional Vector DB (CPU) - GPU version needs specific build docker>=7.0.0 optimum[onnxruntime]>=1.16.1 stable-baselines3[extra]>=2.2.0 numpy>=1.26.2 pipdeptree>=2.13.2 selenium>=4.15.2 playwright>=1.40.0 keyring>=24.3.0 requests>=2.31.0 Pygments>=2.17.2 pyperclip>=1.8.2 GitPython>=3.1.40 python-dotenv>=1.0.0 watchdog>=3.0.0 diskcache>=5.6.3 pynput>=1.7.6 asyncqt>=0.7.0 # For integrating asyncio with PyQt

Install Deps: pip install -r requirements_agent.txt

WebDrivers Setup:

Activate core env: playwright install. This downloads browsers/drivers needed by Playwright. Selenium might require manual chromedriver.exe/etc download placed in PATH or configured in settings.yaml if selenium-manager fails.

VS Code & Roo Code Setup (Optional): Install VS Code. Install Roo Code extension. Configure if needed.

Initial Configuration (settings.yaml):

Copy config/settings.yaml.example to config/settings.yaml.

Edit config/settings.yaml and fill in ALL correct service URLs and PATHS (LM Studio, A1111, Video Svc, Audio Svc, ByteCraft script, Amuse python exe & script, ONNX model). Use absolute paths (e.g., D:\legacy_oracle\...) for clarity unless relative paths from project root are guaranteed to work.

Copy config/user_config.yaml.example to config/user_config.yaml.


7. Project Structure (Detailed Folder Layout)

The proposed project structure organizes the codebase into logical modules, facilitating development, testing, and maintenance. The root directory will be legacy_oracle.

      legacy_oracle/
├── config/ # Configuration files (Static & User Editable)
│ ├── model_config.py # Static: Model capabilities matrix, temps, categories
│ ├── settings.yaml # Static Defaults: System settings, paths, URLs, base ACLs
│ ├── user_config.yaml # User Overrides: Theme, language, priorities, workflows, privilege mode
│ ├── tasks.json # User Defined: GUI Automation Task Definitions (alternative to DB)
│ ├── web_tasks.json # User Defined: Web Automation Task Definitions (alternative to DB)
│ └── flows.json # User Defined: Workflow Definitions (alternative to DB)

├── core/ # Agent core logic (OpenManus style implementation)
│ ├── __init__.py # Includes __version__ = "1.0.0-alpha"
│ ├── agent.py # Main Agent Orchestrator Class (LegacyOracleAgent)
│ ├── conductor_agent.py # Multi-agent coordination Logic
│ ├── state_manager.py # Manages agent's internal state (idle, thinking, emotion)
│ ├── memory_manager.py # Interface to SQLite database (agent_data.db)
│ ├── reasoning_controller.py # Selects and executes Reasoning Strategies
│ ├── model_selector.py # Implements dynamic LM Studio model selection logic
│ ├── proactive_manager.py # Handles activity logging, pattern analysis, suggestions
│ ├── scheduler.py # Wrapper for APScheduler background tasks
│ ├── skill_dispatcher.py # Routes requests to appropriate Skills/Plugins/Tasks
│ ├── async_task_manager.py# Manages long-running async tasks (e.g., polling)
│ ├── resource_governor.py # Monitors and throttles based on system resources
│ ├── security_manager.py # Handles privilege mode, ACL checks, UAC interface
│ ├── config_manager.py # Loads and provides access to settings.yaml, user_config.yaml
│ ├── error_handler.py # Centralized error logging and reporting
│ ├── learning_agent.py # Logs performance, updates empirical matrix, continual learning hooks
│ ├── rl_agent.py # Reinforcement Learning logic (initial structure)
│ ├── user_config_manager.py # Manages reading/writing user_config.yaml
│ ├── ops_manager.py # Service health checks, recovery, load balancing logic
│ ├── plugin_manager.py # Loads and manages plugins from plugins/ directory
│ ├── localization_manager.py # Manages gettext translations
│ ├── dependency_manager.py # Checks Python package dependencies
│ ├── model_optimizer.py # Implements caching, quantization, pruning logic (incl. GAIA)
│ ├── task_manager.py # Loads/saves/provides GUI & Web Task Definitions (from JSON/DB)
│ └── workflow_engine.py # Parses and executes multi-step flows
│ └── automation/ # Task Automation Framework specific code
│ ├── __init__.py
│ ├── gui_macro/
│ │ ├── task_definition.py # TaskDefinition, AutomationStep dataclasses
│ │ ├── engine.py # TaskAutomationEngine (GUI - uses InputSimClient etc.)
│ │ └── recorder.py # GUI Macro Recorder Logic (uses pynput)
│ └── web_automation/
│ ├── __init__.py
│ ├── task_definition.py # WebTaskDefinition, WebAutomationStep dataclasses
│ ├── engine.py # WebAutomationEngine (uses WebDriverClient etc.)
│ └── recorder.py # Web Recorder Logic (coordinates with UI/browser)

├── skills/ # Modular skills implementations (Built-in)
│ ├── __init__.py
│ ├── base_skill.py # Abstract Base Class for all skills
│ ├── clients/ # Internal: Skills that primarily act as API clients
│ │ ├── __init__.py
│ │ ├── base_api_client.py # Base class for API client skills
│ │ ├── image_gen_client.py # Interfaces with A1111 API via GenService_Image_Client
│ │ ├── video_gen_client.py # Interfaces with ROCm Video Service via GenService_Video_Client
│ │ └── audio_gen_client.py # Interfaces with ROCm Audio Service via GenService_Audio_Client
│ ├── roo_code.py # Skill for interacting with VS Code via VSC_Client
│ ├── file_system.py # Skill for local file operations
│ ├── web_search.py # Skill for performing web searches via SearchAPI_Client
│ ├── code_interpreter.py # Skill for executing code via Sandbox_Client
│ ├── vision.py # Skill for screen capture, OCR, detection via respective clients & LMS_Client_Vision
│ ├── bytecraft.py # Skill for controlling ByteCraft via BC_CLI_Client
│ ├── autonomous.py # Skill containing self-update, reflection logic (uses Git_Client, Scheduler)
│ ├── natural_language.py # Skill for advanced NLP tasks via LMS_Client_Reasoning
│ ├── amuse_skill.py # Skill for controlling Amuse via Amuse_Client
│ ├── system_monitor.py # Skill for querying system state via OS_Client_Monitor
│ ├── user_context.py # Skill for getting user env info via OS_Client_Context
│ ├── os_control.py # Skill for OS actions, UI automation via OS_Client_Advanced, InputSimClient, SecurityManager
│ ├── knowledge_acq.py # Skill for finding and storing knowledge (uses WebSearch, NLQuery, MemoryManager)
│ ├── env_awareness.py # Skill for checking system env via OS_Client_EnvAwareness, Docker_Client
│ ├── macro_execution.py # Skill using core/automation/gui_macro/engine.py
│ ├── web_automation.py # Skill using core/automation/web_automation/engine.py
│ ├── few_shot.py # Skill implementing few-shot learning via LMS_Client_Reasoning
│ └── logic_skill.py # Skill for dedicated logical reasoning via LMS_Client_Reasoning

├── agents/ # Sub-agent implementations
│ ├── __init__.py
│ ├── base_sub_agent.py # Base class for sub-agents
│ └── subagents/ # Specific sub-agent implementations
│ ├── __init__.py
│ ├── code_master.py # Specializes in coding tasks
│ ├── vision_analyst.py # Specializes in interpreting visual data
│ ├── logic_solver.py # Specializes in logical puzzles
│ └── web_navigator.py # Specializes in complex web interactions

├── reasoning_models/ # Reasoning strategy implementations
│ ├── __init__.py
│ ├── base_strategy.py # Base class defining interface (must support streaming)
│ ├── code_reasoning.py
│ ├── general_reasoning.py
│ ├── creative_reasoning.py
│ ├── self_critique_reasoning.py
│ ├── visual_reasoning.py
│ └── logic_reasoning.py

├── ui/ # AI Studio GUI (PyQt6)
│ ├── __init__.py
│ ├── main_window.py # Main application window class
│ ├── controller.py # Handles UI logic, signals/slots connecting to Core
│ ├── widgets/ # Custom UI widgets directory
│ │ ├── __init__.py
│ │ ├── animated_face.py # QPainter widget for face animation
│ │ ├── chat_panel.py # Displays chat history, reasoning stream, feedback buttons
│ │ ├── terminal_panel.py # Displays green text logs
│ │ ├── context_panel.py # Workspace/context management widget
│ │ ├── settings_panel.py # Widget for all settings
│ │ ├── status_panel.py # Displays system/agent vitals
│ │ ├── generative_tabs.py # Container for specific generative UIs
│ │ ├── media_players.py # Widgets for Image, Video, Audio, SWF (Ruffle) display
│ │ ├── task_mgmt_panel.py # Contains buttons/dropdowns for automation tasks
│ │ ├── record_macro_dialog.py # Dialog for GUI macro recording
│ │ ├── record_web_dialog.py # Dialog for Web automation recording
│ │ ├── workflow_editor.py # UI for creating/editing workflows
│ │ ├── view_window.py # Panel for displaying VisionSkill output
│ │ ├── plugin_manager_ui.py # UI for managing plugins
│ │ └── language_selector.py # Widget for selecting language
│ ├── resources/ # Icons (icon.png), QSS stylesheets (theme.qss)
│ ├── locale/ # Compiled .mo translation files (e.g., en/LC_MESSAGES/legacy_oracle.mo)
│ └── setup_wizard.py # Multi-page dialog for first-run configuration
│ └── native_integration.py # Handles System Tray icon (pystray) & Notifications (win11toast)

├── clients/ # Wrappers for external libs/APIs/CLI
│ ├── __init__.py
│ ├── base_client.py # Abstract base class for clients
│ ├── lm_studio.py # Interacts with LM Studio HTTP API
│ ├── vscode.py # Interacts with Roo Code API (Implementation TBD)
│ ├── search_api.py # Wrapper for DuckDuckGoSearch or other search libs
│ ├── sandbox.py # Wrapper for Docker SDK or secure subprocess
│ ├── bytecraft_cli.py # Wrapper for ByteCraft subprocess call
│ ├── git.py # Wrapper for GitPython or Git CLI subprocess
│ ├── email.py # Wrapper for smtplib/imaplib
│ ├── input_sim.py # Wrapper for pyautogui/keyboard/pynput
│ ├── a1111.py # Wrapper for A1111 HTTP API (uses httpx)
│ ├── generative_service.py# Base class for ROCm service clients (uses httpx)
│ ├── amuse_client.py # Wrapper for Amuse implementation subprocess call
│ ├── os_monitor.py # Wrapper for WMI, PerfCounters, Event Logs
│ ├── os_context.py # Wrapper for pywin32 Window/Clipboard/Search
│ ├── os_advanced.py # Wrapper for pywin32/ctypes/uiautomation/Power
│ ├── os_ps_executor.py # Wrapper for PowerShell subprocess execution (incl. elevation)
│ ├── os_env_awareness.py # Wrapper for WMI/Registry/PS for env info
│ ├── docker.py # Wrapper for Docker SDK/CLI
│ ├── ocr.py # Wrapper for pytesseract/paddleocr
│ ├── object_detect.py # Wrapper for ONNX Runtime inference
│ ├── webdriver.py # Wrapper for Selenium/Playwright
│ └── credential_manager.py# Wrapper for keyring

├── plugins/ # Directory for dynamically loaded third-party plugins
│ └── example_plugin/ # Example plugin package structure
│ └── __init__.py # Contains plugin registration and Skill implementation

├── data/ # Persistent storage
│ ├── agent_data.db # Main SQLite database file
│ └── cache/ # Directory for diskcache

├── logs/ # Log files directory
│ └── agent.log # Main application log file

├── outputs/ # Default directory for generated media/files
│ ├── images/
│ │ ├── a1111/
│ │ └── amuse/
│ ├── videos/
│ ├── audio/
│ ├── swf/
│ └── temp/ # Temporary files

├── models/ # Local non-LLM models (downloaded/placed here)
│ └── detection/
│ └── yolov8n.onnx # Example object detection model

├── external/ # Cloned/Installed external tools & service codebases
│ ├── stable-diffusion-webui/
│ ├── rocm-video-service/ # Contains FastAPI app & scripts for video service
│ ├── rocm-audio-service/ # Contains FastAPI app & scripts for audio service
│ ├── bytecraft/
│ └── amuse_implementation/ # Contains Amuse control scripts & its own venv

├── locale/ # Source .po files for translation (managed by build process)
│ └── en/LC_MESSAGES/
│ └── legacy_oracle.po

├── scripts/ # Helper/utility scripts
│ ├── setup_check.py # Verifies environment setup prerequisites
│ ├── run_agent.ps1 # Example PowerShell script to launch agent
│ ├── manage_services.ps1 # Example script to start/stop external services
│ └── compile_locale.py # Script to compile .po to .mo files

├── tests/ # Automated tests directory
│ ├── __init__.py
│ ├── unit/ # Unit tests per module
│ ├── integration/ # Integration tests for component interactions
│ └── e2e/ # End-to-end workflow tests

├── main.py # Application entry point (Initializes Core, UI, Starts loop)
├── requirements_agent.txt # Core agent Python dependencies for pip
├── requirements_video_svc.txt # Video service Python dependencies
├── requirements_audio_svc.txt # Audio service Python dependencies
├── requirements_amuse_impl.txt # Amuse implementation Python dependencies
├── Dockerfile.agent # Dockerfile for building core agent container
├── Dockerfile.rocm_video # Dockerfile for building video service container
├── Dockerfile.rocm_audio # Dockerfile for building audio service container
├── docker-compose.yaml # Docker Compose file for running all services
└── README.md # Project overview, setup guide, contribution guidelines

8. Core Agentic Flow & Philosophy

This section details the fundamental operational flow of the LegacyOracle Super Agent (v6.6.2 onwards), contrasting it with simpler static approaches and explaining the rationale behind the dynamic, multi-step process.

8.1 Contrast with Static Approaches

Simpler agent designs or earlier prototypes often utilize a static mapping strategy. In such systems:

  • A predefined dictionary (like a basic MODEL_CATEGORIES) directly links a task description (e.g., “coding”) to a single, predetermined model (e.g., “qwen2.5-coder-14b-instruct”).
  • Task identification relies on simple keyword matching within the user prompt or basic intent classification rules.
  • A single selected model is typically responsible for handling the entire request from interpretation to final output generation.
  • Configuration might only involve setting a global default model or a small set of task-specific models.

While straightforward, this static approach lacks the necessary flexibility and intelligence to handle the diverse and complex tasks expected of LegacyOracle. It cannot effectively leverage the specialized strengths of different models, adapt to varying resource availability (VRAM/RAM), manage complex multi-step workflows, or incorporate quality control mechanisms like cross-checking.

8.2 The Dynamic Agentic Flow Explained (v6.6.2 Detailed Steps)

LegacyOracle v6.6.2 implements a sophisticated, dynamic, multi-step agentic flow designed for optimal performance, flexibility, and quality:

User Input Reception (UI -> Core): The user submits a request (text, potentially triggering file attachment or future voice input) via the Coplot GUI. The UI_Controller processes this and sends a structured request to the AC_CommInterface of the Agent Core.

Orchestration – Task Understanding (AgentOrchestrator + Orchestrator LLM):

The AgentOrchestrator receives the request.

It utilizes the designated Orchestrating Model (phi-4-mini-instruct-reasoning…v2 by default, loaded via LMS_Client) for initial analysis.

A carefully crafted prompt asks the Orchestrator Model to analyze the user request’s intent, classify the primary task_category (e.g., “coding”, “logic”, “image_generation”, “web_automation”, “system_command”), determine necessary capabilities (e.g., requires_vision=True, requires_tool_use=True), and assess if the task is simple (single skill likely sufficient) or complex (requiring multiple steps or specialized sub-agent reasoning).

The AgentOrchestrator parses the Orchestrator Model’s structured response.

UI Feedback: The agent emits signals via AC_CommInterface to display [Reasoning: Step 1 – Analyzing task intent…] in the UI_ChatPanel.

Orchestration – Model & Strategy Selection (AgentOrchestrator -> ModelSelector -> ReasoningController):

Based on the task analysis (category, required capabilities), the AgentOrchestrator queries the AC_ModelSelector.

ModelSelector Logic:

Consults config/model_config.py::SKILLS_MATRIX (using Qualitative ratings like “High”, “Medium”, “Low”).

Maps qualitative ratings internally to numerical scores for ranking (e.g., High=3, Medium=2, Low=1).

Checks real-time resource availability (VRAM/RAM) against model requirements via AC_ResourceGovernor. Filters out models that won’t fit.

Scores the remaining eligible models based on their ratings for the required primary skill (and secondary skills if needed).

Applies user preference (speed vs accuracy from UserConfigManager) as a tie-breaker (e.g., prioritizing lower params/VRAM for speed, higher relevant skill score for accuracy).

Selects the top-scoring model(s) suitable for the task.

Determines the appropriate temperature from config/model_config.py::TEMPERATURES for the selected model and task category.

Model Loading: If the selected execution model is not currently loaded in LM Studio, ModelSelector initiates an asynchronous load request via AC_AsyncTaskManager and LMS_Client, respecting ResourceGovernor limits and potentially unloading an LRU model if necessary. The flow waits for the model to be ready (or handles load failure).

The AgentOrchestrator receives the selected execution model_name and temperature.

It then determines the appropriate ReasoningStrategy (e.g., CodeReasoning, VisualReasoning, GeneralReasoning) via the AC_ReasoningController based on the task category.

UI Feedback: Emits signals to display [Reasoning: Step 2 – Determined Task: {category}…], [Reasoning: Step 3 – Selecting optimal execution model…], [Model Selected: {model_name} for {category} task], [Reasoning: Step 4 – Using {StrategyName} strategy…].

Orchestration – Planning & Delegation (AgentOrchestrator -> SkillDispatcher / ConductorAgent):

Simple Task: If the task analysis indicated a simple, single-skill task, the AgentOrchestrator directly calls AC_SkillDispatcher.dispatch(skill_name, inputs, selected_model, temperature).

Complex Task: If the task is complex or requires multi-step logic, the AgentOrchestrator delegates the entire execution planning and management to the AC_ConductorAgent.handle_complex_task(task_analysis, initial_inputs). The ConductorAgent then performs sub-task decomposition, sub-agent assignment (potentially involving further model selections per sub-task), execution management, and result aggregation.

Task Execution (Skills Layer / Sub-Agents -> Backend Clients -> External Services/OS):

The dispatched Skill or Sub-Agent executes the core logic.

It uses the selected_model and temperature provided, making calls to LMS_Client.invoke for necessary LLM inference.

It interacts with the necessary Backend Clients (e.g., WebDriverClient for web tasks, OS_Client_PSExecutor for PowerShell, A1111_Client for image gen, Sandbox_Client for code execution).

UI Feedback: Skills and reasoning strategies emit intermediate [Reasoning: Step X – Performing action Y…] steps via the stream_callback provided by the ReasoningController/Core. The UI ChatPanel displays these.

MoA – Cross-Checking (Optional, AgentOrchestrator / ConductorAgent):

After receiving the initial result from the execution step, the orchestrator checks if MoA cross-checking is enabled and if resources permit (ResourceGovernor).

If yes, it selects a suitable cross-checking model (from model_config.py) via ModelSelector.

It invokes the cross-checker model via LMS_Client with a prompt asking it to verify the initial result.

UI Feedback: [Reasoning: Step N – Cross-checking output with {checker_model}…].

MoA – Refinement (Optional, AgentOrchestrator / ConductorAgent):

Checks if refinement is enabled and resources permit.

Selects a lightweight refinement model (from model_config.py) via ModelSelector.

Invokes the refiner model via LMS_Client with a prompt asking it to polish the (potentially verified) output.

The refined text replaces or augments the previous result.

UI Feedback: [Reasoning: Step N+1 – Refining output with {refiner_model}…].

Final Output & State Update (Core -> UI):

The AgentOrchestrator receives the final (potentially refined) result dict from the Skill/ConductorAgent.

It determines the final agent emotion based on the result status (success/error) using StateManager.

It logs the interaction details to the MemoryManager (SQLite).

It sends the final response content, message ID, and emotion via AC_CommInterface signals to the UI_Controller.

The UI_Controller updates the UI_ChatPanel (appending the final Agent: message with feedback buttons) and the UI_FaceWidget.

Agent state returns to idle.

8.3 Role of the Orchestrating Model (phi-4-mini default)

phi-4-mini-instruct-reasoning-ita-sft-cold-start-claude-3.7-distillation-v2 (configurable in settings.yaml) acts as the initial request interpreter and planner. Its strengths in reasoning and efficiency make it ideal for analyzing user intent, classifying the task type, identifying required capabilities (vision, logic, tool use), and assessing task complexity before engaging potentially larger, more specialized execution models. It standardizes the input for the ModelSelector and ReasoningController.

8.4 Dynamic Model Selection (ModelSelector Logic Explained)

The ModelSelector is key to leveraging the diverse LLM fleet effectively:

Input: Task Category (from Orchestrator), specific requirements (vision, tool use, logic), user preference (speed/accuracy).

Process:

Filter SKILLS_MATRIX models by MODEL_CATEGORIES relevant to the task.

Filter further by specific requirements (e.g., remove models where vision != High if requires_vision=True).

Filter by available resources (query ResourceGovernor for VRAM/RAM, compare against model’s matrix values).

Score remaining models: Map qualitative ratings (“High”=3, “Medium”=2, “Low”=1, “Yes”=1, “No”=0) to points for the primary skill category and potentially secondary relevant skills.

Apply preference tie-breaker: If multiple models have the top score, select the one with lower params/vram if preference is speed, or higher primary skill/reasoning/logic score if preference is accuracy.

Load/Confirm Loaded: Ensure the chosen model is loaded in LM Studio via LMS_Client, potentially unloading another model via LRU if needed and resources allow.

Determine Temperature: Look up appropriate temperature in TEMPERATURES dict for the selected model and task category.

Output: (selected_model_name: str, temperature: float).

8.5 MoA (Mixture-of-Agents) Cross-Checking/Refinement Logic

This optional layer enhances output quality:

  • Trigger: Enabled via settings.yaml and sufficient resources confirmed by ResourceGovernor.
  • Cross-Checking Model: Defined in model_config.py (e.g., qwen2.5-7b-instruct-1). Selected via ModelSelector. Prompt focuses on verification. Result appended to output (e.g., [Verification: Accurate]).
  • Refinement Model: Defined in model_config.py (e.g., qwen2.5-0.5b-instruct). Selected via ModelSelector. Prompt focuses on clarity/conciseness, taking original request and verified output as input. Replaces previous output.

8.6 Benefits Summary

Resource Efficiency: Dynamic selection respects hardware limits (128GB RAM, 16GB VRAM), enabling use of larger models when feasible and preventing crashes. Preloading/unloading optimizes availability.

Optimality: Best model chosen for each specific task’s demands.

Flexibility: Easily handles diverse tasks (coding, vision, logic, automation, generation) and adapts to new models via SKILLS_MATRIX.

Intelligence: Uses an LLM for nuanced task understanding and planning.

Quality: MoA layer provides optional verification and polishing.

Okay, focusing exclusively on Section 9: Component Deep Dive (Specifications) and expanding it with the full, detailed content as requested for the LegacyOracle Super Agent v6.6.2 specification.


9. Component Deep Dive (Specifications)

This section provides detailed specifications for each major logical component group within the LegacyOracle system, outlining their purpose, key responsibilities, primary interfaces (classes, methods, signals/slots, API endpoints where applicable), core implementation logic notes, configuration dependencies, and interactions with other components.

(Note: Full Python code is not provided for brevity, but class structures, method signatures with type hints, and detailed logic descriptions are included to guide implementation.)


9.1 UI (Coplot GUI – ui/)

Purpose: Provides the primary user interface built with PyQt6 for interacting with the LegacyOracle agent, managing tasks, viewing outputs, configuring settings, and monitoring status. Adheres to specified aesthetics (Backgrounds: #f0f0f0, #ffffff; Terminal Text: #00FF00; Face: #ffdc96).

Communication Protocol: Uses Qt’s Signals and Slots mechanism via UI_Controller and AC_CommInterface for non-blocking communication between the UI thread and the asynchronous Agent Core.


9.1.1 MainWindow (ui/main_window.py)

Purpose: Main application window, orchestrates all UI panels and top-level interactions.

Responsibilities: Initialize main window (QMainWindow), set title (“LegacyOracle Super Agent v6.6.2”) and geometry. Instantiate and manage the central QTabWidget containing all main panels. Instantiate UI_Controller and pass necessary references. Initialize NativeIntegration for System Tray functionality. Display SetupWizard on first run or if config is invalid. Handle application close event (closeEvent) to gracefully trigger agent shutdown via UI_Controller. Load main stylesheet (.qss). Apply main background color (#f0f0f0).

Key Interfaces:

__init__(self, agent_comm_interface: AC_CommInterface)

closeEvent(self, event): Intercept close, signal agent to shutdown.

show_initial_setup(): Displays the SetupWizard.

Implementation Notes: Uses standard PyQt6 layout managers (QVBoxLayout, QHBoxLayout, QTabWidget). Central widget holds the main QTabWidget.

Config: None directly (receives theme via stylesheet).

Deps: PyQt6, UI_Controller, SetupWizard, NativeIntegration.


9.1.2 UI_Controller (ui/controller.py)

Purpose: Mediator between UI elements and the Agent Core (AC_CommInterface). Handles UI logic and state synchronization.

Responsibilities: Instantiate all UI panel widgets (ChatPanel, TerminalPanel, etc.). Establish all Signal/Slot connections: connect UI widget signals (button clicks, text edits, combo box changes) to controller slots or emit corresponding signals to AC_CommInterface; connect signals received from AC_CommInterface to slots within the controller or directly to UI widget update methods. Manage application-wide UI state (e.g., enabling/disabling controls based on agent status). Initiate agent actions based on UI events.

Key Interfaces:

__init__(self, main_window: MainWindow, agent_comm_interface: AC_CommInterface)

connect_signals(): Called after initialization to set up all connections.

Slots for UI Element Signals: @Slot() on_send_button_clicked(), @Slot(str) on_reasoning_strategy_changed(strategy_name), @Slot() on_record_macro_clicked(), etc.

Slots for Agent Core Signals: @Slot(str, str) update_agent_state(state, emotion), @Slot(str) update_terminal_log(log_entry), @Slot(str) update_chat_reasoning(step_text), @Slot(str, int) update_chat_response(response_text, message_id), @Slot(float, float, float) update_system_vitals(cpu, ram, vram), @Slot(str, str, str) show_proactive_suggestion(suggestion_id, title, message), etc.

Signals to Agent Core (via AC_CommInterface): sendMessageRequest = Signal(str), setPreferenceRequest = Signal(str, object), executeTaskRequest = Signal(str, dict), etc.

Implementation Notes: Central hub for UI logic. Uses signals/slots extensively. Must ensure interactions with the agent core interface are non-blocking (e.g., triggering async agent methods).

Config: None directly.

Deps: PyQt6, AC_CommInterface, All UI panel widgets (ChatPanel, TerminalPanel, etc.).


9.1.3 ChatPanel (ui/widgets/chat_panel.py)

  • Purpose: Displays conversation history, agent reasoning stream, and allows user feedback.
  • Responsibilities: Use QTextBrowser for display, enabling rich text (HTML subset). Append user messages (User: ...). Append reasoning steps ([Reasoning: ...]) distinctly. Append final agent responses (Agent: ...) followed by dynamically created “👍” / “👎” QPushButton widgets. Connect feedback button signals to feedback_provided signal. Ensure panel scrolls to the bottom automatically. Render code blocks using pygments (requires generating HTML <pre>...</pre> blocks). Apply white background (#ffffff).
  • Key Interfaces:
    • @Slot(str) append_user_message(text: str)
    • @Slot(str) append_reasoning_step(step_text: str)
    • @Slot(str, int) append_agent_response(response_text: str, message_id: int)
    • feedback_provided = Signal(int, bool, Optional[str]) # message_id, is_positive, optional_comment
  • Implementation Notes: Store message IDs associated with feedback buttons. Use QTextBrowser.append() which handles basic HTML. For syntax highlighting, generate HTML using pygments.highlight and HtmlFormatter, then append.
  • Config: Background color (#ffffff).
  • Deps: PyQt6, pygments (optional for highlighting).

9.1.4 TerminalPanel (ui/widgets/terminal_panel.py)

  • Purpose: Displays timestamped system logs and internal agent messages.
  • Responsibilities: Use QTextBrowser (read-only). Apply white background (#ffffff) and green text (#00FF00) via stylesheet. Use a monospace font (e.g., ‘Consolas’, ‘Courier New’). Append formatted log entries received via signal. Limit the document block count to prevent excessive memory usage (e.g., keep last 1000 lines).
  • Key Interfaces:
    • @Slot(str) append_log_message(formatted_log_entry: str)
  • Implementation Notes: Set setMaximumBlockCount on the QTextDocument. Styles applied via setStyleSheet.
  • Config: Colors (#ffffff, #00FF00).
  • Deps: PyQt6.

9.1.5 AnimatedFaceWidget & VisualPersonaController (ui/widgets/animated_face.py, ui/visual_persona_controller.py)

  • Purpose: Provide non-verbal feedback on agent status.
  • Responsibilities:
    • AnimatedFaceWidget: QWidget subclass. Overrides paintEvent to draw face elements (circle, eyes, mouth) using QPainter. Uses internal QTimer for simple blinking animation. Draws different features based on self.state and self.emotion properties. Uses skin tone #ffdc96.
    • VisualPersonaController: Manages face state (idle, thinking, speaking, error, listening) and emotion (neutral, positive, negative, surprised, confused). Provides update_status(state, emotion) method called by UI_Controller. Maps status to face widget properties. (Future: Integrates with TTS client’s phoneme output and lip-sync library like Rhubarb Lip Sync to generate detailed mouth shape commands for AnimatedFaceWidget).
  • Key Interfaces:
    • AnimatedFaceWidget.set_state(state: str)
    • AnimatedFaceWidget.set_emotion(emotion: str)
    • VisualPersonaController.update_status(state: str, emotion: str)
  • Implementation Notes: Keep drawing logic in paintEvent efficient. Use simple geometric shapes. Controller acts as a state machine for the face.
  • Config: Face color (#ffdc96).
  • Deps: PyQt6 (QtGui, QtCore, QtWidgets).

9.1.6 ContextPanel (ui/widgets/context_panel.py)

  • Purpose: Allow user to manage information provided as context to the agent.
  • Responsibilities: Display list/tree of contextual items (e.g., loaded file summaries, previous results, workspace tasks defined by user). Use QListWidget or QTreeView with checkable items (Qt.ItemIsUserCheckable). Emit signal when check state changes, providing list of active context item IDs. Display estimated token count for selected items (calculated by agent core).
  • Key Interfaces:
    • @Slot(List[Tuple[str, str]]) update_context_items(items: List[Tuple[id, display_name]])
    • @Slot(int) update_token_count(count: int)
    • context_selection_changed = Signal(List[str]) # List of selected item IDs
  • Implementation Notes: Model-view pattern might be better for complex context. Token count updated asynchronously by agent.
  • Config: None directly.
  • Deps: PyQt6.

9.1.7 SettingsPanel (ui/widgets/settings_panel.py)

  • Purpose: Allow user configuration of agent behavior and preferences.
  • Responsibilities: Provide UI widgets (QComboBox, QCheckBox, QSpinBox, QLineEdit, QPushButton) for ALL settings defined in settings.yaml (defaults) and user_config.yaml (overrides). Includes: LM Studio URL, Service URLs, Paths, Default Reasoning Strategy, Manual Model Override, Model Selection Preference, Proactive Enabled/Frequency/Thresholds, Privilege Mode Toggle, Allowed Elevated Actions (display only?), Theme selection, Language selection, Skill Priorities, Workflow Management trigger. Load current values from UserConfigManager (via UI_Controller) on display. Emit signals on value change to update UserConfigManager. Provide “Save” button to persist changes.
  • Key Interfaces:
    • load_settings(settings: dict, user_settings: dict)
    • setting_changed = Signal(str, object) # key, value
    • save_settings_requested = Signal()
  • Implementation Notes: Group settings logically using QGroupBox or tabs. Use input validation where necessary.
  • Config: Reads setting keys/types indirectly via data passed from ConfigManager/UserConfigManager.
  • Deps: PyQt6.

9.1.8 StatusPanel (ui/widgets/status_panel.py)

  • Purpose: Display real-time status information.
  • Responsibilities: Use QLabel widgets to display Agent State (Idle, Thinking, etc.), Active Model (from ModelSelector), CPU %, RAM %, VRAM % (from ResourceGovernor/SystemMonitorSkill), Context Token Count/Limit. Update labels based on signals received from UI_Controller.
  • Key Interfaces:
    • @Slot(str) update_agent_state_display(state: str)
    • @Slot(str) update_active_model_display(model_name: str)
    • @Slot(float, float, float) update_vitals_display(cpu: float, ram: float, vram: float)
    • @Slot(int, int) update_context_token_display(used: int, limit: int)
  • Implementation Notes: Keep updates efficient. Can be integrated into main window’s QStatusBar.
  • Config: None directly.
  • Deps: PyQt6.

9.1.9 GenerativeTabs & MediaPlayers (ui/widgets/generative_tabs.py, ui/widgets/media_players.py)

  • Purpose: Provide dedicated interfaces for triggering and viewing generative tasks.
  • Responsibilities:
    • GenerativeTabs: QTabWidget. Contains separate child widgets for each generative type (Image-A1111, Image-Amuse, Video-ROCm, Audio-ROCm, SWF-ByteCraft).
    • Each Tab Widget: Provides relevant parameter inputs (QLineEdit for prompt, QSpinBox for steps/duration, QComboBox for samplers/styles). Has a “Generate” button triggering the corresponding agent skill via UI_Controller. Displays a progress bar/status label updated via signals. Includes an instance of the appropriate MediaPlayer widget.
    • MediaPlayers: Separate QWidget subclasses. ImageViewer uses QLabel with QPixmap. VideoPlayer uses QVideoWidget+QMediaPlayer. AudioPlayer uses QMediaPlayer. SwfPlayer uses QWebEngineView loading ruffle.js and the target SWF file path. Provide methods like load_media(path).
  • Key Interfaces:
    • generate_request = Signal(str, dict) # generative_type, parameters
    • @Slot(str, str, float) update_generation_progress(task_id, status, progress)`
    • @Slot(str, str) display_generated_media(task_id, output_path)`
  • Implementation Notes: Use standard media playback features of PyQt6. Ruffle integration requires careful handling of local file URLs in QWebEngineView.
  • Config: None directly (parameters might have defaults from settings.yaml).
  • Deps: PyQt6, PyQt6-WebEngine.

9.1.10 TaskMgmtPanel, RecordMacroDialog, Task Definition Editor/Loader UI

  • Components: ui/widgets/task_mgmt_panel.py, ui/widgets/record_macro_dialog.py, potentially ui/widgets/task_editor.py (for manual editing).
  • Purpose: Provide UI elements for managing, creating (via recording or loading JSON), selecting, and executing custom GUI and Web Automation tasks.
  • Responsibilities:
    • TaskMgmtPanel: Displays buttons (“Record GUI Macro”, “Record Web Task”, “Load Task JSON”, “Execute Task”) and a QComboBox (task_selector) populated with task names from AC_TaskManager. Emits signals to UI_Controller when buttons are clicked or a task is selected for execution.
    • RecordMacroDialog: Handles the GUI macro recording workflow (as detailed in v6.5). Prompts for app path/task name, uses pynput listeners (via TAF_MacroRecorder logic) to capture clicks/keys/timing, displays recorded steps, prompts for input/output variable mapping and save_dir, constructs TaskDefinition, and returns it upon saving. Includes safety pauses.
    • Task Editor/Loader UI: (Could be part of TaskMgmtPanel or separate dialog triggered by “Load Task JSON”). Allows users to select a .json file containing a TaskDefinition or WebTaskDefinition. Validates the JSON structure. Provides an interface (potentially a QTextEdit or structured form) to view/edit task definitions manually (Advanced User feature). Interacts with AC_TaskManager to save loaded/edited tasks.
  • Key Interfaces:
    • TaskMgmtPanel.record_gui_macro_requested = Signal()
    • TaskMgmtPanel.record_web_task_requested = Signal()
    • TaskMgmtPanel.load_task_requested = Signal()
    • TaskMgmtPanel.execute_task_requested = Signal(str) # task_name
    • TaskMgmtPanel.update_task_list(task_names: List[str]) # Slot to update dropdown
    • RecordMacroDialog.exec() -> Optional[dict] # Returns TaskDefinition dict if saved
    • RecordWebDialog.exec() -> Optional[dict] # Returns WebTaskDefinition dict if saved
    • TaskEditor.load_task(task_def: dict)
    • TaskEditor.get_task() -> dict
  • Implementation Notes: TaskMgmtPanel gets task list updates from AC_TaskManager via UI_Controller. Recording dialogs manage external process launch (subprocess) and input listeners (pynput, potentially browser extension comms for web). Task saving signals UI_Controller to call AC_TaskManager.add_task. Execution signals UI_Controller to call AC_Agent.execute_automation_task.
  • Config: Reads task definitions from TaskManager (which loads from JSON/DB).
  • Deps: PyQt6, pynput (for Macro Recorder), core.automation.* (Task Definitions, TaskManager).

9.1.11 RecordWebDialog (ui/widgets/record_web_dialog.py)

  • Purpose: Guides the user through recording a sequence of web browser interactions for automation.
  • Responsibilities: Prompts for Task Name and Start URL. Initiates the recording process (method depends on implementation choice). Displays recorded steps (Action, Selector, Value, Delay). Prompts for input/output variable mapping (e.g., mapping a typed password to a credential key). Prompts for credential key (login_credential_key) if login steps detected. Constructs WebTaskDefinition and returns it.
  • Implementation Strategy (Choose One or Hybrid):
    • A) Guided Manual Definition (Simpler): UI prompts user sequentially: “Navigate to URL?”, “Identify the CSS Selector for the element to click/type into?”, “What value to type/select?”. Requires user familiarity with selectors. Uses WebDriverClient directly to test selectors perhaps.
    • B) Browser Extension (Advanced/Best UX): Requires building a separate browser extension (e.g., Chrome Extension). The extension records user clicks/typing, identifies robust selectors, and sends action data (via Native Messaging or WebSockets) to the RecordWebDialog running in the Python UI. The dialog receives and displays these steps.
    • C) Playwright Codegen Adaptation (Potential): Explore if Playwright’s codegen feature can be invoked programmatically via subprocess and its output parsed to generate the steps. This might be complex to integrate smoothly.
  • Key Interfaces:
    • exec() -> Optional[dict] # Returns WebTaskDefinition dict if saved
  • Implementation Notes: Strategy B is most user-friendly but involves significant extra work (browser extension development). Strategy A is feasible but less intuitive for non-developers. Recording logic resides in core/automation/web_automation/recorder.py.
  • Config: None directly.
  • Deps: PyQt6, core.automation.web_automation.task_definition.

9.1.12 WorkflowEditor UI (ui/widgets/workflow_editor.py)

  • Purpose: Allows users to define and manage multi-step automation flows.
  • Responsibilities: Provide a UI (e.g., using QListWidget or a more graphical node-based editor) to:
    • Create new workflows (prompt for name).
    • Add steps to a workflow, selecting the type (GUI Task, Web Task, Built-in Skill Call).
    • Select the specific task/skill for each step from dropdowns populated by TaskManager and available skills.
    • Define input mapping for each step (e.g., use output {“image_path”: …} from Step 1 as input image_path for Step 2). Use placeholders like {step1.data.image_path}.
    • Define overall workflow inputs.
    • (Future) Add conditional logic (if step status == “success”…) and loops.
    • Save workflow definitions (signals UI_Controller to call AC_WorkflowEngine.save_flow).
    • Load existing workflows for editing/execution.
  • Key Interfaces:
    • load_workflow(flow_definition: dict)
    • get_workflow() -> dict
    • save_workflow_requested = Signal(dict)
  • Implementation Notes: Needs careful design for input/output mapping UI. Storing flow definitions as JSON in config/flows.json or DB table flows.
  • Config: Reads task names from TaskManager.
  • Deps: PyQt6, core.WorkflowEngine, core.TaskManager.

9.1.13 ViewWindowPanel (ui/widgets/view_window.py)

  • Purpose: Display visual output from skills, primarily the VisionSkill.
  • Responsibilities: Contains a QLabel configured to display QPixmap. Includes QScrollArea for large images. Provides a slot update_visual_context(context: dict) connected to signals from the agent core. When context contains image data (e.g., context[‘data’][‘screenshot_path’] or base64 string), load the image into the QLabel via QPixmap. Display associated OCR text and detected object info in separate labels within the panel. Clear the display when appropriate.
  • Key Interfaces:
    • @Slot(dict) update_visual_context(context: dict)
  • Implementation Notes: Handle image loading/scaling efficiently. Clear previous content on new updates.
  • Config: None directly.
  • Deps: PyQt6.

9.1.14 NativeIntegration (ui/native_integration.py)

  • Purpose: Handle interactions with native Windows UI elements like System Tray and Notifications.
  • Responsibilities:
    • Use pystray library to create and manage the System Tray icon (icon.png). Define menu items (e.g., “Show/Hide Agent”, “Toggle Proactive”, “Quit”). Connect menu item actions to signals emitted to UI_Controller. Run the pystray icon loop in a separate thread to avoid blocking the main Qt loop.
    • Use win11toast (or winrt-notifications) library to display native Windows toast notifications. Provide a simple method show_notification(title: str, message: str) called by UI_Controller when triggered by agent core (e.g., ProactiveManager).
  • Key Interfaces:
    • __init__(self, show_action_callback, hide_action_callback, quit_action_callback, …)
    • run_tray_icon() # Starts the pystray loop in thread
    • stop_tray_icon()
    • show_notification(title: str, message: str, **kwargs)
  • Implementation Notes: Requires careful thread management for pystray. Ensure icon.png exists.
  • Config: Icon path.
  • Deps: pystray, Pillow (for loading icon), win11toast (or chosen notification lib).

9.1.15 PluginManagerUI (ui/widgets/plugin_manager_ui.py)

  • Purpose: Allow users to view and manage installed/available plugins.
  • Responsibilities: Display a list (QListWidget) of plugins discovered by AC_PluginManager. Show plugin name, version, author, description. Provide buttons/checkboxes to enable/disable loaded plugins. (Future: Add “Install from URL/File”, “Configure Plugin” buttons). Interact with AC_PluginManager via UI_Controller.
  • Key Interfaces:
    • @Slot(list) update_plugin_list(plugins_info: List[dict])
    • plugin_toggled = Signal(str, bool) # plugin_name, enabled_state
  • Implementation Notes: Loads data received via signal. Does not handle plugin installation itself in this version.
  • Config: None directly.
  • Deps: PyQt6.

9.1.16 LanguageSelector (ui/widgets/language_selector.py)

  • Purpose: Allow user to change the application’s display language.
  • Responsibilities: QComboBox populated with available languages (e.g., “en”, “fr”, “es” – based on existing .mo files in locale/). Emit signal language_changed = Signal(str) when selection changes. UI_Controller connects this to AC_LocalizationManager.set_language and triggers a UI text refresh.
  • Key Interfaces:
    • language_changed = Signal(str) # language_code (e.g., “fr”)
    • set_available_languages(languages: List[str])
    • set_current_language(language_code: str)
  • Implementation Notes: Needs mechanism to trigger re-translation of all visible UI text upon language change.
  • Config: Reads available languages perhaps from directory structure or settings.yaml.
  • Deps: PyQt6.

9.1.17 SetupWizard (ui/setup_wizard.py)

  • Purpose: Guide new users through essential initial configuration.
  • Responsibilities: Multi-page QWizard. Pages for: Welcome, Environment Check (optional), Service URL Configuration (LM Studio, A1111, ROCm Services with “Test Connection” buttons), Key Path Configuration (Outputs, DB, External Tools like Amuse/ByteCraft), Initial Preferences (Language, Theme, Proactive Opt-in), Summary. On finish, returns collected configuration data to main application for saving via ConfigManager/UserConfigManager.
  • Key Interfaces:
    • exec() -> int (Standard QDialog return code)
    • get_collected_config() -> dict
  • Implementation Notes: Uses QWizardPage. Connection tests use OpsManager health checks via agent core. Path validation ensures directories/files exist.
  • Config: Reads default values from settings.yaml.example.
  • Deps: PyQt6, core.OpsManager (indirectly for tests).

9.1.18 UI Communication Protocol (Signals/Slots Summary)

Data Types: Use basic Python types (str, int, float, bool, dict, list) for signal/slot arguments where possible for simplicity. For complex data, pass dictionaries.

Standard: Use Qt’s typed Signals and Slots. UI_Controller connects signals from specific widgets (e.g., QPushButton.clicked) to its slots. Controller slots process UI events and emit standardized signals to AC_CommInterface. AC_CommInterface emits signals representing agent state changes, results, logs, etc., which are connected to slots in UI_Controller that update the appropriate UI widgets.

Threading: Use Qt.QueuedConnection for signals crossing the UI/Agent thread boundary to ensure thread safety.

Key Signals (Examples):

UI -> Core: sendMessageRequest(str), executeTaskRequest(str, dict), saveSettingsRequest(dict), toggleProactiveRequest(bool), recordMacroRequest(str, str), provideFeedbackRequest(int, bool, Optional[str]).

Core -> UI: agentStateChanged(str, str), newLogMessage(str), reasoningStep(str), modelSelectionUpdate(str, str), finalAgentResponse(str, int), systemVitalsUpdate(float, float, float), proactiveSuggestion(str, str, str), taskStatusUpdate(str, str, Optional[float], str), taskCompleted(str, dict), visualContextUpdate(dict).


(Specifications as detailed previously in v6.5/v6.6.1, allowing users to chain GUI/Web/Skill tasks)

  • Interacts with AC_WorkflowEngine via UI_Controller.

(Start of Section 10)

10. Configuration & Data Management

This section details the structure, content, and management of configuration files and persistent data storage used by the LegacyOracle Super Agent.

10.1 Configuration Files

Configuration is split into static definitions (model_config.py), system defaults/paths (settings.yaml), user overrides (user_config.yaml), and potentially user-created task definitions (JSON files or DB).

10.1.1 config/model_config.py (Static Model Capabilities)

Purpose: Defines the known capabilities, resource requirements, and default parameters of the language models available via LM Studio. This file is maintained by the developers based on model evaluations and LM Studio availability. It is crucial for the ModelSelector.

Content: Contains Python dictionaries and lists. Uses Qualitative Ratings (“High”, “Medium”, “Low”, “Yes”, “No”) for skills, which are mapped internally by ModelSelector to numerical scores for ranking (e.g., High=3, Medium=2, Low=1 or using float scores like 0.9, 0.6, 0.3). Includes VRAM/RAM estimates in GB (approximated for common quantizations loaded in LM Studio).

# config/model_config.py
# Defines static capabilities of known models for the ModelSelector.
# Maintained by developers, not typically edited by end-users.
MATRIX_VERSION = "1.3" # Version of this matrix structure/data
# Skill Ratings: High, Medium, Low. tool_use: Yes/No. vision: High/Medium/Low.
# VRAM/RAM in GB: Approximate loaded values for common quantizations (adjust based on actual measurements).
# context_window: Max tokens the model supports.
SKILLS_MATRIX = {
# --- Reasoning & Logic Focused ---
"phi-4-mini-instruct-reasoning-ita-sft-cold-start-claude-3.7-distillation-v2":
{"params": 3.8, "vram": 4.1, "reasoning": "High", "coding": "Medium", "nlu": "Medium", "generation": "Medium", "vision": "Low", "tool_use": "No", "logic": "High", "context_window": 2048},
"math_phi4_reason": # Assuming this is phi-4 fine-tuned for math/logic
{"params": 15, "vram": 8.9, "reasoning": "High", "coding": "Low", "nlu": "Low", "generation": "Low", "vision": "Low", "tool_use": "No", "logic": "High", "context_window": 16384}, # Q4_K_M VRAM
"nousresearch/hermes-3-llama-3_2-3b": # Example - Small but potentially strong logic/reasoning
{"params": 3, "vram": 3.6, "reasoning": "High", "coding": "Medium", "nlu": "Medium", "generation": "High", "vision": "Low", "tool_use": "Yes", "logic": "High", "context_window": 4096}, # Q8_0 VRAM
"mmngaaxcept-phi-4-deepseek-rlk-rl1-ezo": # Custom Phi-4 fine-tune
{"params": 15, "vram": 8.4, "reasoning": "High", "coding": "Medium", "nlu": "Medium", "generation": "Medium", "vision": "Low", "tool_use": "No", "logic": "High", "context_window": 16384}, # IQ4_NL VRAM
"phi-3-mini-instruct": # Example rating for standard Phi-3 Mini
{"params": 3.8, "vram": 4.1, "reasoning": "High", "coding": "Medium", "nlu": "Medium", "generation": "Medium", "vision": "Low", "tool_use": "No", "logic": "High", "context_window": 4096}, # Assuming Q8_0 VRAM and 4k context variant
"phi-4": # Example rating for base Phi-4 (if available)
{"params": 15, "vram": 15.6, "reasoning": "High", "coding": "Medium", "nlu": "Medium", "generation": "Medium", "vision": "Low", "tool_use": "No", "logic": "High", "context_window": 16384}, # Assuming Q8_0 VRAM
"phi-4-14b-reasoning-2000steps": # Specific fine-tune
{"params": 14.7,"vram": 8.9, "reasoning": "High", "coding": "Low", "nlu": "Low", "generation": "Low", "vision": "Low", "tool_use": "No", "logic": "High", "context_window": 16384}, # Q4_K_M VRAM
"exaone-deep-32b":
{"params": 32, "vram": 19.3, "reasoning": "High", "coding": "Medium", "nlu": "Medium", "generation": "Medium", "vision": "Low", "tool_use": "No", "logic": "High", "context_window": 32768}, # Q4_K_M VRAM
# --- Coding Focused ---
"deepseek-coder-v2-lite-instruct":
{"params": 16, "vram": 14.1, "reasoning": "Medium", "coding": "High", "nlu": "Medium", "generation": "High", "vision": "Low", "tool_use": "No", "logic": "Medium", "context_window": 16384}, # Q6_K VRAM
"qwen-32b": # Assuming Qwen1.5-32B-Chat GGUF Q3_K_M VRAM estimate
{"params": 32, "vram": 16.0, "reasoning": "High", "coding": "High", "nlu": "Medium", "generation": "Medium", "vision": "Low", "tool_use": "Yes", "logic": "High", "context_window": 32768},
"qwen2.5-coder-32b-instruct": # Qwen2.5 specific coder model
{"params": 32.5,"vram": 19.9, "reasoning": "Medium", "coding": "High", "nlu": "Medium", "generation": "High", "vision": "Low", "tool_use": "No", "logic": "Medium", "context_window": 32768}, # Q4_K_M VRAM
"qwen2.5-coder-14b-instruct":
{"params": 14.7,"vram": 15.7, "reasoning": "Medium", "coding": "High", "nlu": "Medium", "generation": "High", "vision": "Low", "tool_use": "No", "logic": "Medium", "context_window": 32768}, # Q8_0 VRAM
"exaone-deep-7.8b":
{"params": 7.8, "vram": 4.8, "reasoning": "High", "coding": "High", "nlu": "Medium", "generation": "Medium", "vision": "Low", "tool_use": "No", "logic": "High", "context_window": 8192}, # Q4_K_M VRAM
"deepseek-r1-distill-qwen-7b": # Custom fine-tune/distillation name
{"params": 7, "vram": 8.1, "reasoning": "High", "coding": "High", "nlu": "Medium", "generation": "Medium", "vision": "Low", "tool_use": "No", "logic": "High", "context_window": 128000}, # Q8_0 VRAM
"deepseek-r1-distill-llama-8b": # Custom fine-tune/distillation name
{"params": 8, "vram": 8.5, "reasoning": "High", "coding": "High", "nlu": "Medium", "generation": "Medium", "vision": "Low", "tool_use": "No", "logic": "High", "context_window": 128000}, # Q8_0 VRAM
# --- General Purpose / Balanced ---
"openelm-in-7b-v0_1": # Assuming openELM Instruct
{"params": 7, "vram": 7.2, "reasoning": "Medium", "coding": "Medium", "nlu": "Medium", "generation": "Medium", "vision": "Low", "tool_use": "Yes", "logic": "Medium", "context_window": 8192}, # Q8_0 VRAM estimate
"mistral-small-3.1-24b-instruct-2503": # Mistral likely needs offloading on 16GB VRAM even w/ quant
{"params": 24, "vram": 13.5, "reasoning": "High", "coding": "Medium", "nlu": "High", "generation": "High", "vision": "High", "tool_use": "Yes", "logic": "High", "context_window": 32768}, # IQ4_NL VRAM estimate
"qwen2.5-7b-instruct-1": # Assuming Qwen 1.5 7B Instruct
{"params": 7, "vram": 8.1, "reasoning": "Medium", "coding": "Medium", "nlu": "High", "generation": "High", "vision": "Low", "tool_use": "Yes", "logic": "Medium", "context_window": 199999}, # Q8_0 VRAM, large context specified
"granite-3.2-8b-instruct": # Likely IBM Granite Instruct
{"params": 8, "vram": 8.7, "reasoning": "High", "coding": "Medium", "nlu": "Medium", "generation": "High", "vision": "Low", "tool_use": "No", "logic": "High", "context_window": 8192}, # Q8_0 VRAM
"gemma-3-27b-it": # Needs offloading on 16GB VRAM
{"params": 27, "vram": 17.4, "reasoning": "High", "coding": "Medium", "nlu": "High", "generation": "High", "vision": "High", "tool_use": "Yes", "logic": "High", "context_window": 8192}, # Q4_K_M VRAM
"gemma-3-12b-it": # Borderline on 16GB VRAM for Q8
{"params": 12, "vram": 13.4, "reasoning": "High", "coding": "Medium", "nlu": "High", "generation": "High", "vision": "High", "tool_use": "Yes", "logic": "High", "context_window": 8192}, # Q8_0 VRAM
# --- Vision Focused ---
"granite-vision-3.2-2b":
{"params": 2, "vram": 3.6, "reasoning": "Low", "coding": "Low", "nlu": "Low", "generation": "Medium", "vision": "High", "tool_use": "No", "logic": "Low", "context_window": 4096}, # Q8_0 VRAM
# --- Other/Specialized ---
"qwen2.5-0.5b-instruct":
{"params": 0.5, "vram": 0.4, "reasoning": "Low", "coding": "Medium", "nlu": "Medium", "generation": "Medium", "vision": "Low", "tool_use": "No", "logic": "Low", "context_window": 32768}, # Q4_K_M VRAM
"bytecraft": # Primarily Generation/Tool focus assumed from context
{"params": 8.4, "vram": 8.9, "reasoning": "Low", "coding": "Low", "nlu": "Low", "generation": "High", "vision": "Low", "tool_use": "Yes", "logic": "Low", "context_window": 8192}, # Q8_0 VRAM estimate
}
# List models capable of processing images (Vision skill >= High)
MULTIMODAL_MODELS = [m for m, s in SKILLS_MATRIX.items() if s.get("vision", "Low") == "High"]
# List models supporting tool use formatting / function calling
TOOL_USE_MODELS = [m for m, s in SKILLS_MATRIX.items() if s.get("tool_use", "No") == "Yes"]
# List models strong in logical reasoning
LOGIC_MODELS = [m for m, s in SKILLS_MATRIX.items() if s.get("logic", "Low") == "High"]
# List models strong in coding
CODING_MODELS = [m for m, s in SKILLS_MATRIX.items() if s.get("coding", "Low") == "High"]
# Default temperatures per model for different task types
TEMPERATURES = { model: {
"general": 0.7,
"coding": 0.2,
"reasoning": 0.4,
"vision": 0.6, # Vision description often needs some creativity
"math": 0.3,
"logic": 0.3, # Logic usually needs precision
"generation": 0.8, # Creative text generation
"nlu": 0.6, # Natural language understanding
"self_critique": 0.5,
"creative": 0.9, # Highly creative tasks
"summarization": 0.5
} for model in SKILLS_MATRIX
}
# Example Override: Make coding models slightly more creative if needed
# for model in CODING_MODELS:
#     if model in TEMPERATURES: TEMPERATURES[model]["coding"] = 0.3
# Mapping of logical task categories to lists of suitable model names
MODEL_CATEGORIES = {
"general": [m for m, s in SKILLS_MATRIX.items() if s.get("nlu", "Low") in ["High", "Medium"]],
"coding": CODING_MODELS,
"math": LOGIC_MODELS, # Assume strong logic models are good for math unless specific math models exist
"reasoning": [m for m, s in SKILLS_MATRIX.items() if s.get("reasoning", "Low") == "High" or s.get("logic", "Low") == "High"],
"logic": LOGIC_MODELS,
"vision": MULTIMODAL_MODELS,
"image_generation": MULTIMODAL_MODELS, # For interpreting prompt for external service
"swf_generation": MULTIMODAL_MODELS, # For interpreting prompt for ByteCraft
"generation": [m for m, s in SKILLS_MATRIX.items() if s.get("generation", "Low") in ["High", "Medium"]],
"tool_use": TOOL_USE_MODELS,
"orchestration": ["phi-4-mini-instruct-reasoning-ita-sft-cold-start-claude-3.7-distillation-v2"], # Explicit default
"refinement": ["qwen2.5-0.5b-instruct"], # Explicit default
"cross_check": ["qwen2.5-7b-instruct-1"], # Explicit default
"system_command": TOOL_USE_MODELS, # Usually requires structured output/tool format
"automation": TOOL_USE_MODELS, # Often involves planning/tool use
"knowledge_acquisition": [m for m, s in SKILLS_MATRIX.items() if s.get("nlu", "Low") == "High" or s.get("reasoning", "Low") == "High"], # Needs good understanding/summarization
"few_shot": [m for m, s in SKILLS_MATRIX.items() if s.get("nlu", "Low") == "High"], # Good NLU helps understand examples
"self_critique": [m for m, s in SKILLS_MATRIX.items() if s.get("reasoning", "Low") == "High"],
}
# Default orchestrator model ID used by Agent Core
DEFAULT_ORCHESTRATOR_MODEL = "phi-4-mini-instruct-reasoning-ita-sft-cold-start-claude-3.7-distillation-v2"
# Helper function for ModelSelector (Optional - logic can be inline)
def get_qualitative_to_score_map():
return {"High": 3, "Medium": 2, "Low": 1, "Yes": 1, "No": 0}

config/settings.yaml:

Purpose: Stores deployment-specific configurations, default behaviors, API endpoints, paths, security settings. Should be configured once during setup and rarely changed unless environment changes.

# config/settings.yaml - Example Structure v1.0
# --- Service URLs ---
service_urls:
lm_studio: http://localhost:1234 # Verify LM Studio server address
a1111: http://localhost:7860     # Verify A1111 API address
rocm_video: http://localhost:8001 # Verify Video service address
rocm_audio: http://localhost:8002 # Verify Audio service address
# external_search_api: null # Example: Add if using SerpAPI etc.
# --- File System Paths (Use absolute paths or relative from project root) ---
paths:
outputs_base: ./outputs               # Base directory for generated files
logs_dir: ./logs                       # Directory for agent log files
database_file: ./data/agent_data.db  # Path to the main SQLite database
cache_dir: ./data/cache                # Directory for diskcache
plugins_dir: ./plugins               # Directory for custom plugins
locale_dir: ./ui/locale              # Directory for gettext .mo files
# --- Model/Tool Specific Paths ---
onnx_detection_model: ./models/detection/yolov8n.onnx # Path to YOLO model
bytecraft_script: ./external/bytecraft/run_bytecraft.py # Path to ByteCraft runner
amuse_python_exe: ./external/amuse_implementation/.venv/Scripts/python.exe # Path to Amuse venv python
amuse_script: ./external/amuse_implementation/amuse_generator.py # Path to Amuse generator script
# webdriver_path: # Optional: Explicit path if playwright install fails or specific driver needed
# --- Agent Core Settings ---
agent_settings:
orchestrator_model_id: "phi-4-mini-instruct-reasoning-ita-sft-cold-start-claude-3.7-distillation-v2" # Default orchestrator
default_reasoning_strategy: GeneralReasoning # Initial strategy
model_selection_preference: accuracy # Default preference: accuracy | speed | balance
log_level: INFO # Logging level: DEBUG | INFO | WARNING | ERROR
matrix_version_check: "1.3" # Expected model_config.py version
# --- Proactive Features ---
proactive_settings:
enabled: true # Master toggle for proactive features
default_frequency_hours: 2 # How often to run pattern analysis
suggestion_confidence_threshold: 0.7 # Minimum confidence for showing suggestion
# Add specific proactive module toggles if needed:
# enable_error_suggestions: true
# enable_workflow_suggestions: true
# --- Security & Permissions ---
security:
default_privilege_mode: standard # Default mode: standard | privileged
allow_unsigned_plugins: false # Require plugins to be signed (future feature)
# List of actions requiring elevation prompt (Format: SkillClassName:method_name)
# These actions will trigger UAC if agent is in standard mode
acl_requires_elevation:
- OSControlSkill:run_shell_command_admin
- SystemAdminSkill:install_package_admin # Example specific method
- AutomationSkill:run_elevated_script # Example
- OSControlSkill:set_system_time # Example
# keyring service name used to store credentials
keyring_service_name: "LegacyOracleAgent"
# --- Resource Governor ---
resource_governor:
monitor_interval_seconds: 10 # How often to check resources
max_system_cpu_threshold: 85.0 # % CPU usage before throttling background tasks
max_system_ram_threshold: 90.0 # % RAM usage before throttling background tasks
min_battery_percent_for_heavy_tasks: 30.0 # Prevent heavy tasks (Gen AI) on low battery
pause_on_battery_saver: true # Pause heavy tasks if Windows Battery Saver is on
# --- Operations Manager ---
operations_manager:
health_check_interval_seconds: 300 # Check services every 5 mins
max_retries_on_failure: 3 # How many times to retry a failed API call/service check
# Recovery commands (use null if no command, use docker/nssm/taskkill as appropriate)
recovery_commands:
a1111: null # Cannot reliably restart A1111 externally
rocm_video: "docker restart rocm_video_service" # If using docker-compose
rocm_audio: "docker restart rocm_audio_service" # If using docker-compose
# Or use Nssm example: "nssm restart RocmVideoService"
# --- Model Optimizer ---
model_optimizer:
cache_enabled: true
cache_ttl_seconds: 3600 # Cache LLM responses for 1 hour
cache_max_size_gb: 2 # Max cache size on disk
auto_quantize_on_load: false # Experimental: Attempt quantization via GAIA/Optimum when model loads
quantization_tool: "gaia" # Preferred tool: gaia | optimum | torch | none
pruning_tool: "none" # Pruning disabled by default: gaia | torch | none
onnx_execution_provider: "DmlExecutionProvider" # Preferred ONNX backend: DmlExecutionProvider | ROCmExecutionProvider | CPUExecutionProvider
# --- Localization ---
localization:
default_language: "en" # Default language code (maps to locale/en/...)
# --- Skill Specific Settings (Examples) ---
web_search:
engine: "duckduckgo" # duckduckgo | google (requires api key setup) | brave
max_results: 5
# google_api_key: null
# google_cse_id: null
# Add other skill-specific configs here as needed

config/user_config.yaml: Purpose: Stores user-specific overrides and preferences, managed via the UI.

# config/user_config.yaml - Example Structure v1.0
# User Preferences (Overrides settings.yaml where applicable)
ui:
theme: dark # Overrides default if user changes: light | dark | path/to/custom.qss
language: "en" # User's preferred language code, e.g., "fr", "es"
font_size: 10 # Optional: UI font size override
agent:
reasoning_strategy_override: null # null | CodeReasoning | GeneralReasoning etc.
model_selection_preference: speed # Overrides default: speed | accuracy | balance
persona: default # User preferred agent persona: default | helpful_assistant | concise_expert | witty_sage
proactive:
enabled: true # User override for proactive features
frequency_hours: 4 # User override for suggestion frequency
disabled_suggestions: # List of suggestion types user doesn't want
- organize_desktop
- system_cleanup_reminder
skills:
# User can adjust skill priorities (higher number = higher priority during dispatch tie-breaks)
skill_priorities:
CodingSkill: 10 # User really wants good code
LogicSkill: 9
WebAutomationSkill: 8
# ... other skills can be added here by user via UI potentially
automation:
# User-defined simple workflows (Managed via UI WorkflowEditor)
workflows:
morning_brief:
- description: "Get latest tech news"
type: "skill_call" # Or 'gui_macro', 'web_task'
target: "WebSearchSkill:search"
inputs: {"query": "Latest tech news headlines"}
output_var: "news_results" # Variable to store output for next step
- description: "Check personal email"
type: "skill_call"
target: "AutomationSkill:check_email"
inputs: {"account": "personal"}
- description: "Summarize news"
type: "skill_call"
target: "NLQuerySkill:summarize"
inputs: {"text": "{news_results.summary}"} # Use output from step 1
output_var: "news_summary"
security:
# User can potentially restrict specific elevated actions further than defaults
user_restricted_elevated_actions:
- SystemAdminSkill:format_drive # Example of user blocking a potentially dangerous action
# --- Add other user-configurable sections as needed ---

config/tasks.json / web_tasks.json: Purpose: Stores user-created GUI and Web automation definitions if not using the SQLite DB approach. JSON format specified in v6.5.

10.2 Data Storage (agent_data.db SQLite Schema, Cache Strategy)

SQLite Database (data/agent_data.db): Single file for persistent agent data. Managed by core/MemoryManager.

-- data/agent_data.sql - Full Schemas v1.0
-- Ensure UTF-8 encoding is used by the connection
PRAGMA encoding = "UTF-8";
-- Enable Foreign Key support
PRAGMA foreign_keys = ON;
-- Set initial database version
PRAGMA user_version = 1;
-- Chat History Table
CREATE TABLE IF NOT EXISTS chat_history (
message_id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL DEFAULT 'default_session', -- Allows grouping conversations
timestamp TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now', 'localtime')), -- ISO8601 UTC Zulu format
role TEXT NOT NULL CHECK(role IN ('user', 'agent', 'system', 'tool', 'error')), -- Role of the message sender
content TEXT NOT NULL, -- The main message content
reasoning_steps TEXT, -- JSON Array of strings '[Reasoning: step1, Reasoning: step2, ...]'
selected_model TEXT, -- Model ID used by agent for this response
selected_strategy TEXT, -- Reasoning strategy used
metadata TEXT -- JSON blob for extra context (e.g., {"file_path": "/path/to/image.png", "tool_call_id": "xyz"})
);
CREATE INDEX IF NOT EXISTS idx_chat_timestamp ON chat_history (timestamp);
CREATE INDEX IF NOT EXISTS idx_chat_session ON chat_history (session_id);
-- User Feedback on Agent Responses Table
CREATE TABLE IF NOT EXISTS user_feedback (
feedback_id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id INTEGER NOT NULL, -- Links to chat_history.message_id
is_positive INTEGER NOT NULL CHECK(is_positive IN (0, 1)), -- 0 for False/Thumbs Down, 1 for True/Thumbs Up
feedback_text TEXT, -- Optional user comment
timestamp TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now', 'localtime')),
FOREIGN KEY (message_id) REFERENCES chat_history(message_id) ON DELETE SET NULL -- Keep feedback even if chat message deleted? Or CASCADE? SET NULL is safer.
);
CREATE INDEX IF NOT EXISTS idx_feedback_message ON user_feedback (message_id);
-- Log of Agent/User Activities for Proactive Analysis Table
CREATE TABLE IF NOT EXISTS activity_log (
log_id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now', 'localtime')),
event_type TEXT NOT NULL, -- e.g., 'window_focus', 'command_run', 'idle_start', 'file_created', 'suggestion_shown', 'suggestion_accepted', 'skill_executed', 'plugin_executed', 'task_executed', 'flow_executed'
details TEXT -- JSON blob containing event-specific details (e.g., {"window_title": "...", "process_name": "...", "command": "...", "skill": "...", "status": "...", "duration_ms": 123})
);
CREATE INDEX IF NOT EXISTS idx_activity_timestamp ON activity_log (timestamp);
CREATE INDEX IF NOT EXISTS idx_activity_event_type ON activity_log (event_type);
-- Stored GUI Macros / Web Automations (Unified Table)
CREATE TABLE IF NOT EXISTS automation_tasks (
name TEXT PRIMARY KEY, -- User-defined unique name for the task
task_type TEXT NOT NULL CHECK(task_type IN ('gui', 'web')), -- Differentiates task type
description TEXT,
-- GUI Specific
software_path TEXT, -- Path to executable for GUI tasks
save_dir TEXT, -- Directory to watch for GUI task output files
-- Web Specific
start_url TEXT, -- Starting URL for web tasks
login_credential_key TEXT, -- Key name in OS Credential Store (e.g., "SunoAI") for web logins
-- Common
steps TEXT NOT NULL, -- JSON representation of List[AutomationStep] or List[WebAutomationStep]
inputs TEXT NOT NULL, -- JSON Dict describing needed input variables: {"var_name": "description"}
outputs TEXT NOT NULL, -- JSON Dict describing expected output variables: {"var_name": "description"}
created_at TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now', 'localtime')),
updated_at TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now', 'localtime'))
);
-- Access Control List for Privileged Actions Table
CREATE TABLE IF NOT EXISTS acl_rules (
action_identifier TEXT PRIMARY KEY, -- Format: 'SkillName:method_name' or 'Component:action'
description TEXT, -- Explanation of the action
requires_elevation INTEGER NOT NULL DEFAULT 0 CHECK(requires_elevation IN (0, 1)), -- 0=No, 1=Yes (Default: No)
default_user_confirmation INTEGER NOT NULL DEFAULT 1 CHECK(default_user_confirmation IN (0, 1, 2)) -- 0=Never ask, 1=Ask always, 2=Ask once per session
-- Add user override reference if implementing RBAC later
);
-- Stored Knowledge from KnowledgeAcquisitionSkill Table
CREATE TABLE IF NOT EXISTS knowledge_base (
topic_hash TEXT PRIMARY KEY, -- SHA256 hash of the normalized topic string
topic TEXT NOT NULL, -- The original topic queried
summary TEXT NOT NULL, -- LLM-generated summary
key_facts TEXT, -- JSON list of extracted key points/facts
source_urls TEXT, -- JSON list of source URLs used
embedding BLOB, -- Optional: Store vector embedding (requires BLOB support)
relevance_score REAL, -- Optional: Agent's assessment of source quality/relevance
last_accessed TEXT, -- Timestamp of last retrieval
last_updated TEXT NOT NULL, -- Timestamp of last update/creation
created_at TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now', 'localtime'))
);
CREATE INDEX IF NOT EXISTS idx_kb_topic ON knowledge_base (topic);
CREATE INDEX IF NOT EXISTS idx_kb_last_updated ON knowledge_base (last_updated);
-- Optional Full-Text Search Index: Requires FTS5 module enabled in SQLite build
-- CREATE VIRTUAL TABLE IF NOT EXISTS knowledge_fts USING fts5(
--     topic, summary, key_facts,
--     content=knowledge_base, content_rowid=rowid, tokenize="porter unicode61"
-- );
-- -- Trigger to keep FTS table synchronized with knowledge_base table
-- CREATE TRIGGER IF NOT EXISTS knowledge_ai AFTER INSERT ON knowledge_base BEGIN
--   INSERT INTO knowledge_fts (rowid, topic, summary, key_facts) VALUES (new.rowid, new.topic, new.summary, new.key_facts);
-- END;
-- CREATE TRIGGER IF NOT EXISTS knowledge_ad AFTER DELETE ON knowledge_base BEGIN
--   DELETE FROM knowledge_fts WHERE rowid=old.rowid;
-- END;
-- CREATE TRIGGER IF NOT EXISTS knowledge_au AFTER UPDATE ON knowledge_base BEGIN
--   UPDATE knowledge_fts SET topic=new.topic, summary=new.summary, key_facts=new.key_facts WHERE rowid=old.rowid;
-- END;
-- Log of Self-Reflection Cycles Table
CREATE TABLE IF NOT EXISTS reflection_log (
reflection_id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now', 'localtime')),
trigger TEXT NOT NULL CHECK(trigger IN ('scheduled', 'error_threshold', 'user_request')), -- What initiated reflection
period_start TEXT, -- Start timestamp of data analyzed
period_end TEXT, -- End timestamp of data analyzed
data_analyzed_summary TEXT, -- Description of data reviewed
llm_analysis_prompt TEXT, -- The prompt sent to the SelfCritique model
llm_identified_issues TEXT, -- Issues identified by the LLM (potentially JSON list)
llm_proposed_actions TEXT, -- Suggestions from the LLM (potentially JSON list)
agent_action_taken TEXT, -- What the agent core actually did
action_status TEXT CHECK(action_status IN ('success', 'failed', 'pending_review', 'logged_only')),
notes TEXT -- Any additional context or observations
);
-- Log for Learning Agent Performance Tracking Table
CREATE TABLE IF NOT EXISTS performance_log (
log_id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now', 'localtime')),
task_id TEXT UNIQUE, -- Optional: Unique ID for tracking specific task instances across retries/steps
task_category TEXT,
skill_used TEXT,
sub_agent_used TEXT,
model_used TEXT NOT NULL,
strategy_used TEXT,
success INTEGER NOT NULL CHECK(success IN (0, 1)), -- 0=Fail, 1=Success
duration_ms INTEGER, -- Execution time in milliseconds
prompt_tokens INTEGER, -- Optional: Token usage if available from LLM API
completion_tokens INTEGER, -- Optional
resource_vram_gb REAL, -- Approximate VRAM usage snapshot
resource_ram_gb REAL, -- Approximate system RAM snapshot during task
user_feedback_id INTEGER, -- Optional: Link to user_feedback table
error_code TEXT, -- Standardized error code if applicable
error_details TEXT -- Detailed error message if success=0
);
CREATE INDEX IF NOT EXISTS idx_perf_timestamp ON performance_log (timestamp);
CREATE INDEX IF NOT EXISTS idx_perf_model ON performance_log (model_used);
CREATE INDEX IF NOT EXISTS idx_perf_task ON performance_log (task_category);

Cache Strategy: Utilize diskcache.Cache('./data/cache/llm_cache') for persistent LLM response caching.

Key Generation: cache_key = hashlib.sha256(f"{model_name}-{temperature}-{max_tokens}-{top_p}-{prompt}".encode()).hexdigest() (Include all relevant parameters influencing the output).

Configuration: Read cache_enabled, cache_ttl_seconds, cache_max_size_gb from settings.yaml:model_optimizer.

Management: Wrap LLM calls within ModelOptimizer or LMS_Client. Check cache using get_cached_response(cache_key); if hit, return cached data. If miss, execute LLM call, then store result using cache_response(cache_key, response_data).

10.3 Versioning Strategy Details

  • Application Version: Use Semantic Versioning (MAJOR.MINOR.PATCH, e.g., “1.0.0-alpha”). Store in core/__init__.py (__version__ = "..."). Display in UI “About” dialog.
  • Skills Matrix Version: Use a simple version string (e.g., "1.3") stored as MATRIX_VERSION in config/model_config.py. The agent reads settings.yaml:agent_settings:matrix_version_check on startup and logs a warning if versions don’t match, indicating potential incompatibility.
  • Configuration File Versions: Use a comment # Version: X.Y at the top of settings.yaml and user_config.yaml. The ConfigManager and UserConfigManager should check this version on load against an expected version constant. Log warning or potentially attempt migration for minor version changes; raise error for major mismatches.
  • Database Schema Version: Use SQLite’s built-in PRAGMA user_version. The MemoryManager checks this on initialization. If the DB version is lower than the code’s expected version, apply sequential SQL migration scripts stored in scripts/migrations/ (e.g., 001_initial.sql, 002_add_knowledge_table.sql). Update user_version after successful migration.

11. Key Workflows (Detailed Component Interactions & Data Flow)

(Provide 5-7 fully detailed E2E workflow descriptions)

  • Example: Web Automation Login & Action (Suno Login & Generate)
    1. User Input (UI): Types /run_flow SunoLoginAndGenerate --topic "Synthwave track" into ChatPanel or triggers via WorkflowEditor.
    2. UI Controller: Emits signal executeWorkflowRequest("SunoLoginAndGenerate", {"topic": "Synthwave track"}).
    3. Agent Core (AgentOrchestrator -> WorkflowEngine): Receives request. Calls AC_WorkflowEngine.execute_workflow("SunoLoginAndGenerate", {"topic": "Synthwave track"}).
    4. WorkflowEngine – Step 1 (Login):
      a. Retrieves “SunoLogin” WebTaskDefinition from AC_TaskManager.
      b. Dispatches via AC_SkillDispatcher to WebAutomationSkill (instantiated with TaskDef).
      c. WebAutomationSkill requests credentials via AC_SecurityManager.get_credential("SunoAI") (uses CredentialManagerClient / keyring).
      d. WebAutomationSkill calls TAF_Engine.execute_task(...) (Web variant).
      e. WebAutomationEngine uses WebDriverClient to perform login steps.
      f. Returns {"status": "success", "data": {"session_cookie": "...", "final_url": "..."}, "message": "Login successful"}. Engine stores context (session_cookie).
    5. WorkflowEngine – Step 2 (Generate):
      a. Retrieves “SunoGenerate” WebTaskDefinition.
      b. Dispatches to WebAutomationSkill. Task inputs merge workflow context (session_cookie implicitly used by WebDriver) and step inputs ("topic").
      c. WebAutomationSkill -> WebAutomationEngine -> WebDriverClient interacts with Suno generate page (fills prompt {topic}, clicks generate, waits, scrapes result URL).
      d. Returns {"status": "success", "data": {"track_url": "https://suno.ai/..."}, "message": "Music generation initiated/complete."}.
    6. WorkflowEngine: Aggregates final output ({"track_url": "..."}). Returns result dict to AgentOrchestrator.
    7. Agent Core (AgentOrchestrator): Formats response.
    8. Agent Core -> UI: Sends signal finalAgentResponse("Suno generation complete. Track URL: ...", message_id).
    9. UI Controller: Updates ChatPanel.
    • Error Handling: If any step fails, WorkflowEngine halts, returns error dict with failed_step info. ErrorHandler logs details. UI displays user-friendly error.

(Provide similar detailed flows for Vision->Proactive, Multi-Agent, Knowledge Acq, Elevated OS Task, Amuse Gen, Dynamic Model Selection)


12. Cross-Cutting Concerns (Detailed Implementation Strategies)

This section addresses critical aspects that span multiple components, outlining specific strategies to ensure the system is secure, performant, robust, maintainable, globally usable, and handles concurrent operations effectively.

12.1 Security

Security is paramount due to the agent’s deep integration with the OS, ability to execute code/automation, and potential handling of sensitive data.

  • Privilege Management & UAC (SecurityManager, OS_Client_PSExecutor, acl_rules):
    • Least Privilege Principle: The agent core process runs as a standard user by default (settings.yaml:security:default_privilege_mode: standard).
    • Action Control: Specific sensitive actions (defined by identifiers like SkillName:method_name in data/agent_data.db::acl_rules or initially settings.yaml:security:acl_requires_elevation) are flagged as requiring elevation.
    • UAC Prompting: Before executing a flagged action, the responsible skill (e.g., OSControlSkill) calls SecurityManager.request_elevation_if_needed(action_identifier).
    • SecurityManager checks if elevation is required and if the agent is not already running in privileged mode. If both are true, it uses OS_Client_PSExecutor to execute a minimal PowerShell command via Start-Process -Verb RunAs. This triggers the standard Windows UAC prompt for the user. The agent cannot bypass UAC.
    • Outcome Handling: The SecurityManager must handle the result – if the user denies UAC or the elevation fails, an error (status: “error”, message: “Elevation denied/failed”) is returned to the skill, halting the privileged action. If successful (either elevated or not required), it returns True.
    • Auditing: All elevation requests, UAC prompts triggered, and outcomes (granted/denied/failed) must be logged with details by SecurityManager via the central ErrorHandler.
    • Privileged Mode Toggle (UI): Allows the user (if they have admin rights on their user account) to attempt launching the entire agent process with elevated privileges via a restart triggered by OSControlSkill and Start-Process -Verb RunAs. This bypasses per-action UAC prompts but requires initial elevation and runs the whole agent with higher risk. This mode should be clearly indicated in the UI and used cautiously.
  • Code Execution Sandboxing (CodeInterpreterSkill, Sandbox_Client, Docker):
    • Isolation: Untrusted code generated by LLMs or provided by the user must be executed within a strongly isolated environment. Docker containers are the recommended approach.
    • Sandbox_Client Implementation: Uses the docker Python SDK. It dynamically creates ephemeral containers (e.g., from python:alpine image) for each execution request.
    • Container Configuration: Containers run with:
      • No network access by default (–network none), unless explicitly required and configured for a specific task.
      • Read-only volume mounts for required inputs, or code copied directly into the container.
      • Strict resource limits (CPU shares, memory limits, execution timeout) set via Docker SDK.
      • Running as a non-root user inside the container (–user nobody).
    • Input/Output: Code is passed in, stdout/stderr are captured via Docker SDK logs/attach streams.
    • Cleanup: Containers are automatically removed (–rm) after execution.
    • Error Handling: Catches Docker errors (image not found, container failed to start, resource limits exceeded, timeouts).
  • Credential Management (CredentialManagerClient, keyring, WebAutomationSkill, OSControlSkill):
    • Storage: User credentials for web logins or other services are never stored in config files or plain text. The CredentialManagerClient uses the keyring library to securely store/retrieve credentials from the native Windows Credential Manager.
    • Access: Skills requiring credentials (e.g., WebAutomationSkill for login steps defined in WebTaskDefinition via login_credential_key, OSControlSkill for network resources) request them via the CredentialManagerClient using a service key (e.g., “LegacyOracle_SunoAI”).
    • User Setup: The user must initially add credentials to the Windows Credential Manager either manually or via a secure UI flow within the SettingsPanel that calls keyring.set_password. The agent only reads credentials, it should ideally not prompt for them directly during runtime actions.
  • Input Sanitization:
    • ALL inputs originating from the user or LLMs that are used to construct file paths, shell commands (subprocess, PowerShell), API calls, or database queries must be rigorously sanitized.
    • File Paths: Use os.path.normpath, check for directory traversal (../), validate against allowed base directories defined in settings.yaml.
    • Shell Commands: Avoid using shell=True in subprocess where possible. If unavoidable, use shlex.quote on all arguments. For PowerShell via OS_Client_PSExecutor, use parameterized scripts or ensure commands/arguments are properly escaped.
    • API Payloads: Validate data types and structure before sending JSON payloads to external services.
    • Database Queries: Use parameterized queries (e.g., cursor.execute(“SELECT * FROM users WHERE name=?”, (user_input,))) to prevent SQL injection.
  • Macro Recording Safety (RecordMacroDialog, TAF_MacroRecorder):
    • User Awareness: Clearly indicate when recording is active (e.g., flashing icon, persistent notification).
    • Pause on Sensitive Fields: Implement heuristics in the pynput listener (TAF_MacroRecorder). If the active window title or focused element properties (obtained via OS_Client_Context/pywin32/uiautomation if feasible) suggest a password field, automatically pause recording and notify the user. Resume requires explicit user action. Provide a manual pause/resume button.
    • Review Before Saving: Allow users to review and potentially edit the recorded steps (especially type actions) in the RecordMacroDialog before saving the TaskDefinition.
  • Plugin Security (PluginManager, SecurityManager):
    • Loading: By default (settings.yaml:security:allow_unsigned_plugins: false), only load plugins from trusted sources or those with valid digital signatures (future feature).
    • Permissions: Implement a basic permission system for plugins. Define required permissions in the plugin’s registration (register_plugin). PluginManager checks these against allowed permissions for plugins (defined in settings.yaml or potentially user-managed). Skills called by plugins might be restricted based on the plugin’s permissions.
    • Sandboxing (Future): Explore running plugins in separate processes or restricted environments for stronger isolation.

12.2 Performance

  • Asynchronous Operations (asyncio): The entire Agent Core, Skills, and Client layers are built on asyncio. All potentially blocking I/O operations (network requests via httpx, subprocess calls, file I/O via aiofiles, service polling, UI communication) must use async/await.
  • Resource Governor (ResourceGovernor):
    • Periodically queries SystemMonitorSkill for CPU/RAM usage (using psutil) and potentially GPU VRAM/Util (using vendor-specific APIs via OS_Client_Monitor if reliable access possible, otherwise estimates based on loaded models from ModelSelector). Checks power status (OS_Client_Advanced).
    • Maintains internal state of currently allocated agent resources (e.g., estimated VRAM for loaded models).
    • Provides methods like can_schedule_heavy_task() -> bool based on thresholds in settings.yaml (CPU, RAM, Battery, Power Saver state).
    • Provides can_allocate_vram(needed_gb) and can_allocate_ram(needed_gb) for ModelSelector.
    • Can signal AsyncTaskManager to pause or defer low-priority background tasks (e.g., reflection, performance logging, non-critical generation polling) when system load is high.
  • Model Selection & Management (ModelSelector, LMS_Client):
    • Dynamic selection prevents loading unnecessarily large models.
    • Preloading (preload_models) frequently used or user-preferred models (if resources allow) reduces first-use latency.
    • LRU (Least Recently Used) unloading (unload_lru_model) frees up VRAM when needed for loading a different model, coordinated with ResourceGovernor.
  • Caching (ModelOptimizer, diskcache):
    • LLM responses (for non-volatile prompts) are cached using diskcache based on a hash of model name, key parameters, and prompt text. ModelOptimizer (or LMS_Client wrapper) checks cache before invoking LLM. Cache TTL and max size are configurable.
  • Model Optimization (ModelOptimizer, AMD GAIA):
    • Provide optional mechanisms (triggered manually or via config auto_quantize_on_load) to quantize models using gaia-toolbox (preferred for AMD) or optimum. This trades some precision for potentially faster inference and lower VRAM. Requires careful testing per model. Pruning is more experimental.
    • Configure ONNX Runtime clients (OCR_Client, ObjectDetect_Client) to use GAIA-recommended Execution Providers (DmlExecutionProvider or ROCmExecutionProvider) based on settings.yaml:model_optimizer:onnx_execution_provider.
  • UI Responsiveness (UI_Controller, asyncqt):
    • All long-running agent tasks occur outside the main UI thread.
    • Use asyncqt or QThread with signals/slots to bridge the asyncio event loop and the Qt event loop.
    • Batch frequent UI updates (e.g., log messages, minor status changes) using QTimer to avoid overwhelming the event loop. Limit update frequency for system vitals display.

12.3 Error Handling

  • Central Error Handler (ErrorHandler):
    • Provides a unified handle(exception, context: str, severity: str = ‘ERROR’) method.
    • Logs the exception details (traceback) and context to the file logger (logs/agent.log).
    • Formats a user-friendly error message based on exception type and context.
    • Sends the formatted message via signal to UI_TerminalPanel (for detailed logs) and potentially UI_ChatPanel or UI_NativeIntegration (for critical user-facing errors).
    • Returns a standardized error dictionary: {“status”: “error”, “message”: “User-friendly message”, “error_details”: “Technical details”}.
  • Standardized Skill/Client Returns: All Skill.execute and Client.method calls should return the standard dictionary format ({“status”: “success/error”, …}). Skills and Core logic check the status key.
  • Client Retries (BaseClient, OpsManager): Implement retry logic (e.g., exponential backoff up to max_retries_on_failure from settings.yaml) within base client classes or OpsManager for network-related errors when calling external services.
  • Service Health Checks & Recovery (OpsManager): Periodically check /health endpoints. On failure, attempt recovery command defined in settings.yaml via OS_Client_PSExecutor/Docker_Client. Log recovery attempts and outcomes. Inform user via UI notification if recovery fails persistently.
  • Specific Exception Handling: Catch specific exceptions where appropriate (e.g., FileNotFoundError in FileSystemSkill, ElementNotFound in WebDriverClient, docker.errors.APIError in SandboxClient, keyring.errors.PasswordDeleteError in CredentialManagerClient). Provide more informative error messages based on the specific exception.
  • Timeouts: Implement network timeouts (httpx) for all external API calls. Implement execution timeouts for CodeInterpreterSkill (via Docker/subprocess) and potentially long-running subprocess calls (ByteCraft, Amuse).
  • Graceful Degradation: If a service (e.g., Video Gen) is down, OpsManager flags it. SkillDispatcher or Agent Core should prevent dispatching tasks requiring that service and inform the user (“Video generation is currently unavailable”).

12.4 Maintainability

  • Modularity (OpenManus): Strict adherence to skill-based architecture. Skills should be self-contained and interact via defined interfaces/clients.
  • Plugin System (PluginManager): Allows adding new skills/features without modifying core code. Define a clear Plugin API contract (registration function, required BaseSkill inheritance, manifest file for permissions/dependencies). Implement basic version checking for plugins.
  • Dependency Management (DependencyManager, requirements*.txt): Use separate requirements.txt files for Core Agent, ROCm Services, Amuse Implementation. DependencyManager provides checks (pipdeptree) run manually or via CI. Use pinned versions where stability is critical, allow flexible versions where appropriate.
  • Versioning (Versioning Strategy): Maintain clear versions for Application, Configs, DB Schema, and Skills Matrix to manage compatibility.
  • Configuration (ConfigManager, YAML): Centralize configuration in settings.yaml and user_config.yaml. Avoid hardcoding values. ConfigManager provides validated access.
  • Code Style & Quality: Enforce consistent style using black and flake8. Require docstrings for all public classes/methods. Use type hinting extensively.
  • Logging (ErrorHandler, logging): Comprehensive logging with different levels (DEBUG, INFO, WARNING, ERROR) directed to both file (logs/agent.log) and UI_TerminalPanel. Use contextual information in log messages.
  • Documentation: Maintain this specification document, add inline code comments, generate API docs (e.g., using Sphinx), create user guides/tutorials.

12.5 Globalization

  • LocalizationManager (core/localization_manager.py): Central component for handling translations.
  • gettext Workflow:
    1. Wrap all user-facing strings in the UI (ui/) and potentially agent responses (where applicable) with a translation function (e.g., _(“Your String”)).
    2. Use pygettext or similar tools to extract translatable strings into a .pot template file.
    3. Create language-specific .po files (e.g., locale/fr/LC_MESSAGES/legacy_oracle.po) from the template.
    4. Translate strings in .po files.
    5. Compile .po files to binary .mo files using msgfmt (can be part of build/setup script). Place .mo files in ui/locale/<lang_code>/LC_MESSAGES/.
    6. LocalizationManager.set_language(lang_code) loads the appropriate .mo file and installs the translation globally (gettext.install).
  • UI Implementation: All UI text elements (QLabel, QPushButton, etc.) should use the translation function _(…). The UI_LanguageSelector allows the user to call LocalizationManager.set_language, which should trigger a UI refresh to apply new translations. Use Qt layouts that adapt to different string lengths.
  • Date/Time/Number Formatting: Use locale-aware formatting where necessary (though less critical for this type of application initially).

12.6 Concurrency

  • asyncio Core: The Agent Core’s main loop and most skill/client operations are built on asyncio for efficient handling of I/O-bound tasks (network calls, file access, subprocess communication). Use async/await syntax consistently.
  • UI Thread Separation: The PyQt6 UI runs in the main thread. Interactions between the UI thread and the async Agent Core must be thread-safe. Use asyncqt or a similar library to bridge the Qt event loop and the asyncio event loop, or use QThread with Qt Signals/Slots for communication. Never call blocking agent methods directly from the UI thread, and never update UI widgets directly from agent core threads/tasks.
  • Background Tasks (AsyncTaskManager, Scheduler): Use asyncio.create_task for fire-and-forget background operations managed by AsyncTaskManager. Use APScheduler’s AsyncIOScheduler for recurring scheduled tasks integrated with the main asyncio loop.
  • CPU-Bound Tasks: If any skill or component performs heavy CPU-bound computation (unlikely for most core logic, possibly complex analysis or local model processing if not in a separate service), run it in a separate process using multiprocessing or concurrent.futures.ProcessPoolExecutor to avoid blocking the main asyncio event loop.
  • Shared State: Access to shared state (e.g., StateManager, MemoryManager data structures if accessed concurrently) must be protected using asyncio.Lock or appropriate thread-safe mechanisms if mixing threads. SQLite access should ideally be handled via a single connection manager or connection pool suitable for async use (or synchronous access offloaded to a thread pool).

13. Testing Strategy (Detailed Plans)

A comprehensive testing strategy is crucial for ensuring the quality, reliability, security, and performance of the complex LegacyOracle Super Agent. This strategy employs multiple levels of testing, utilizing appropriate tools and methodologies.

Overall Approach: Agile Testing integrated within development sprints. Automated tests run via CI pipeline. Manual exploratory testing complements automated efforts.

Primary Testing Frameworks & Tools:

  • Python Unit/Integration: pytest, pytest-asyncio, unittest.mock (or pytest-mock).
  • UI Testing: pytest-qt.
  • API Testing (Services): pytest with httpx, or tools like Postman/Insomnia for manual checks.
  • Code Coverage: pytest-cov (integrated with pytest).
  • Static Analysis/Linting: flake8, black, mypy.
  • Security Scanners: bandit (static analysis), potentially dependency vulnerability scanners (safety). Manual review required.
  • Performance: cProfile, memory-profiler, psutil for system monitoring during tests, specialized benchmarking tools if needed.
  • CI/CD: GitHub Actions or GitLab CI.

Target Code Coverage: Aim for > 80% unit test coverage for core logic and critical skills/clients.


13.1 Unit Testing

  • Goal: Verify the correctness of individual functions, methods, and classes in isolation.
  • Scope: All core managers (core/), non-IO parts of skills (skills/), utility functions, client logic (parsing, data transformation), reasoning strategies (prompt generation).
  • Tools: pytest, pytest-asyncio, unittest.mock/pytest-mock.
  • Environment: Standard Python environment. External dependencies (APIs, OS calls, file system, DB, services) are mocked.
  • Key Scenarios & Validation Points:
    • ConfigManager: Test loading valid/invalid YAML, retrieving default/nested values, handling missing keys.
    • ModelSelector: Test scoring logic with mock SKILLS_MATRIX data for different task categories/preferences. Test resource filtering logic with mock ResourceGovernor states. Test temperature lookup.
    • ResourceGovernor: Test can_allocate_* logic with various usage levels and limits. Test allocate/free methods update internal state correctly.
    • MemoryManager: Test CRUD operations for each table using an in-memory SQLite DB (:memory:). Test data formatting (JSON parsing/dumping). Test schema migration check.
    • Reasoning Strategies: Verify correct prompt generation based on input query/context. Test streaming callback invocation.
    • TaskManager: Test loading/saving tasks from JSON/DB, retrieving tasks by name.
    • WorkflowEngine: Test parsing flow definitions, sequential execution logic, data passing between steps (using mock skills), default error handling (halt on failure).
    • Individual Skills (Logic): Test internal logic, data processing, and construction of requests for backend clients (mock the clients). Example: KnowledgeAcqSkill – test logic for checking memory before searching, test summarization prompt generation.
    • Backend Clients (Logic): Test request payload construction, response parsing, error mapping for different API status codes/outputs (mock httpx/subprocess/etc.). Test retry logic if implemented.
    • Automation Definitions (TaskDefinition, etc.): Test dataclass initialization and validation.
  • Coverage: Aim for high coverage (>85%) on core logic components and utility functions.

13.2 Integration Testing

  • Goal: Verify interactions and data flow between integrated components within the agent’s core process.
  • Scope: Core <-> Skills, Skills <-> Clients (with mocked backends), Core Managers interactions (e.g., ModelSelector <-> ResourceGovernor), ConductorAgent <-> Sub-Agents (with mocked sub-agent logic). UI Controller <-> Agent Comm Interface.
  • Tools: pytest, pytest-asyncio, unittest.mock/pytest-mock, potentially pytest-qt for UI<->Core interface tests.
  • Environment: Core Python environment. External APIs/Services are mocked at the client level or using tools like aiohttp.pytest_plugin or pytest-https. File system/DB interactions might use temporary files/in-memory DBs.
  • Key Scenarios & Validation Points:
    • Agent Request Processing: Test full flow from AgentOrchestrator.process_task -> ModelSelector -> ReasoningController -> SkillDispatcher -> Skill Execution (mocked client call) -> Response formatting -> UI Signal emission.
    • Model Loading/Unloading Flow: Test ModelSelector interaction with ResourceGovernor and mocked LMS_Client load/unload calls under different resource scenarios.
    • Dynamic Skill/Task Dispatch: Verify SkillDispatcher routes correctly to built-in skills, plugin skills (mocked), and AutomationSkill instances based on task name lookups in TaskManager.
    • Service Polling (AsyncTaskManager): Test polling logic for generative services using mocked client status checks (check_status), verifying correct state transitions (pending->running->complete/error) and result handling. Test timeouts.
    • Proactive Suggestion Flow: Test ActivityLogger -> PatternAnalyser (with mock data) -> ProactiveManager -> Notifier -> UI Signal emission / Native Notification call (mocked).
    • Multi-Agent Delegation: Test ConductorAgent task decomposition (mocked LLM planning call), delegation to mocked SubAgents via asyncio.Queue, and result aggregation.
    • Privilege Management Flow: Test skill requesting elevation -> SecurityManager check -> Mocked UAC prompt (grant/deny) -> Skill execution proceeds or fails accordingly.
    • UI <-> Core Communication: Use pytest-qt to test signal emissions from AC_CommInterface and slot execution in a mocked UI_Controller, verifying data integrity. Test requests sent from mocked UI_Controller to AC_CommInterface.
  • Coverage: Focus on critical interaction points between major components.

13.3 Service Testing

  • Goal: Verify the functionality, API contract, and basic performance of external dependency services (A1111, ROCm FastAPI wrappers) independently.
  • Scope: The REST APIs provided by A1111, ROCm Video Service, ROCm Audio Service.
  • Tools: pytest with httpx, Postman, Insomnia, curl.
  • Environment: Requires the specific service (e.g., ROCm Video Service) to be running in its dedicated environment (Conda/Docker) with necessary hardware access (GPU). LM Studio server also needs to be running for tests involving LLM calls within services (if any).
  • Key Scenarios & Validation Points:
    • A1111 /sdapi/v1/txt2img: Test with valid/invalid prompts, different samplers, steps, dimensions. Verify successful image generation (check for base64 data), correct parameters returned, expected error codes for bad requests.
    • A1111 /sdapi/v1/progress: Test endpoint accessibility during generation (if A1111 configured for async).
    • ROCm Video/Audio /generate: Test with valid/invalid prompts, different parameters (duration, seed). Verify successful task queuing (202 Accepted) and task_id returned. Test input validation (e.g., invalid duration).
    • ROCm Video/Audio /status/{task_id}: Test polling for pending, running, completed, and failed states. Verify correct output_path or error_message is returned upon completion/failure. Test response for invalid task_id (404 Not Found).
    • ROCm Video/Audio /health: Verify endpoint returns {“status”: “ok”} and status code 200.
    • LM Studio API: Basic tests confirming /v1/models lists loaded models and /v1/chat/completions responds correctly to simple prompts with the expected loaded model. Test model load/unload endpoints.
  • Note: These tests validate the services themselves, ensuring they meet the API contracts expected by the LegacyOracle clients.

13.4 End-to-End (E2E) Testing

  • Goal: Verify complete user workflows across the entire integrated system (UI -> Core -> Skills -> Clients -> Services -> OS -> UI).
  • Scope: Critical user scenarios involving multiple components and external interactions.
  • Tools: Primarily manual testing based on detailed test cases. Potential for partial automation using UI testing frameworks (pytest-qt driving UI actions) combined with service state validation.
  • Environment: Requires the entire system to be running: Core Agent + UI, LM Studio (with models), A1111 Service, ROCm Video Service, ROCm Audio Service, Docker Daemon (if used). Target Windows 11 machine.
  • Key Scenarios & Validation Points:
    • Full Chat Interaction: User asks complex question -> Orchestrator analyzes -> ModelSelector picks Model A -> ReasoningController streams steps -> Model A generates response -> MoA (optional) uses Model B/C -> Final response displayed correctly with correct face emotion.
    • Image Generation (A1111 & Amuse): User triggers generation via UI tab -> Request flows through Core/Skill/Client -> A1111/Amuse generates image -> Output path returned -> Image displayed correctly in UI Gen Tab / View Window. Test different prompts/parameters.
    • Video/Audio Generation: User triggers via UI -> Request -> Skill Client -> FastAPI Service /generate -> Task ID returned -> Core polls /status via Skill Client -> UI Progress updates -> Service completes -> Path returned -> Media plays correctly in UI Media Player. Test error states (service down, generation failure).
    • GUI Macro Record & Execute: User records Notepad macro via RecordMacroDialog -> Saves Task -> Selects task in TaskMgmtPanel -> Clicks Execute -> Verify AutomationSkill runs TaskAutomationEngine and Notepad is automated correctly.
    • Web Automation Record & Execute: User records Suno login -> Saves Task -> Executes Task -> Verify agent uses WebDriverClient (launching browser) and CredentialManagerClient (keyring) to log in successfully.
    • Workflow Execution: Define and execute a multi-step workflow (e.g., Web Search -> Summarize -> Save to File). Verify data passes correctly between steps and WorkflowEngine handles execution. Test flow halting on step failure.
    • Elevated OS Action: User triggers system cleanup -> Agent uses SecurityManager -> User manually confirms UAC prompt on screen -> Agent executes cleanup via OSControlSkill/OS_Client_PSExecutor. Verify success/failure is reported correctly.
    • Vision -> Proactive Suggestion: Manually create scenario VisionSkill should detect (e.g., open specific error dialog) -> Verify ProactiveManager triggers -> Verify Native Notification or UI Overlay appears with correct suggestion.
    • Knowledge Acquisition: Ask about unknown topic -> Verify agent searches web (WebSearchSkill), summarizes (NLQuerySkill), answers user, and stores summary (MemoryManager/KnowledgeAcqSkill). Ask again later -> Verify agent retrieves answer from memory.
  • Focus: Validating the integration and data flow across all layers for primary use cases.

13.5 UI Testing

  • Goal: Verify GUI functionality, layout, responsiveness, and state changes.
  • Scope: All UI components (ui/), including main window, panels, dialogs, widgets, custom drawing (AnimatedFaceWidget).
  • Tools: pytest-qt. Manual testing for look-and-feel and usability.
  • Environment: Core Python environment with PyQt6 installed. Agent Core can be mocked or run minimally.
  • Key Scenarios & Validation Points:
    • Widget Interactions: Test button clicks trigger correct signals/actions, combo box selections update state, text inputs work, sliders function.
    • State Updates: Verify UI elements update correctly based on signals from UI_Controller (e.g., face animation changes, status bar updates, terminal logs appear, chat streams correctly, media viewers load content).
    • Layout & Resizing: Test UI layout adapts correctly to different window sizes and resolutions. Check for overlapping or hidden elements. Test tab switching.
    • Dialogs: Test opening, interaction, and closing of dialogs (RecordMacro, RecordWeb, Settings, SetupWizard).
    • Responsiveness: Ensure UI does not freeze during agent processing (requires proper async integration).
    • Theme & Localization: Test switching themes and languages, verifying all relevant text and styles update correctly.
  • Focus: Ensuring the UI is functional, visually correct, and interacts properly with the underlying controller/agent interface.

13.6 Security Testing

  • Goal: Identify and mitigate potential security vulnerabilities.
  • Scope: Components handling external input, file system access, code execution, OS interaction, credential management, plugins.
  • Tools: Static analysis (bandit), dependency checkers (safety), manual code review, potentially penetration testing tools/techniques (if applicable).
  • Environment: Test environment mirroring deployment setup.
  • Key Scenarios & Validation Points:
    • Input Validation/Sanitization: Test injecting malicious strings/paths/commands into UI inputs, API calls, file paths used by skills (FileSystemSkill, OSControlSkill, clients using subprocess). Verify inputs are rejected or properly sanitized/escaped.
    • Privilege Escalation: Attempt to trigger elevated actions (acl_requires_elevation) without UAC confirmation (should fail). Verify SecurityManager correctly enforces ACLs and privilege mode. Test run_as_admin functionality in OS_Client_PSExecutor only proceeds after successful UAC.
    • Code Interpreter Sandbox (CodeInterpreterSkill, Sandbox_Client): Attempt to execute malicious code (e.g., infinite loops, file system access outside permitted volumes, network calls if disabled, attempts to escape Docker container). Verify resource limits (timeout, memory) are enforced. Verify container isolation.
    • Credential Security (CredentialManagerClient, keyring): Verify credentials are not stored in plain text config/logs. Verify keyring interacts correctly with Windows Credential Manager. Attempt unauthorized access to stored credentials (should fail). Verify macro recorder pauses on password fields.
    • Plugin Security (PluginManager): Test loading unsigned plugins when allow_unsigned_plugins is false (should fail). Review plugin loading mechanism for potential code injection vulnerabilities. Test plugin permission model (if implemented).
    • Service Security: Review FastAPI service code for common web vulnerabilities (if exposed beyond localhost). Check dependencies for known CVEs.
    • Directory Traversal: Test file paths provided to FileSystemSkill or other components handling paths.
  • Focus: Preventing unauthorized access, privilege escalation, code execution vulnerabilities, and data leakage. Requires a security-conscious mindset during development and dedicated testing phases.

13.7 Performance Testing

  • Goal: Measure and optimize system responsiveness, resource usage, and throughput.
  • Scope: Core agent loop, LLM inference latency, generative task duration, UI responsiveness, resource consumption (CPU, RAM, VRAM).
  • Tools: Python cProfile, timeit, memory-profiler, psutil, system Task Manager / perfmon, potentially specialized load testing tools for services (e.g., locust). pytest-benchmark.
  • Environment: Target hardware (AMD RX 7800 XT, 128GB RAM). Controlled environment with services running.
  • Key Scenarios & Validation Points:
    • Agent Response Latency: Measure time from user input send to final response display for various query types (simple, complex, skill-based). Identify bottlenecks.
    • Model Selection/Load Time: Measure time taken by ModelSelector and LMS_Client.load_model. Test impact of preloading.
    • Generative Task Duration: Measure end-to-end time for Image, Video, Audio, SWF generation tasks. Measure polling intervals and /status response times.
    • Resource Consumption: Monitor peak and average CPU, RAM, VRAM usage during idle, chat, complex task execution, and generative processes using psutil and Task Manager/GPU utilities. Verify usage stays within acceptable limits and ResourceGovernor functions correctly.
    • UI Responsiveness: Manually assess UI responsiveness (lag, freezes) during heavy background processing. Use profiling tools if needed.
    • Throughput (Services): (Optional) Load test FastAPI/A1111 services to determine concurrent request handling capacity.
    • GAIA Optimization Benchmarks: Compare inference time and resource usage for ONNX models (OCR, Object Detection) using standard ONNX Runtime vs. GAIA-optimized ONNX Runtime provider. Compare model size/speed after optimization with gaia-toolbox vs optimum.
  • Focus: Ensuring the agent performs efficiently on target hardware, remains responsive, and manages resources effectively. Identify and address performance bottlenecks.

Okay, let’s fully expand Section 14: Deployment & Operations for the LegacyOracle Super Agent v6.6.2 specification, providing the detailed docker-compose.yaml example, exhaustive manual setup, service management examples, health check logic, setup wizard flow, and update procedures.


(Start of Section 14)

14. Deployment & Operations

This section details the procedures for deploying, running, managing, and updating the LegacyOracle Super Agent and its associated services on the target Windows 11 environment. Two primary deployment methods are outlined: using Docker Compose (Recommended for consistency and isolation) and Manual Setup.

14.1 docker-compose.yaml Example (Recommended Deployment)

Using Docker Compose simplifies the management of the agent and its dependency services (excluding LM Studio, A1111, and potentially Amuse/ByteCraft which often require native installs for hardware access or specific setup). This example assumes Dockerfiles exist for the agent and the ROCm services.

Prerequisites for Docker Compose: Docker Desktop for Windows installed and running (with WSL2 backend enabled), docker-compose command available. Sufficient resources allocated to Docker Desktop.

      # D:\legacy_oracle\docker-compose.yaml
# Use 'docker-compose up -d' to start, 'docker-compose down' to stop.
# Ensure prerequisite native services (LM Studio, A1111) are running on the host.
version: '3.8'
services:
# --- LEGACY ORACLE AGENT CORE (+UI if using VNC/RDP/X11 Forwarding) ---
legacy_oracle_agent:
build:
context: .
dockerfile: Dockerfile.agent # Assumes Dockerfile in project root
container_name: legacy_oracle_agent
# --- Volumes (Map essential directories) ---
volumes:
- ./config:/app/config:rw # Agent needs to write user_config potentially
- ./data:/app/data:rw # Persistent DB, Cache
- ./logs:/app/logs:rw
- ./outputs:/app/outputs:rw
- ./plugins:/app/plugins:rw # Mount plugins dir
- ./external:/app/external:ro # Read-only access to external tool code if needed by clients
# - ./models:/app/models:ro # If local ONNX models used by agent container directly
# --- Environment Variables (Critical for configuration) ---
environment:
- PYTHONPATH=/app
- PYTHONUNBUFFERED=1 # For better container logging
- LOG_LEVEL=INFO
# --- Service URLs (Using host.docker.internal for services running on Windows host) ---
# Ensure these match the actual ports your services are listening on!
- LM_STUDIO_URL=http://host.docker.internal:1234
- A1111_URL=http://host.docker.internal:7860
- ROCM_VIDEO_URL=http://rocm_video_service:8001 # Use service name if containerized
- ROCM_AUDIO_URL=http://rocm_audio_service:8002 # Use service name if containerized
# --- Paths inside the container (Adjust if needed based on Dockerfile) ---
- DATABASE_PATH=/app/data/agent_data.db
- CACHE_PATH=/app/data/cache
- LOGS_PATH=/app/logs
# Add any necessary API keys securely (use .env file with compose)
# - SOME_API_KEY=${SOME_API_KEY_FROM_ENV}
# --- Ports ---
# Expose ports ONLY if necessary (e.g., for remote UI access - NOT RECOMMENDED for Alpha)
# ports:
#   - "5901:5901" # Example VNC port if running UI inside container
# --- Dependencies (If services defined below) ---
depends_on:
rocm_video_service:
condition: service_healthy # Wait for service health check
rocm_audio_service:
condition: service_healthy
restart: unless-stopped
# --- GPU Access (Required for ROCm services, potentially agent if doing local ONNX/Torch inference) ---
# This configuration is highly dependent on the host OS and Docker version.
# For Linux host with NVIDIA GPU & NVIDIA Container Toolkit:
# deploy:
#   resources:
#     reservations:
#       devices:
#         - driver: nvidia
#           count: 1 # or specific GPU ID
#           capabilities: [gpu]
# For Windows with WSL2 & compatible hardware/drivers, GPU access might be automatic
# or require specific Docker Desktop settings / WSL configurations. Verify documentation.
# Ensure containers have access to necessary ROCm libraries/drivers if running ROCm tasks inside.
# --- ROCm Video Service ---
rocm_video_service:
build:
context: ./external/rocm-video-service # Path to service code & Dockerfile.rocm_video
dockerfile: Dockerfile.rocm_video
container_name: rocm_video_service
ports:
- "8001:8001" # Expose port 8001 on the host
volumes:
- ./outputs/videos:/app/outputs # Map output directory
# Mount any necessary model directories if models aren't baked into the image
# - /path/to/video/models:/app/models:ro
environment:
- PYTHONUNBUFFERED=1
# Add any env vars needed by the service (e.g., model paths inside container)
# --- GPU Access for ROCm ---
# MUST be configured correctly for ROCm passthrough in Docker (complex setup)
# Example (Conceptual - Consult Docker/ROCm docs for Windows):
devices:
- /dev/kfd:/dev/kfd # ROCm device nodes
- /dev/dri:/dev/dri
group_add:
- video
- render
ipc: host
cap_add:
- SYS_PTRACE
security_opt:
- seccomp=unconfined
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8001/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s # Give time for models to load
# --- ROCm Audio Service ---
rocm_audio_service:
build:
context: ./external/rocm-audio-service # Path to service code & Dockerfile.rocm_audio
dockerfile: Dockerfile.rocm_audio
container_name: rocm_audio_service
ports:
- "8002:8002"
volumes:
- ./outputs/audio:/app/outputs
# Mount model directories if needed
# - /path/to/audio/models:/app/models:ro
environment:
- PYTHONUNBUFFERED=1
# --- GPU Access for ROCm (Similar to Video Service) ---
devices:
- /dev/kfd:/dev/kfd
- /dev/dri:/dev/dri
group_add:
- video
- render
ipc: host
cap_add:
- SYS_PTRACE
security_opt:
- seccomp=unconfined
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8002/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
# --- Services typically run NATIVELY on Host (Not in Compose) ---
# - LM Studio: Relies heavily on direct hardware access/UI. Run natively.
# - Automatic1111: Often easier setup/GPU config natively. Run `webui-user.bat --api ...`.
# - Amuse Software: Native Windows application. Run natively.
# - ByteCraft: CLI tool, potentially run natively or via agent's container if dependencies allow.
# Optional: Define networks if needed for specific isolation/communication
# networks:
#   agent_network:
#     driver: bridge

Notes on Docker Compose:

  • Requires Dockerfile.agent, Dockerfile.rocm_video, Dockerfile.rocm_audio to be created, handling Python setup, dependency installation (requirements_*.txt), and entry points (CMD [“python”, “main.py”] or CMD [“uvicorn”, …]).
  • GPU Passthrough (Critical & Complex): Correctly configuring GPU access (especially ROCm) within Docker on Windows/WSL2 is non-trivial and depends heavily on the specific Docker Desktop version, WSL setup, and driver compatibility. Consult official Docker and AMD ROCm documentation. The example above shows Linux-style directives which may need adaptation. If passthrough is problematic, running ROCm services natively might be necessary.
  • Host Access: Uses host.docker.internal for the agent container to access services running directly on the Windows host (like LM Studio, A1111). This requires Docker Desktop networking to be configured correctly.
  • Volumes: Ensure paths are correct and provide necessary permissions for read/write access from containers.

14.2 Manual Setup Guide (Exhaustive Step-by-Step)

This guide assumes no Docker Compose is used. All services and the agent run directly on the Windows host. Requires careful management of Python environments.

  1. Prerequisites: Complete ALL steps in Section 6: Environment Setup, including installing Python, Git, Docker Desktop (can be used for CodeInterpreter sandbox even if not for deployment), GPU drivers, ROCm, LM Studio (running server), A1111 (running server), ByteCraft, Amuse Implementation (env activated), OCR/Detection setup, WebDrivers, and Core Agent Python environment (.venv).
  2. Start LM Studio Server: Launch LM Studio -> Local Server -> Start Server (ensure correct models loaded). Note URL (http://localhost:1234).
  3. Start Automatic1111 Service: Open Command Prompt -> Navigate to D:\legacy_oracle\external\stable-diffusion-webui -> Activate its venv (.\.venv\Scripts\activate) -> Run webui-user.bat (ensure –api flag is set inside). Note URL (http://localhost:7860). Keep this terminal open.
  4. Start ROCm Video Service: Open Anaconda Prompt (with ROCm paths) -> Activate video env (conda activate rocm_video_env) -> Navigate to D:\legacy_oracle\external\rocm-video-service -> Run uvicorn main:app –host 0.0.0.0 –port 8001. Note URL (http://localhost:8001). Keep this terminal open.
  5. Start ROCm Audio Service: Open Anaconda Prompt (with ROCm paths) -> Activate audio env (conda activate rocm_audio_env) -> Navigate to D:\legacy_oracle\external\rocm-audio-service -> Run uvicorn main:app –host 0.0.0.0 –port 8002. Note URL (http://localhost:8002). Keep this terminal open.
  6. Start Amuse Software (If Required): Manually launch D:\Program Files\Amuse\Amuse.exe if the implementation script requires the main application to be running.
  7. Verify Configuration: Double-check D:\legacy_oracle\config\settings.yaml to ensure all service_urls and paths match the running services and installation locations.
  8. Run Core Agent: Open new Command Prompt -> Navigate to project root D:\legacy_oracle -> Activate core agent venv (.\.venv\Scripts\activate) -> Run python main.py.
  9. Interact: The Coplot GUI should launch. Perform basic tests (chat, generate image via A1111/Amuse, video/audio gen) to verify connections. Check logs/agent.log and the various service terminals for errors.

14.3 Service Management (Windows Native using NSSM)

NSSM (Non-Sucking Service Manager) is a helpful tool for running applications as Windows services. Download nssm.exe from nssm.cc.

Example: Running ROCm Video Service via NSSM:

  1. Open Command Prompt as Administrator.
  2. Navigate to NSSM directory: cd C:\path\to\nssm
  3. Install service: nssm install RocmVideoService "C:\Users\YourUser\miniconda3\envs\rocm_video_env\python.exe" "C:\Users\YourUser\miniconda3\envs\rocm_video_env\Scripts\uvicorn.exe main:app --host 0.0.0.0 --port 8001" IGNORE_WHEN_COPYING_START content_copy download Use code with caution.BashIGNORE_WHEN_COPYING_END(Adjust paths to your Conda env Python/Uvicorn)
  4. Set working directory: nssm set RocmVideoService AppDirectory "D:\legacy_oracle\external\rocm-video-service" IGNORE_WHEN_COPYING_START content_copy download Use code with caution.BashIGNORE_WHEN_COPYING_END
  5. (Optional) Set dependencies, user account, logging, etc., using nssm set.
  6. Start service: nssm start RocmVideoService (or use Windows Services GUI services.msc).
  7. Check status: nssm status RocmVideoService.
  8. Stop service: nssm stop RocmVideoService.
  9. Remove service: nssm remove RocmVideoService confirm.

Example: Running LegacyOracle Agent via NSSM:

  1. Open Command Prompt as Administrator.
  2. Navigate to NSSM directory.
  3. Install service: nssm install LegacyOracleAgent "D:\legacy_oracle\.venv\Scripts\python.exe" "D:\legacy_oracle\main.py" IGNORE_WHEN_COPYING_START content_copy download Use code with caution.BashIGNORE_WHEN_COPYING_END
  4. Set working directory: nssm set LegacyOracleAgent AppDirectory "D:\legacy_oracle" IGNORE_WHEN_COPYING_START content_copy download Use code with caution.BashIGNORE_WHEN_COPYING_END
  5. Set dependencies (ensure LLM/Gen services start first if critical): nssm set LegacyOracleAgent DependOnService RocmVideoService RocmAudioService # Add A1111 if run as service IGNORE_WHEN_COPYING_START content_copy download Use code with caution.BashIGNORE_WHEN_COPYING_END
  6. Configure restart options (recommended): nssm set LegacyOracleAgent AppRestartDelay 5000 # 5 second delay IGNORE_WHEN_COPYING_START content_copy download Use code with caution.BashIGNORE_WHEN_COPYING_END
  7. Start service: nssm start LegacyOracleAgent.

Note: Running GUI applications (like PyQt6 Coplot GUI) directly as Windows services can be problematic regarding session interaction. Running the core agent logic headless as a service and having a separate GUI executable connect to it (e.g., via a local API/WebSocket) might be a more robust approach for service deployment, but adds complexity. For Alpha, running python main.py manually or via Task Scheduler might be sufficient.

14.4 Health Checks & Monitoring (OpsManager Implementation Details)

  • FastAPI Service Health Endpoint: Add a simple /health endpoint to external/rocm-video-service/main.py and external/rocm-audio-service/main.py: # In FastAPI app (main.py) from fastapi import FastAPI app = FastAPI() # ... other endpoints ... @app.get("/health") async def health_check(): # Add more sophisticated checks if needed (e.g., model loaded, GPU accessible) return {"status": "ok"} IGNORE_WHEN_COPYING_START content_copy download Use code with caution.PythonIGNORE_WHEN_COPYING_END
  • OpsManager.check_service_health Implementation: # core/ops_manager.py async def check_service_health(self, service_name: str) -> bool: url = self.service_urls.get(service_name) health_url = None if service_name == 'a1111' and url: # A1111 might not have a dedicated /health, check base URL or /docs health_url = f"{url}/docs" # Or just url elif service_name in ['rocm_video', 'rocm_audio'] and url: health_url = f"{url}/health" elif service_name == 'lm_studio' and url: # Check if model endpoint responds health_url = f"{url}/v1/models" # Add checks for other essential services if needed if not health_url: self.logger.warning(f"No health check URL configured for service: {service_name}") return True # Assume ok if not configured for check try: async with httpx.AsyncClient(timeout=5.0) as client: # Short timeout for health check response = await client.get(health_url) is_healthy = 200 <= response.status_code < 300 if not is_healthy: self.logger.warning(f"Health check failed for {service_name} at {health_url}: Status {response.status_code}") return is_healthy except (httpx.RequestError, asyncio.TimeoutError) as e: self.logger.error(f"Health check failed for {service_name} at {health_url}: {e}") return False IGNORE_WHEN_COPYING_START content_copy download Use code with caution.PythonIGNORE_WHEN_COPYING_END
  • OpsManager.attempt_service_recovery Implementation: # core/ops_manager.py async def attempt_service_recovery(self, service_name: str) -> bool: if not await self.check_service_health(service_name): command_info = self.recovery_commands.get(service_name) if not command_info: self.logger.warning(f"No recovery command configured for failed service: {service_name}") return False self.logger.info(f"Attempting recovery for service: {service_name} using command: {command_info}") try: # Use OS_Client_PSExecutor or Docker_Client based on command type # Example using subprocess directly for simplicity: process = await asyncio.create_subprocess_shell( command_info, # Assuming command_info is a string like "docker restart X" or "nssm restart Y" stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE ) stdout, stderr = await process.communicate() if process.returncode == 0: self.logger.info(f"Recovery command executed successfully for {service_name}.") await asyncio.sleep(15) # Wait for service to potentially restart recovered = await self.check_service_health(service_name) self.logger.info(f"Service {service_name} recovery status after attempt: {recovered}") return recovered else: self.logger.error(f"Recovery command failed for {service_name}. Stderr: {stderr.decode()}") return False except Exception as e: self.logger.error(f"Exception during recovery attempt for {service_name}: {e}") return False return True # Service was already healthy IGNORE_WHEN_COPYING_START content_copy download Use code with caution.PythonIGNORE_WHEN_COPYING_END
  • Scheduled Health Check: AC_Scheduler calls OpsManager.check_all_services() periodically (e.g., every 5 minutes). check_all_services iterates through configured services, calls check_service_health, and if failed, calls attempt_service_recovery.

14.5 First-Run Setup Wizard (UI_SetupWizard Detailed Flow)

  1. Page 1: Welcome: Simple welcome message. “Next” button.
  2. Page 2: Environment Check (Optional but Recommended): Run basic checks via scripts/setup_check.py (called via subprocess): Python version, Git found, Docker running, ROCm detected (rocminfo), Tesseract found. Display results (Green check / Red X). Warn user about missing critical components. “Next” button enabled only if critical checks pass.
  3. Page 3: Service URLs: Display QLineEdit fields pre-populated with default URLs from settings.yaml.example for LM Studio, A1111, ROCm Video, ROCm Audio. Allow user to edit. Add “Test Connection” buttons next to each URL which trigger OpsManager.check_service_health. Display check status. “Next” button enabled only if core services (LM Studio mandatory, others optional but recommended) are reachable.
  4. Page 4: Key File Paths: Display QLineEdit fields with “Browse” buttons for critical paths from settings.yaml (Database, Outputs, Cache, Plugins, ONNX Model, ByteCraft Script, Amuse Python Exe, Amuse Script). Pre-populate with defaults relative to project root. Validate paths exist. “Next” button.
  5. Page 5: Initial Preferences: Allow user to set initial user_config.yaml values: Language (QComboBox), UI Theme (QComboBox), Proactive Features (QCheckBox), Model Selection Preference (QComboBox). “Next” button.
  6. Page 6: Summary & Finish: Display summary of configured settings. “Finish” button.
  7. On Finish: SetupWizard returns collected data. ConfigManager writes values to config/settings.yaml and UserConfigManager writes to config/user_config.yaml. Agent initialization proceeds.

14.6 Updating Procedures

  • Agent Self-Update (AutonomousTasksSkill):
    1. Scheduled job calls AutonomousTasksSkill.run_self_update_check().
    2. Skill uses GitClient (git fetch, git status) to check remote repository (defined in settings.yaml) for new commits on the current branch.
    3. If updates found, notify user via UI/Notification: “Updates available for LegacyOracle. Apply now? [Yes/No]”.
    4. If user confirms (or if configured for auto-update):
      a. Attempt GitClient.pull() to get updates.
      b. Handle potential merge conflicts (notify user, abort update).
      c. Check if requirements_agent.txt changed. If yes, run DependencyManager.update_dependencies().
      d. Check if DB schema version changed (PRAGMA user_version vs code constant). If yes, run migrations via MemoryManager.
      e. Signal agent core to restart gracefully (e.g., using os.execv or triggering external service manager like nssm restart LegacyOracleAgent).
  • External Services (A1111, ROCm Wrappers, etc.): Manual Process. These are independent applications/services. Users must update them manually by following their respective update procedures (e.g., git pull in A1111 directory and re-running setup, rebuilding Docker images for ROCm services).
  • Dependencies (DependencyManager): The agent can check for outdated Python dependencies using DependencyManager.check_updates() (run scheduled or manually). It can log findings or notify the user. Automatically applying updates (update_dependencies) is risky and should likely require explicit user confirmation via the UI due to potential breakage.

15. Development Plan / Phasing (Detailed Task Breakdown)

This section outlines a phased approach for developing LegacyOracle v6.6.2. The phases are designed to build functionality incrementally, allowing for testing and adaptation. Estimated efforts are in Person-Days (PD) and assume a small, focused development team (2-3 members) or equivalent solo effort. These estimates are approximate and subject to change based on unforeseen complexities. An Agile methodology (Scrum or Kanban) is recommended.


Phase 0: Setup & Foundation (Sprint 0 / Estimated: 1-2 Weeks / 5-10 PD)

  • Goal: Establish the core development environment, project structure, version control, basic CI/CD pipeline, and essential configuration loading. Ensure all prerequisite external software (LM Studio, Docker, etc.) is installed and minimally functional.
  • Tasks:
    • T0.1: Initialize Git Repository (GitHub/GitLab), establish branching strategy (e.g., Gitflow: main, develop, feature/, release/, hotfix/*). (0.5 PD)
    • T0.2: Define and create initial Project Structure (folders for core, skills, ui, clients, config, data, logs, outputs, tests, external, scripts, models, locale, plugins) as per Section 7. (0.5 PD)
    • T0.3: Setup Core Python Virtual Environment (.venv) using Python 3.10.12. Create initial requirements_agent.txt with core libs (PyQt6, asyncio, httpx, PyYAML). (0.5 PD)
    • T0.4: Implement core.ConfigManager to load/validate initial config/settings.yaml.example. Define basic settings.yaml structure. (1 PD)
    • T0.5: Implement core.ErrorHandler with basic file logging (logging module) to logs/agent.log. (0.5 PD)
    • T0.6: Implement basic core.LocalizationManager structure (load default ‘en’ locale). Create initial locale/en/LC_MESSAGES/legacy_oracle.po. (0.5 PD)
    • T0.7: Setup basic CI pipeline (e.g., GitHub Actions): Trigger on push/PR to develop, run linters (flake8, black), run pytest (initially with few tests). (1 PD)
    • T0.8: Verify External Service Setup: Manually confirm LM Studio API is running, A1111 API is running, ROCm installed correctly (rocminfo). Document verification steps. (1 PD)
  • Milestone: Code repository established, basic project structure exists, core environment setup, essential managers (Config, Error, Locale) implemented, basic CI running, external services verified.

Phase 1: Core Agent Loop & Basic UI Shell (Sprints 1-2 / Estimated: 2-3 Weeks / 10-15 PD)

  • Goal: Implement the minimal viable agent loop connected to a basic, non-functional UI shell. Demonstrate basic request-response flow with a mocked LLM initially.
  • Tasks:
    • T1.1: Implement core.agent.AgentOrchestrator main asyncio event loop structure. (1 PD)
    • T1.2: Implement core.StateManager class with basic states (idle, thinking, speaking, error) and emotions (neutral, positive, negative). (0.5 PD)
    • T1.3: Implement basic core.AC_CommInterface (using QObject for signals/slots) defining key signals/slots for UI<->Core communication. (1 PD)
    • T1.4: Implement basic core.SkillDispatcher (register/lookup dummy skills). (0.5 PD)
    • T1.5: Implement clients.LMS_Client basic async invoke method (initially can return hardcoded response or error). Integrate with AgentOrchestrator. (1 PD)
    • T1.6: Implement ui.main_window.MainWindow structure with QTabWidget and empty placeholder panels. (1 PD)
    • T1.7: Implement ui.controller.UI_Controller, instantiate panels, setup basic signal/slot connections to AC_CommInterface. (1.5 PD)
    • T1.8: Implement ui.widgets.ChatPanel (basic display/input) and ui.widgets.TerminalPanel (basic display). (1 PD)
    • T1.9: Implement ui.widgets.AnimatedFaceWidget with static state drawing (idle/thinking/speaking) driven by basic VisualPersonaController. Connect state updates via signals. (1.5 PD)
    • T1.10: Connect basic UI Input -> Agent -> Mocked LLM -> Agent -> UI Output flow. Ensure basic async communication works. (1.5 PD)
    • T1.11: Integrate ErrorHandler logging with UI_TerminalPanel display via signals. (0.5 PD)
    • T1.12: Address critical bugs inherited from any previous codebase state. (1+ PD – Contingency)
  • Milestone: User can type a message in the UI, the agent simulates processing (changes face state), a mocked/simple LLM response appears in the chat, and logs appear in the terminal. Core loop runs.

Phase 2: Model Selection, Reasoning Stream, Basic Memory (Sprints 3-5 / Estimated: 3-4 Weeks / 15-20 PD)

  • Goal: Implement dynamic model selection, stream reasoning steps to the UI, and establish basic chat history persistence.
  • Tasks:
    • T2.1: Implement core.ResourceGovernor (methods for checking/tracking VRAM/RAM based on psutil and SKILLS_MATRIX estimates). (1 PD)
    • T2.2: Define full config/model_config.py (SKILLS_MATRIX with Qualitative ratings, TEMPERATURES, etc.). (1 PD)
    • T2.3: Implement core.ModelSelector logic (parsing matrix, scoring based on qualitative ratings via internal map, resource filtering via ResourceGovernor, preference tie-breaking). Implement get_temperature_for_model. (3 PD)
    • T2.4: Enhance clients.LMS_Client to handle actual async model loading (/v1/models/load), unloading (/v1/models/unload), and error handling (model not found, load failed). (2 PD)
    • T2.5: Integrate ModelSelector call into AgentOrchestrator’s process_task flow before LLM invocation. Integrate LMS_Client load/unload calls managed by ModelSelector. (1.5 PD)
    • T2.6: Implement reasoning_models/BaseReasoningStrategy requiring stream_callback. Implement GeneralReasoning and CodeReasoning strategies with basic LLM prompts and streaming logic. (2 PD)
    • T2.7: Implement core.ReasoningController (load strategies, select based on basic task category, execute strategy passing stream_callback). (1 PD)
    • T2.8: Implement AC_CommInterface signals and UI_Controller slots for handling reasoningStep(str) and modelSelectionUpdate(str, str). (1 PD)
    • T2.9: Implement UI_ChatPanel logic to display [Reasoning: …] and [Model Selected: …] messages distinctly. (1 PD)
    • T2.10: Implement UI_SettingsPanel widgets (QComboBox) for selecting Reasoning Strategy and Manual Model Override. Connect signals to UI_Controller/AC_CommInterface. (1.5 PD)
    • T2.11: Enhance VisualPersonaController and AnimatedFaceWidget to handle dynamic emotion updates based on agent response analysis (simple keyword check initially). (1 PD)
    • T2.12: Implement core.MemoryManager with SQLite backend (agent_data.db). Implement chat_history table schema and methods (store_chat_message, get_recent_history). Integrate history storage into AgentOrchestrator. (2 PD)
  • Milestone: Agent dynamically selects and loads appropriate LM Studio models based on task type. Reasoning steps are streamed live to the chat UI. Basic chat history is stored. UI allows selecting reasoning strategy/model override. Face shows basic emotion.

Phase 3: Foundational Skills & OS Awareness (Sprints 6-8 / Estimated: 3-4 Weeks / 15-20 PD)

  • Goal: Implement core skills for interacting with the web, file system, OS, and managing knowledge. Integrate system monitoring into the UI.
  • Tasks:
    • T3.1: Implement clients.SearchAPI_Client (using duckduckgo-search). Implement skills.WebSearchSkill. (1.5 PD)
    • T3.2: Implement skills.FileSystemSkill (read, write, list, create dir, delete – with path validation and safety checks). (1.5 PD)
    • T3.3: Implement skills.NLQuerySkill (uses LMS_Client for summarization, Q&A on provided text). (1 PD)
    • T3.4: Implement skills.KnowledgeAcqSkill (basic workflow: check memory -> WebSearchSkill -> NLQuerySkill summarize -> MemoryManager store). (2 PD)
    • T3.5: Implement core.MemoryManager knowledge_base table and methods (store_knowledge, retrieve_knowledge). (1 PD)
    • T3.6: Implement clients.OS_Client_Monitor (using psutil, basic wmi for GPU if possible – research required). Implement skills.SystemMonitorSkill. (2 PD)
    • T3.7: Implement clients.OS_Client_Context (using pywin32 for active window, pyperclip for clipboard). Implement skills.UserContextSkill. (2 PD)
    • T3.8: Integrate SystemMonitorSkill data into UI_StatusPanel via UI_Controller. (1 PD)
    • T3.9: Implement full core.LocalizationManager (gettext) integration. Implement UI_LanguageSelector. Ensure UI text refresh on language change. Create initial .po/.mo files for English. (2 PD)
    • T3.10: Implement Unit tests for new skills and clients. Implement Integration tests for Knowledge Acquisition workflow. (2 PD)
  • Milestone: Agent can search web, interact with files, summarize text, answer questions using acquired knowledge, monitor basic system stats, and be localized (initially English).

Phase 4: Generative Services & Advanced Skills (Sprints 9-12 / Estimated: 4-6 Weeks / 20-30 PD)

  • Goal: Integrate all external generative services (A1111, ROCm, Amuse, ByteCraft) and the code interpreter skill. Build corresponding UI tabs.
  • Tasks:
    • T4.1: Set up external services robustly: A1111 (API mode), ROCm FastAPI Wrappers (Video/Audio – implement basic service logic and Dockerfiles). Verify manual interaction via API. (3 PD)
    • T4.2: Define detailed API contracts (JSON Schemas) for ROCm services (/generate, /status). (1 PD)
    • T4.3: Implement Backend Clients: GenService_Image_Client (A1111), GenService_Video_Client, GenService_Audio_Client (using httpx, handling async polling via task IDs). (3 PD)
    • T4.4: Implement API Client Skills: ImageGenClientSkill, VideoGenClientSkill, AudioGenClientSkill. (1 PD)
    • T4.5: Implement core.AsyncTaskManager to reliably poll /status endpoints and manage background generation tasks. (2 PD)
    • T4.6: Implement clients.Sandbox_Client using Docker SDK for secure code execution. Define container setup (image, network isolation, timeouts, resource limits). (2 PD)
    • T4.7: Implement skills.CodeInterpreterSkill. (1 PD)
    • T4.8: Implement clients.BC_CLI_Client (subprocess wrapper). Implement skills.ByteCraftSkill. (2 PD)
    • T4.9: Implement clients.AmuseClient (subprocess wrapper for dedicated env). Implement skills.AmuseSkill. (2 PD)
    • T4.10: Implement UI GenerativeTabs with parameter inputs for each service. (3 PD)
    • T4.11: Implement UI MediaPlayers (Ruffle via QWebEngineView, Image via QLabel, Video/Audio via QMediaPlayer). Integrate into Generative Tabs and View Window. (3 PD)
    • T4.12: Implement core.ModelOptimizer basic caching logic (using diskcache). Integrate cache check into LMS_Client. (1 PD)
    • T4.13: Write Integration tests for each generative workflow (UI Trigger -> Core -> Skill -> Client -> Mocked/Live Service -> UI Display). (4 PD)
  • Milestone: Agent can generate images (A1111, Amuse), video (ROCm), audio (ROCm), SWF (ByteCraft) triggered from UI. Results are displayed. Agent can execute Python code in a sandbox. LLM caching active.

Phase 5: OS Control, Proactivity, Security (Sprints 13-15 / Estimated: 4-5 Weeks / 20-25 PD)

  • Goal: Implement secure elevated actions, proactive suggestions, native OS integrations, and user configuration management.
  • Tasks:
    • T5.1: Implement core.SecurityManager (Privilege mode state, ACL checks from DB/config, UAC interface logic). (3 PD)
    • T5.2: Implement clients.OS_Client_PSExecutor (handling standard vs. elevated Start-Process -Verb RunAs via subprocess). (1 PD)
    • T5.3: Implement skills.OSControlSkill (safe actions: send_native_notification, query_process_list; UAC-triggered actions: run_shell_command_admin). Integrate SecurityManager checks. (4 PD)
    • T5.4: Implement clients.CredentialManagerClient using keyring. (1 PD)
    • T5.5: Implement core.ProactiveManager (ActivityLogger writing to agent_data.db::activity_log, PatternAnalyser with initial rule-based logic, Notifier triggering native notifications). (4 PD)
    • T5.6: Implement ui.NativeIntegration (pystray for tray icon/menu, win11toast for notifications). Connect UI actions/agent signals. (2 PD)
    • T5.7: Implement basic UI Overlay (UI_OverlayWidget/Controller) or notification panel to display proactive suggestions. (2 PD)
    • T5.8: Enhance core.ResourceGovernor to monitor power state (OS_Client_Advanced) and apply throttling rules. (1 PD)
    • T5.9: Implement core.UserConfigManager (Load/Save user_config.yaml). (1 PD)
    • T5.10: Implement core.OpsManager (basic health checks for configured services using clients). (2 PD)
    • T5.11: Perform initial Security Review of privilege handling, credential storage, input sanitization. Write specific security tests. (2 PD)
    • T5.12: Update UI_SettingsPanel to include Privilege Mode toggle, Proactive settings, Theme/Language connected to UserConfigManager. (1 PD)
  • Milestone: Agent can perform basic elevated actions with UAC confirmation. Proactive suggestions appear as native notifications. System Tray icon functional. User preferences are saved/loaded. Basic service health monitored.

Phase 6: Autonomy, Multi-Agent, Learning Agent, Automation Frameworks (Sprints 16-18 / Estimated: 5-7 Weeks / 25-35 PD)

  • Goal: Implement core autonomous loops, multi-agent coordination, performance logging, and the full GUI/Web automation frameworks.
  • Tasks:
    • T6.1: Implement clients.GitClient. Implement skills.AutonomousTasksSkill self-update check logic. (2 PD)
    • T6.2: Implement reasoning_models.SelfCritiqueReasoning strategy. (1 PD)
    • T6.3: Enhance core.MemoryManager to support reflection_log and performance_log tables. (1 PD)
    • T6.4: Implement skills.AutonomousTasksSkill reflection cycle (triggers critique strategy, logs results). (2 PD)
    • T6.5: Implement core.ConductorAgent (basic task decomposition logic – rule-based or simple LLM call, asyncio.Queue based IPC). (3 PD)
    • T6.6: Implement agents.BaseSubAgent. Implement initial CodeMasterAgent and VisionAnalystAgent structures, handling basic requests via Conductor. (4 PD)
    • T6.7: Implement core.LearningAgent (receiving performance data via signals/calls, logging to performance_log DB table). (2 PD)
    • T6.8: Implement core.RLAgent basic structure (class, interface methods – no complex training logic yet). (1 PD)
    • T6.9: Implement GUI Macro Framework: TaskDefinition, AutomationStep, TAF_Engine (using InputSimClient), RecordMacroDialog (using pynput), MacroExecutionSkill, TaskManager (JSON/DB load/save). (6 PD)
    • T6.10: Implement Web Automation Framework: WebTaskDefinition, WebAutomationStep, WAF_Engine (using WebDriverClient), RecordWebDialog (guided definition or basic recorder), WebAutomationSkill. Integrate CredentialManagerClient. (6 PD)
    • T6.11: Implement core.WorkflowEngine (sequential execution, basic input/output mapping). Implement basic UI_WorkflowEditor. (3 PD)
    • T6.12: Integration tests for multi-agent delegation and workflow execution. (3 PD)
  • Milestone: Agent performs basic self-reflection & update checks. Simple multi-agent delegation works. Performance is logged. Users can record, define, and execute both GUI and Web automation tasks and basic workflows.

Phase 7: Advanced Features, Optimization, Release Prep (Sprints 19+ / Estimated: 5+ Weeks / 25+ PD)

  • Goal: Implement remaining advanced features, optimize performance, conduct thorough testing, finalize documentation, and prepare for Alpha release.
  • Tasks:
    • T7.1: Implement full skills.VisionSkill (OCR via OCR_Client, Object Detection via ObjectDetect_Client, integration with ViewWindowPanel). (5 PD)
    • T7.2: Implement interactive Live Chat Overlay UI and integrate with ProactiveManager/VisionSkill. (4 PD)
    • T7.3: Enhance skills.OSControlSkill with UI Automation features (uiautomation via OS_Client_Advanced). (3 PD)
    • T7.4: Implement skills.FewShotSkill. (1 PD)
    • T7.5: Implement skills.LogicSkill & LogicReasoningStrategy. (2 PD)
    • T7.6: Implement LearningAgent empirical matrix update logic (refining ModelSelector scores based on performance_log). (1 PD)
    • T7.7: Implement ModelOptimizer GAIA integration for quantization/pruning (optional). Benchmark performance. (3 PD)
    • T7.8: Implement OpsManager service recovery and load balancing logic. (2 PD)
    • T7.9: Implement PluginManager (loading, basic execution) & example plugin. Implement UI_PluginManagerUI. (3 PD)
    • T7.10: Implement DependencyManager checks. (1 PD)
    • T7.11: Comprehensive Testing: Execute full E2E test plan, Security audit, Performance benchmarking & optimization. (5+ PD)
    • T7.12: Finalize Documentation: User Guide, Developer Guide, Tutorials, API Docs, Glossary. (3 PD)
    • T7.13: Implement UI_SetupWizard fully. (2 PD)
    • T7.14: Final Bug Fixing, UI Polish, Performance Tuning. (Variable)
    • T7.15: Create Release Package (Installer or Docker Compose bundle). (1 PD)
  • Milestone: All planned features implemented. System is stable, performant, well-tested, and documented. Ready for Alpha release.

(Total Estimated Effort: ~140 – 200+ Person-Days)

Okay, expanding Section 16: Developer Integration Steps (Checklist) into a more detailed, actionable checklist for the LegacyOracle Super Agent v6.6.2 specification. This serves as a high-level guide for developers starting the project or integrating these features into an existing codebase.


(Start of Section 16)

16. Developer Integration Steps (Checklist)

This checklist outlines the primary steps a developer or team should follow to implement the LegacyOracle Super Agent based on this v6.6.2 specification. It assumes the developer is starting from scratch or integrating these features into a compatible baseline. Follow these steps sequentially, aligning with the Phased Development Plan (Section 15).

Phase 0: Environment & Project Foundation

  • 1. Understand the Vision & Architecture: Thoroughly read Sections 1-4 of this document to grasp the project goals, architectural style, and overall component layout.
  • 2. Set up Hardware & OS: Ensure the development machine meets the specifications in Section 6.1 & 6.2 (Win 11 Pro, Target RAM/GPU).
  • 3. Install Core Software: Install Python 3.10+, Git, and Docker Desktop per Section 6.3. Verify installations.
  • 4. Install GPU Drivers & ROCm: CRITICAL STEP. Install latest AMD Adrenalin drivers and the correct ROCm version for Windows per AMD documentation (Section 6.4). Verify with rocminfo.
  • 5. Set up Version Control: Initialize Git repository (git init), create main and develop branches, configure .gitignore. Establish contribution workflow (e.g., feature branches off develop).
  • 6. Create Project Structure: Create the top-level directories (core, skills, ui, clients, config, data, logs, outputs, tests, external, scripts, models, locale, plugins) as defined in Section 7.
  • 7. Set up Core Python Environment: Create and activate the main virtual environment (.venv). Create requirements_agent.txt (initially with core libs like PyQt6, asyncio, httpx, PyYAML). Install initial dependencies (pip install -r requirements_agent.txt). (Section 6.12)
  • 8. Implement Basic Config Loading: Implement core.ConfigManager to load config/settings.yaml.example. Create the example file with initial service URLs and paths. (Section 9.2.13)
  • 9. Implement Basic Logging/Error Handling: Setup logging configuration. Implement basic core.ErrorHandler. (Section 9.2.14)
  • 10. Setup Basic CI: Configure GitHub Actions/GitLab CI for automatic linting (flake8, black) and running pytest on push/PR to develop. (Section 13)

Phase 1: Core Agent Loop & Basic UI Shell

  • 11. Implement Core Agent Loop: Create core.agent.AgentOrchestrator with main asyncio loop structure. (Section 9.2.1)
  • 12. Implement State Management: Create core.StateManager (basic states/emotions). (Section 9.2.2)
  • 13. Implement UI <-> Core Communication: Create core.AC_CommInterface and ui.controller.UI_Controller. Establish basic Signal/Slot connections using asyncqt or threads. (Sections 9.1.1, 9.1.18)
  • 14. Implement Basic UI Shell: Create ui.main_window.MainWindow with QTabWidget. Implement basic ChatPanel, TerminalPanel, and static AnimatedFaceWidget. (Sections 9.1.1, 9.1.2, 9.1.3, 9.1.4)
  • 15. Implement Basic LMS Client: Create clients.LMS_Client with only the async invoke method (can initially mock the HTTP call). (Section 9.6)
  • 16. Connect Basic Workflow: Ensure UI input -> Agent -> Mocked LLM -> Agent -> UI output works. Ensure logs appear in TerminalPanel. (Section 11 – Basic Chat Workflow)
  • 17. Address Inherited Bugs: Fix any critical blocking bugs from the previous state if this is an upgrade.

Phase 2: Model Selection, Reasoning Stream, Basic Memory

  • 18. Implement Resource Governor: Create core.ResourceGovernor with VRAM/RAM check logic (using psutil). (Section 9.2.10)
  • 19. Finalize model_config.py: Create/verify the full SKILLS_MATRIX with Qualitative ratings and resource estimates. (Section 10.1)
  • 20. Implement Model Selector: Create core.ModelSelector implementing the full dynamic selection logic (scoring, resource filter, preference). (Section 9.2.5)
  • 21. Enhance LMS Client: Add async load_model, async unload_model methods, calling LM Studio API. Add error handling. (Section 9.6)
  • 22. Integrate Model Selection: Update AgentOrchestrator to call ModelSelector and use the selected model/temp. Handle model loading via ModelSelector. (Section 9.2.1, 9.2.5)
  • 23. Implement Reasoning Framework: Create reasoning_models.BaseReasoningStrategy (with stream_callback). Implement GeneralReasoning, CodeReasoning strategies. (Section 9.5)
  • 24. Implement Reasoning Controller: Create core.ReasoningController for strategy selection and execution, managing the streaming callback. (Section 9.2.4)
  • 25. Implement UI Reasoning Stream: Update UI_ChatPanel to display [Reasoning: …] steps received via signals. (Section 9.1.2)
  • 26. Implement UI Settings (Model/Strategy): Add widgets to SettingsPanel for selecting strategy/model override. Connect signals. (Section 9.1.6)
  • 27. Enhance UI Face: Implement dynamic emotion updates in AnimatedFaceWidget based on agent state signals. (Section 9.1.4)
  • 28. Implement Memory Manager (Chat): Create core.MemoryManager, implement SQLite backend, chat_history table schema, and methods. Integrate history storage/retrieval. (Sections 9.2.3, 10.2)

Phase 3: Foundational Skills & OS Awareness

  • 29. Implement Core Skills: Create skills.FileSystemSkill, skills.WebSearchSkill (with SearchAPI_Client), skills.NLQuerySkill (using LMS_Client). (Section 9.3)
  • 30. Implement Knowledge Acquisition: Create skills.KnowledgeAcqSkill orchestrating Search->Summarize->Store. Implement knowledge_base table/methods in MemoryManager. (Sections 9.3.15, 9.2.3, 10.2)
  • 31. Implement OS Monitoring: Create clients.OS_Client_Monitor (WMI/psutil). Create skills.SystemMonitorSkill. Integrate vitals display into UI_StatusPanel. (Sections 9.6, 9.3.12, 9.1.7)
  • 32. Implement User Context: Create clients.OS_Client_Context (pywin32/pyperclip). Create skills.UserContextSkill. (Sections 9.6, 9.3.13)
  • 33. Implement Localization: Implement full core.LocalizationManager (gettext). Create locale/ structure and initial .po/.mo files. Implement UI_LanguageSelector. Integrate _() calls in UI. (Sections 9.2.20, 9.1.16, 12.5)
  • 34. Write Tests: Add unit/integration tests for newly implemented skills and managers.

Phase 4: Generative Services & Advanced Skills

  • 35. Setup External Services: Ensure A1111 and ROCm FastAPI services are running and configured correctly per Section 6 & 9.7 API specs.
  • 36. Implement ROCm Service Code: Write the FastAPI application code (main.py) and underlying generation scripts (run_rocm_*.py) for Video and Audio services. Create requirements_*.txt for them. (Section 9.7)
  • 37. Implement Gen Service API Clients: Create clients.A1111_Client, clients.GenService_Video_Client, clients.GenService_Audio_Client (using httpx). (Section 9.6)
  • 38. Implement API Client Skills: Create skills.clients.ImageGenClientSkill, VideoGenClientSkill, AudioGenClientSkill. (Section 9.3.2)
  • 39. Implement Async Task Manager: Create core.AsyncTaskManager for polling /status endpoints. Integrate with API Client Skills. (Section 9.2.9)
  • 40. Implement Code Interpreter: Create clients.Sandbox_Client (Docker SDK). Create skills.CodeInterpreterSkill. Configure sandbox security. (Sections 9.6, 9.3.6)
  • 41. Implement ByteCraft: Create clients.BC_CLI_Client. Create skills.ByteCraftSkill. Configure path in settings.yaml. (Sections 9.6, 9.3.8)
  • 42. Implement Amuse: Create clients.AmuseClient. Create skills.AmuseSkill. Configure paths in settings.yaml. (Sections 9.6, 9.3.11)
  • 43. Implement UI Generative Tabs: Create/populate GenerativeTabs widget with inputs for each service. (Section 9.1.8)
  • 44. Implement UI Media Players: Create/integrate ImageViewer, VideoPlayer, AudioPlayer, SwfPlayer (Ruffle via QWebEngineView). (Section 9.1.8)
  • 45. Implement Caching: Implement core.ModelOptimizer basic caching logic (diskcache). Integrate into LMS_Client. (Section 9.2.22)
  • 46. Write Tests: Integration tests for all generative workflows (UI -> Service -> UI).

Phase 5: OS Control, Proactivity, Security

  • 47. Implement Security Manager: Create core.SecurityManager. Implement privilege mode logic, ACL checks (read from DB/config), UAC interface via OS_Client_PSExecutor. (Section 9.2.12)
  • 48. Implement PowerShell Executor: Create clients.OS_Client_PSExecutor handling standard/elevated execution via subprocess. (Section 9.6)
  • 49. Implement OS Control Skill (Basic): Create skills.OSControlSkill. Implement safe actions (notifications via client) and basic elevated command execution (using SecurityManager). (Section 9.3.14)
  • 50. Implement Credential Manager Client: Create clients.CredentialManagerClient using keyring. (Section 9.6)
  • 51. Implement Proactive Manager: Create core.ProactiveManager, ActivityLogger (writes to DB), PatternAnalyser (rule-based), Notifier (uses NativeIntegration). (Section 9.2.6)
  • 52. Implement Native Integration: Create ui.NativeIntegration (pystray, win11toast). (Section 9.1.14)
  • 53. Integrate Proactive UI: Implement basic UI Overlay or notification panel in UI_MainWindow/UI_Controller to display suggestions. (Section 9.1)
  • 54. Enhance Resource Governor: Add power state awareness (OS_Client_Advanced). (Section 9.2.10)
  • 55. Implement User Config Manager: Create core.UserConfigManager. Integrate with SettingsPanel for loading/saving user prefs. (Sections 9.2.17, 9.1.6)
  • 56. Implement Ops Manager (Basic): Create core.OpsManager. Implement health check logic (check_service_health). (Section 9.2.18)
  • 57. Security Review & Testing: Perform manual review of privilege handling, implement specific tests. (Section 13.6)

Phase 6: Autonomy, Multi-Agent, Learning Agent, Automation Frameworks

  • 58. Implement Autonomous Tasks Skill: Create skills.AutonomousTasksSkill. Implement Self-Update check (GitClient). Implement Reflection cycle trigger (calls SelfCritiqueReasoning). (Section 9.3.9)
  • 59. Implement Self-Critique Strategy: Create reasoning_models.SelfCritiqueReasoning. (Section 9.5)
  • 60. Enhance Memory Manager: Implement reflection_log and performance_log tables/methods. (Section 9.2.3)
  • 61. Implement Conductor Agent: Create core.ConductorAgent with basic task decomposition (rules/LLM) and sub-agent delegation logic using asyncio.Queue. (Section 9.2.11)
  • 62. Implement Sub-Agents: Create agents.BaseSubAgent. Implement initial CodeMasterAgent and VisionAnalystAgent. Define communication protocol. (Section 9.4)
  • 63. Implement Learning Agent (Logging): Create core.LearningAgent. Implement log_performance method writing to DB. Integrate calls from skills/core. (Section 9.2.15)
  • 64. Implement RL Agent Structure: Create core.RLAgent class structure (no training logic yet). (Section 9.2.16)
  • 65. Implement GUI Macro Framework: Create core.automation.gui_macro components (TaskDefinition, AutomationStep, TAF_Engine). Implement skills.MacroExecutionSkill. Implement core.TaskManager (JSON/DB). Implement ui.widgets.RecordMacroDialog (with pynput). Integrate into UI_TaskMgmtPanel. (Section 9.8)
  • 66. Implement Web Automation Framework: Create core.automation.web_automation components (WebTaskDefinition, WebAutomationStep, WAF_Engine). Implement skills.WebAutomationSkill. Enhance TaskManager. Implement ui.widgets.RecordWebDialog (guided approach first). Integrate into UI_TaskMgmtPanel. Integrate CredentialManagerClient. (Section 9.9)
  • 67. Implement Workflow Engine: Create core.WorkflowEngine (sequential execution, basic I/O mapping). Implement basic UI_WorkflowEditor. (Section 9.10, 9.1.12)
  • 68. Write Tests: Integration tests for multi-agent delegation, workflow execution, GUI/Web macro record/play.

Phase 7: Advanced Features, Optimization, Release Prep

  • 69. Implement Full Vision Skill: Implement OCR (OCR_Client), Object Detection (ObjectDetect_Client). Integrate results. Feed visual data to UI_ViewWindowPanel. (Sections 9.3.7, 9.1.13, 9.6)
  • 70. Implement Live Chat Overlay: Create UI_OverlayWidget/Controller. Integrate with ProactiveManager/VisionSkill. (Section 9.1)
  • 71. Implement OS Control UI Automation: Enhance OSControlSkill using uiautomation via OS_Client_Advanced. (Sections 9.3.14, 9.6)
  • 72. Implement FewShotSkill: Create skills.FewShotSkill. (Section 9.3.19)
  • 73. Implement Logic Skill & Strategy: Create skills.LogicSkill and reasoning_models.LogicReasoning. (Sections 9.3.20, 9.5)
  • 74. Implement Learning Agent Matrix Updates: Add logic to LearningAgent to analyze performance_log and adjust internal matrix scores used by ModelSelector. (Section 9.2.15)
  • 75. Implement Model Optimizer (Advanced): Integrate GAIA tools (gaia-toolbox) via ModelOptimizer. Add Quant/Prune methods. Benchmark vs optimum. (Section 9.2.22)
  • 76. Implement Ops Manager (Advanced): Add service recovery and load balancing logic. (Section 9.2.18)
  • 77. Implement Plugin Manager: Implement core.PluginManager (load/execute). Create example plugin. Implement UI_PluginManagerUI. (Sections 9.2.19, 9.1.15)
  • 78. Implement Dependency Manager: Create core.DependencyManager. Add UI check/update trigger. (Section 9.2.21)
  • 79. Comprehensive Testing: Execute full E2E, Security, Performance test plans. Address all findings. (Section 13)
  • 80. Finalize Documentation: Update all sections of this spec. Write User Guide, Tutorials. (Sections 19, 20)
  • 81. Implement Setup Wizard: Create UI_SetupWizard. (Section 9.1.17)
  • 82. Final Polish: Address all remaining bugs, optimize UI responsiveness, refine agent persona/prompts.
  • 83. Release Packaging: Create installer or finalize Docker Compose bundle for Alpha release.

17. Future Considerations (Post v6.6.2 – Consolidated List)

While the v6.6.2 specification outlines a highly capable Alpha release, the potential for LegacyOracle extends much further. This section details potential enhancements and directions for future development phases beyond the initial Alpha scope, building upon the established architecture.

  1. Advanced Reinforcement Learning (RL) & Training Pipelines:
    • Concept: Move beyond the basic structure of the RLAgent. Implement sophisticated RL algorithms (e.g., PPO, SAC from stable-baselines3 or custom implementations) to dynamically optimize complex decision-making processes. This could include model selection, skill chaining in workflows, parameter tuning for generative tasks, or even UI interaction strategies.
    • Implementation: Requires defining robust state representations, reward functions (potentially derived from LearningAgent logs and user feedback), and setting up offline or online training pipelines. This is computationally intensive and requires deep RL expertise.
  2. Federated Learning / Privacy-Preserving Collaboration:
    • Concept: If deployed to multiple users, explore federated learning techniques to allow agents to learn from collective usage patterns without sharing raw user data centrally. This could improve proactive suggestions, error prediction, or workflow optimization across the user base while maintaining privacy.
    • Implementation: Requires a complex federated learning framework, secure aggregation server, and careful design to prevent data leakage. High research and infrastructure overhead.
  3. Dynamic Plugin Marketplace & Sandboxing:
    • Concept: Expand the PluginManager into a full marketplace where users can discover, install, and manage community-developed or official plugins directly from the Coplot GUI.
    • Implementation: Requires a backend repository/API for plugin distribution, robust plugin validation/signing, and significantly enhanced sandboxing for executing untrusted plugin code (e.g., running plugins in dedicated processes with restricted permissions, using WebAssembly runtimes, or advanced OS-level sandboxing). Define strict plugin permission models.
  4. Cloud Synchronization Options (User Opt-in):
    • Concept: Allow users to optionally synchronize specific data (e.g., configuration (user_config.yaml), custom tasks/workflows, knowledge base entries, but likely not sensitive activity logs) across multiple devices using a secure cloud backend.
    • Implementation: Requires designing a secure cloud API and storage solution, implementing synchronization logic in the agent core, managing user accounts/authentication, and ensuring user control and transparency over synced data.
  5. Advanced Model Fine-Tuning Workflows (Local):
    • Concept: Integrate user-friendly workflows (potentially within the UI) to enable local fine-tuning of compatible open-weight LLMs (loaded via LM Studio or run separately) using the user’s own interaction data (chat history, feedback, corrected outputs) after explicit consent and anonymization.
    • Implementation: Requires integrating fine-tuning libraries (like axolotl, unsloth, trl), managing datasets derived from agent logs, providing UI controls for the process, handling resource requirements, and managing resulting fine-tuned model checkpoints. Significant complexity and resource needs.
  6. Holographic / AR / VR Interface Concepts:
    • Concept: Explore next-generation interfaces beyond the Coplot GUI. Imagine interacting with a 3D holographic version of the LegacyOracle avatar in an Augmented Reality or Virtual Reality environment, potentially overlaying information directly onto the user’s real-world or virtual workspace.
    • Implementation: Requires integration with specific AR/VR SDKs (e.g., Unity, Unreal Engine via API bridge, OpenXR), 3D modeling for the avatar, and rethinking the entire UI/UX paradigm. Highly experimental.
  7. Webcam-based Face/Emotion Tracking for User State:
    • Concept: With explicit user consent, use the webcam and local computer vision models (running via ONNX Runtime/GAIA) to analyze the user’s facial expression or attention level (e.g., looking at screen vs. away). Use this as additional context for the ProactiveManager or to adapt the agent’s persona/communication style (StateManager).
    • Implementation: Requires integrating webcam access libraries (opencv-python), local facial recognition/expression analysis models, and robust privacy controls/indicators. Significant privacy concerns must be addressed.
  8. High-Quality Local STT/TTS Integration:
    • Concept: Replace basic TTS (pyttsx3) and add Speech-to-Text (STT) for full voice interaction. Utilize high-performance, high-quality local engines.
    • Implementation: Integrate local STT libraries like Whisper.cpp (via bindings or subprocess) for transcription and local TTS engines like Piper or Coqui TTS (potentially as separate FastAPI services similar to ROCm generation if dependencies are complex) for more natural-sounding voice output. Requires significant effort in audio processing, wake-word detection, and managing audio streams.
  9. More Sophisticated UI Automation (OSControlSkill):
    • Concept: Move beyond coordinate-based or basic uiautomation property matching. Implement more intelligent interaction with application GUIs, potentially using visual understanding (VisionSkill) combined with accessibility APIs to handle dynamic layouts, custom controls, or applications without standard APIs. Could involve techniques like visual element search or LLM-driven interaction planning based on screenshots.
    • Implementation: Requires advanced computer vision techniques, potentially fine-tuning models for UI element recognition, and complex error handling for dynamic interfaces.
  10. Advanced VectorDB Integration & Reasoning over Knowledge Base:
    • Concept: Fully leverage a local Vector Database (ChromaDB, FAISS) integrated with MemoryManager. Store not just summaries but embeddings of chat history, documents, potentially visual context descriptions. Implement Retrieval-Augmented Generation (RAG) techniques within ReasoningController or KnowledgeAcqSkill to allow the agent to reason over its entire stored knowledge base semantically, providing more contextually relevant and informed answers.
    • Implementation: Requires setting up and managing the vector DB, implementing embedding generation pipelines (using local sentence-transformer models), and modifying reasoning prompts/strategies to incorporate retrieved context effectively.


18. Glossary (Comprehensive Definitions)

This glossary defines key terms, components, technologies, and concepts used within the LegacyOracle Super Agent v6.6.2 specification.

  • A1111 (Automatic1111 WebUI): A popular open-source web interface for Stable Diffusion based image generation. Used in this project as an external Service accessed via its HTTP API for image generation, typically leveraging the DirectML backend on AMD hardware.
  • ACL (Access Control List): A list of permissions determining which users or processes are granted access to objects or specific actions. Used by the SecurityManager (acl_rules table/config ) to define which agent actions require elevated (Administrator) privileges.
  • Agent Core: The central processing and orchestration part of LegacyOracle, implemented in Python using OpenManus principles. Includes managers for state, memory, configuration, skills, tasks, etc.
  • Agent Orchestrator (AgentOrchestrator): The main class within the Agent Core (core/agent.py) responsible for handling the primary interaction loop, receiving input, coordinating task analysis and delegation, managing overall state, and formatting final output for the UI.
  • AMD GAIA: An ecosystem and toolkit from AMD focused on optimizing the development and deployment of Generative AI applications specifically on AMD hardware platforms (CPUs, GPUs, Accelerators). Includes gaia-toolbox and potentially optimized runtimes.
  • Amuse: Specific image generation software targeted for integration. Assumed to be a native Windows application installed locally.
  • Amuse Implementation: The separate Python scripts and dedicated environment (external/amuse_implementation/) designed to control the main Amuse software, likely via CLI calls or other interfacing methods. Accessed by the AmuseSkill via the AmuseClient.
  • Animated Face (AnimatedFaceWidget): The custom PyQt6 widget in the Coplot GUI responsible for displaying a 2D animated face using QPainter. Its appearance changes based on signals from the VisualPersonaController reflecting the agent’s state and emotion.
  • API (Application Programming Interface): A set of rules and protocols allowing different software components or services to communicate with each other (e.g., REST APIs for Web Services, Python Class APIs for internal modules).
  • API Client Skills (skills/clients/): Skills whose primary function is to interact with an external service API (A1111, ROCm Video/Audio). They use corresponding Backend Clients (GenService_*_Client).
  • APScheduler: Python library used by the Scheduler component for scheduling background tasks (e.g., health checks, reflection cycles, proactive analysis) based on time intervals or cron expressions.
  • Architectural Style: The high-level design philosophy guiding the system’s structure (Hybrid: Modular Core + SOA + Multi-Agent + OS Integration + Plugins + Automation).
  • Async Task Manager (AsyncTaskManager): Core component responsible for managing long-running asynchronous tasks, particularly polling the /status endpoints of external generative services. Uses asyncio.
  • Asynchronous Programming (asyncio): Python’s framework for writing concurrent code using async/await syntax, crucial for preventing blocking operations (like network calls or long computations) from freezing the agent or UI.
  • Autonomy: The agent’s capability to perform actions (scheduled tasks, self-update, reflection) without direct, immediate user commands. Managed by AutonomousTasksSkill and Scheduler.
  • AutomationStep (core/automation/gui_macro/task_definition.py): Dataclass defining a single step in a GUI automation task (e.g., click at x,y; type text; press key). Used by TaskAutomationEngine.
  • Automatic1111 WebUI: See A1111.
  • Backend Client (clients/): Python class acting as a wrapper around external interactions (APIs, CLIs, OS functions, libraries like keyring or pywin32). Provides a clean, abstracted interface for Skills to use, handling details like HTTP requests, subprocess management, or specific API calls.
  • BaseSkill (skills/base_skill.py): Abstract Base Class defining the standard interface (async execute(**kwargs) -> dict) and return format that all skills (built-in and plugin) must implement.
  • BaseSubAgent (agents/base_sub_agent.py): Abstract Base Class defining the standard interface (e.g., async handle_task(task_data, context)) for specialized sub-agents managed by the ConductorAgent.
  • ByteCraft: A specific tool assumed to generate SWF (Flash) files via a Command Line Interface (CLI). Interacted with via ByteCraftSkill and BC_CLI_Client.
  • Caching (ModelOptimizer, diskcache): Storing the results of expensive computations (like LLM inference) to quickly retrieve them if the same input occurs again. Implemented using the diskcache library managed by ModelOptimizer.
  • Client: See Backend Client.
  • CLI (Command Line Interface): A text-based interface for interacting with software (e.g., Git CLI, Docker CLI, ByteCraft CLI, Amuse Implementation script). Used by various Backend Clients via subprocess.
  • Code Interpreter (CodeInterpreterSkill, Sandbox_Client): Skill responsible for securely executing code snippets (e.g., Python generated by an LLM) within an isolated Sandbox (Docker recommended).
  • Component Deep Dive: Section 9 of this document, providing detailed specifications for each software module.
  • Conductor Agent (ConductorAgent): Core component responsible for orchestrating complex tasks by decomposing them and delegating sub-tasks to specialized Sub-Agents. Manages communication and aggregates results.
  • Configuration Files (config/): Files storing settings and static data (settings.yaml, user_config.yaml, model_config.py). Managed by ConfigManager and UserConfigManager.
  • Config Manager (ConfigManager): Core component responsible for loading, validating, and providing access to system-wide settings from settings.yaml.
  • Consolidated Specification: This document (v6.6.2), intended as the single, complete source of truth.
  • Continual Learning: Advanced AI concept where a model learns incrementally over time without forgetting previously learned information. A goal for the LearningAgent.
  • Coplot GUI: The specific name for LegacyOracle’s advanced graphical user interface, built with PyQt6.
  • Core Agentic Flow: The sequence of steps the agent takes to process user input, select models, execute tasks, and generate output (detailed in Section 8).
  • Credential Manager Client (CredentialManagerClient, keyring): Backend Client using the keyring library to securely interact with the native OS credential store (Windows Credential Manager).
  • Cross-Cutting Concerns: Aspects of software design that affect multiple components (Security, Performance, Error Handling, etc.). Detailed in Section 12.
  • Data Storage (data/): Directory containing persistent data like the SQLite database (agent_data.db) and file cache (cache/).
  • Dependency Manager (DependencyManager): Core component responsible for checking installed Python package versions against requirements_agent.txt using pipdeptree.
  • Deployment: The process of installing and configuring LegacyOracle and its services on a target machine.
  • DirectML: Microsoft DirectX 12 API for hardware acceleration on various GPUs (including AMD), used as a backend by ONNX Runtime and Automatic1111.
  • Docker: Containerization platform used for sandboxing (CodeInterpreterSkill) and recommended for deploying external services (A1111, ROCm wrappers).
  • Docker Compose: Tool for defining and running multi-container Docker applications via a docker-compose.yaml file. Recommended deployment method.
  • Dynamic Model Selection: The core process, managed by ModelSelector, of choosing the best LLM from LM Studio at runtime based on task requirements and system resources.
  • E2E (End-to-End) Testing: Testing complete user workflows across the entire system.
  • Environment Setup: The mandatory process (Section 6) of installing all prerequisites (software, drivers, libraries, models, services).
  • Episodic Memory: The agent’s memory of past interactions and events, stored in the SQLite database (chat_history, activity_log) via MemoryManager.
  • Error Handler (ErrorHandler): Central component for catching, logging, and reporting errors from any part of the system.
  • Evolution History: Summary of the project’s architectural development phases (Section 2).
  • External Services: Applications or APIs running separately from the core agent process (LM Studio, A1111, ROCm Services).
  • FastAPI: A Python web framework used to create the API wrappers for the ROCm Video and Audio generation services.
  • Few-Shot Learning (FewShotSkill): AI technique allowing models to perform new tasks based on only a small number of examples provided in the prompt.
  • Flow Automation Framework (WorkflowEngine): Core component responsible for executing user-defined multi-step automation sequences combining various skills and tasks.
  • GAIA (AMD GAIA): AMD’s ecosystem/toolkit for optimizing AI on AMD hardware.
  • GAIA Toolbox (gaia-toolbox): Specific Python tools within GAIA for model optimization (quantization, pruning). Used optionally by ModelOptimizer.
  • Generative Services: The external services dedicated to multimedia generation (Image: A1111/Amuse; Video/Audio: ROCm FastAPI wrappers).
  • gettext: Standard library/toolchain for internationalization (i18n) and localization (l10n). Used by LocalizationManager.
  • Git: Version control system used for managing the codebase and potentially for agent self-updating.
  • Globalization (LocalizationManager): The process of designing the software to support multiple languages and regional conventions.
  • Glossary: This section (Section 18), defining key terms.
  • GUI Macro: An automation sequence specifically targeting Graphical User Interface interactions. Defined via TaskDefinition, executed by TaskAutomationEngine.
  • Handover Specification: This document, intended to provide all necessary information for development.
  • Health Checks: Periodic requests sent by OpsManager to external services to verify they are running and responsive.
  • Hybrid Architecture: The architectural style combining multiple patterns (Modular, SOA, Multi-Agent, etc.).
  • httpx: Asynchronous HTTP client library used for interacting with REST APIs (LM Studio, A1111, ROCm Services).
  • i18n / l10n: Abbreviations for Internationalization and Localization.
  • Input Simulation Client (Input_Sim_Client): Backend Client wrapping pyautogui, keyboard, and pynput for simulating mouse/keyboard actions. Used by MacroExecutionSkill / TaskAutomationEngine.
  • Integration Testing: Testing the interaction and data flow between connected components.
  • Keyring: Python library providing a cross-platform interface to native credential stores (like Windows Credential Manager). Used by CredentialManagerClient.
  • Knowledge Acquisition (KnowledgeAcqSkill): Skill focused on finding information (via Web Search), processing it (via NLQuery), and storing it (MemoryManager).
  • Knowledge Base (agent_data.db::knowledge_base): Table in the SQLite database storing structured summaries and facts learned by the agent.
  • Learning Agent (LearningAgent): Core component responsible for analyzing agent performance (from performance_log), updating the empirical skills matrix used by ModelSelector, implementing continual learning concepts, and providing reward signals for the RLAgent.
  • LEGOPROMIND: Conceptual name for the agent’s overall self-improvement system (encompassing Reflection, Learning Agent, Toolsmithing goals).
  • litellm: Python library providing a unified interface to call various LLM APIs, including the OpenAI-compatible endpoint provided by LM Studio. Used by LMS_Client.
  • Live Chat Overlay (UI_OverlayWidget): A semi-transparent, always-on-top UI element for displaying context-aware suggestions or brief agent interactions without switching windows.
  • LM Studio: Desktop application for discovering, downloading, and running local LLMs, providing an OpenAI-compatible API server at http://localhost:1234. The primary source of LLM inference for LegacyOracle.
  • Load Balancing: Distributing incoming requests or tasks across multiple resources (e.g., different LLMs or service instances) to improve throughput and prevent overload. Basic logic implemented in OpsManager.
  • Local-First: Design principle prioritizing running computation and storing data on the user’s machine rather than relying on cloud services.
  • Localization Manager (LocalizationManager): Core component handling language translation using gettext and locale files.
  • Logic Skill (LogicSkill): Skill specialized in tasks requiring formal logical reasoning, puzzle-solving, or argument evaluation, using an appropriate LLM selected via ModelSelector.
  • Maintainability: Design considerations ensuring the software is easy to modify, debug, and update over time (modularity, testing, documentation, dependency management).
  • Manus-like Skill (OSControlSkill): Refers to the goal of achieving deep, sophisticated control over the Windows OS and its applications, beyond simple commands. Implemented via OSControlSkill using advanced clients.
  • Mixture-of-Agents (MoA): Technique involving multiple LLMs/agents collaborating on a task, typically for quality improvement (e.g., one executes, one verifies, one refines). Implemented optionally in the core agentic flow.
  • Model Category (MODEL_CATEGORIES): Defined in model_config.py, grouping models suitable for broad task types (e.g., coding, vision, logic). Used by ModelSelector for initial filtering.
  • Model Optimizer (ModelOptimizer): Core component responsible for managing LLM/model output caching (diskcache) and potentially applying optimizations like quantization or pruning (using optimum or gaia-toolbox).
  • Model Selection Preference: User setting (speed, accuracy, balance) influencing tie-breaking logic in ModelSelector.
  • Model Selector (ModelSelector): Core component implementing the dynamic LLM selection logic based on task, skills matrix, resources, and preferences.
  • Modular Core (OpenManus): Architectural principle implemented by organizing core logic into distinct managers (StateManager, MemoryManager, etc.) and functionalities into swappable Skills.
  • Multimodal: Capability of handling multiple types of data, primarily text and images (VisionSkill).
  • Multi-Agent System: Architecture using a ConductorAgent to orchestrate multiple specialized SubAgents.
  • Native Integration: Utilizing platform-specific APIs (Windows APIs via pywin32, wmi, ctypes) for deep interaction with the OS.
  • Native Notifications (NativeIntegration, win11toast): Displaying notifications using the standard Windows Action Center.
  • NSSM (Non-Sucking Service Manager): Utility for running applications (like Python scripts or FastAPI services) as persistent Windows services.
  • OCR (Optical Character Recognition): Extracting text from images. Performed by VisionSkill using OCR_Client (Tesseract/PaddleOCR).
  • ONNX Runtime: High-performance inference engine for ONNX (Open Neural Network Exchange) models. Used by ObjectDetect_Client and potentially OCR_Client, leveraging DirectML or ROCm execution providers (possibly via GAIA optimizations).
  • ONNX Runtime Execution Provider: The specific backend (CPU, CUDA, DirectML, ROCm, TensorRT, etc.) used by ONNX Runtime to execute model inference on hardware. Configurable in settings.yaml.
  • OpenManus: A conceptual framework/philosophy for building modular, skill-based AI agents, adopted by LegacyOracle’s core design.
  • Operations Manager (OpsManager): Core component responsible for monitoring external service health, attempting automated recovery, and basic load balancing.
  • Orchestrating Model: The initial LLM (default phi-4-mini) used by the AgentOrchestrator to analyze user input and plan task execution.
  • OS Control Skill (OSControlSkill – “Manus-like”): Advanced skill for interacting with the Windows OS, including running commands (standard/elevated), managing processes, interacting with UI elements (uiautomation), managing power settings, and using credentials. Requires SecurityManager mediation for privileged actions.
  • Perception (“View Window”): The agent’s ability to process visual screen information (VisionSkill) and the associated UI panel (ViewWindowPanel) for displaying this.
  • Performance Testing: Measuring system speed, latency, resource usage, and throughput under various conditions.
  • Persona: The agent’s configurable personality and communication style, managed by StateManager and potentially influenced by UserConfigManager.
  • Phased Development Plan: The staged approach (Section 15) for implementing LegacyOracle features incrementally.
  • Plugin (PluginManager, plugins/): A mechanism for extending agent functionality via dynamically loaded Python modules/skills placed in the plugins/ directory.
  • PowerShell: Windows command-line shell and scripting language used extensively by OS interaction skills via OS_Client_PSExecutor.
  • Proactive Manager (ProactiveManager): Core component responsible for analyzing user activity (ActivityLogger) and system context to generate helpful suggestions (Notifier).
  • Project Structure: The defined organization of directories and files within the LegacyOracle codebase (Section 7).
  • Pruning: Model optimization technique removing less important weights/connections to reduce size and potentially speed up inference. Handled by ModelOptimizer.
  • PyAutoGUI / Pynput / Keyboard: Python libraries used by Input_Sim_Client and Macro Recorder for simulating/recording low-level mouse and keyboard events for GUI automation.
  • PyQt6 / PySide6: Python bindings for the Qt cross-platform UI framework. Used for building the Coplot GUI. (Specification standardized on PyQt6, but PySide6 is very similar).
  • Pytest: Python testing framework used for unit, integration, and potentially UI testing.
  • Qualitative Ratings: Using descriptive terms (“High”, “Medium”, “Low”, “Yes”, “No”) in the SKILLS_MATRIX for readability. Mapped internally to scores by ModelSelector.
  • Quantization: Model optimization technique reducing the numerical precision of model weights (e.g., FP32 to INT8) to significantly decrease size and VRAM usage, often improving speed on compatible hardware. Handled by ModelOptimizer.
  • Reasoning Controller (ReasoningController): Core component responsible for selecting the appropriate ReasoningStrategy based on the task and managing its execution, including handling the streaming callback for UI updates.
  • Reasoning Strategy (reasoning_models/): A pluggable module defining a specific approach or prompting technique for the agent to “think” or solve problems (e.g., CodeReasoning, VisualReasoning, SelfCritiqueReasoning). Must support streaming intermediate steps.
  • Reflection: The autonomous process where the agent analyzes its past performance (performance_log, reflection_log) using SelfCritiqueReasoning to identify areas for improvement. Managed by AutonomousTasksSkill.
  • Reinforcement Learning (RLAgent): Machine learning paradigm where an agent learns optimal behavior by receiving rewards or penalties for its actions. Planned for future optimization of ModelSelector or WorkflowEngine.
  • Resource Governor (ResourceGovernor): Core component responsible for monitoring system resources (CPU, RAM, VRAM, Power) and enforcing limits by throttling or deferring agent tasks.
  • ROCm (Radeon Open Compute): AMD’s open software platform for GPU computing, enabling GPU acceleration for tasks like AI training and inference on compatible AMD Radeon/Instinct hardware. Used by the Video/Audio generative services.
  • Roo Code: Specific VS Code extension targeted for integration via RooCodeSkill and VSC_Client.
  • Ruffle: An open-source Flash Player emulator written in Rust, compiled to WebAssembly. Used via QWebEngineView in the MediaPlayers widget to display SWF files generated by ByteCraft.
  • Sandboxing: Technique for running code (CodeInterpreterSkill) or potentially plugins in a restricted, isolated environment (ideally Docker) to prevent malicious actions. Managed by Sandbox_Client.
  • Schema: Formal definition of the structure for data (JSON Schema for APIs, SQL Schema for DB tables, YAML structure for config).
  • Scheduler (Scheduler, APScheduler): Core component managing the scheduling and execution of recurring background tasks (health checks, proactive analysis, reflection).
  • Security Manager (SecurityManager): Core component responsible for managing privilege modes, checking ACLs, and interfacing with the OS for UAC prompts.
  • Self-Improvement (LEGOPROMIND): The overarching concept of the agent learning and enhancing itself through reflection, performance analysis, and potentially autonomous tool creation.
  • Self-Updating: The agent’s ability to automatically update its own codebase from a Git repository. Managed by AutonomousTasksSkill.
  • Selenium / Playwright: Browser automation frameworks used by WebDriverClient for the WebAutomationSkill. Playwright is generally preferred for modern features.
  • Sentient OS / Sentient Software Sovereign: Vision concept for LegacyOracle as a deeply integrated, aware, and autonomous AI layer within Windows.
  • Service: An independently running application providing functionality via an API (e.g., A1111 Service, ROCm Video Service, LM Studio Server).
  • Service-Oriented Architecture (SOA): Architectural style where functionalities are provided by loosely coupled, communicating services. Used here for heavy generative tasks.
  • Skill (skills/): A modular component implementing a specific capability of the agent (e.g., searching the web, controlling the OS, generating images), following OpenManus principles.
  • Skill Dispatcher (SkillDispatcher): Core component responsible for receiving task requests and routing them to the correct registered Skill, Plugin instance, or Automation Task executor. Considers user priorities.
  • Skills Matrix (model_config.py::SKILLS_MATRIX): Central configuration defining the evaluated capabilities (Reasoning, Coding, Vision, Logic, etc.) and resource needs (VRAM, RAM) of available LLMs. Used by ModelSelector.
  • SQLite: File-based relational database engine used for local persistent storage (agent_data.db).
  • State Manager (StateManager): Core component tracking and managing the agent’s current operational state (idle, thinking, speaking, error, etc.) and emotional persona (neutral, positive, negative).
  • Streaming Reasoning: The technique of sending intermediate “thought” steps from the ReasoningStrategy/LLM to the UI_ChatPanel in real-time, providing transparency into the agent’s process.
  • Sub-Agent (agents/subagents/): A specialized agent instance focused on a particular domain (e.g., CodeMaster, VisionAnalyst) managed by the ConductorAgent to handle parts of complex tasks.
  • System Tray (NativeIntegration, pystray): Icon residing in the Windows notification area allowing basic agent control (Show/Hide, Quit).
  • Task Automation Framework: The set of components (TaskDefinition, AutomationStep, TaskAutomationEngine, TaskManager, MacroExecutionSkill, RecordMacroDialog) enabling the creation and execution of custom GUI automation sequences.
  • Task Definition (GUI/Web): A structured representation (JSON or DB record) defining the steps, inputs, outputs, and target application/URL for a custom automation task. Managed by TaskManager.
  • Technology Stack: The complete list of programming languages, libraries, frameworks, and external tools used in the project (Section 5).
  • Temperature (LLM): A parameter controlling the randomness of an LLM’s output. Lower values (e.g., 0.2) make output more deterministic and focused; higher values (e.g., 0.9) increase creativity and diversity. Determined by ModelSelector based on task type.
  • Terminal Panel (UI_TerminalPanel): The UI widget displaying system logs and agent messages with green text on a white/black background.
  • Testing Strategy: The overall approach to verifying the correctness, reliability, performance, and security of the system (Section 13).
  • Tool Use (Skill Dimension): An LLM’s capability to understand when an external tool/skill is needed, generate the correct API call/command format for that tool, and interpret its response. Assessed in the SKILLS_MATRIX.
  • Tutorials / Examples: Step-by-step guides for users and developers (Section 20 Outline).
  • UAC (User Account Control): Windows security feature prompting the user for consent before allowing actions requiring administrator privileges. Triggered via SecurityManager.
  • UI Automation (uiautomation library, OSControlSkill): Technique using Windows Accessibility APIs to interact with GUI elements more robustly than coordinate-based methods. Implemented in OS_Client_Advanced.
  • Unit Testing: Testing individual functions or classes in isolation, mocking dependencies.
  • User Configuration (UserConfigManager, user_config.yaml): Settings specific to the user’s preferences, overriding system defaults where applicable.
  • Vector Database (chromadb, faiss): Database optimized for storing and searching high-dimensional vectors (embeddings). Optional integration with MemoryManager for semantic search of knowledge/history.
  • Versioning Strategy: Plan for managing versions of the application, configurations, and database schema (Section 10.3).
  • View Window (ViewWindowPanel, VisionSkill): UI panel for displaying visual output from the VisionSkill or generative tasks.
  • Vision Skill (VisionSkill): Skill responsible for processing visual input from the screen (capture, OCR, object detection) and providing interpretations.
  • Visual Persona Controller (VisualPersonaController): UI component managing the state and animation logic for the AnimatedFaceWidget.
  • VRAM (Video RAM): Dedicated memory on the GPU. A critical resource constraint for loading and running LLMs and generative models. Monitored by ResourceGovernor.
  • Watchdog: Python library for monitoring file system events. Used by TaskAutomationEngine (GUI) to detect output files.
  • WebDriver (WebDriverClient, Selenium/Playwright): Interface/protocol for automating web browsers. Used by WebAutomationSkill.
  • Web Automation Framework: Components enabling creation/execution of web automation tasks (WebTaskDefinition, WebAutomationStep, WebAutomationEngine, WebRecorder, WebAutomationSkill).
  • WMI (Windows Management Instrumentation): Windows interface for accessing system management information and operations. Used by OS_Client_Monitor, OS_Client_EnvAwareness.
  • Workflow Engine (WorkflowEngine): Core component responsible for executing user-defined multi-step automation flows combining skills, GUI macros, and web tasks. Handles basic data passing and error propagation (halts on step failure by default).
  • WSL (Windows Subsystem for Linux): Allows running Linux distributions on Windows. Interacted with via PowerShell commands executed by OS_Client_PSExecutor.
  • YOLO (You Only Look Once): A popular real-time object detection model architecture. ONNX version used by ObjectDetect_Client.

Okay, here is the fully expanded Section 19: API Documentation for the LegacyOracle Super Agent v6.6.2 specification, providing detailed schemas and examples for key external and internal interfaces.


(Start of Section 19)

19. API Documentation (Consolidated & Detailed Specs)

This section details the Application Programming Interfaces (APIs) and data contracts used by LegacyOracle, both for interacting with external services and for internal communication between core components.

19.1 External Service APIs

These are the APIs provided by the external services that LegacyOracle interacts with. Clients within LegacyOracle (clients/) must adhere to these specifications.

19.1.1 LM Studio API (OpenAI Compatible)

  • Base URL: Configured in settings.yaml:service_urls:lm_studio (Default: http://localhost:1234)
  • Authentication: Typically none required for local LM Studio server, but litellm or httpx clients might require setting a dummy API key (e.g., “lm-studio”).
  • Key Endpoints Used:
    • List Loaded Models:
      • Endpoint: GET /v1/models
      • Success Response (200 OK, JSON): { "object": "list", "data": [ { "id": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q8_0.gguf", // Example model ID/filename "object": "model", "created": 1677649963, // Example timestamp "owned_by": "user", "permission": [] // ... potentially other loaded models } ] }
    • Load Model:
      • Endpoint: POST /v1/models/load
      • Request Body (JSON): { "model": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q8_0.gguf" // Filename known to LM Studio } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
      • Success Response (200 OK): Empty body or {“success”: true}.
      • Error Response (e.g., 404, 500): JSON with error details.
    • Unload Model:
      • Endpoint: POST /v1/models/unload
      • Request Body (JSON): { "model": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q8_0.gguf" } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
      • Success Response (200 OK): Empty body or {“success”: true}.
    • Chat Completions (Primary Interaction):
      • Endpoint: POST /v1/chat/completions
      • Request Body (JSON – OpenAI Schema): { "model": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q8_0.gguf", // ID of LOADED model "messages": [ {"role": "system", "content": "You are LegacyOracle, a witty AI."}, {"role": "user", "content": "Explain the concept of asynchronous programming."} // Prepend chat history messages here as needed, respecting context limits ], "temperature": 0.7, // As determined by ModelSelector "max_tokens": 2048, // Configurable, sensible default "stream": false // Set to true by Agent Core for reasoning stream // Optional: "top_p", "presence_penalty", "frequency_penalty", "stop" } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
      • Success Response (Non-Streaming, 200 OK, JSON): { "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxxx", "object": "chat.completion", "created": 1677652288, "model": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q8_0.gguf", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Asynchronous programming allows a program to..." // Actual response }, "finish_reason": "stop" // "stop", "length", "function_call", etc. } ], "usage": { // Token usage reported by the model "prompt_tokens": 56, "completion_tokens": 150, "total_tokens": 206 } } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
      • Success Response (Streaming, 200 OK, Server-Sent Events text/event-stream): data: {"id": "chatcmpl-...", "object": "chat.completion.chunk", "created": ..., "model": "...", "choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": null}]} data: {"id": "chatcmpl-...", "object": "chat.completion.chunk", "created": ..., "model": "...", "choices": [{"index": 0, "delta": {"content": "Async"}, "finish_reason": null}]} data: {"id": "chatcmpl-...", "object": "chat.completion.chunk", "created": ..., "model": "...", "choices": [{"index": 0, "delta": {"content": "hronous programming"}, "finish_reason": null}]} data: {"id": "chatcmpl-...", "object": "chat.completion.chunk", "created": ..., "model": "...", "choices": [{"index": 0, "delta": {"content": " allows..."}, "finish_reason": null}]} ... more chunks ... data: {"id": "chatcmpl-...", "object": "chat.completion.chunk", "created": ..., "model": "...", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]} data: [DONE] IGNORE_WHEN_COPYING_START content_copy download Use code with caution.IGNORE_WHEN_COPYING_END
    • Endpoint: POST /v1/completions (Legacy – Use Chat Completions if possible)
      • (Similar structure but uses prompt string instead of messages array)

19.1.2 Automatic1111 WebUI API (Stable Diffusion)

  • Base URL: Configured in settings.yaml:service_urls:a1111 (Default: http://localhost:7860)
  • Authentication: Typically none by default.
  • Key Endpoints Used:
    • Text-to-Image:
      • Endpoint: POST /sdapi/v1/txt2img
      • Request Body (JSON – Key parameters): { "prompt": "string (required) - Positive prompt.", "negative_prompt": "string (optional) - Negative prompt.", "styles": ["string"] (optional) - List of style names.", "seed": "integer (optional, default: -1) - Seed, -1 for random.", "subseed": "integer (optional, default: -1)", "subseed_strength": "float (optional, default: 0)", "seed_resize_from_h": "integer (optional, default: -1)", "seed_resize_from_w": "integer (optional, default: -1)", "sampler_name": "string (optional) - e.g., 'Euler a', 'DPM++ 2M Karras'. Check /sdapi/v1/samplers for options.", "batch_size": "integer (optional, default: 1)", "n_iter": "integer (optional, default: 1)", "steps": "integer (optional, default: 50) - Number of sampling steps.", "cfg_scale": "float (optional, default: 7.0) - Classifier-Free Guidance scale.", "width": "integer (optional, default: 512) - Must be multiple of 64.", "height": "integer (optional, default: 512) - Must be multiple of 64.", "restore_faces": "boolean (optional, default: false)", "tiling": "boolean (optional, default: false)", "do_not_save_samples": "boolean (optional, default: false)", "do_not_save_grid": "boolean (optional, default: false)", "eta": "float (optional) - Depends on sampler.", "s_churn": "float (optional, default: 0)", "s_tmax": "float (optional)", "s_tmin": "float (optional, default: 0)", "s_noise": "float (optional, default: 1)", "override_settings": {} (optional) - Dictionary to temporarily override A1111 settings.", "override_settings_restore_afterwards": "boolean (optional, default: true)", "script_args": [] (optional) - Arguments for selected script.", "script_name": "string (optional) - Name of script to run.", "send_images": "boolean (optional, default: true) - Include base64 images in response.", "save_images": "boolean (optional, default: false) - Save images on server.", "alwayson_scripts": {} (optional) - Settings for AlwaysOn scripts." } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
      • Success Response (200 OK, JSON): { "images": [ "string (base64 encoded PNG image data)" // Array contains one image per batch_size * n_iter ], "parameters": { // Dictionary of parameters used for the generation "prompt": "...", "negative_prompt": "...", "steps": 25, // ... other parameters ... }, "info": "string (JSON string containing detailed generation info, including seed, sampler, etc.)" } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
      • Error Response (e.g., 422 Validation Error, 500 Internal Server Error): JSON with {“detail”: …} or similar error structure.
    • Get Progress:
      • Endpoint: GET /sdapi/v1/progress?skip_current_image=false
      • Query Parameter: skip_current_image (boolean, optional) – If true, doesn’t send preview image.
      • Success Response (200 OK, JSON): { "progress": "float (0.0 to 1.0)", "eta_relative": "float (Estimated remaining time fraction)", "state": { "skipped": false, "interrupted": false, "job": "txt2img", "job_count": 1, "job_timestamp": "2023-10-27T10:30:00.123456", "job_no": 1, "sampling_step": 15, "sampling_steps": 25 }, "current_image": "string (base64 encoded PNG preview image) | null", "textinfo": "string | null (Status message from A1111)" } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Get Options/Settings: GET /sdapi/v1/options
    • Set Options/Settings: POST /sdapi/v1/options
    • Get Samplers: GET /sdapi/v1/samplers
    • Get SD Models: GET /sdapi/v1/sd-models
    • (Other endpoints available – see A1111 API Docs at /docs)

19.1.3 ROCm Video FastAPI Service

  • Base URL: Configured in settings.yaml:service_urls:rocm_video (Default: http://localhost:8001)
  • Authentication: None assumed for local deployment.
  • API Contract:
    • Generate Video:
      • Endpoint: POST /generate
      • Request Body (JSON Schema): { "type": "object", "properties": { "prompt": {"type": "string", "description": "Text description for video."}, "num_frames": {"type": "integer", "default": 16, "minimum": 1}, "seed": {"type": "integer", "default": -1}, "style_hint": {"type": ["string", "null"], "default": null}, "width": {"type": "integer", "default": 512, "minimum": 64}, "height": {"type": "integer", "default": 512, "minimum": 64}, "guidance_scale": {"type": "number", "format": "float", "default": 7.5}, "num_inference_steps": {"type": "integer", "default": 25, "minimum": 1}, "output_filename": {"type": "string", "description": "Optional: Desired output filename (without extension)", "default": null} }, "required": ["prompt"] } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
      • Success Response (202 Accepted, JSON Schema): { "type": "object", "properties": { "task_id": {"type": "string", "format": "uuid", "description": "Unique ID for the generation task."} }, "required": ["task_id"] } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
      • Error Response (400/422/500, JSON Schema): {"type": "object", "properties": {"detail": {"type": "string"}}, "required": ["detail"]} IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Check Task Status:
      • Endpoint: GET /status/{task_id}
      • Path Parameter: task_id (string, UUID format, required).
      • Success Response (200 OK, JSON Schema): { "type": "object", "properties": { "task_id": {"type": "string", "format": "uuid"}, "status": {"type": "string", "enum": ["pending", "processing", "complete", "failed"]}, "progress": {"type": ["number", "null"], "format": "float", "minimum": 0.0, "maximum": 1.0}, "output_path": {"type": ["string", "null"], "description": "Absolute path to the generated video (.mp4/.gif) if status is 'complete'."}, "error_message": {"type": ["string", "null"], "description": "Error details if status is 'failed'."} }, "required": ["task_id", "status"] } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
      • Error Response (404 Not Found, JSON Schema): {“type”: “object”, “properties”: {“detail”: {“type”: “string”, “example”: “Task ID not found.”}}}
    • Health Check:
      • Endpoint: GET /health
      • Success Response (200 OK, JSON Schema): {“type”: “object”, “properties”: {“status”: {“type”: “string”, “enum”: [“ok”]}}}

19.1.4 ROCm Audio FastAPI Service

  • Base URL: Configured in settings.yaml:service_urls:rocm_audio (Default: http://localhost:8002)
  • Authentication: None assumed.
  • API Contract:
    • Generate Audio:
      • Endpoint: POST /generate
      • Request Body (JSON Schema): { "type": "object", "properties": { "prompt": {"type": "string", "description": "Text description for audio/music."}, "duration_seconds": {"type": "integer", "default": 10, "minimum": 1}, "seed": {"type": "integer", "default": -1}, "model_id": {"type": ["string", "null"], "default": null, "description": "e.g., 'facebook/musicgen-small'"}, "temperature": {"type": "number", "format": "float", "default": 0.7}, "top_p": {"type": "number", "format": "float", "default": 0.9}, "guidance_scale": {"type": "number", "format": "float", "default": 3.0}, "output_format": {"type": "string", "enum": ["wav", "mp3"], "default": "wav"}, "output_filename": {"type": "string", "description": "Optional: Desired output filename (without extension)", "default": null} }, "required": ["prompt"] } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
      • Success Response (202 Accepted, JSON Schema): {“type”: “object”, “properties”: {“task_id”: {“type”: “string”, “format”: “uuid”}}, “required”: [“task_id”]}
      • Error Response: {“type”: “object”, “properties”: {“detail”: {“type”: “string”}}}
    • Check Task Status:
      • Endpoint: GET /status/{task_id}
      • Path Parameter: task_id (string, UUID format, required).
      • Success Response (200 OK, JSON Schema): { "type": "object", "properties": { "task_id": {"type": "string", "format": "uuid"}, "status": {"type": "string", "enum": ["pending", "processing", "complete", "failed"]}, "output_path": {"type": ["string", "null"], "description": "Absolute path to generated audio (.wav/.mp3) if status is 'complete'."}, "error_message": {"type": ["string", "null"]} }, "required": ["task_id", "status"] } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
      • Error Response (404 Not Found): {“type”: “object”, “properties”: {“detail”: {“type”: “string”, “example”: “Task ID not found.”}}}
    • Health Check:
      • Endpoint: GET /health
      • Success Response (200 OK, JSON Schema): {“type”: “object”, “properties”: {“status”: {“type”: “string”, “enum”: [“ok”]}}}

19.2 Internal API Contracts & Interfaces

  • BaseSkill.execute Return Dictionary JSON Schema: { "type": "object", "properties": { "status": { "type": "string", "enum": ["success", "error", "pending", "needs_confirmation", "needs_user_input"], "description": "Outcome of the skill execution." }, "data": { "type": ["object", "string", "array", "null"], "description": "Skill-specific output data (e.g., file path, text, list, dict)." }, "message": { "type": "string", "description": "User-friendly status or result message for UI display." }, "error_details": { "type": ["string", "null"], "description": "Detailed error message/traceback if status is 'error'." }, "task_id": { "type": ["string", "null"], "description": "Identifier for asynchronous tasks requiring polling (if status is 'pending')." }, "confidence": { "type": ["number", "null"], "format": "float", "minimum": 0.0, "maximum": 1.0, "description": "Optional confidence score (0.0-1.0) for the result." }, "required_confirmation": { "type": ["string", "null"], "description": "Message asking for user confirmation if status is 'needs_confirmation'." }, "required_input_prompt": { "type": ["string", "null"], "description": "Message prompting user for additional input if status is 'needs_user_input'." } }, "required": ["status", "message"] } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
  • Agent Core (AC_CommInterface) <-> UI (UI_Controller) Communication Protocol (Qt Signals/Slots):
    (Define specific signal names, argument types (using Python type hints), and connection types (Direct/Queued). Ensure thread safety using Qt.QueuedConnection for signals emitted from agent core threads/async tasks to the main UI thread.)
    • Signals emitted by AC_CommInterface:
      • agentStateChanged = Signal(str, str) # state, emotion
      • newLogMessage = Signal(str) # formatted_log_string
      • reasoningStep = Signal(str) # step_text
      • modelSelectionUpdate = Signal(str, str) # model_name, task_category
      • finalAgentResponse = Signal(str, int) # response_text, message_id
      • systemVitalsUpdate = Signal(float, float, float) # cpu, ram, vram
      • proactiveSuggestion = Signal(str, str, str) # suggestion_id, title, message
      • taskStatusUpdate = Signal(str, str, object, str) # task_id, status, progress (float or None), message
      • taskCompleted = Signal(str, dict) # task_id, result_dict (BaseSkill format)
      • pluginListUpdated = Signal(list) # List of plugin info dicts
      • availableLanguages = Signal(list) # List of language codes [“en”, “fr”]
      • visualContextUpdate = Signal(dict) # Dictionary containing image path/data, OCR text, object list
      • requestUserInput = Signal(str, str) # prompt_id, prompt_message
      • requestUserConfirmation = Signal(str, str) # confirmation_id, confirmation_message
    • Slots in AC_CommInterface (connected from UI_Controller signals):
      • @Slot(str) process_user_message
      • @Slot(str, object) set_user_preference
      • @Slot(bool) set_proactive_enabled
      • @Slot(bool) set_privilege_mode
      • @Slot(str, dict) execute_automation_task
      • @Slot(str, dict) execute_workflow
      • @Slot(dict) save_automation_task
      • @Slot(dict) save_workflow
      • @Slot(str, str) record_gui_macro_start
      • @Slot() record_gui_macro_stop
      • @Slot(str, str) record_web_task_start
      • @Slot() record_web_task_stop
      • @Slot(int, bool, str) store_user_feedback
      • @Slot() request_plugin_list
      • @Slot(str, str, dict) execute_plugin_action
      • @Slot(str) provide_user_input # Response to requestUserInput signal
      • @Slot(str, bool) provide_user_confirmation # Response to requestUserConfirmation
  • Conductor <-> Sub-Agent Message JSON Schema (via asyncio.Queue): { "type": "object", "properties": { "message_id": {"type": "string", "format": "uuid", "description": "Unique ID for this specific message."}, "conversation_id": {"type": "string", "format": "uuid", "description": "ID for the overall complex task."}, "task_id": {"type": "string", "description": "Unique ID for this sub-task execution instance."}, "type": {"type": "string", "enum": ["request", "response", "status_update", "error"], "description": "Message type."}, "source": {"type": "string", "description": "Sender ('ConductorAgent' or Sub-Agent name like 'CodeMasterAgent')."}, "target": {"type": "string", "description": "Recipient ('ConductorAgent' or Sub-Agent name)."}, "payload": { "type": "object", "description": "Task-specific data or results.", "properties": { "goal": {"type": "string", "description": "Specific instruction for the sub-agent (in request)."}, "inputs": {"type": "object", "description": "Inputs for the sub-task (in request)."}, "context": {"type": "object", "description": "Relevant context from Conductor (memory snippets, previous steps)."}, "model_suggestion": {"type": "string", "description": "Recommended execution model from Conductor."}, "temperature": {"type": "number", "format": "float"}, "status": {"type": "string", "enum": ["success", "error", "processing", "pending_confirmation"]}, "result_data": {"type": ["object", "string", "array", "null"], "description": "Output data from the sub-task (in response)."}, "error_message": {"type": "string", "description": "Error details if status is 'error'."}, "progress": {"type": "number", "format": "float", "description": "Progress update (0.0-1.0) for long tasks (in status_update)."} } # Required properties depend on 'type' }, "timestamp": {"type": "string", "format": "date-time"} }, "required": ["message_id", "conversation_id", "task_id", "type", "source", "target", "payload", "timestamp"] } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
  • Plugin API Interface Definition (plugins/):
    • Each plugin resides in its own subdirectory within plugins/.
    • Must contain __init__.py with register_plugin() function returning a dictionary: def register_plugin(): return { "name": "string (Unique Plugin Name)", "version": "string (SemVer)", "author": "string", "description": "string", "skill_class": MyPluginSkill, # The class implementing BaseSkill "required_permissions": ["file_read", "network_access"] # Optional list of permissions needed } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.PythonIGNORE_WHEN_COPYING_END
    • Must contain a skill class inheriting skills.base_skill.BaseSkill.
    • PluginManager loads, validates, and registers the skill_class. Execution permissions checked by SecurityManager based on plugin declaration and potentially user configuration.

20. Tutorials / Examples (Outline)

This section outlines key tutorials to guide both end-users and developers in setting up, using, and extending the LegacyOracle Super Agent. Each tutorial should provide clear, step-by-step instructions with relevant commands and UI actions.

20.1 User Tutorial: Setup & First Run Guide

  • Goal: Guide a new user through the initial installation and configuration process.
  • Audience: End-users.
  • Prerequisites: Hardware meets requirements (Section 6.1), OS is Windows 11 Pro (Section 6.2), Admin rights available.
  • Steps:
    1. Download & Extract: Download the LegacyOracle release package (e.g., .zip or installer) and extract/install it to the desired location (e.g., D:\legacy_oracle).
    2. Install Core Dependencies (if not bundled): Explain how to install Python 3.10+, Git, and Docker Desktop if they aren’t already present (referencing Section 6.3).
    3. Install GPU Drivers/ROCm: Emphasize the critical importance of installing the correct AMD Adrenalin drivers and the specific ROCm for Windows version required by the external services (referencing Section 6.4). Link to official AMD resources. Include the rocminfo verification step.
    4. Setup External Services (Simplified Options):
      • Provide clear, step-by-step instructions (or link to detailed guides) for setting up LM Studio (Section 6.5), Automatic1111 (Section 6.6), and optionally ROCm Video/Audio Services (Section 6.7 – mention Conda environments). Explain how to start these services and note their default URLs (http://localhost:1234, http://localhost:7860, etc.). Emphasize these need to be running for the agent to use their features.
      • Explain setup for Amuse (Section 6.9) and ByteCraft (Section 6.8) if the user intends to use these specific tools.
    5. Run LegacyOracle for the First Time: Navigate to the legacy_oracle directory in Command Prompt and run python main.py (or use a provided shortcut/installer).
    6. First-Run Setup Wizard (UI_SetupWizard): Walk through each page of the wizard:
      • Welcome screen.
      • Environment Check results (explain any warnings).
      • Verify/Enter Service URLs (LM Studio, A1111, etc.) – Use “Test Connection” buttons.
      • Verify/Set Key File Paths (Outputs, DB, External Tools).
      • Set Initial Preferences (Language, Theme, Proactive Opt-in).
      • Summary & Finish. Explain that settings are saved to config/settings.yaml and config/user_config.yaml.
    7. Basic Interaction: Guide the user to type “Hello” in the chat panel and observe the agent’s response and face animation.
    8. Troubleshooting: Link to common setup issues (service not running, incorrect paths, driver problems).

20.2 User Tutorial: Basic Chat & Core Skills Usage

  • Goal: Teach users how to interact with the agent via chat for common tasks.
  • Audience: End-users.
  • Prerequisites: Agent running after first setup.
  • Steps:
    1. Asking General Questions: Type natural language questions (e.g., “What’s the weather like?”, “Explain quantum computing.”) into the ChatPanel. Observe the reasoning stream ([Reasoning: …]) and the final answer.
    2. Using Web Search: Ask questions requiring current information (e.g., “What are the latest AI news headlines?”). Explain that the agent might display [Reasoning: Using WebSearchSkill…] and provide summarized results with sources.
    3. File System Interaction: Give commands like:
      • “List files in my Downloads folder.” (/run_skill FileSystemSkill list_directory –path “C:\Users\YourUser\Downloads”)
      • “Summarize the document ‘report.txt’ on my Desktop.” (/run_skill NLQuerySkill summarize_file –path “C:\Users\YourUser\Desktop\report.txt”)
      • “Create a new folder named ‘Projects’ on my Desktop.” (/run_skill FileSystemSkill create_directory –path “…”)
    4. System Monitoring: Ask “What’s my current CPU usage?” or “How much RAM is free?”. Show how the StatusPanel also displays this. (/run_skill SystemMonitorSkill get_resource_stats)
    5. Knowledge Acquisition: Ask about a topic multiple times. Explain how the agent might use its KnowledgeBase for faster subsequent answers.
    6. Using Feedback: Explain the “👍” / “👎” buttons next to agent responses for improving future interactions.

20.3 User Tutorial: Generating Images (A1111 & Amuse)

  • Goal: Guide users on using the integrated image generation capabilities.
  • Audience: End-users (potentially creatives).
  • Prerequisites: Agent running, A1111 service running, Amuse software/implementation set up correctly.
  • Steps:
    1. Navigate to Generative Tab: Click the “Generative” tab in the main UI, then the “Image” sub-tab.
    2. Select Backend: Choose “A1111” or “Amuse” from a backend selection dropdown within the Image tab.
    3. Enter Parameters (A1111 Example):
      • Prompt: “Photorealistic portrait of an astronaut on Mars, cinematic lighting”
      • Negative Prompt: “cartoon, drawing, illustration, text, watermark”
      • Adjust sliders/spinboxes for Steps, CFG Scale, Width, Height.
      • Select Sampler from dropdown.
      • Enter Seed (-1 for random).
    4. Generate: Click the “Generate Image” button.
    5. Monitor Progress: Observe the status label/progress bar within the tab and potentially the main StatusPanel. Check A1111 console if needed.
    6. View Result: Once complete, the generated image appears in the Media Viewer area within the tab and/or the main ViewWindowPanel. Note the output file path displayed.
    7. Using Amuse: Repeat steps 2-6, selecting the “Amuse” backend and providing relevant parameters (Prompt, Resolution, Seed).
    8. Chat Commands: Demonstrate triggering generation via chat:
      • /generate_image_a1111 prompt=”Synthwave sunset” –steps 30 –cfg 7.5
      • /generate_image_amuse prompt=”Impressionist flower garden” –resolution 1024×768

20.4 User Tutorial: Generating Video & Audio (ROCm Services)

  • Goal: Guide users on using the video and audio generation features.
  • Audience: End-users.
  • Prerequisites: Agent running, ROCm Video & Audio FastAPI services running correctly.
  • Steps:
    1. Navigate to Tabs: Go to Generative Tab -> Video or Generative Tab -> Audio.
    2. Enter Parameters (Video Example):
      • Prompt: “Drone footage flying over a futuristic city”
      • Duration (seconds): 10
      • Style Hint: “Cinematic, neon lights”
      • Adjust Seed, Width/Height, Steps if available.
    3. Generate: Click “Generate Video”.
    4. Monitor & View: Observe progress status. Once complete, the video path is shown, and the video loads in the embedded QMediaPlayer. Use Play/Pause/Seek controls.
    5. Audio Generation: Repeat steps 2-4 in the Audio tab, providing a prompt (e.g., “Calm ambient background music for focus”) and duration. Play the resulting .wav file.
    6. Chat Commands: Demonstrate chat triggers:
      • /generate_video prompt=”Time-lapse of clouds” –duration 15
      • /generate_audio prompt=”Epic orchestral battle theme” –duration 60

20.5 User Tutorial: Creating & Running GUI Macros (Record & Execute)

  • Goal: Teach users how to automate repetitive GUI tasks without coding.
  • Audience: Non-technical users, Power users.
  • Prerequisites: Agent running, target application (e.g., Notepad, Photoshop) installed.
  • Steps:
    1. Open Task Panel: Navigate to the “Tasks” or “Automation” tab in the Coplot GUI.
    2. Start Recording: Click “Record GUI Macro”.
    3. Configure Recording:
      • Enter Task Name (e.g., “NotepadHelloWorld”).
      • Click “Browse” and select the target application executable (e.g., C:\Windows\System32\notepad.exe).
      • Click “Start Recording”. Acknowledge the informational pop-up (“Recording started…”).
    4. Perform Actions: Switch to the target application (Notepad will launch). Perform the sequence:
      • Type “Hello, LegacyOracle!”.
      • Click File -> Save As…
      • Type a filename (e.g., D:\temp\greeting.txt). Note: The recorder should ideally detect this path input.
      • Click Save.
      • Click File -> Exit.
    5. Stop Recording: Press the configured stop hotkey (e.g., Ctrl+Shift+R – Needs implementation) or switch back to LegacyOracle UI and click “Stop Recording”.
    6. Define Inputs/Outputs: A dialog appears showing recorded steps.
      • Identify the “Type” step where D:\temp\greeting.txt was entered. Click “Map to Variable”.
      • Define Input Variable: Name=output_filepath, Description=”Full path where the file should be saved”. Default Value=D:\temp\greeting.txt.
      • Define Outputs: (None needed for this simple example, but could define {“file_saved”: true}).
      • Define Save Dir: (Not needed if not watching for output file).
      • Click “Save Task”.
    7. Execute Macro:
      • Select “NotepadHelloWorld” from the Task Selector dropdown in the Task Management Panel.
      • Click “Execute Task”.
      • (Optional) An input dialog might pop up asking for the output_filepath if no default was set or if designed to prompt. Enter a new path (e.g., D:\temp\another.txt).
      • Observe Notepad launching and the recorded actions being replayed automatically. Verify the new file is created.
    8. Explain Limitations: Briefly mention that coordinate-based macros can be brittle if the UI changes.

20.6 User Tutorial: Creating & Running Web Automations (Record/Define & Execute)

  • Goal: Teach users how to automate website interactions.
  • Audience: Users needing web task automation.
  • Prerequisites: Agent running, target website accessible, Playwright setup complete (Browsers installed).
  • Steps (Guided Definition Method – Assuming no extension initially):
    1. Open Task Panel: Go to Automation/Tasks tab.
    2. Start Definition: Click “Record Web Task” (or “Define Web Task”).
    3. Configure:
      • Task Name: “GoogleSearchTopic”
      • Start URL: https://www.google.com
      • Define Input: search_query (Description: “Topic to search on Google”)
      • Define Output: first_result_text (Description: “Text content of the first search result link”)
      • Define Credential Key: (None needed for Google search)
    4. Add Steps (Guided UI):
      • Click “Add Step”: Type = type, Selector = textarea[name=q] (Find using browser dev tools), Value = {search_query} (use input variable). Delay = 0.5. Add Step.
      • Click “Add Step”: Type = keypress, Selector = textarea[name=q], Value = Enter. Delay = 2.0 (Wait for results). Add Step.
      • Click “Add Step”: Type = scrape, Selector = #search .g a h3 (Selector for first result title – verify!), Attribute = textContent. Map Output to first_result_text. Delay = 0.5. Add Step.
    5. Save Task: Click “Save Web Task”.
    6. Execute Task:
      • Select “GoogleSearchTopic” from the Task Selector dropdown.
      • Click “Execute Task”.
      • An input dialog prompts for search_query. Enter “LegacyOracle AI”. Click OK.
      • Observe a browser launching (via Playwright), navigating, typing, searching, and closing.
      • Check the task output label in the UI for the scraped text of the first result.

20.7 User Tutorial: Creating & Running Complex Workflows

  • Goal: Teach users how to chain multiple tasks (GUI, Web, Skills) together.
  • Audience: Power users.
  • Prerequisites: Agent running, relevant tasks (“GoogleSearchTopic”, “SaveToNotepad” – a GUI macro) already defined.
  • Steps:
    1. Open Workflow Editor: Navigate to Workflow tab/panel in UI. Click “Create New Workflow”.
    2. Name Workflow: “SearchAndSave”.
    3. Add Step 1 (Web Search):
      • Click “Add Step”. Select Type: Web Task. Select Task: GoogleSearchTopic.
      • Define Input Mapping: search_query -> Workflow Input Variable topic.
      • Define Output Variable Name: step1_output (will contain {“first_result_text”: “…”}).
    4. Add Step 2 (Save to Notepad):
      • Click “Add Step”. Select Type: GUI Task. Select Task: SaveToNotepad (Assumes this macro types text into an open Notepad and saves).
      • Define Input Mapping: text_to_save -> {step1_output.first_result_text} (Accessing output from previous step), output_filepath -> Workflow Input Variable save_location.
    5. Define Workflow Inputs: Specify topic and save_location as required inputs for the entire workflow.
    6. Save Workflow: Click “Save Workflow”.
    7. Execute Workflow:
      • Select “SearchAndSave” from a Workflow execution dropdown/list.
      • Click “Execute Workflow”.
      • UI prompts for topic (“AI Agents”) and save_location (“D:\temp\ai_results.txt”).
      • Observe browser automating search, then Notepad automating saving the first result text to the specified file.

20.8 User Tutorial: Configuring Proactivity & User Preferences

  • Goal: Show users how to customize agent behavior.
  • Audience: All users.
  • Prerequisites: Agent running.
  • Steps:
    1. Open Settings: Navigate to the “Settings” tab/panel in the Coplot GUI.
    2. Proactive Features: Locate the “Proactive Settings” group. Toggle the master “Enable Proactive Features” checkbox. Adjust the “Proactive Suggestion Frequency (hours)” spinbox. Disable specific suggestion types if listed (future feature).
    3. Model Preference: Find “Agent Settings” group. Select “Speed”, “Accuracy”, or “Balance” from the “Model Selection Preference” dropdown.
    4. Theme: Find “UI Settings”. Select “Light”, “Dark”, or “Custom” from the “Theme” dropdown.
    5. Language: Find “Localization”. Select desired language from the “Language” dropdown. (Requires restart or dynamic UI reload).
    6. Privilege Mode: Find “Security”. Toggle “Enable Privileged Mode”. Explain that this requires restarting the agent with Admin rights and bypasses most per-action UAC prompts (use with caution).
    7. Save Settings: Click “Save Settings” or note that changes might apply automatically.

20.9 Developer Tutorial: Building a Simple Plugin

  • Goal: Guide developers on extending the agent with custom skills via the plugin system.
  • Audience: Developers.
  • Prerequisites: Core Agent setup complete, understanding of Python and BaseSkill interface.
  • Steps:
    1. Create Plugin Directory: Create a folder inside legacy_oracle/plugins/, e.g., plugins/MyEchoPlugin.
    2. Create __init__.py: Inside plugins/MyEchoPlugin/, create __init__.py with the registration function: # plugins/MyEchoPlugin/__init__.py from .skill import MyEchoSkill # Import the skill class PLUGIN_NAME = "MyEcho" PLUGIN_VERSION = "1.0" PLUGIN_AUTHOR = "Developer Name" PLUGIN_DESCRIPTION = "A simple plugin that echoes input." def register_plugin(): """Called by PluginManager to register the skill.""" return { "name": PLUGIN_NAME, "version": PLUGIN_VERSION, "author": PLUGIN_AUTHOR, "description": PLUGIN_DESCRIPTION, "skill_class": MyEchoSkill # The actual Skill class }
    3. Create skill.py: Inside plugins/MyEchoPlugin/, create skill.py implementing the skill logic: # plugins/MyEchoPlugin/skill.py from typing import Dict import asyncio # IMPORTANT: Adjust import path based on actual project structure from core.skills.base_skill import BaseSkill # Assuming BaseSkill is accessible class MyEchoSkill(BaseSkill): async def execute(self, inputs: Dict, **kwargs) -> Dict: """Echoes back the provided 'text' input.""" text_to_echo = inputs.get("text", "Nothing provided to echo!") # Simulate some async work await asyncio.sleep(0.1) result_data = {"echoed_text": text_to_echo} return self.standard_response( status="success", data=result_data, message=f"Echoed successfully!" ) IGNORE_WHEN_COPYING_START content_copy download Use code with caution.PythonIGNORE_WHEN_COPYING_END
    4. Restart Agent: Stop and restart the main LegacyOracle agent.
    5. Verify Loading: Check the agent logs or the PluginManagerUI to confirm “MyEchoPlugin” was loaded successfully.
    6. Trigger Plugin: Use a chat command designed to trigger plugins (e.g., /run_plugin MyEcho text=”Hello World”) or ensure SkillDispatcher routes a specific intent to it. Observe the echoed response.

20.10 User/Developer Tutorial: Troubleshooting Common Issues

  • Goal: Provide guidance on diagnosing and resolving common problems.
  • Audience: All users, Developers.
  • Issues & Solutions:
    • Service Not Running (A1111, ROCm, LM Studio):
      • Symptom: Generative tasks fail immediately, connection errors in log.
      • Solution: Manually check if the service is running (check terminal window, Docker Desktop, services.msc). Restart the required service. Verify URL in settings.yaml. Check service logs for specific errors. Use OpsManager health checks via UI (if implemented).
    • Model Load Failure (LM Studio):
      • Symptom: Agent reports “Cannot load model X”, tasks needing that model fail.
      • Solution: Open LM Studio UI. Check if model file exists and is not corrupted. Ensure sufficient VRAM is free (unload other models in LM Studio UI). Check LM Studio server logs. Verify model ID in model_config.py matches LM Studio.
    • Permission Denied (Files/OS):
      • Symptom: Skills report errors accessing files or running commands.
      • Solution: Check file/folder permissions for the user running the agent. For elevated actions, ensure the user confirmed the UAC prompt. Check SecurityManager ACLs. Try running agent process as Administrator (use privileged mode toggle cautiously).
    • Web Automation Fails:
      • Symptom: Browser launches but doesn’t complete actions, errors like “Element not found”.
      • Solution: Website UI likely changed. Re-record the web task or manually update selectors in web_tasks.json/DB. Ensure correct WebDriver version is installed (playwright install). Increase delays in task definition steps. Check browser console for errors.
    • GUI Macro Fails:
      • Symptom: Mouse clicks wrong place, types wrong text.
      • Solution: Screen resolution or application layout likely changed. Re-record the macro. Increase delays between steps. Consider using DynamicAutomationSkill with image recognition for more robustness (future feature).
    • Dependency Errors on Startup:
      • Symptom: Agent fails to start with ImportError or similar.
      • Solution: Ensure correct Python environment is activated. Run pip install -r requirements_agent.txt again. Check for conflicts using pipdeptree. Ensure ROCm/A1111/Amuse dedicated environments are set up correctly if issues originate there. Run DependencyManager check.
    • Check Logs: Always check logs/agent.log and the UI TerminalPanel for detailed error messages.


21. Final Handover Statement

This v6.6.2 specification provides the definitive, comprehensive, and self-contained blueprint for the LegacyOracle Super Agent project. It integrates all architectural decisions, feature requirements, specific implementation details (including configuration files, schemas, APIs, setup commands, and the finalized skills matrix), operational considerations, and development guidance. It is intended to provide the development team with a complete and actionable guide for building this state-of-the-art, Windows-native AI assistant. Success requires meticulous attention to these specifications, rigorous testing, and adherence to the security and performance principles outlined.

(Total Estimated Effort: ~140-200+ Person-Days, highlighting significant complexity)

(Outline detailed steps for Setup, Basic Use, All Generative Tasks, GUI/Web Automation Creation/Execution, Proactivity Config, Plugins, Troubleshooting)


16. Developer Integration Steps (Checklist)

  1. ✅ Review this full v6.6.2 Specification Document.
  2. ✅ Set up the complete Development Environment (Section 6).
  3. ✅ Clone repository & verify project structure (Section 7).
  4. ✅ Understand the Core Agentic Flow (Section 8).
  5. ✅ Implement/Update Core Agent components per specs (Section 9.2).
  6. ✅ Implement/Update UI components per specs (Section 9.1).
  7. ✅ Implement/Update Skills and Backend Clients per specs (Sections 9.3, 9.6).
  8. ✅ Implement/Update External Services (FastAPI wrappers) per specs (Section 9.7).
  9. ✅ Implement Automation Frameworks (GUI/Web) per specs (Sections 9.8, 9.9).
  10. ✅ Configure settings.yaml and model_config.py accurately (Section 10.1).
  11. ✅ Implement Database schemas and interactions (Section 10.2).
  12. ✅ Implement Testing Strategy progressively (Section 13).
  13. ✅ Follow Phased Development Plan (Section 15).
  14. ✅ Integrate AMD GAIA optimizations where specified (Section 9.2, 9.6).

17. Future Considerations (Post v6.6.2 – Consolidated List)

Advanced RL Algorithms & Training Pipelines, Federated Learning / Privacy-Preserving Collaboration, Dynamic Plugin Marketplace & Sandboxing, Cloud Synchronization Options (User Opt-in), Advanced Model Fine-Tuning Workflows (Local), Holographic / AR / VR Interface Concepts, Webcam-based Face/Emotion Tracking for User State, High-Quality Local STT/TTS Integration (Whisper.cpp, Piper), More Sophisticated UI Automation (Handling complex dynamic UIs), Advanced VectorDB Integration & Reasoning over Knowledge Base.

18. Glossary (Comprehensive Definitions)

(Define ALL terms introduced:

  • ACL (Access Control List): A list of permissions attached to an object; used here for defining which agent actions require elevation.
  • Agent Core: The central orchestrating part of LegacyOracle, implementing OpenManus principles.
  • Agent Orchestrator (AgentOrchestrator): The main class coordinating the agent’s overall execution flow.
  • A1111 (Automatic1111 WebUI): A popular web interface for Stable Diffusion image generation, used here as an external service via its API.
  • Amuse: A specific image generation software targeted for integration.
  • Amuse Implementation: The separate Python scripts and environment used to control the main Amuse software.
  • AMD GAIA: AMD’s ecosystem/toolkit for optimizing AI on AMD hardware.
  • Animated Face (AnimatedFaceWidget): The code-drawn 2D face in the UI reflecting agent state/emotion.
  • API (Application Programming Interface): A contract allowing software components to communicate (e.g., REST APIs for web services).
  • APScheduler: Python library for scheduling background tasks.
  • Architecture Style: The high-level design pattern (Hybrid: Modular Core + SOA + Multi-Agent + OS Integration + Plugins + Automation).
  • Async Task Manager (AsyncTaskManager): Core component managing background asyncio tasks, especially service polling.
  • Asynchronous Programming (asyncio): A programming paradigm allowing concurrent execution of I/O-bound tasks without blocking.
  • Autonomy: The agent’s ability to perform tasks (scheduling, self-update, reflection) without direct user command.
  • Backend Client (clients/): Python wrappers abstracting communication with external services, APIs, or CLIs.
  • ByteCraft: A specific tool targeted for SWF generation via its CLI.
  • Cache Strategy: The method used for storing and retrieving previously computed results (e.g., LLM responses) to save time/resources. Managed by ModelOptimizer.
  • Conda: A package and environment management system, used here primarily for isolating complex ROCm dependencies.
  • Conductor Agent (ConductorAgent): Core component responsible for orchestrating specialized Sub-Agents for complex tasks.
  • Configuration Files (config/): YAML and Python files storing settings and static data (settings.yaml, user_config.yaml, model_config.py, tasks.json).
  • Config Manager (ConfigManager): Core component responsible for loading, validating, and providing access to settings.yaml.
  • Consolidated Specification: This document, combining all details into one.
  • Continual Learning: AI techniques allowing models to learn over time without forgetting previous knowledge. Part of LearningAgent.
  • AI Studio GUI: The name for LegacyOracle’s advanced graphical user interface.
  • Cross-Cutting Concerns: Aspects affecting multiple components (Security, Performance, Errors, etc.).
  • Data Storage (data/): Location for persistent data (SQLite DB, cache, logs).
  • Dependency Manager (DependencyManager): Core component for checking Python package dependencies.
  • Deployment: The process of setting up and running the LegacyOracle system.
  • DirectML: Microsoft DirectX 12 API for hardware-accelerated machine learning on Windows, used by A1111 on AMD GPUs.
  • Docker: Containerization platform used for sandboxing (CodeInterpreterSkill) and optionally deploying services.
  • Docker Compose: Tool for defining and running multi-container Docker applications.
  • Dynamic Model Selection: The process of choosing the best LLM at runtime based on task requirements and resources, implemented by ModelSelector.
  • E2E (End-to-End) Testing: Testing complete workflows from user input to final output.
  • Environment Setup: The mandatory process of installing all prerequisites.
  • Episodic Memory: Storing records of past interactions and events (in SQLite via MemoryManager).
  • Error Handler (ErrorHandler): Central component for logging and reporting errors.
  • Evolution History: Summary of the project’s architectural development stages.
  • External Services: Processes running separately from the core agent (A1111, ROCm Services, LM Studio).
  • FastAPI: Python web framework used to wrap ROCm generation scripts as services.
  • Few-Shot Learning (FewShotSkill): AI technique enabling models to learn new tasks from very few examples.
  • Flow Automation Framework (WorkflowEngine): Core component for executing multi-step automation sequences combining various skills/tasks.
  • GAIA Toolbox (gaia-toolbox): AMD’s Python tools for optimizing models for AMD hardware. Used by ModelOptimizer.
  • Generative Services: External services dedicated to Image, Video, Audio generation.
  • Globalization (LocalizationManager, gettext): Support for multiple languages and locales.
  • Git: Version control system used for code management and potentially self-updating.
  • GUI Macro: An automated sequence of graphical user interface interactions (clicks, typing). Managed by the Task Automation Framework.
  • Handover Specification: This document, intended for developers.
  • Health Checks: Periodic checks performed by OpsManager to verify service availability.
  • Hybrid Architecture: The combination of multiple architectural styles.
  • httpx: Asynchronous HTTP client library used by backend clients.
  • i18n/l10n: Internationalization and Localization.
  • Integration Testing: Testing interactions between different components.
  • Keyring: Python library for accessing native OS credential stores (like Windows Credential Manager). Used by CredentialManagerClient.
  • Knowledge Acquisition (KnowledgeAcqSkill): Skill for finding, processing, and storing external information.
  • Knowledge Base: Structured storage (SQLite/VectorDB) for learned information.
  • LEGOPROMIND: Conceptual framework for the agent’s self-improvement capabilities.
  • litellm: Library providing a unified interface to various LLM APIs, including LM Studio’s OpenAI-compatible endpoint.
  • Live Chat Overlay (UI_OverlayWidget): A floating, transparent UI element for proactive suggestions/interactions.
  • LLM (Large Language Model): The core AI models providing reasoning and language capabilities (run via LM Studio).
  • LM Studio: Desktop application for running local LLMs and providing an OpenAI-compatible API.
  • Load Balancing: Distributing tasks across available resources (models/services) to prevent overload. Handled by OpsManager.
  • Localization Manager (LocalizationManager): Core component managing language translation using gettext.
  • Logic Skill (LogicSkill): Skill dedicated to solving logical reasoning problems.
  • Maintainability: Ease of modifying, extending, and debugging the system.
  • Manus-like Skill (OSControlSkill): Refers to the goal of deep, sophisticated OS control and interaction.
  • Mixture-of-Agents (MoA): Technique using multiple models/agents to improve output quality (Cross-Checking, Refinement).
  • Model Category (MODEL_CATEGORIES): Grouping of LLMs suitable for specific task types (e.g., “coding”, “logic”).
  • Model Optimizer (ModelOptimizer): Core component managing caching, quantization, and pruning.
  • Model Selection Preference: User setting (“speed” vs “accuracy”) influencing ModelSelector tie-breaking.
  • Model Selector (ModelSelector): Core component implementing dynamic LLM selection.
  • Modular Core (OpenManus): Architecture principle emphasizing self-contained, reusable components (Skills, Managers).
  • Multimodal: Ability to process multiple types of data (text, image, potentially audio).
  • Multi-Agent System: Architecture using a central orchestrator (ConductorAgent) and specialized SubAgents.
  • Native Integration: Direct interaction with operating system APIs and features.
  • Native Notifications: Using Windows Action Center notifications (win11toast).
  • NSSM (Non-Sucking Service Manager): Utility for running applications as Windows services.
  • OCR (Optical Character Recognition): Extracting text from images (VisionSkill using OCR_Client).
  • ONNX Runtime: Inference engine for running optimized ONNX models (used by ObjectDetect_Client, potentially with GAIA EPs).
  • ONNX Runtime Execution Provider: Backend used by ONNX Runtime for hardware acceleration (e.g., DmlExecutionProvider, ROCmExecutionProvider, CPUExecutionProvider).
  • OpenManus: Conceptual framework emphasizing modular skills and agent orchestration (implemented by Agent Core).
  • Operations Manager (OpsManager): Core component monitoring service health and managing recovery/load balancing.
  • Orchestrating Model: The initial LLM (phi-4-mini default) used to analyze user requests and plan execution.
  • Perception (“View Window”): The agent’s ability to “see” the screen (VisionSkill) and the corresponding UI panel (ViewWindowPanel).
  • Performance Optimization: Techniques to improve speed and reduce resource usage.
  • Persona: Adaptable communication style/behavior of the agent.
  • Phased Development Plan: Staged approach to implementing the project.
  • Plugin (PluginManager, plugins/): Mechanism for extending agent functionality with third-party code.
  • PowerShell: Windows command-line shell and scripting language, used for OS interaction.
  • Proactive Manager (ProactiveManager): Core component analyzing user activity and triggering suggestions/automations.
  • Project Structure: The organization of files and directories in the codebase.
  • PyAutoGUI / Pynput / Keyboard: Python libraries for simulating mouse and keyboard input (GUI Automation, Macro Recording).
  • PyQt6 / PySide6: Python bindings for the Qt UI framework used for AI Studio GUI.
  • Pytest: Python testing framework.
  • Qualitative Ratings: Using terms like “High”, “Medium”, “Low” in the SKILLS_MATRIX.
  • Quantization: Model optimization technique reducing precision (e.g., to INT8) to decrease size and potentially speed up inference. Handled by ModelOptimizer.
  • Reasoning Controller (ReasoningController): Core component selecting and managing Reasoning Strategies.
  • Reasoning Strategy (reasoning_models/): Pluggable modules defining different approaches to problem-solving/thinking (Code, General, Visual, Logic, etc.). Includes logic for streaming intermediate steps.
  • Reflection: The agent’s process of analyzing its past performance to identify areas for improvement (AutonomousTasksSkill, SelfCritiqueReasoning).
  • Reinforcement Learning (RLAgent): AI training paradigm where agents learn through trial-and-error based on rewards. Used here potentially to optimize model selection or workflows.
  • Resource Governor (ResourceGovernor): Core component monitoring system resources (CPU, RAM, VRAM, Power) and throttling agent activities.
  • ROCm (Radeon Open Compute): AMD’s open platform for GPU computing, used here for accelerating Video/Audio generation services.
  • Roo Code: Specific VS Code extension targeted for integration via RooCodeSkill and VSC_Client.
  • Ruffle: Flash Player emulator used in MediaPlayers widget to display SWF files generated by ByteCraft.
  • Sandboxing: Running code (esp. from CodeInterpreterSkill or potentially ToolsmithSkill) in an isolated environment (Docker preferred) for security.
  • Scheduler (Scheduler, APScheduler): Manages execution of background/scheduled tasks.
  • Schema: Formal definition of data structure (e.g., JSON Schema, SQLite Schema).
  • Security Manager (SecurityManager): Core component handling privilege checks, ACLs, and UAC interactions.
  • Selenium / Playwright: Libraries/frameworks for automating web browsers (WebDriverClient).
  • Self-Improvement (LEGOPROMIND): The overall concept encompassing reflection, learning, and potential self-modification.
  • Self-Updating: Agent’s ability to update its own codebase via Git (AutonomousTasksSkill).
  • Sentient OS / Sentient Software Sovereign: Vision concept for the highly integrated, autonomous agent.
  • Service-Oriented Architecture (SOA): Design pattern where functionalities are provided by independent, communicating services. Used here for generative tasks.
  • Skill (skills/): Modular component encapsulating a specific agent capability, following OpenManus principles.
  • Skill Dispatcher (SkillDispatcher): Core component routing requests to the appropriate Skill, Plugin, or Automation Task instance.
  • Skills Matrix (model_config.py::SKILLS_MATRIX): Configuration data rating LLM capabilities across different skill dimensions. Used by ModelSelector.
  • SQLite: Relational database engine used for local data persistence (agent_data.db).
  • State Manager (StateManager): Core component tracking the agent’s current operational state and emotional persona.
  • Streaming Reasoning: Displaying intermediate thought processes from the LLM/Reasoning Strategy in the UI (ChatPanel) in real-time.
  • Sub-Agent (agents/subagents/): Specialized agent instance managed by the ConductorAgent for specific sub-tasks.
  • System Tray (NativeIntegration, pystray): Icon in the Windows notification area for agent control.
  • Task Automation Framework: Components enabling creation/execution of GUI and Web automation tasks (TaskManager, TaskDefinition, Engines, Recorders).
  • Technology Stack: List of all software libraries, frameworks, and tools used.
  • Temperature (LLM): Parameter controlling the randomness/creativity of LLM output. Determined by ModelSelector.
  • Terminal Panel (UI_TerminalPanel): UI widget displaying agent’s internal logs with green text.
  • Testing Strategy: Plan outlining different types of testing (Unit, Integration, E2E, etc.).
  • Tool Use (Skill Dimension): LLM’s capability to correctly format requests for and interpret results from external tools/skills (distinct from direct OS/Web automation).
  • UAC (User Account Control): Windows security feature requiring user confirmation for actions needing administrator privileges. Interfaced via SecurityManager.
  • UI Automation (uiautomation library, OSControlSkill): Technique for controlling GUI applications via accessibility APIs (more robust than coordinate-based).
  • User Configuration (UserConfigManager, user_config.yaml): Settings specific to the user’s preferences.
  • Vector Database (chromadb, faiss): Database optimized for storing and searching vector embeddings, useful for semantic memory/knowledge retrieval (Optional integration in MemoryManager).
  • Versioning Strategy: Approach for managing versions of the application, configurations, and database.
  • View Window (ViewWindowPanel, VisionSkill): UI panel displaying visual output/perception data.
  • Vision Skill (VisionSkill): Skill responsible for screen perception (capture, OCR, object detection).
  • Visual Persona Controller (VisualPersonaController): UI component managing the state and animation logic of the AnimatedFaceWidget.
  • VRAM (Video RAM): Dedicated memory on the GPU, critical constraint for loading LLMs. Monitored by ResourceGovernor.
  • Watchdog: Python library for monitoring file system events (used in GUI TaskAutomationEngine).
  • WebDriver (WebDriverClient, Selenium/Playwright): Interface for controlling web browsers programmatically.
  • Web Automation Framework: Components enabling creation/execution of web automation tasks.
  • WMI (Windows Management Instrumentation): Windows API for querying system information and management tasks. Used by OS_Client_Monitor, OS_Client_EnvAwareness.
  • Workflow Engine (WorkflowEngine): Core component executing complex multi-step sequences involving various skills and logic.
  • WSL (Windows Subsystem for Linux): Allows running Linux environments on Windows. Interacted with via OSControlSkill/OS_Client_PSExecutor.
  • YOLO (You Only Look Once): Real-time object detection system. ONNX model used by ObjectDetect_Client.

19. API Documentation (Consolidated & Detailed Specs)

19.1 External Service APIs

  • ROCm Video FastAPI Service:
    • Endpoint: POST /generate
    • Request Body (JSON): { "prompt": "string (required) - Text description for video.", "num_frames": "integer (optional, default: 16) - Number of frames to generate.", "seed": "integer (optional, default: -1) - Random seed (-1 for random).", "style_hint": "string (optional) - Text hint for visual style.", "width": "integer (optional, default: 512) - Output video width.", "height": "integer (optional, default: 512) - Output video height.", "guidance_scale": "float (optional, default: 7.5) - CFG scale.", "num_inference_steps": "integer (optional, default: 25) - Diffusion steps." } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Success Response (202 Accepted, JSON): { "task_id": "string (UUID) - Identifier for the generation task." } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Error Response (400 Bad Request / 500 Internal Server Error, JSON): { "detail": "string - Description of the error." } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Endpoint: GET /status/{task_id}
    • Path Parameter: task_id (string, required) – The UUID returned by /generate.
    • Success Response (200 OK, JSON): { "task_id": "string", "status": "string ('pending'|'processing'|'complete'|'failed')", "progress": "float | null (0.0-1.0, null if pending/failed/not supported)", "output_path": "string | null - Absolute path to the generated video file (e.g., .mp4) upon completion.", "error_message": "string | null - Details if status is 'failed'." } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Error Response (404 Not Found, JSON): { "detail": "Task ID not found." } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Endpoint: GET /health
    • Success Response (200 OK, JSON): {“status”: “ok”}
  • ROCm Audio FastAPI Service:
    • Endpoint: POST /generate
    • Request Body (JSON): { "prompt": "string (required) - Text description for audio/music.", "duration_seconds": "integer (required, default: 10) - Desired audio duration.", "seed": "integer (optional, default: -1) - Random seed.", "model_id": "string (optional) - Specific MusicGen model variant (e.g., 'facebook/musicgen-small').", "temperature": "float (optional, default: 0.7) - Sampling temperature.", "top_p": "float (optional, default: 0.9) - Nucleus sampling probability.", "guidance_scale": "float (optional, default: 3.0) - Classifier-free guidance scale." } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Success Response (202 Accepted, JSON): {“task_id”: “string (UUID)”}
    • Error Response: {“detail”: “string”}
    • Endpoint: GET /status/{task_id}
    • Path Parameter: task_id (string, required).
    • Success Response (200 OK, JSON): { "task_id": "string", "status": "string ('pending'|'processing'|'complete'|'failed')", "output_path": "string | null - Absolute path to generated audio (.wav, .mp3) upon completion.", "error_message": "string | null" } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Error Response (404 Not Found): {“detail”: “Task ID not found.”}
    • Endpoint: GET /health
    • Success Response (200 OK, JSON): {“status”: “ok”}
  • Automatic1111 API: (Referencing key endpoints, see official A1111 API docs /docs for full details)
    • Endpoint: POST /sdapi/v1/txt2img
    • Request Body (JSON – Example): { "prompt": "a photorealistic cat wearing a wizard hat", "negative_prompt": "blurry, low quality, deformed", "steps": 25, "cfg_scale": 7.0, "width": 768, "height": 512, "sampler_name": "Euler a", "seed": -1, "batch_size": 1 // ... other A1111 parameters } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Success Response (200 OK, JSON): { "images": [ "string (base64 encoded PNG image)" // potentially more images if batch_size > 1 ], "parameters": { /* Parameters used */ }, "info": "string (JSON string of generation info)" } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Endpoint: GET /sdapi/v1/progress (Used for monitoring progress if A1111 runs tasks asynchronously – check A1111 setup)
    • Success Response (200 OK, JSON): { "progress": "float (0.0-1.0)", "eta_relative": "float (Estimated time remaining relative to total)", "state": { "skipped": "bool", "interrupted": "bool", "job": "string", "job_count": "int", "job_timestamp": "string", "job_no": "int", "sampling_step": "int", "sampling_steps": "int" }, "current_image": "string | null (base64 encoded preview image)", "textinfo": "string | null (Status text)" } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Endpoint: GET /health (or similar heartbeat if available, often just check if base URL is reachable)
  • LM Studio API: (OpenAI Compatible)
    • Endpoint: GET /v1/models
    • Success Response (200 OK, JSON): { "object": "list", "data": [ { "id": "model-name-loaded.gguf", // The identifier used in other calls "object": "model", "owned_by": "user", "permission": [] } // Potentially other loaded models ] } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Endpoint: POST /v1/models/load
    • Request Body (JSON): {“model”: “string (filename or path known to LM Studio)”}
    • Success Response (200 OK): Empty body or simple confirmation.
    • Endpoint: POST /v1/models/unload
    • Request Body (JSON): {“model”: “string (model ID from /v1/models)”}
    • Success Response (200 OK): Empty body or confirmation.
    • Endpoint: POST /v1/chat/completions (Primary interaction endpoint)
    • Request Body (JSON – OpenAI Schema): { "model": "string (model ID from /v1/models)", "messages": [ {"role": "system", "content": "You are LegacyOracle..."}, {"role": "user", "content": "User's prompt"} // Include chat history as needed ], "temperature": "float (Optional, default from LM Studio UI or 0.7)", "max_tokens": "integer (Optional, default from LM Studio UI or e.g., 2048)", "stream": "boolean (Optional, default: false)" // Set to true for streaming // ... other OpenAI parameters (top_p, presence_penalty, etc.) } IGNORE_WHEN_COPYING_START content_copy download Use code with caution.JsonIGNORE_WHEN_COPYING_END
    • Success Response (Non-Streaming, 200 OK, JSON):
    • {
    • “type”: “object”,
    • “properties”: {
    • “task_id”: {“type”: “string”, “format”: “uuid”},
    • “status”: {“type”: “string”, “enum”: [“pending”, “processing”, “complete”, “failed”]},
    • “output_path”: {“type”: [“string”, “null”], “description”: “Absolute path to generated audio (.wav/.mp3) if status is ‘complete’.”},
    • “error_message”: {“type”: [“string”, “null”]}
    • },
    • “required”: [“task_id”, “status”]
    • }
    • Success Response (Streaming, 200 OK, Server-Sent Events): A stream of data: {…} chunks following OpenAI SSE format, ending with data: [DONE]. Each chunk contains a delta.

19.2 Internal API Contracts & Interfaces

BaseSkill.execute Return Dictionary JSON Schema:

{
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["success", "error", "pending", "needs_confirmation", "needs_user_input"],
"description": "Outcome of the skill execution."
},
"data": {
"type": ["object", "string", "array", "null"],
"description": "Skill-specific output data (e.g., file path, text, list, dict)."
},
"message": {
"type": "string",
"description": "User-friendly status or result message for UI display."
},
"error_details": {
"type": ["string", "null"],
"description": "Detailed error message/traceback if status is 'error'."
},
"task_id": {
"type": ["string", "null"],
"description": "Identifier for asynchronous tasks requiring polling (if status is 'pending')."
},
"confidence": {
"type": ["number", "null"],
"format": "float",
"minimum": 0.0,
"maximum": 1.0,
"description": "Optional confidence score (0.0-1.0) for the result."
},
"required_confirmation": {
"type": ["string", "null"],
"description": "Message asking for user confirmation if status is 'needs_confirmation'."
},
"required_input_prompt": {
"type": ["string", "null"],
"description": "Message prompting user for additional input if status is 'needs_user_input'."
}
},
"required": ["status", "message"]
}

Agent Core (AC_CommInterface) <-> UI (UI_Controller) Communication Protocol (Qt Signals/Slots):
(Define specific signal names, argument types (using Python type hints), and connection types (Direct/Queued). Ensure thread safety using Qt.QueuedConnection for signals emitted from agent core threads/async tasks to the main UI thread.)

  • Signals emitted by AC_CommInterface:
    • agentStateChanged = Signal(str, str) # state, emotion
    • newLogMessage = Signal(str) # formatted_log_string
    • reasoningStep = Signal(str) # step_text
    • modelSelectionUpdate = Signal(str, str) # model_name, task_category
    • finalAgentResponse = Signal(str, int) # response_text, message_id
    • systemVitalsUpdate = Signal(float, float, float) # cpu, ram, vram
    • proactiveSuggestion = Signal(str, str, str) # suggestion_id, title, message
    • taskStatusUpdate = Signal(str, str, object, str) # task_id, status, progress (float or None), message
    • taskCompleted = Signal(str, dict) # task_id, result_dict (BaseSkill format)
    • pluginListUpdated = Signal(list) # List of plugin info dicts
    • availableLanguages = Signal(list) # List of language codes [“en”, “fr”]
    • visualContextUpdate = Signal(dict) # Dictionary containing image path/data, OCR text, object list
    • requestUserInput = Signal(str, str) # prompt_id, prompt_message
    • requestUserConfirmation = Signal(str, str) # confirmation_id, confirmation_message
  • Slots in AC_CommInterface (connected from UI_Controller signals):
    • @Slot(str) process_user_message
    • @Slot(str, object) set_user_preference
    • @Slot(bool) set_proactive_enabled
    • @Slot(bool) set_privilege_mode
    • @Slot(str, dict) execute_automation_task
    • @Slot(str, dict) execute_workflow
    • @Slot(dict) save_automation_task
    • @Slot(dict) save_workflow
    • @Slot(str, str) record_gui_macro_start
    • @Slot() record_gui_macro_stop
    • @Slot(str, str) record_web_task_start
    • @Slot() record_web_task_stop
    • @Slot(int, bool, str) store_user_feedback
    • @Slot() request_plugin_list
    • @Slot(str, str, dict) execute_plugin_action
    • @Slot(str) provide_user_input # Response to requestUserInput signal
    • @Slot(str, bool) provide_user_confirmation # Response to requestUserConfirmation

Conductor <-> Sub-Agent Message JSON Schema (via asyncio.Queue):

{
"type": "object",
"properties": {
"message_id": {"type": "string", "format": "uuid", "description": "Unique ID for this specific message."},
"conversation_id": {"type": "string", "format": "uuid", "description": "ID for the overall complex task."},
"task_id": {"type": "string", "description": "Unique ID for this sub-task execution instance."},
"type": {"type": "string", "enum": ["request", "response", "status_update", "error"], "description": "Message type."},
"source": {"type": "string", "description": "Sender ('ConductorAgent' or Sub-Agent name like 'CodeMasterAgent')."},
"target": {"type": "string", "description": "Recipient ('ConductorAgent' or Sub-Agent name)."},
"payload": {
"type": "object",
"description": "Task-specific data or results.",
"properties": {
"goal": {"type": "string", "description": "Specific instruction for the sub-agent (in request)."},
"inputs": {"type": "object", "description": "Inputs for the sub-task (in request)."},
"context": {"type": "object", "description": "Relevant context from Conductor (memory snippets, previous steps)."},
"model_suggestion": {"type": "string", "description": "Recommended execution model from Conductor."},
"temperature": {"type": "number", "format": "float"},
"status": {"type": "string", "enum": ["success", "error", "processing", "pending_confirmation"]},
"result_data": {"type": ["object", "string", "array", "null"], "description": "Output data from the sub-task (in response)."},
"error_message": {"type": "string", "description": "Error details if status is 'error'."},
"progress": {"type": "number", "format": "float", "description": "Progress update (0.0-1.0) for long tasks (in status_update)."}
}
# Required properties depend on 'type'
},
"timestamp": {"type": "string", "format": "date-time"}
},
"required": ["message_id", "conversation_id", "task_id", "type", "source", "target", "payload", "timestamp"]
}

Plugin API Interface Definition: Plugins are Python packages placed in the plugins/ directory. Each plugin must contain an __init__.py with a registration function and implement a class inheriting from skills.base_skill.BaseSkill.

def register_plugin():
return {
"name": "string (Unique Plugin Name)",
"version": "string (SemVer)",
"author": "string",
"description": "string",
"skill_class": MyPluginSkill, # The class implementing BaseSkill
"required_permissions": ["file_read", "network_access"] # Optional list of permissions needed
}

20. Tutorials / Examples (Outline)

Setup & First Run:

Verify prerequisites (Python, Git, Docker, GPU Drivers, ROCm).

Clone repository.

Run docker-compose up -d (for services) or detailed manual service setup.

Setup Core Agent Python environment (pip install -r requirements_agent.txt).

Run python main.py.

Complete the First-Run Setup Wizard (Set LM Studio URL, basic prefs).

Send a test message (“Hello”).

Basic Chat & Core Skills:

Ask general knowledge questions (triggers GeneralReasoning).

Ask for system stats (/run_skill SystemMonitorSkill get_summary).

Ask for file listing (/run_skill FileSystemSkill list_directory –path “C:\\Users”).

Ask for web search (/run_skill WebSearchSkill search –query “Latest AI news”).

Generating Images (A1111 & Amuse):

Go to Generative Tab -> Image.

Select “A1111” backend. Enter prompt, adjust params, click Generate. View result.

Select “Amuse” backend. Ensure Amuse setup is correct (Section 6.9). Enter prompt, click Generate. View result.

Try via chat: /generate_image_a1111 prompt=”Cyberpunk cat”

Try via chat: /generate_image_amuse prompt=”Impressionist landscape”

Generating Video & Audio (ROCm Services):

Go to Generative Tab -> Video. Enter prompt, duration, click Generate. Monitor progress, play result.

Go to Generative Tab -> Audio. Enter prompt, duration, click Generate. Monitor progress, play result.

Try via chat commands.

Creating & Running GUI Macros (Record & Execute):

Open Task Management Panel. Click “Record GUI Macro”.

Select Notepad (notepad.exe), name it “TypeHello”.

Click “Start Recording”.

Switch to Notepad, type “Hello World!”, Save the file (e.g., to Desktop).

Click “Stop Recording”.

In details dialog, define inputs/outputs if needed (e.g., map typed filename to output_file). Save task.

Select “TypeHello” in dropdown, click “Execute Task”. Observe Notepad automation.

Creating & Running Web Automations (Record/Define & Execute):

Open Task Management Panel. Click “Record Web Task” (or “Load Task JSON”).

If Recording: Select browser, name “GoogleSearch”, enter https://google.com. Click Start. In browser, type “LegacyOracle” in search bar, press Enter. Stop Recording. Define outputs (e.g., scrape results count selector=#result-stats). Save.

If Loading: Load a predefined web_tasks.json containing the steps.

Select “GoogleSearch” in dropdown, click Execute. Observe browser automation.

Creating & Running Complex Workflows:

Open Workflow Editor UI.

Name workflow “WebSearchAndSave”.

Add Step 1: Task=”WebSearchSkill”, Inputs={“query”: “{topic}”} (Uses input variable).

Add Step 2: Task=”FileSystemSkill”, Method=”write_file”, Inputs={“file_path”: “./outputs/search_{topic}.txt”, “content”: “{step1.output.results_summary}”} (Uses output from previous step).

Save Workflow.

Trigger via chat: /run_workflow WebSearchAndSave –topic “AI Agents”

Configuring Proactivity & User Preferences:

Open Settings Panel.

Toggle “Enable Proactive Features”. Adjust frequency.

Change UI Theme dropdown.

Set Model Selection Preference.

Set Skill Priorities. Save settings. Observe changes.

(Dev) Building a Simple Plugin:

Create directory plugins/MyEchoPlugin.

Create plugins/MyEchoPlugin/__init__.py with registration function pointing to skill class.

Create plugins/MyEchoPlugin/skill.py inheriting BaseSkill with an execute method that returns the input data.

Restart Agent. Check Plugin Manager UI.

Trigger via chat: /run_plugin MyEchoPlugin echo –text “Hello Plugin”

Troubleshooting Common Issues:

Service Not Running: Check Docker/Uvicorn logs. Use OpsManager health checks. Verify URLs in settings.yaml.

Model Load Failure: Check LM Studio logs. Verify model files exist. Check VRAM via ResourceGovernor/Task Manager.

Permission Denied (OS/File): Check agent user permissions. Use SecurityManager privileged mode toggle + UAC.

Web Automation Fails: Update WebDrivers. Check selectors match current website structure. Increase delays.

Dependency Conflicts: Verify Python environments (Core Agent, ROCm Services, Amuse) are separate and correct dependencies are installed in each.



Comments

Leave a Reply