Small and Large LLMs Cooperation

A framework for edge computing and centralized LLMs integrative cooperation 

Feb 2025

The Symbiotic Dance of AI Web Agents:

Big and Small Models Working Together

In the evolving landscape of AI web agents, a collaborative effort developed "AgentSymbiotic," a framework that creates an iterative, symbiotic relationship between large and small language models (LLMs) for enhanced web navigation. This approach challenges the conventional decoupled paradigm where large LLMs generate trajectory data that's later used for retrieval or distillation.


The researchers identified a fundamental complementarity between large and small LLMs that forms the basis of their system. Large LLMs (like Claude-3.5 or GPT-4o) demonstrate superior exploitation capabilitiesβ€”making precise decisions in well-understood scenarios. Meanwhile, small LLMs (like Llama-3-8B) excel at explorationβ€”their faster inference speeds and increased stochasticity allow them to discover diverse trajectories and novel solutions. This dynamic creates a mathematical relationship where small LLMs visit a larger subset of state-action pairs within a given computational budget. This exploratory behavior generates valuable edge cases and alternative pathways that large LLMs might miss.


The framework implements a four-step iterative cycle:

  1. Trajectory Generation: Large LLMs use retrieval-augmented generation (RAG) to interact with web environments, generating high-quality navigation trajectories. These models learn from both successful and failed trajectories across multiple rounds of self-interaction.
  2. Trajectory Distillation: A multi-LLM debate mechanism evaluates generated trajectories, with two key innovations:
    • Speculative Data Synthesis: This mitigates off-policy bias by having the small LLM propose actions that are evaluated against multiple candidates from the large LLM, creating a student-teacher filtering mechanism that evolves dynamically as the small LLM improves.
    • Multi-Task Learning: The small LLM is trained to jointly predict both actions and intermediate reasoning steps, preserving the critical reasoning capabilities often lost during distillation.
  3. Small LLM Exploration: Distilled small LLMs explore the environment more extensively, uncovering diverse trajectories that feed back into the knowledge base.
  4. Symbiotic Improvement: The comprehensive trajectory database enhances the large LLM's RAG process, creating a virtuous cycle where each model continuously improves the other.


Privacy Preservation

For real-world applications handling sensitive data, the researchers developed a hybrid mode that automatically detects privacy-sensitive interactions and routes them to local small LLMs rather than cloud-based large LLMs. Empirical analysis showed that in domains like e-commerce, up to 61.2% of interactions contained privacy-sensitive information, highlighting the importance of this feature.


Benchmarks

On benchmark tests for web navigation, the large model achieved a 52% success rate (improving upon the previous result of 45%), while their 8B parameter small model reached an impressive 49%β€”closing the performance gap with models hundred times larger.


The implications extend beyond better web navigation. This research demonstrates that seemingly opposite approaches in AIβ€”methodical analysis versus exploratory discoveryβ€”can create powerful synergies when thoughtfully combined. As this technology matures, we may soon find ourselves with AI assistants that handle mundane online tasks with both remarkable capability while preserving privacy protection. The digital future may not belong to the biggest AI models, but to those systems that skillfully orchestrate collaboration between diverse models, each contributing what they do best.

Article

Daice Labs Inc.

Brookline, MA, USA

Privacy policy

OK