OpenWild: A New Conversational Dataset for Modern LLMs

Abstract

OpenWild is an open conversational dataset that captures real-world interactions with current-generation language models. The project is directly inspired by WildChat, a corpus of one million user-ChatGPT conversations collected by the Allen Institute for AI. Since WildChat's collection period ended in April 2024, the language model landscape has shifted substantially. New model families, longer context windows, multimodal capabilities, and fundamentally different alignment strategies mean that existing datasets no longer reflect how people actually use these systems. OpenWild addresses this gap by collecting consented conversations through chat.amias.mx, an open interface operated by the Alianza Mexicana para la IA Soberana (AMIAS).

TL;DR

WildChat gave us the first large-scale picture of how people interact with LLMs in the wild. But models have changed dramatically since that data was collected. OpenWild is building a new, open conversational dataset that reflects the current generation of language models, collected with consent through a free chat interface.

1. The Dataset Gap

Research on language model behavior depends on naturalistic data. Synthetic benchmarks measure capability, but they do not capture how users actually prompt, probe, and rely on these systems in unstructured settings. The difference matters: deployment risks, jailbreak patterns, and failure modes emerge from real usage, not lab conditions.

The best existing resource for this kind of analysis is WildChat. But its collection window closed before the current wave of models shipped. The result is a growing blind spot: we have detailed data on how people used GPT-3.5 and early GPT-4, but very little on how they interact with the systems that are actually deployed today.

2. WildChat and Its Contribution

WildChat, published by Zhao et al. at the Allen Institute for AI, is a corpus of approximately one million conversations collected between April 2023 and April 2024. Users accessed free GPT-3.5 Turbo and GPT-4 endpoints hosted on Hugging Face Spaces, with informed consent for data collection.

The dataset enabled research on user behavior, language diversity, toxic prompt patterns, and model response characteristics. It also produced WildLlama, a model fine-tuned to predict both user and assistant turns. WildChat remains a valuable resource, but its temporal scope is now a limitation.

3. Why Models Behave Differently Now

The language model landscape in 2026 bears little resemblance to the one WildChat captured. Several shifts make the older data insufficient for understanding current behavior.

Figure 1. Model capability timeline

Major model releases and capability milestones since WildChat's collection period.

New model families. Claude Opus 4.6, GPT-5.2, Gemini 3 Deep Think, DeepSeek-V3.2, and Llama 4 have all shipped since WildChat's collection ended. Each introduced architectures and training approaches that produce qualitatively different conversational behavior. Users adapt their prompting strategies accordingly.

Alignment evolution. RLHF, constitutional AI, and refusal training have all evolved significantly. Models that were permissive in mid-2023 are now more constrained, while newer models exhibit different refusal boundaries. This changes the distribution of user strategies and model responses.

Extended capabilities. Long context windows, tool use, code execution, and multimodal inputs have changed the nature of conversations. Multi-turn interactions are longer and more complex than they were during the WildChat collection period.

Figure 2. Distribution shift in conversation characteristics

Estimated changes in key interaction metrics between the WildChat era and current models.

4. What Is OpenWild

OpenWild is a dataset collection effort that provides free access to current-generation language models through chat.amias.mx, in exchange for consented conversation logging. The project is operated by AMIAS (Alianza Mexicana para la IA Soberana) and follows the same core methodology that made WildChat successful: offer a useful service, collect naturalistic data, release it openly for research.

The key difference is temporal. OpenWild captures interactions with models that are deployed today, reflecting current alignment behavior, capability boundaries, and user expectations.

5. Design Principles

Consent first. Every user sees a clear notice that conversations are collected for research. No personal information is requested or required to use the service.

Model diversity. Unlike WildChat's exclusive use of OpenAI models, OpenWild aims to capture interactions across multiple providers and model families, reflecting the fragmented deployment landscape.

Open release. The dataset will be released under a permissive license for research use, following WildChat's precedent.

Regional perspective. Operated from Mexico by AMIAS, the project naturally captures a Latin American user base that is underrepresented in existing English-centric datasets.

Figure 3. Data collection pipeline

Architecture of the OpenWild collection and anonymization pipeline.

6. Collection and Privacy

Conversations are logged with metadata including model identifier, timestamp, and language. All data passes through an anonymization pipeline before inclusion in any public release. IP addresses are discarded. Content flagged as containing personal information is reviewed and redacted.

The platform runs on LibreChat, an open-source conversational interface, ensuring transparency in the collection infrastructure itself.

7. Research Applications

A current-generation conversational dataset enables several lines of research that cannot be conducted with older data.

Alignment drift analysis: How have refusal patterns and safety boundaries changed across model generations?
User adaptation: How do prompting strategies evolve as models become more capable and more constrained?
Cross-model comparison: Do users interact differently with Claude, GPT, Gemini, and open-weight models?
Multilingual behavior: How do safety and capability patterns vary across languages, particularly in underserved ones?

8. Limitations and Future Work

OpenWild is in its early collection phase. The dataset does not yet have the scale of WildChat, and the user base is currently concentrated in Latin America. We plan to expand model coverage, increase geographic diversity through partnerships, and release periodic dataset snapshots as the corpus grows.

Long-term, the project aims to establish a continuously maintained, open conversational dataset that tracks the evolving relationship between people and language models.

References

Zhao et al., "WildChat: 1M ChatGPT Interaction Logs in the Wild," arXiv:2405.01470, 2024.
Ouyang et al., "Training Language Models to Follow Instructions with Human Feedback," NeurIPS, 2022.
Bai et al., "Constitutional AI: Harmlessness from AI Feedback," arXiv:2212.08073, 2022.