HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive Review 2024
HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive
Lossless uncensored MoE model with extended context and multimodal support
Starting at
0.00
Billing
One-time download
Refund
N/A (Open-weight model)
Our Take
A highly capable, unrestricted variant of the Qwen3.6-35B-A3B architecture, optimized for local deployment and specialized workflows requiring unfiltered outputs.
Is It Worth It?
Yes, for developers and researchers who require an open-weight, uncensored MoE model with extensive quantization options and strong reasoning capabilities.
Best Suited For
Local AI deployment, uncensored content generation, agentic coding workflows, and long-context reasoning tasks.
What We Loved
- ✓Completely removes safety refusal filters
- ✓Wide range of lossless GGUF quantizations for flexible hardware deployment
- ✓Strong coding and reasoning capabilities for its size
- ✓Native multimodal and long-context support
- ✓Free to download and self-host
What Bothered Us
- ✗Requires substantial VRAM for higher precision formats
- ✗Lacks built-in content moderation, requiring external safeguards
- ✗No official vendor support or SLA
- ✗Aggressive variant may produce unverified or harmful outputs without careful prompting
How It Performed
output Quality
Maintains the strong reasoning, coding, and multimodal understanding of the base Qwen3.6 architecture across various quantization levels.
ai Intelligence
Competitive for its size, leveraging a 35B/3B MoE design with 262k context and native tool-calling.
speed Test
Inference speed depends heavily on hardware and quantization; MoE architecture typically offers faster token generation than dense models of similar parameter counts.
This model serves as a direct, lossless modification of the Qwen3.6-35B-A3B base, focusing on removing alignment filters while preserving original capabilities. It supports text, image, and video inputs, and features a 'thinking mode' that maintains chain-of-thought reasoning across extended sessions. The release includes a comprehensive set of GGUF quantizations, ranging from Q8-KP (~44 GB) down to IQ2-M (~11 GB), enabling deployment across a wide spectrum of hardware configurations. While the removal of safety filters provides maximum flexibility for developers, it also necessitates careful prompt engineering and output validation. The model integrates well with popular local inference frameworks and maintains competitive performance in coding and reasoning benchmarks relative to its parameter count.
Ideal for developers building local AI agents, researchers studying unfiltered model behavior, and users requiring long-context multimodal processing without vendor-imposed restrictions. It is less suitable for enterprise environments requiring strict content moderation or compliance guarantees.
Competes with other uncensored community releases like Llama 3.2-Uncensored and Mixtral-8x7B-Uncensored, as well as the standard safety-aligned Qwen3.6-35B-A3B. Its MoE architecture provides a favorable balance of performance and resource efficiency compared to dense 35B models.
Frequently Asked Questions
It indicates that all built-in safety refusal mechanisms have been removed, allowing the model to respond to prompts that would typically be blocked by aligned versions.
Requirements vary by quantization. The Q8-KP version requires approximately 44 GB of VRAM, while the IQ2-M version can run on around 11 GB, making it accessible to consumer-grade GPUs.
Yes, it retains the multimodal capabilities of the base Qwen3.6 architecture, allowing it to process text, images, and video natively.
The model itself is open-weight and intended for self-hosting. However, cloud providers offering the base Qwen3.6 architecture typically charge around $0.78 per 1M input tokens and $3.90 per 1M output tokens.
No, this is a community-released modification. Support is handled through community channels like Hugging Face discussions and Discord.
It is optimized for and compatible with Transformers, vLLM, SGLang, KTransformers, LM Studio, and Ollama.
For thinking mode, use temperature=1.0, top_p=0.95, top_k=20. For coding or precise tasks, lower the temperature to 0.6 and set presence_penalty to 0.