GeohazardGPT: Specialized LLM for Geohazard Analysis

GeohazardGPT is the first large language model explicitly developed for geohazard analysis and engineering. Built upon a Qwen3-8B base model and fine-tuned using LoRA, it features 8 billion parameters and supports a 32K token context length. The model was trained on a specialized domain corpus of 883 million tokens across 12 major geological hazard categories, alongside approximately 100K instruction-response pairs.

Key Capabilities

Factual QA: Provides precise recall of geohazard definitions, geomaterial properties, and code requirements.
Open-ended Explanation: Interprets hazard mechanisms, failure processes, and impact analyses.
Engineering Recommendation: Suggests stabilization measures, mitigation strategies, and monitoring plans for specific site conditions.
Report Summarization: Extracts key findings from investigation reports, case studies, and technical specifications.

RAG Integration for Enhanced Performance

GeohazardGPT is designed to be used with a retrieval-augmented generation (RAG) pipeline for standards-based engineering questions. This pipeline leverages Qwen3-Embedding and Qwen3-Reranker-4B to retrieve and re-rank relevant clauses from a vector database of technical specifications, grounding the model's responses in authoritative engineering standards. This integration allows GeohazardGPT to achieve performance comparable to much larger models on professional engineering examination tasks.

Intended Use

This model is ideal for geotechnical engineers, geohazard researchers, and practitioners requiring technically accurate, domain-grounded responses. It supports knowledge-intensive workflows in geohazard assessment and geotechnical engineering practice. It is crucial to note that model outputs should complement, not replace, professional field investigation and expert judgment.

Overview

GeohazardGPT: Specialized LLM for Geohazard Analysis

Key Capabilities

RAG Integration for Enhanced Performance

Intended Use

Full Model Card (README)