
GEOAssist V2.0: Opensource Geological AI App. Extract geoscience entities from your PDFs and create Geoscience Knowledge Graphs (GeoKG). Surface insights, find patterns, validate structure and support discovery. I’ve added an extra feature this weekend allowing automatic extraction of geoscience data and associations from your PDFs using Large Language Models (LLM).
You can run GEOAssist locally on a single PDF or thousands downloaded by GEOAssist (or files you already have), ensuring data never leaves your firewall for privacy.
Knowledge Graphs formalise information as interconnected structures. You can also view the entities extracted and their associations in tabular form, spotting unusual associations that may lead to new lines of thinking. The GeoKG can be exported via an RDF option (as it can be very large) for use in other applications.
For example, specific graph algorithms can be applied to a GeoKG, which can also be used in Graph Neural Networks (GNN). These can help find similar formations based on shared connections; to run link prediction to identify missing mineral-rock type links, or new plausible mineral associations; for pattern mining to find geological configurations commonly preceding resource-rich areas, and unusual patterns not previously documented; or perhaps discover novel geological connections, e.g. links between tectonics and a mineral previously unassociated.
The GeoKG option uses:
1. Geographical Location
2. Chronostratigraphy (Geol Age)
3. Lithology (Rock Type)
4. Minerals
5. Tectonics
6. Ore Body Feature
However, you can add/change these and extract anything based on your use case. This might be focused on research such as deep time, palaeontology and stratigraphy; urban planning geotechnical engineering; mitigating geohazards and disaster preparedness; to natural resource industry sectors such as water – hydrogeology; and the move towards the energy transition such as economic mining for critical minerals, geothermal, natural hydrogen, oil & gas exploration, carbon capture and storage and underground storage such as radioactive waste etc. Supporting the UN Sustainable Development Goals (SDG).
Out-of-the-box foundation LLMs have been trained on vast amounts of geological content, so have some ‘understanding’ of terminology without the need to perform fine tuning. Hopefully releasing all this code can help towards building equitable and sustainable geodata science and AI capabilities, and help spark new ideas!
I have updated the V1.0 GEOAssist code to V2.0 in Github : https://github.com/PCleverleyGeol/GeoAssist—An-open-source-autonomous-research-agent-for-geoscience-data-and-literature.-
