Background 📚
Background 📚
Link prediction is the problem of predicting the existence of an edge between two entities in a network. Most problems in biology can be formulated as link prediction, e.g., drug repurposing is (drug)—(disease) link prediction; protein network reconstruction is (protein)—(protein) link prediction; drug safety prediction is (drug)—(side effect) link prediction; and so on. Indeed, graph neural networks for link prediction have already demonstrated impressive advances in genetic diagnosis (Alsentzer et al., medRxiv, 2024; Middleton et al., Science, 2024), microbe-drug association (Long et al., Bioinformatics, 2020), drug-drug interaction prediction (Zitnik et al., Bioinformatics, 2018), therapeutic target nomination (Li et al., Nature Methods, 2024), and zero-shot drug repurposing (Huang et al., Nature Medicine, 2024). However, existing methods are only able to predict single edges between two nodes, while clinically and scientifically relevant queries in the real world often require more complex graph queries that involve multiple edges, nodes, and other variables. In the era of precision medicine, complex graph queries would enable contextualized or personalized predictions. For example: "Which combinations of drugs (D_1, D_2, ..., Dn) will simultaneously modulate multiple proteins (P1, P2, ..., Pn) associated with a disease J in a patient with genetic variant V1?"
About GLIMPSE 🤖
About GLIMPSE 🤖
To enable complex logical queries on large-scale biomedical networks, we build upon “Embedding Logical Queries on Knowledge Graphs” (Hamilton et al., NeurIPS, 2018), which introduces graph query embeddings (GQE), a method for making predictions about first-order conjunctive logical queries on graph relations. We apply GQE to graph foundation models for biomedicine (e.g., TxGNN, recently published in Nature Medicine) trained on heterogeneous knowledge graphs (e.g., PrimeKG, see Chandak et al., Scientific Data, 2023).
We developed GLIMPSE: Graph-based Logical Inference for Multi-query Prediction and Scientific Exploration, an end-to-end system for complex logical queries on biomedical graphs with applications in contextualized prediction and precision medicine. GLIMPSE is powered by a biomedical knowledge graph with 129,375 nodes and 4,050,249 edges describing biomedical relationships. GLIMPSE includes the following features:
- Smart search across all nodes based on OpenAI text embeddings of node name and type.
- Graph-based retrieval augmented generation to understand and answer general graph queries.
- Smart delegation to a custom GQE-based advanced graph AI model to answer complex precision medicine queries.
- Automatic construction of structured directed acyclic graphs to represent conjunctive logical queries derived from human user-provided free text for GQE prediction.
Implementation Details 💻
Implementation Details 💻
We developed a combined front-end and back-end that allows the user to interact with an LLM and with a GQE model through a web interface. Through our interface, a human user can chat with an LLM that has access to the information encoded in the knowledge graph, as well as request additional computation to be performed by external models that specialize in a given biomedical task. The framework that we built is extremely flexible, allowing for any graph and any external models to be added in the future.

Our storage is handled by a Postgres instance in Digital Ocean. This instance holds more than 10M rows representing the structure of the graph. We designed a storage schema for knowledge graphs for the ground up to allow for efficient manipulation of the data under a relational database. Together with the new schema, we created a new binary format for easy export and sharing of graphs created by scientists. We developed libraries in both Python and TypeScript for loading, saving and exporting knowledge graphs.

Impact 💊
Impact 💊
Our compound AI system makes progress towards AI-driven precision medicine. Example applications of GLIMPSE include:
- Genetic and environmental interactions: “Given a combination of genetic mutations (G1, G2, …, Gn) and environmental exposures (E1, E2, …, En), what is the likelihood that a patient will develop asthma?”
- Combination therapeutics: "Will a combination of two drugs (D1, D2) target multiple proteins 8(P1, P2)* associated with a genetic mutation (G1) in Alzheimer’s disease?"
- Cell type-specific pathology: "Which immune cells (C1, C2) will interact with a viral protein (V1) to trigger an inflammatory response in patients with a specific genetic marker (G1)?"
- Universal treatments in patients with comorbidities: "Which drugs can simultaneously target diseases (D1, D2, …, Dn) in patients with symptoms (S1, S2, S3)?"
- Overcoming treatment resistance: "For a patient with ovarian cancer associated with specific mutations (G1, G2, G3) and resistance to treatments (T1, T2, …, Tn), what are potential new targets or drug candidates?"
We plan to continue working on GLIMPSE to enable many more complex multi-node and multi-edge queries. GLIMPSE will provide a glimpse into the future of AI-guided personalized medicine.