Detection of UNSAT Cores

In this project, I focused on improving the detection of UNSAT cores within 3-SAT formulas, which are boolean equations where each clause contains three variables. My goal was to develop a method to identify these cores—the specific variables or contradictions that prevent an equation from being true—more quickly and efficiently than previous methods. To implement this, I generated training data containing UNSAT cores, visualized that data on a bipartite graph, and trained a Graphical Neural Network (GNN) to observe and improve how it interprets these clauses.During the research phase, I explored the complexities of the SAT algorithm and previous detection methods using resources like ScienceDirect and arXiv. Although I initially attempted to use a "Neural SAT" GitHub repository as a foundation, I found that much of the code was outdated and would not run locally. As a result, I had to build significant portions of the project from scratch, including developing my own scripts for data generation. Ultimately, my work combined visualization techniques and machine learning to better understand and solve contradictions within complex boolean logic.

Step 1: Overview of Detection of UNSAT Cores This project revolved around two main arguments: the Satisfiability Boolean Algorithm and UNSAT cores. The first part is an argument that tries to make the entire equation true, and is only satisfied then. On the latter side, UNSAT cores are the variables that prevent the equation from being true. Only one contradiction can lead to a negative outcome of the equation.

Based on this knowledge, the project was made to find a way to detect UNSAT cores faster and more efficiently in the 3-SAT formula. This type of SAT is one in which each clause contains three variables. The implementation of this project was as follows: generating training data that had UNSAT cores, visualizing the training data, and finally, training a Graphical Neural Network on a bipartite graph and modifying the way it interprets clauses through training the Neural Net and observing the outcome. ---

Step 2: Methodology

In order to conduct this project, I had to garner knowledge about UNSAT cores and the previous detection methods, as well as learn more about the complexities of the SAT algorithm. My first source of knowledge was a GitHub repository about Neural SAT, which is a different type of model to interpret the SAT problem. This source provided me with the basic code that I could implement to see this model run. However, when I ran the code, it would not work locally because all the code in the repository was outdated. After understanding this, I had to make everything from scratch and not use sample code as a basis for my project. **Papers read:** - [Information Sciences — SAT Survey](https://www.sciencedirect.com/science/article/pii/S0020025525008904) — provided basic knowledge of SAT, the math behind it, and discussed two different types of models: NeuroSAT and the Student-Teacher Model. - [arXiv 2509.25411](https://arxiv.org/abs/2509.25411) — further discussed SAT models and UNSAT cores. - [NeuroCore (arXiv 1903.04671)](https://arxiv.org/pdf/1903.04671) — the authors built a framework designed to predict which variables are most likely part of an unsatisfiable core, similar to what I am doing. Instead of querying a neural network for every branching decision, they use "periodic refocusing" — every 100 seconds, the solver's Variable Activity Score gets restated to better point the AI to the most critical parts of the problem. To understand the technical architecture, I watched a video by Daniel Selman explaining the message-passing architecture and how a neural network can learn to solve a problem just by being told if the formula is "SAT" or "UNSAT." ---

Step 3: Implementation The implementation followed this structure: 1. Generating the data 2. Turning data into a bipartite graph for the solver to understand 3. Implementing the solver to detect UNSAT cores --- ### GENERATE_DATA.py ```python import random import json from pysat.solvers import Glucose3 def generate_random_3sat(n_vars, n_clauses): clauses = [] for _ in range(n_clauses): vars_sample = random.sample(range(1, n_vars + 1), 3) clause = [v if random.random() > 0.5 else -v for v in vars_sample] clauses.append(clause) return clauses def get_unsat_core(clauses, n_vars): with Glucose3() as solver: selectors = [] for i, clause in enumerate(clauses): selector = n_vars + i + 1 selectors.append(selector) solver.add_clause(clause + [-selector]) is_sat = solver.solve(assumptions=selectors) if is_sat: return None core_selectors = solver.get_core() core_indices = [sel - n_vars - 1 for sel in core_selectors] return core_indices def create_dataset(num_samples=100, n_vars=20, ratio=4.26): n_clauses = int(n_vars * ratio) dataset = [] print(f"Generating {num_samples} unsat samples...") while len(dataset) < num_samples: clauses = generate_random_3sat(n_vars, n_clauses) core = get_unsat_core(clauses, n_vars) if core: dataset.append({ "n_vars": n_vars, "clauses": clauses, "core_indices": core }) with open("unsat_data.json", "w") as f: json.dump(dataset, f) print("\nDataset saved to unsat_data.json") if name == "main": create_dataset(num_samples=500, n_vars=30) The generate_random_3sat function generates a clause with three variables. Each variable is randomly negated using the Bernoulli principle. The get_unsat_core function combats a major problem: SAT solvers report failure without listing the contradictions that caused it. To solve this, selectors with unique IDs are created and negated, then the solver assumes all selectors are true. When a contradiction arises, get_core() extracts which selector had the contradictory clause. create_dataset uses a ratio of 4.26 — a sweet spot at the borderline of hard and impossible — to generate the hardest possible data. Only UNSAT cores are saved to unsat_data.json. --- DATA_UTILS.py

def json_to_graphs(json_file): with open(json_file, 'r') as f: raw_data = json.load(f) graph_list = [] for item in raw_data: clauses = item['clauses'] n_vars = item['n_vars'] n_clauses = len(clauses) core_indices = set(item['core_indices']) edge_index = [] edge_attr = [] for c_idx, clause in enumerate(clauses): for lit in clause: v_idx = abs(lit) - 1 + n_clauses polarity = 1.0 if lit > 0 else -1.0 edge_index.append([c_idx, v_idx]) edge_index.append([v_idx, c_idx]) edge_attr.append([polarity]) edge_attr.append([polarity]) edge_index = torch.tensor(edge_index, dtype=torch.long).t().contiguous() edge_attr = torch.tensor(edge_attr, dtype=torch.float) x_clauses = torch.tensor([[1.0, 0.0]] * n_clauses) x_vars = torch.tensor([[0.0, 1.0]] * n_vars) x = torch.cat([x_clauses, x_vars], dim=0) y = torch.zeros(n_clauses + n_vars, dtype=torch.float) train_mask = torch.zeros(n_clauses + n_vars, dtype=torch.bool) for i in range(n_clauses): train_mask[i] = True if i in core_indices: y[i] = 1.0 data = Data(x=x, edge_index=edge_index, edge_attr=edge_attr, y=y, train_mask=train_mask) data.n_clauses = n_clauses data.n_vars = n_vars graph_list.append(data) return graph_list Neural networks cannot read JSON — they need information as a bipartite graph. Two types of nodes are created: one for every clause and one for every variable. Variables are offset because computers start at 0: v_idx = abs(lit) - 1 + n_clauses

Each edge carries a polarity: 1.0 if the literal is positive, -1.0 if negated. Connections are bidirectional so information flows both ways during the model's reasoning: edge_index.append([c_idx, v_idx]) edge_index.append([v_idx, c_idx]) Each node gets a label: [1, 0] for a clause and [0, 1] for a variable, helping the model keep identities straight. --- GNN_TRAINING.py

import torch import torch.nn.functional as F from torch_geometric.nn import GCNConv from torch_geometric.loader import DataLoader from data_utils import json_to_graphs class ClauseGNN(torch.nn.Module): def __init__(self, hidden_channels): super(ClauseGNN, self).__init__() self.conv1 = GCNConv(2, hidden_channels) self.conv2 = GCNConv(hidden_channels, hidden_channels) self.conv3 = GCNConv(hidden_channels, hidden_channels) self.lin = torch.nn.Linear(hidden_channels, 1) def forward(self, x, edge_index, batch): x = self.conv1(x, edge_index) x = x.relu() x = self.conv2(x, edge_index) x = x.relu() x = self.conv3(x, edge_index) out = self.lin(x) return torch.sigmoid(out).view(-1) def train(): full_dataset = json_to_graphs("unsat_data.json") train_loader = DataLoader(full_dataset, batch_size=32, shuffle=True) device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu') model = ClauseGNN(hidden_channels=64).to(device) optimizer = torch.optim.Adam(model.parameters(), lr=0.01) criterion = torch.nn.BCELoss() model.train() for epoch in range(1, 150): total_loss = 0 for data in train_loader: data = data.to(device) optimizer.zero_grad() out = model(data.x, data.edge_index, data.batch) loss = criterion(out[data.train_mask], data.y[data.train_mask]) loss.backward() optimizer.step() total_loss += loss.item() if epoch % 10 == 0: print(f"Epoch {epoch:03d}, Loss: {total_loss / len(train_loader):.4f}") torch.save(model.state_dict(), "clause_gnn_model.pth") print("Training Complete. Model saved!") if __name__ == "__main__": train()

Initial problem: Loss was at 0.0460 but the model wasn't learning — it was predicting everything as non-core with 95% accuracy, since core clauses were so rare. Fix 1: Added pos_weight to BCEWithLogitsLoss to penalize missing a core clause 10× more than a false alarm. This caused the loss to spike to 2.814 — the penalty was now so high the model collapsed to predicting everything as 0. Fix 2: Changed the learning rate and switched from GCNConv to GraphConv, which supports edge weights and attributes, allowing polarity to actually flow through the network. Bugs found: - The dataset file was saved as unsat.json but loaded as unsat_data.json - Edge attributes (polarity) were stored but never passed to the model, so it couldn't distinguish clause types Data problem: The core clause ratio was only 0.008 — less than 1% of all clauses were labeled as core. Changes made to generate_data.py: ┌────────────────┬────────┬───────┐ │ Parameter │ Before │ After │ ├────────────────┼────────┼───────┤ │ Variables │ 30 │ 20 │ ├────────────────┼────────┼───────┤ │ Ratio │ 4.26 │ 4.3 │ ├────────────────┼────────┼───────┤ │ Samples │ 500 │ 2000 │ ├────────────────┼────────┼───────┤ │ Core ratio │ 0.008 │ 0.015 │ ├────────────────┼────────┼───────┤ │ Min/max filter │ None │ Added │ └────────────────┴────────┴───────┘ Architecture changes: - Switched GCNConv → GraphConv for edge attribute support - Added a 4th convolutional layer for deeper message passing - Reduced hidden channels from 128 → 64 (128 caused overfitting, training loss dropped to 0.14 as the model memorized 400 samples instead of learning patterns) - Added 80/20 train/test split - Slowed learning rate from 0.01 → 0.001 - Added learning rate scheduler: halves LR if no improvement for 10 cycles - Added early stopping: halts after 50 consecutive cycles with no improvement - Enriched node features to include literal polarity counts and variable frequency, replacing simple [1,0] / [0,1] labels After all changes, the model plateaued at ~44% F1. Analysis revealed the fundamental issue: random 3-SAT near the phase transition ratio has no learnable structure — contradictions are spread across the entire graph with no consistent pattern for the GNN to find. The breakthrough: Instead of generating random formulas and extracting cores after the fact, cores were constructed first and the dataset was padded with random clauses. ┌────────────────────────┬────────┬─────────┐ │ Metric │ Before │ After │ ├────────────────────────┼────────┼─────────┤ │ F1 Score │ 44% │ 92.21% │ ├────────────────────────┼────────┼─────────┤ │ Recall │ — │ 96.5% │ ├────────────────────────┼────────┼─────────┤ │ Perfect recall samples │ — │ 365/400 │ ├────────────────────────┼────────┼─────────┤ │ Loss │ ~0.44 │ 0.014 │ └────────────────────────┴────────┴─────────┘ Generalization tests: - Different variable sizes → 90.02% F1 - Cores larger than 4 clauses → 96% F1 --- Results This project demonstrated that a GNN can successfully extract UNSAT cores from a SAT problem. The single biggest improvement across the entire project came from changing the data structure — switching from random UNSAT formulas to planted cores — which took the model from 44% to 92% F1 and proved that the quality of training data matters more than model architecture.