graphdot.kernel package¶
-
class
graphdot.kernel.
Tang2019MolecularKernel
(stopping_probability=0.01, starting_probability=1.0, element_prior=0.2, edge_length_scale=0.05, **kwargs)[source]¶ Bases:
object
A margianlized graph kernel for 3D molecular structures as in: Tang, Y. H., & de Jong, W. A. (2019). Prediction of atomization energy using graph kernel and active learning. The Journal of chemical physics, 150(4), 044107. The kernel can be directly used together with Graph.from_ase() to operate on molecular structures.
Parameters: - stopping_probability (float in (0, 1)) – The probability for the random walk to stop during each step.
- starting_probability (float) – The probability for the random walk to start from any node. See the p
kwarg of
graphdot.kernel.marginalized.MarginalizedGraphKernel
- element_prior (float in (0, 1)) – The baseline similarity between distinct elements — an element always have a similarity 1 to itself.
- edge_length_scale (float in (0, inf)) – length scale of the Gaussian kernel on edge length. A rule of thumb is that the similarity decays smoothly from 1 to nearly 0 around three times of the length scale.
-
__call__
(X, Y=None, **kwargs)[source]¶ Same call signature as
graphdot.kernel.marginalized.MarginalizedGraphKernel.__call__()
-
bounds
¶
-
diag
(X, **kwargs)[source]¶ Same call signature as
graphdot.kernel.marginalized.MarginalizedGraphKernel.diag()
-
hyperparameter_bounds
¶
-
hyperparameters
¶
-
theta
¶
-
class
graphdot.kernel.
KernelOverMetric
(distance, expr, x, **hyperparameters)[source]¶ Bases:
object
-
bounds
¶
-
hyperparameters
¶
-
theta
¶
-
-
class
graphdot.kernel.
MarginalizedGraphKernel
(node_kernel, edge_kernel, p=1.0, q=0.01, q_bounds=(0.0001, 0.9999), eps=0.01, ftol=1e-08, gtol=1e-06, dtype=<class 'float'>, backend='auto')[source]¶ Bases:
object
Implements the random walk-based graph similarity kernel as proposed in: Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 321-328).
Parameters: - node_kernel (microkernel) – A kernelet that computes the similarity between individual nodes
- edge_kernel (microkernel) – A kernelet that computes the similarity between individual edge
- p (positive number (default=1.0) or
StartingProbability
) – The starting probability of the random walk on each node. Must be either a positive number or a concrete subclass instance ofStartingProbability
. - q (float in (0, 1)) – The probability for the random walk to stop during each step.
- q_bounds (pair of floats) – The lower and upper bound that the stopping probability can vary during hyperparameter optimization.
- eps (float) – The step size used for finite difference approximation of the gradient.
Only used for nodal matrices (
nodal=True
). - dtype (numpy dtype) – The data type of the kernel matrix to be returned.
- backend ('auto' or 'cuda' or an instance of) –
- :param
graphdot.kernel.marginalized.Backend
.: The computing engine that solves the marginalized graph kernel’s - generalized Laplacian equation.
-
__call__
(X, Y=None, eval_gradient=False, nodal=False, lmin=0, timing=False)[source]¶ Compute pairwise similarity matrix between graphs
Parameters: - X (list of N graphs) – The graphs must all have same node and edge attributes.
- Y (None or list of M graphs) – The graphs must all have same node and edge attributes.
- eval_gradient (Boolean) – If True, computes the gradient of the kernel matrix with respect to hyperparameters and return it alongside the kernel matrix.
- nodal (bool) – If True, return node-wise similarities; otherwise, return graphwise similarities.
- lmin (0 or 1) – Number of steps to skip in each random walk path before similarity is computed. (lmin + 1) corresponds to the starting value of l in the summation of Eq. 1 in Tang & de Jong, 2019 https://doi.org/10.1063/1.5078640 (or the first unnumbered equation in Section 3.3 of Kashima, Tsuda, and Inokuchi, 2003).
Returns: - kernel_matrix (ndarray) – if Y is None, return a square matrix containing pairwise similarities between the graphs in X; otherwise, returns a matrix containing similarities across graphs in X and Y.
- gradient (ndarray) – The gradient of the kernel matrix with respect to kernel hyperparameters. Only returned if eval_gradient is True.
-
active_theta_mask
¶
-
bounds
¶ The logarithms of a reshaped X-by-2 array of kernel hyperparameter bounds, excluing those declared as ‘fixed’ or those with equal lower and upper bounds.
-
diag
(X, eval_gradient=False, nodal=False, lmin=0, active_theta_only=True, timing=False)[source]¶ Compute the self-similarities for a list of graphs
Parameters: - X (list of N graphs) – The graphs must all have same node attributes and edge attributes.
- eval_gradient (Boolean) – If True, computes the gradient of the kernel matrix with respect to hyperparameters and return it alongside the kernel matrix.
- nodal (bool) – If True, returns a vector containing nodal self similarties; if False, returns a vector containing graphs’ overall self similarities; if ‘block’, return a list of square matrices which forms a block-diagonal matrix, where each diagonal block represents the pairwise nodal similarities within a graph.
- lmin (0 or 1) – Number of steps to skip in each random walk path before similarity is computed. (lmin + 1) corresponds to the starting value of l in the summation of Eq. 1 in Tang & de Jong, 2019 https://doi.org/10.1063/1.5078640 (or the first unnumbered equation in Section 3.3 of Kashima, Tsuda, and Inokuchi, 2003).
- active_theta_only (bool) – Whether or not to return only gradients with regard to the non-fixed hyperparameters.
Returns: - diagonal (numpy.array or list of np.array(s)) – If nodal=True, returns a vector containing nodal self similarties; if nodal=False, returns a vector containing graphs’ overall self similarities; if nodal = ‘block’, return a list of square matrices, each being a pairwise nodal similarity matrix within a graph.
- gradient – The gradient of the kernel matrix with respect to kernel hyperparameters. Only returned if eval_gradient is True.
-
flat_hyperparameters
¶
-
hyperparameter_bounds
¶
-
hyperparameters
¶ A hierarchical representation of all the kernel hyperparameters.
-
n_dims
¶ Number of hyperparameters including both optimizable and fixed ones.
-
requires_vector_input
¶
-
theta
¶ The logarithms of a flattened array of kernel hyperparameters, excluing those declared as ‘fixed’ or those with equal lower and upper bounds.
-
trait_t
¶ alias of
Traits