QNLP
v1.0
|
Public Member Functions | |
def | __init__ (self, corpus_path="", mode=0, stop_words=True, encoder=enc.gray.GrayEncoder(), use_spacy=False) |
def | load_tokens (self, corpus_path, mode=0, stop_words=True, use_spacy=False) |
def | sort_basis_helper (self, token_type, num_elems) |
def | define_basis (self, num_basis={"verbs":8, "nouns":8}) |
def | sort_tokens_by_dist (self, tokens_type, graph_type=nx.DiGraph, dist_metric=lambda x, np.abs(x[:, np.newaxis] - y) y) |
def | sort_basis_tokens_by_dist (self, tokens_type, graph_type=nx.DiGraph, dist_metric=lambda x, np.abs(x[:, np.newaxis] - y) y, num_basis=16, ham_cycle=True) |
def | assign_indexing (self, token_type) |
def | calc_diff_matrix (self) |
def | getPathLength (self, token_type) |
Data Fields | |
pc | |
tokens | |
encoder | |
distance_dictionary | |
encoded_tokens | |
ordered_tokens | |
Private Member Functions | |
def | _create_token_graph (self, token_dist_pairs, graph_type=nx.DiGraph) |
def | _get_ordered_tokens (self, nx.DiGraph token_graph, ham_cycle=True) |
def | _tsp_token_solver (self, nx.DiGraph token_graph) |
def | _calc_token_order_distance (self, token_order_list, token_type) |
Use vector space model of meaning to determine relative order of tokens in basis (see disco papers). Plan: - 1. Populate set of tokens of type t in corpus; label as T; O(n) - 2. Choose n top most occurring tokens from T, and define as basis elements of type t; 2 <= n <= |T|; O(n) - 3. Find relative (pairwise) distances between these tokens, and define as metric; n(n-1)/2 -> O(n^2) - 4. Sort tokens by distance metric; O(n*log(n)) - 5. Assign tokens integer ID using Gray code mapping of sorted index; O(n) After the aboves steps the elements are readily available for encoding using the respective ID. Tokens that appear relatively close together will be closer in the sorted list, and as a result have a smaller number of bit flips of difference which can be used in the Hamming distance calculation later for similarity of meanings.
Definition at line 108 of file VectorSpaceModel.py.
def QNLP.proc.VectorSpaceModel.VectorSpaceModel.__init__ | ( | self, | |
corpus_path = "" , |
|||
mode = 0 , |
|||
stop_words = True , |
|||
encoder = enc.gray.GrayEncoder() , |
|||
use_spacy = False |
|||
) |
Definition at line 123 of file VectorSpaceModel.py.
|
private |
Definition at line 345 of file VectorSpaceModel.py.
References QNLP.proc.VectorSpaceModel.VectorSpaceModel.distance_dictionary.
|
private |
Creates graph using the (basis) tokens as nodes, and the pairwise distances between them as weighted edges. Used to determine optimal ordering of token adjacency for later encoding.
Definition at line 239 of file VectorSpaceModel.py.
Referenced by QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_basis_tokens_by_dist(), and QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_tokens_by_dist().
|
private |
Solves the Hamiltonian cycle problem to define the ordering of the basis tokens. If a cycle is not required, can solve the TSP problem instead (ham_cycle = False).
Definition at line 279 of file VectorSpaceModel.py.
References QNLP.proc.VectorSpaceModel.VectorSpaceModel._tsp_token_solver().
Referenced by QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_basis_tokens_by_dist(), and QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_tokens_by_dist().
|
private |
Using or-tools to solve TSP of token_graph. Adapted from or-tools examples on TSP
Definition at line 296 of file VectorSpaceModel.py.
Referenced by QNLP.proc.VectorSpaceModel.VectorSpaceModel._get_ordered_tokens().
def QNLP.proc.VectorSpaceModel.VectorSpaceModel.assign_indexing | ( | self, | |
token_type | |||
) |
5. Encode the ordered tokens using a code based on indexed location. Values close together will have fewer bit flips.
Definition at line 360 of file VectorSpaceModel.py.
References QNLP.proc.VectorSpaceModel.VectorSpaceModel.encoded_tokens, QNLP.proc.VectorSpaceModel.VectorSpaceModel.encoder, and QNLP.proc.VectorSpaceModel.VectorSpaceModel.ordered_tokens.
def QNLP.proc.VectorSpaceModel.VectorSpaceModel.calc_diff_matrix | ( | self | ) |
Definition at line 375 of file VectorSpaceModel.py.
References QNLP.proc.VectorSpaceModel.VectorSpaceModel.tokens.
def QNLP.proc.VectorSpaceModel.VectorSpaceModel.define_basis | ( | self, | |
num_basis = {"verbs": 8, "nouns":8} |
|||
) |
2. Specify the number of basis elements in each space. Dict holds keys of type and values of number of elements to request.
Definition at line 156 of file VectorSpaceModel.py.
References QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_basis_helper(), and QNLP.proc.VectorSpaceModel.VectorSpaceModel.tokens.
Referenced by QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_basis_tokens_by_dist().
def QNLP.proc.VectorSpaceModel.VectorSpaceModel.getPathLength | ( | self, | |
token_type | |||
) |
Definition at line 384 of file VectorSpaceModel.py.
References QNLP.proc.VectorSpaceModel.VectorSpaceModel.distance_dictionary, and QNLP.proc.VectorSpaceModel.VectorSpaceModel.ordered_tokens.
def QNLP.proc.VectorSpaceModel.VectorSpaceModel.load_tokens | ( | self, | |
corpus_path, | |||
mode = 0 , |
|||
stop_words = True , |
|||
use_spacy = False |
|||
) |
Definition at line 134 of file VectorSpaceModel.py.
References QNLP.proc.VectorSpaceModel.VSM_pc.pc, QNLP.proc.VectorSpaceModel.VectorSpaceModel.pc, and QNLP.proc.process_corpus.tokenize_corpus().
def QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_basis_helper | ( | self, | |
token_type, | |||
num_elems | |||
) |
Definition at line 144 of file VectorSpaceModel.py.
References QNLP.proc.VectorSpaceModel.VectorSpaceModel.tokens.
Referenced by QNLP.proc.VectorSpaceModel.VectorSpaceModel.define_basis().
def QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_basis_tokens_by_dist | ( | self, | |
tokens_type, | |||
graph_type = nx.DiGraph , |
|||
dist_metric = lambda x , |
|||
np.abs(x[:, np.newaxis] - y) | y, | ||
num_basis = 16 , |
|||
ham_cycle = True |
|||
) |
Definition at line 202 of file VectorSpaceModel.py.
References QNLP.proc.VectorSpaceModel.VectorSpaceModel._create_token_graph(), QNLP.proc.VectorSpaceModel.VectorSpaceModel._get_ordered_tokens(), QNLP.proc.VectorSpaceModel.VectorSpaceModel.define_basis(), QNLP.proc.VectorSpaceModel.VectorSpaceModel.distance_dictionary, QNLP.proc.VectorSpaceModel.VectorSpaceModel.ordered_tokens, and QNLP.proc.VectorSpaceModel.VectorSpaceModel.tokens.
def QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_tokens_by_dist | ( | self, | |
tokens_type, | |||
graph_type = nx.DiGraph , |
|||
dist_metric = lambda x , |
|||
np.abs(x[:, np.newaxis] - y) | y | ||
) |
Definition at line 171 of file VectorSpaceModel.py.
References QNLP.proc.VectorSpaceModel.VectorSpaceModel._create_token_graph(), QNLP.proc.VectorSpaceModel.VectorSpaceModel._get_ordered_tokens(), QNLP.proc.VectorSpaceModel.VectorSpaceModel.distance_dictionary, QNLP.proc.VectorSpaceModel.VectorSpaceModel.ordered_tokens, and QNLP.proc.VectorSpaceModel.VectorSpaceModel.tokens.
QNLP.proc.VectorSpaceModel.VectorSpaceModel.distance_dictionary |
Definition at line 127 of file VectorSpaceModel.py.
Referenced by QNLP.proc.VectorSpaceModel.VectorSpaceModel._calc_token_order_distance(), QNLP.proc.VectorSpaceModel.VectorSpaceModel.getPathLength(), QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_basis_tokens_by_dist(), and QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_tokens_by_dist().
QNLP.proc.VectorSpaceModel.VectorSpaceModel.encoded_tokens |
Definition at line 128 of file VectorSpaceModel.py.
Referenced by QNLP.proc.VectorSpaceModel.VectorSpaceModel.assign_indexing().
QNLP.proc.VectorSpaceModel.VectorSpaceModel.encoder |
Definition at line 126 of file VectorSpaceModel.py.
Referenced by QNLP.proc.VectorSpaceModel.VectorSpaceModel.assign_indexing().
QNLP.proc.VectorSpaceModel.VectorSpaceModel.ordered_tokens |
Definition at line 129 of file VectorSpaceModel.py.
Referenced by QNLP.proc.VectorSpaceModel.VectorSpaceModel.assign_indexing(), QNLP.proc.VectorSpaceModel.VectorSpaceModel.getPathLength(), QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_basis_tokens_by_dist(), and QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_tokens_by_dist().
QNLP.proc.VectorSpaceModel.VectorSpaceModel.pc |
Definition at line 124 of file VectorSpaceModel.py.
Referenced by QNLP.proc.VectorSpaceModel.VectorSpaceModel.load_tokens().
QNLP.proc.VectorSpaceModel.VectorSpaceModel.tokens |
Definition at line 125 of file VectorSpaceModel.py.
Referenced by QNLP.proc.VectorSpaceModel.VectorSpaceModel.calc_diff_matrix(), QNLP.proc.VectorSpaceModel.VectorSpaceModel.define_basis(), QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_basis_helper(), QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_basis_tokens_by_dist(), and QNLP.proc.VectorSpaceModel.VectorSpaceModel.sort_tokens_by_dist().