QNLP
v1.0
|
Public Member Functions | |
def | __init__ (self, fd=lambda x :[1.0/(i+1) for i in x]) |
def | load_corpus (self, corpus_path) |
def | tokenise_corpus (self, corpus_text) |
def | word_occurrence (self, list corpus_list) |
def | define_basis_words (self, dict word_dict, int max_words) |
def | map_to_basis (self, dict corpus_list, list basis, basis_dist_cutoff=10, distance_func=None) |
from multimethod import multimethod #Allow multiple dispatch More... | |
def | nvn_distances (self, dict corpus_list_n, dict corpus_list_v, dist_cutoff=2, distance_func=None) |
def | map_to_bitstring (self, list basis) |
def | generate_state_mapping (self, bit_map, dat_map) |
def | latex_states (self, bit_map, dat_map, file_name="state") |
Data Fields | |
distance_func | |
Implements precomputation for the DisCo(Cat) model to represent sentence meanings using category theory methods. See <PAPERS> for details.
Definition at line 95 of file DisCoCat.py.
def QNLP.proc.DisCoCat.DisCoCat.__init__ | ( | self, | |
fd = lambda x : [1.0/(i+1) for i in x] |
|||
) |
Definition at line 100 of file DisCoCat.py.
def QNLP.proc.DisCoCat.DisCoCat.define_basis_words | ( | self, | |
dict | word_dict, | ||
int | max_words | ||
) |
Chooses the max_words number of most common words from word_dict and return as list for use as basis.
Definition at line 126 of file DisCoCat.py.
def QNLP.proc.DisCoCat.DisCoCat.generate_state_mapping | ( | self, | |
bit_map, | |||
dat_map | |||
) |
Takes the basis bitstring map, and the token-to-basis relationship, and returns a normalised set of states, with coefficients determined by the distance_func lambda, given the distance between the token and the resulting basis element.
Definition at line 238 of file DisCoCat.py.
References QNLP.proc.DisCoCat.DisCoCat.distance_func.
Referenced by QNLP.proc.DisCoCat.DisCoCat.latex_states().
def QNLP.proc.DisCoCat.DisCoCat.latex_states | ( | self, | |
bit_map, | |||
dat_map, | |||
file_name = "state" |
|||
) |
LaTeX file outputter for state generation. Given the above data structures, file_name.tex is generated. Beware, as output may need to replace '_' with '\_' for non-math-mode usage.
Definition at line 268 of file DisCoCat.py.
References QNLP.proc.DisCoCat.DisCoCat.generate_state_mapping().
def QNLP.proc.DisCoCat.DisCoCat.load_corpus | ( | self, | |
corpus_path | |||
) |
Definition at line 103 of file DisCoCat.py.
def QNLP.proc.DisCoCat.DisCoCat.map_to_basis | ( | self, | |
dict | corpus_list, | ||
list | basis, | ||
basis_dist_cutoff = 10 , |
|||
distance_func = None |
|||
) |
from multimethod import multimethod #Allow multiple dispatch
Maps the words from the corpus into the chosen basis. Returns word_map dictionary, mapping corpus tokens -> basis states Keyword arguments: corpus_list -- List of tokens representing corpus basis -- List of basis tokens basis_dist_cutoff -- Cut-off for token distance from basis for it to be significant distance_func -- Function accepting distance between basis and token, and returning the resulting scaling. If 'None', defaults to 1/coeff for scaling param
Definition at line 148 of file DisCoCat.py.
References QNLP.proc.DisCoCat.DisCoCat.distance_func.
def QNLP.proc.DisCoCat.DisCoCat.map_to_bitstring | ( | self, | |
list | basis | ||
) |
Definition at line 227 of file DisCoCat.py.
def QNLP.proc.DisCoCat.DisCoCat.nvn_distances | ( | self, | |
dict | corpus_list_n, | ||
dict | corpus_list_v, | ||
dist_cutoff = 2 , |
|||
distance_func = None |
|||
) |
This function matches the NVN sentence structure, by locating adjacent nouns and verbs, following the same procedure as used to map corpus words onto the basis. With this, we can construct relationships between the verbs and their subject/object nouns.
Definition at line 188 of file DisCoCat.py.
References QNLP.proc.DisCoCat.DisCoCat.distance_func.
def QNLP.proc.DisCoCat.DisCoCat.tokenise_corpus | ( | self, | |
corpus_text | |||
) |
Definition at line 106 of file DisCoCat.py.
def QNLP.proc.DisCoCat.DisCoCat.word_occurrence | ( | self, | |
list | corpus_list | ||
) |
Counts word occurrence in a given corpus, presented as a tokenised word list. Returns a dictionary with keys as the tokens and values as the occurrences.
Definition at line 111 of file DisCoCat.py.
QNLP.proc.DisCoCat.DisCoCat.distance_func |
Definition at line 101 of file DisCoCat.py.
Referenced by QNLP.proc.DisCoCat.DisCoCat.generate_state_mapping(), QNLP.proc.DisCoCat.DisCoCat.map_to_basis(), and QNLP.proc.DisCoCat.DisCoCat.nvn_distances().