Collaboration diagram for QNLP.proc.DisCoCat.DisCoCat:

Public Member Functions
def	__init__ (self, fd=lambda x :[1.0/(i+1) for i in x])

def	load_corpus (self, corpus_path)

def	tokenise_corpus (self, corpus_text)

def	word_occurrence (self, list corpus_list)

def	define_basis_words (self, dict word_dict, int max_words)

def	map_to_basis (self, dict corpus_list, list basis, basis_dist_cutoff=10, distance_func=None)
	from multimethod import multimethod #Allow multiple dispatch More...

def	nvn_distances (self, dict corpus_list_n, dict corpus_list_v, dist_cutoff=2, distance_func=None)

def	map_to_bitstring (self, list basis)

def	generate_state_mapping (self, bit_map, dat_map)

def	latex_states (self, bit_map, dat_map, file_name="state")

Data Fields
	distance_func

Detailed Description

Implements precomputation for the DisCo(Cat) model to represent sentence meanings
using category theory methods. See <PAPERS> for details.

Definition at line 95 of file DisCoCat.py.

Constructor & Destructor Documentation

◆ init()

def QNLP.proc.DisCoCat.DisCoCat.__init__	(	self,
		fd = `lambda x : [1.0/(i+1) for i in x]`
	)

Definition at line 100 of file DisCoCat.py.

     def __init__(self, fd = lambda x : [1.0/(i+1) for i in x]):
         self.distance_func = fd
 

Member Function Documentation

◆ define_basis_words()

def QNLP.proc.DisCoCat.DisCoCat.define_basis_words	(		self,
		dict	word_dict,
		int	max_words
	)

Chooses the max_words number of most common words from word_dict
and return as list for use as basis.

Definition at line 126 of file DisCoCat.py.

     def define_basis_words(self, word_dict : dict, max_words : int):
         """
         Chooses the max_words number of most common words from word_dict
         and return as list for use as basis.
         """
         k = list(word_dict.keys())
         v = list(word_dict.values())
         res_list = []
 
         for i in range(max_words):
             max_val = max(v)
             val_idx = v.index(max_val)
             res_list.append((k[val_idx],max_val))
             k.remove(k[val_idx])
             v.remove(max_val)
 
         return res_list
 

◆ generate_state_mapping()

def QNLP.proc.DisCoCat.DisCoCat.generate_state_mapping	(	self,
		bit_map,
		dat_map
	)

Takes the basis bitstring map, and the token-to-basis relationship, and returns a normalised set of states, with coefficients determined by the distance_func lambda, given the distance between the token and the resulting basis element.

Definition at line 238 of file DisCoCat.py.

     def generate_state_mapping(self, bit_map, dat_map):
         """
         Takes the basis bitstring map, and the token-to-basis relationship, and returns a normalised set of states, with coefficients determined by the distance_func lambda, given the distance between the token and the resulting basis element.
         """
         num_states = bit_map[0]
 
         # Mapping token to array of tuples, first index the basis state coefficient and second the integer representation of the bitstring state
         state_encoding = {}
         for token, basis_dist_map in dat_map.items():
             local_coeffs = []
             local_states = []
             for basis_token, distance_list in basis_dist_map.items():
                 # If more than one occurrence for the same word, apply the distance relation function then sum the results for that basis work coefficient
                 local_coeffs.append( np.sum( self.distance_func(distance_list) ) )
                 local_states.append( bit_map[1][basis_token] )
 
             # Calc normalisation factor over all the respective basis states for a given token
             norm_factor = np.linalg.norm(local_coeffs)
             for state_idx in range( len(local_states) ):
                 # Normalise the coefficient
                 local_coeffs[state_idx] /= norm_factor
                 current = state_encoding.get(token)
                 if current != None:
                     current.append( (local_coeffs[state_idx], local_states[state_idx],) )
                 else:
                     state_encoding.update({token : [(local_coeffs[state_idx], local_states[state_idx],)] })
         return state_encoding
 

References QNLP.proc.DisCoCat.DisCoCat.distance_func.

Referenced by QNLP.proc.DisCoCat.DisCoCat.latex_states().

Here is the caller graph for this function:

◆ latex_states()

def QNLP.proc.DisCoCat.DisCoCat.latex_states	(	self,
		bit_map,
		dat_map,
		file_name = `"state"`
	)

LaTeX file outputter for state generation. Given the above data structures, file_name.tex is generated. Beware, as output may need to replace '_' with '\_' for non-math-mode usage.

Definition at line 268 of file DisCoCat.py.

     def latex_states(self, bit_map, dat_map, file_name = "state"):
         """
         LaTeX file outputter for state generation. Given the above data structures, file_name.tex is generated. Beware, as output may need to replace '_' with '\_' for non-math-mode usage.
         """
 
         mapping = self.generate_state_mapping(bit_map, dat_map)
         with open(file_name + ".tex", "w") as f:
             f.write("\\documentclass{article} \n \\usepackage{amsmath} \\usepackage{multicol} \n \\begin{document} \n")
             tex_string_format_bit = r'\vert {:0%db} \rangle'%(bit_map[0])
             f.write("\\section{Basis} \\begin{multicols}{2} \n \\noindent ")
             for b_key, b_val in bit_map[1].items():
                 f.write(b_key + " $\\rightarrow " + tex_string_format_bit.format(b_val) + "$\\\\ ")
             f.write("\\end{multicols}")
             f.write("\\noindent\\rule{\\textwidth}{1pt} \n")
             f.write("\\noindent\\rule{\\textwidth}{1pt} \n")
             f.write("\\section{Encoding} \n")
             for token, basis_map in mapping.items():
                 f.write(r"\begin{align}\vert \textrm{" + token + "} \\rangle &= \\\\ \n &" )
                 for i,b in enumerate(basis_map):
                     if( i != 0 ):
                         if(i%3 == 0):
                             f.write(r" \\ & ")
                     f.write("{0:.3f}".format(round(b[0],3)))
                     f.write(tex_string_format_bit.format(b[1]) )
                     if(i != len(basis_map) - 1):
                         f.write(r"+")
                     f.write(" \\nonumber ")
                 f.write(r"""\end{align}""")
                 f.write("\\noindent\\rule{\\textwidth}{1pt} \n")
             f.write(r"\end{document}")
 

References QNLP.proc.DisCoCat.DisCoCat.generate_state_mapping().

Here is the call graph for this function:

◆ load_corpus()

def QNLP.proc.DisCoCat.DisCoCat.load_corpus	(	self,
		corpus_path
	)

Definition at line 103 of file DisCoCat.py.

     def load_corpus(self, corpus_path):
         return pc.load_corpus(corpus_path)
     

◆ map_to_basis()

def QNLP.proc.DisCoCat.DisCoCat.map_to_basis	(		self,
		dict	corpus_list,
		list	basis,
			basis_dist_cutoff = `10`,
			distance_func = `None`
	)

from multimethod import multimethod #Allow multiple dispatch

Maps the words from the corpus into the chosen basis.         
Returns word_map dictionary, mapping corpus tokens -> basis states

Keyword arguments:
corpus_list         --  List of tokens representing corpus
basis               --  List of basis tokens
basis_dist_cutoff   --  Cut-off for token distance from basis for it to be significant
distance_func       --  Function accepting distance between basis and token, and
                returning the resulting scaling. If 'None', defaults to 
                1/coeff for scaling param

Definition at line 148 of file DisCoCat.py.

     def map_to_basis(self, corpus_list : dict, basis : list, basis_dist_cutoff=10, distance_func=None):
         """
         Maps the words from the corpus into the chosen basis.         
         Returns word_map dictionary, mapping corpus tokens -> basis states
 
         Keyword arguments:
         corpus_list         --  List of tokens representing corpus
         basis               --  List of basis tokens
         basis_dist_cutoff   --  Cut-off for token distance from basis for it to be significant
         distance_func       --  Function accepting distance between basis and token, and
                                 returning the resulting scaling. If 'None', defaults to 
                                 1/coeff for scaling param
         """
 
         if distance_func == None:
             distance_func = self.distance_func #lambda x : [1.0/(i+1) for i in x]
 
         word_map = {}
 
        # map distance between basis words and other words in token list
         for word, locations in corpus_list.items():
             word_map.update({word : None})
             for b_idx, b_val in enumerate(basis):
                 # Basis elements are orthogonal
                 if(b_val == word):
                     word_map.update({b_val : {b_val : 0}})
                     break
                 # to add left-right ordering here, remove the abs and use sign of distance to indicate where words appear relative to one another. 
                 min_dist = np.min(np.abs(locations[1][:, np.newaxis] - corpus_list[b_val][1]))
                 m = (word, b_val, min_dist <= basis_dist_cutoff)
 
                 if m[2] != False:
                     if(word_map.get(m[0]) != None):
                         update_val = word_map.get(m[0])
                         update_val.update({m[1] : min_dist})
                         word_map.update({m[0] : update_val })
                     else:
                         word_map.update({m[0] : {m[1] : min_dist} })
         return word_map
 

References QNLP.proc.DisCoCat.DisCoCat.distance_func.

◆ map_to_bitstring()

def QNLP.proc.DisCoCat.DisCoCat.map_to_bitstring	(		self,
		list	basis
	)

Definition at line 227 of file DisCoCat.py.

     def map_to_bitstring(self, basis : list):
         upper_bound_bitstrings = int(np.ceil(np.log2(len(basis))))
         bit_map = {}
         bitstring = 0 # Assume |0...0> state reserved for initialisation only
         for k, v in basis:
             bitstring += 1
             bit_map.update({k: bitstring})
         return (upper_bound_bitstrings, bit_map)
 

◆ nvn_distances()

def QNLP.proc.DisCoCat.DisCoCat.nvn_distances	(		self,
		dict	corpus_list_n,
		dict	corpus_list_v,
			dist_cutoff = `2`,
			distance_func = `None`
	)

This function matches the NVN sentence structure, by locating adjacent
nouns and verbs, following the same procedure as used to map corpus words 
onto the basis. With this, we can construct relationships between the
verbs and their subject/object nouns.

Definition at line 188 of file DisCoCat.py.

     def nvn_distances(self, corpus_list_n : dict, corpus_list_v : dict, dist_cutoff=2, distance_func=None):
         """This function matches the NVN sentence structure, by locating adjacent
         nouns and verbs, following the same procedure as used to map corpus words 
         onto the basis. With this, we can construct relationships between the
         verbs and their subject/object nouns."""
 
         if distance_func == None:
             distance_func = self.distance_func #lambda x : [1.0/(i+1) for i in x]
 
         word_map = {}
 
        # map distance between words
         for word_v, locations_v in corpus_list_v.items():
             for word_n, locations_n in corpus_list_n.items():
                 from IPython import embed; embed()
 
                 dists = locations_n[1][:, np.newaxis] - locations_v[1]
                 if any([np.abs(x) <= dist_cutoff for x in dists]):
                     print("Pattern between {} and {}".format(word_n, word_v))
                 continue
 
                 if(0):# if dist between v and noun is negative, order 1, if positive, order 2
                     word_map.update({word : None})
 
                 # to add left-right ordering here, remove the abs and use sign of distance to indicate where words appear relative to one another. 
                 min_dist = np.min(np.abs(locations[1][:, np.newaxis] - corpus_list[b_val][1]))
                 m = (word, b_val, min_dist <= basis_dist_cutoff)
 
                 if m[2] != False:
                     if(word_map.get(m[0]) != None):
                         update_val = word_map.get(m[0])
                         update_val.update({m[1] : min_dist})
                         word_map.update({m[0] : update_val })
                     else:
                         word_map.update({m[0] : {m[1] : min_dist} })
         return word_map          
 

References QNLP.proc.DisCoCat.DisCoCat.distance_func.

◆ tokenise_corpus()

def QNLP.proc.DisCoCat.DisCoCat.tokenise_corpus	(	self,
		corpus_text
	)

Definition at line 106 of file DisCoCat.py.

     def tokenise_corpus(self, corpus_text):
         return pc.tokenize_corpus(corpus_text)
 

◆ word_occurrence()

def QNLP.proc.DisCoCat.DisCoCat.word_occurrence	(		self,
		list	corpus_list
	)

Counts word occurrence in a given corpus, presented as a tokenised word list.
Returns a dictionary with keys as the tokens and values as the occurrences.

Definition at line 111 of file DisCoCat.py.

     def word_occurrence(self, corpus_list : list):
         """
         Counts word occurrence in a given corpus, presented as a tokenised word list.
         Returns a dictionary with keys as the tokens and values as the occurrences.
         """
         word_dict = {}
         for word in corpus_list:
             if word in word_dict:
                 word_dict[word] += 1
             else:
                 word_dict[word] = 1
         return word_dict
 

Field Documentation

◆ distance_func

QNLP.proc.DisCoCat.DisCoCat.distance_func

Definition at line 101 of file DisCoCat.py.

Referenced by QNLP.proc.DisCoCat.DisCoCat.generate_state_mapping(), QNLP.proc.DisCoCat.DisCoCat.map_to_basis(), and QNLP.proc.DisCoCat.DisCoCat.nvn_distances().

The documentation for this class was generated from the following file:

/Users/mlxd/Desktop/intel-qnlp-rc2/modules/py/pkgs/QNLP/proc/DisCoCat.py

Public Member Functions

Data Fields

Detailed Description

Constructor & Destructor Documentation

◆ __init__()

Member Function Documentation

◆ define_basis_words()

◆ generate_state_mapping()

◆ latex_states()

◆ load_corpus()

◆ map_to_basis()

◆ map_to_bitstring()

◆ nvn_distances()

◆ tokenise_corpus()

◆ word_occurrence()

Field Documentation

◆ distance_func

◆ init()