QNLP
v1.0
|
Public Member Functions | |
def | __init__ (self) |
def | tokenize_corpus (self, corpus, proc_mode=0, stop_words=True, use_spacy=False) |
Data Fields | |
pc | |
Private Member Functions | |
def | _get_token_position (self, tagged_tokens, token_type) |
Definition at line 27 of file VectorSpaceModel.py.
def QNLP.proc.VectorSpaceModel.VSM_pc.__init__ | ( | self | ) |
Definition at line 28 of file VectorSpaceModel.py.
|
private |
Tracks the positions where a tagged element is found in the tokenised corpus list. Useful for comparing distances. If the key doesn't initially exist, it adds a list with a single element. Otherwise, extends the list with the new token position value.
Definition at line 89 of file VectorSpaceModel.py.
Referenced by QNLP.proc.VectorSpaceModel.VSM_pc.tokenize_corpus().
def QNLP.proc.VectorSpaceModel.VSM_pc.tokenize_corpus | ( | self, | |
corpus, | |||
proc_mode = 0 , |
|||
stop_words = True , |
|||
use_spacy = False |
|||
) |
Rewrite of pc.tokenize_corpus to allow for tracking of basis word positions in list to improve later pairwise distance calculations.
Definition at line 31 of file VectorSpaceModel.py.
References QNLP.proc.VectorSpaceModel.VSM_pc._get_token_position(), QNLP.proc.VectorSpaceModel.VSM_pc.pc, and QNLP.proc.process_corpus.remove_stopwords().
QNLP.proc.VectorSpaceModel.VSM_pc.pc |
Definition at line 29 of file VectorSpaceModel.py.
Referenced by QNLP.proc.VectorSpaceModel.VectorSpaceModel.load_tokens(), and QNLP.proc.VectorSpaceModel.VSM_pc.tokenize_corpus().