QNLP  v1.0
QNLP.tagging.tag_file Namespace Reference

Functions

def remove_stopwords (text, sw)
 
def tokenize_corpus (corpus, proc_mode=0)
 

Variables

 sw
 
string corpus_text
 
def words
 
int nvn_space_size
 
def common_n
 
 tokens
 
 tags
 
list nouns
 
list verbs
 
 idx_i
 
 idx_j
 
 val
 
 tmp
 

Function Documentation

◆ remove_stopwords()

def QNLP.tagging.tag_file.remove_stopwords (   text,
  sw 
)

Definition at line 10 of file tag_file.py.

10 def remove_stopwords(text, sw):
11  return [w for w in text if w not in sw]
12 
13 '''
14 Pass the corpus as a string, which is subsequently broken into tokenised sentences.
15 '''
def remove_stopwords(text, sw)
Definition: tag_file.py:10

Referenced by QNLP.tagging.tag_file.tokenize_corpus().

Here is the caller graph for this function:

◆ tokenize_corpus()

def QNLP.tagging.tag_file.tokenize_corpus (   corpus,
  proc_mode = 0 
)

Definition at line 16 of file tag_file.py.

16 def tokenize_corpus(corpus, proc_mode=0):
17  token_sents = nltk.sent_tokenize(corpus_text) #Split on sentences
18  token_words = [] # Individual words
19  tags = [] # Words and respective tags
20 
21  for s in token_sents:
22  tk = nltk.word_tokenize(s)
23  tk = remove_stopwords(tk, stopwords.words('english'))
24  token_words.extend(tk)
25  tags.extend(nltk.pos_tag(tk))
26 
27  if proc_mode != 0:
28  if proc_mode == 's':
29  s = nltk.SnowballStemmer('english', ignore_stopwords=True)
30  token_words = [s.stem(t) for t in token_words]
31  elif proc_mode == 'l':
32  wnl = nltk.WordNetLemmatizer()
33  token_words = [wnl.lemmatize(t) for t in token_words]
34 
35  tags = nltk.pos_tag(token_words)
36 
37  nouns = [i[0] for i in tags if t.matchables(t.Noun, i[1])]
38  verbs = [i[0] for i in tags if t.matchables(t.Verb, i[1])]
39 
40  count_nouns = Counter(nouns)
41  count_verbs = Counter(verbs)
42  return {'verbs':count_verbs, 'nouns':count_nouns, 'tk_sent':token_sents, 'tk_word':token_words}
43 
44 
def remove_stopwords(text, sw)
Definition: tag_file.py:10
def tokenize_corpus(corpus, proc_mode=0)
Definition: tag_file.py:16

References QNLP.tagging.tag_file.remove_stopwords().

Here is the call graph for this function:

Variable Documentation

◆ common_n

def QNLP.tagging.tag_file.common_n

Definition at line 60 of file tag_file.py.

◆ corpus_text

QNLP.tagging.tag_file.corpus_text

Definition at line 46 of file tag_file.py.

◆ idx_i

QNLP.tagging.tag_file.idx_i

Definition at line 80 of file tag_file.py.

◆ idx_j

QNLP.tagging.tag_file.idx_j

Definition at line 81 of file tag_file.py.

◆ nouns

list QNLP.tagging.tag_file.nouns

Definition at line 67 of file tag_file.py.

◆ nvn_space_size

int QNLP.tagging.tag_file.nvn_space_size

Definition at line 53 of file tag_file.py.

◆ sw

QNLP.tagging.tag_file.sw

Definition at line 8 of file tag_file.py.

◆ tags

QNLP.tagging.tag_file.tags

Definition at line 65 of file tag_file.py.

◆ tmp

QNLP.tagging.tag_file.tmp

◆ tokens

def QNLP.tagging.tag_file.tokens

Definition at line 63 of file tag_file.py.

◆ val

QNLP.tagging.tag_file.val

◆ verbs

list QNLP.tagging.tag_file.verbs

Definition at line 68 of file tag_file.py.

◆ words

def QNLP.tagging.tag_file.words

Definition at line 51 of file tag_file.py.