dictionary - how to increment tuple values and search a string in a loop in python -
i have code.
arfffile = [] inputed = raw_input("enter evaluation name including file extension...") reader = open(inputed, 'r') verses = [] line in reader: verses.append(line) line in verses: if line.split('@') == "@": verses.pop(line) numclusters = int(raw_input("enter number of clusters")) clusters = {} in range(1,numclusters+1): clusters["cluster"+str(i)] = 0 print clusters # if verse belongs cluster, increment cluster count 1 in dictionary value. verse in verses: k in clusters: if k in verse: clusters[k] += 1 else: print "not in" print clusters yeslist = [] verse in verses: k in clusters: if k not in yeslist: yeslist.append((k,0)) elif k in yeslist: print "already in" + k verse in verses: k in clusters: if k in verse , "yes" in verse: yeslist.append(yeslist.index(k), +1) # iterate through dictionary , iterate through lines # need read in file line line, # if "yes" , cluster x increment cluster # need work out percentage of possitive verses in each cluster. an example of arff file
@relation tester999.arff_clustered @attribute instance_number numeric @attribute allah numeric @attribute day numeric @attribute lord numeric @attribute people numeric @attribute earth numeric @attribute men numeric @attribute truth numeric @attribute verily numeric @attribute chapter numeric @attribute verse numeric @attribute class {yes,no} @attribute cluster {cluster1,cluster2,cluster3} @data 0,1,0,0,0,0,0,0,0,1,1,no,cluster3 1,1,0,0,0,0,0,0,0,1,2,no,cluster3 2,0,0,0,0,0,0,0,0,1,3,no,cluster3 3,0,1,0,0,0,1,0,0,1,4,no,cluster3 4,0,0,0,0,0,0,0,0,1,5,no,cluster3 5,0,0,0,0,0,0,0,0,1,6,no,cluster3 6,0,0,0,0,0,0,0,0,1,7,no,cluster3 7,0,0,0,0,0,0,0,0,2,1,no,cluster3 8,1,0,0,0,0,0,0,0,2,2,no,cluster3 9,0,0,0,0,0,0,0,0,2,3,no,cluster3 10,0,0,0,0,0,0,0,0,2,4,no,cluster3 11,0,0,1,0,0,0,0,0,2,5,no,cluster2 as stands program reads in data lines eg
0,1,0,0,0,0,0,0,0,1,1,no,cluster3 and have created dictionary detects how many clusters in data files. in example there 3. cluster1 cluster2 , cluster3. code appends each cluster key value represented string in dictionary "clusters"
iterate on verses , count each line see cluster belongs to.
my next step try count, each cluster, number of times line "yes" in occurs. there 10 lines "yes" in string each line in data, code should able count number of occurences of this.
so far code have done here
for verse in verses: k in clusters: if k in verse , "yes" in verse: yeslist.append(yeslist.index(k), +1) i`m basicaly creating list of tuples called "yeslist" values [ (cluster1, 0), (cluster2, 3)]
so each line(represented string), check if there "yes" in it, if there check cluster belongs incremenet tuple value one.
i`m having trouble thinking of logic of how this... can help?
thanks.
import collections inputed = raw_input("enter evaluation name including file extension...") reader = open(inputed, 'r') verses = [ line.strip() line in reader.readlines() if line[0] != '@' ] reader.close() cluster_count = collections.defaultdict(int) yes_count = collections.defaultdict(int) verse_infos = [ (split_verse[-1],split_verse[-2]) split_verse \ in verses.split(",") ] verse in verse_infos: cluster_count[verse[0]]+=1 if verse[1] == 'yes': yes_count[verse[0]]+=1 you end 2 dictionaries:
cluster_count : keys = cluster#, values = count yes_count : keys = cluster#, values = #yes if want list of tuples:
yes_tuples = ( x x in sorted(yes_count.iteritems()) )
Comments
Post a Comment