Something weird happens in this code:
fh = open('romeo.txt', 'r')
lst = list()
for line in fh:
    line = line.split()
    for word in line:
        lst.append(word)
for word in lst:
    numberofwords = lst.count(word)
    if numberofwords > 1:
        lst.remove(word)
lst.sort()
print len(lst)
print lst
romeo.txt is taken from http://www.pythonlearn.com/code/romeo.txt
Result:
27
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
As you can see, there are two 'the'. Why is that? I can run this part of code again:
for word in lst:
    numberofwords = lst.count(word)
    if numberofwords > 1:
        lst.remove(word)
After running this code a second time it deletes the remaining 'the', but why doesn't it work the first time?
Correct output:
26
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']
Surgeons do their best to remove all of the cancer during surgery. But it is always possible to leave behind a small group of cancer cells. Your surgeon may recommend more treatment if they feel that there is a risk that the cancer could come back. This is sometimes called adjuvant treatment.
The surgeon uses a piece of the small intestine to create a tube (conduit). The ureters previously connected to the bladder are connected to the conduit. Urine drains into the conduit, passes outside the body through a hole in the wall of the abdomen (stoma) and fills a pouch worn under clothes.
With enough time, you should be able to do almost everything you did before. Even if you now use a urostomy bag (to collect your urine), you can go back to work, exercise, and swim. People might not even notice you until you tell them.
When part or all of the stomach is removed, the food that is swallowed quickly passes into the intestine, leading to problems with nausea, diarrhea, sweating and flushing after eating. There are treatments to help with these symptoms.
In this loop:
for word in lst:
    numberofwords = lst.count(word)
    if numberofwords > 1:
        lst.remove(word)
lst is modified while iterating over it. Don't do that. A simple fix is to iterate over a copy of it:
for word in lst[:]:
Python makes delicious tools available for making these kinds of tasks very easy. By using what is built-in, you can usually avoid the kinds of problems you're seeing with explicit loops and modifying the loop variable in-place:
with open('romeo.txt', 'r') as fh:
    words = sorted(set(fh.read().replace('\n', ' ').split(' ')))
print(len(words))
print(words)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With