I'm new to python and i need to calculate the average number of characters per word in a list
using these definitions and helper function clean_up.
a token is a str that you get from calling the string method split on a line of a file.
a  word is a non-empty token from the file that isn't completely made up of punctuation.
find the "words" in a file by using str.split to find the tokens and then removing the punctuation from the words using the helper function clean_up.
A sentence is a sequence of characters that is terminated by (but doesn't include) the characters !, ?, . or the end of the file, excludes whitespace on either end, and is not empty.
This is my homework question from my computer science class in my college
the clean up function is:
def clean_up(s):
    punctuation = """!"',;:.-?)([]<>*#\n\"""
    result = s.lower().strip(punctuation)
    return result
my code is:
def average_word_length(text):
    """ (list of str) -> float
    Precondition: text is non-empty. Each str in text ends with \n and at
    least one str in text contains more than just \n.
    Return the average length of all words in text. Surrounding punctuation
    is not counted as part of the words. 
    >>> text = ['James Fennimore Cooper\n', 'Peter, Paul and Mary\n']
    >>> average_word_length(text)
    5.142857142857143 
    """
    for ch in text:
        word = ch.split()
        clean = clean_up(ch)
        average = len(clean) / len(word)
    return average
I get 5.0, but i am really confused, some help would be greatly appreciated :) PS I'm using python 3
Let's clean up some of these functions with imports and generator expressions, shall we?
import string
def clean_up(s):
    # I'm assuming you REQUIRE this function as per your assignment
    # otherwise, just substitute str.strip(string.punctuation) anywhere
    # you'd otherwise call clean_up(str)
    return s.strip(string.punctuation)
def average_word_length(text):
    total_length = sum(len(clean_up(word)) for sentence in text for word in sentence.split())
    num_words = sum(len(sentence.split()) for sentence in text)
    return total_length/num_words
You may notice this actually condenses to a length and unreadable one-liner:
average = sum(len(word.strip(string.punctuation)) for sentence in text for word in sentence.split()) / sum(len(sentence.split()) for sentence in text)
It's gross and disgusting, which is why you shouldn't do it ;). Readability counts and all that.
This is a short and sweet method to solve your problem that is still readable.
def clean_up(word, punctuation="!\"',;:.-?)([]<>*#\n\\"):
    return word.lower().strip(punctuation)  # you don't really need ".lower()"
def average_word_length(text):
    cleaned_words = [clean_up(w) for w in (w for l in text for w in l.split())]
    return sum(map(len, cleaned_words))/len(cleaned_words)  # Python2 use float
>>> average_word_length(['James Fennimore Cooper\n', 'Peter, Paul and Mary\n'])
5.142857142857143
Burden of all those preconditions falls to you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With