Computer Science

To start this guide, download this zip file.

# Counting

Dictionaries make it easy to count items. For example let’s say we wanted to count the number of vowels in a string. Here is what this program should do:

``````% python vowel_counts.py 'Hello. How are you?'
{'a': 1', 'e': 2, 'i': 0, 'o': 3, 'u': 1}``````

Notice that for the string `Hello. How are you?` we have created a dictionary that maps each vowel to the number of times it appears:

• a: 1
• e: 2
• i: 0
• o: 3
• u: 1

To see how we can do this, take a look at this function:

``````def count(letters, text):
# create an empty dictionary
counts = {}

# loop through all of the letters we are counting
# and initialize their counts to zero
for letter in letters:
counts[letter] = 0

# loop through all of the letters in the text
# be sure to convert to lowercase
for c in text.lower():
# if this letter is one we are counting, add 1 to its count
if c in counts:
counts[c] += 1

# return the dictionary
return counts``````

This function takes a set of letters to count and a string. For example, we could call this with:

``vowel_counts = count('aeiou', text)``

In this function we:

• create an empty dictionary
• loop through all of the letters we are counting and initialize their counts to zero
• loop through all of the letters in the text
• if the letter we are looking at is one of the ones we are counting, then add one to its count

Here is a program that uses this function, which you can find in `vowel_counts.py`:

``````import sys

def count(letters, text):
# create an empty dictionary
counts = {}

# loop through all of the letters we are counting
# and initialize their counts to zero
for letter in letters:
counts[letter] = 0

# loop through all of the letters in the text
# be sure to convert to lowercase
for c in text.lower():
# if this letter is one we are counting, add 1 to its count
if c in counts:
counts[c] += 1

# return the dictionary
return counts

def main(text):
# count how many times each vowel occurs in the text
vowel_counts = count('aeiou', text)
# print out the dictionary
print(vowel_counts)

if __name__ == '__main__':
main(sys.argv[1])``````

We can test this program by giving it another string:

``````% python vowel_counts.py "I am going to double major in Computer Science and Journalism"
{'a': 4, 'e': 4, 'i': 5, 'o': 6, 'u': 3}``````

Looks like it works!

## States

To practice this, we are going to write a program that has a group of people enter their home state or country. After all of the places are entered, the program then prints out how many people are from each place. For example:

``````% python place_count.py
State or Country: Delaware
State or Country: Montana
State or Country: Pakistan
State or Country: Iran
State or Country: Montana
State or Country: Pakistan
State or Country: India
State or Country: California
State or Country:
{'Delaware': 1, 'Montana': 2, 'Pakistan': 2, 'Iran': 1, 'India': 1, 'California': 1}``````

Here is a function to do compute the dictionary:

``````def get_places():
# create an empty dictionary
places = {}
while True:
# get a place
place = input('State or Country: ')
# break if we are done
if not place:
break
# if this place is not in the dictionary yet
# then initialize this place to zero
if place not in places:
places[place] = 0
# increment this place by one
# this doesn't cause an error because we were sure
# to initialize it to zero above
places[place] += 1

# return the dictionary
return places``````

Notice that this follows a similar pattern as when we counted values. However, the difference here is that we don’t know the keys for the dictionary in advance. If we are counting vowels, the keys are always “aeiou”. But for this problem, the keys are whatever states and countries people enter.

We can handle this problem by using this code:

``````if place not in places:
places[place] = 0``````

Whenever we find a place that is not in the dictionary, then we initailize its value to zero.

Here is a complete program using this function, which you can find in `places_count.py`:

``````def get_places():
# create an empty dictionary
places = {}
while True:
# get a place
place = input('State or Country: ')
# break if we are done
if not place:
break
# if this place is not in the dictionary yet
# then initialize this place to zero
if place not in places:
places[place] = 0
# increment this place by one
# this doesn't cause an error because we were sure
# to initialize it to zero above
places[place] += 1

# return the dictionary
return places

def main():
places = get_places()
print(places)

if __name__ == '__main__':
main()``````

## Counting words

For this program, we are going to count all times each word occurs in a file. But we need to ignore both case and punctuation. This is important because if the file contains:

``````Twinkle, twinkle, little star,
how I wonder, what you are!
Up above the world so high,
like a diamond in the sky.
Twinkle, twinkle, little star,
how I wonder what you are!``````

Then we need “Twinkle” to be counted the same as “twinkle”, and we need to remove commas and exclamation points.

### Reading the file as a long string

When we want to count words in a file, we could read the file as a list of lines, like we usually do, and then split each line into words. However, a simpler thing to do is to read the file as one long string. Then you can split this long string into words all at once using `split()`.

Here is how to read a file as one long string:

``````def readfile(filename):
with open(filename) as file:

This function uses `file.read()` instead of `file.readlines()`:

• `file.read()` — read an entire file and return it as one long string:
``'Line one\n, Line two\n, Line three\n'``
• `file.readlines()` — read an entire file and return it as a list of strings, one per line in the file:
``['Line one\n', 'Line two\n', 'Line three\n']``

### Removing punctuation

To remove punctuation, we can use `strip()`. Normally, `strip()` removes all leading and trailing white space. But if we give it a string as an argument, then we can remove all trailing and leading characters that are in the string.

For example, this will remove just exclamation points and question marks:

``word = word.strip('!?')``

And this will remove periods, commans, exclamation points, question marks, single quotes, and double quotes:

``word = word.strip('.,!?\'"')``

Notice that to include a single quote inside of the string we have to preface it with a backslash, like this: `\'`

### A function to count words

Here is a function that will count words in a long string (containing multiple lines):

``````def count_words(content):
"""Count the number of each word in content.
Ignore casing and punctuation."""
# create an empty dictionary
counts = {}
# loop through all of the words, first converting to lowercase
# and then splitting them using white space
for word in content.lower().split():
# strip any leading or trailing punctuation from the word
word = word.strip('!?,."\'')
# if the word is not in the dictionary,
# initialize an entry to zero
if word not in counts:
counts[word] = 0
# increment the count by one for this word
counts[word] += 1
# return the dictionary
return counts``````

The two important things to notice here are:

• we convert the content to lowercase using `lower()` before we split it into words using `split()`
• we remove all of the punctuation using `strip()`

Otherwise, this follows the same pattern as counting places.

The file `count_words.py` contains a complete program:

``````import sys

with open(filename) as file:

def count_words(content):
"""Count the number of each word in content.
Ignore casing and punctuation."""
# create an empty dictionary
counts = {}
# loop through all of the words, first converting to lowercase
# and then splitting them using white space
for word in content.lower().split():
# strip any leading or trailing punctuation from the word
word = word.strip('!?,."\'')
# if the word is not in the dictionary,
# initialize an entry to zero
if word not in counts:
counts[word] = 0
# increment the count by one for this word
counts[word] += 1
# return the dictionary
return counts

def main(filename):
# read the file
You can run this using the file `twinkle.txt`:
``````python count_words.py twinkle.txt