Alberto Dorito February 2016

How can I remove punctuation from a string in Python?

I am trying to remove all punctuation from A string but whenever I run my program nothing happens... This is my code:

#OPEN file (a christmas carol)
inputFile = open('H:\Documents\Computing\GCSE COMPUTING\Revision\Practice Prog/christmascarol.txt')
carolText = inputFile.read()



#CONVERT everything into lowercase
for line in carolText:
       carolTextlower = carolText.lower()

#REMOVE punctuation (Put a space instead of a hyphened word or apostrophe)
import string
exclude = set(string.punctuation)
noPunctu = carolTextlower.join(ch for ch in carolTextlower if ch not in exclude)
print(noPunctu)

When I run my program, nothing appears

Answers


Pranav Waila February 2016

Check with the following code:

import string

inputFile = open('H:\Documents\Computing\GCSE COMPUTING\Revision\Practice Prog/christmascarol.txt')
carolText = inputFile.read()

for c in string.punctuation:
    carolText=carolText.replace(c,"")

carolText


Pouria Hadjibagheri February 2016

Here is how you can open a file, replace a certain character in it, and write everything in a new file again.

to_replace = '-'  # Hyphen
replace_by = ' '  # Space

# Reading the file to be modified.
with open('file.txt', 'r') as file:
    # Modifying the contents as the file is being read.
    new_file = [line.replace(to_replace, replace_by) for line in file]

# Writing the contents, both modified and untouched ones, in a new file. 
with open('file_modified.txt', 'w') as file:
    for item in new_file:
        print(item, file=file, end='\n')


Martin Evans February 2016

This could be done using Python's translate function. The code makes a table to map any uppercase character into its matching lowercase character and also converts any punctuation character into a space. This is done in a single call for the whole of the text, so it is very fast:

import string

def process_text(s):
    return s.translate(
        str.maketrans(
            string.punctuation + string.ascii_uppercase, 
            " " * len(string.punctuation) + string.ascii_lowercase)).replace("  ", " ")

with open(r'H:\Documents\Computing\GCSE COMPUTING\Revision\Practice Prog/christmascarol.txt') as inputFile:
    print(process_text(inputFile.read()))


PM 2Ring February 2016

Here's a repaired version of your code.

import string

#OPEN file (a christmas carol)
inputFile = open(r'H:\Documents\Computing\GCSE COMPUTING\Revision\Practice Prog/christmascarol.txt')
carolText = inputFile.read()
inputFile.close()

#CONVERT everything into lowercase
carolTextlower = carolText.lower()

#REMOVE punctuation 
exclude = set(string.punctuation)
noPunctu = ''.join(ch for ch in carolTextlower if ch not in exclude)
print(noPunctu)

The usual Python convention is to put import statements at the top of the script so they're easy to find.

Note that I used a raw string (indicated by the r before the opening quote mark) for the file name. It's not strictly necessary here, but it prevents backslash sequences in Windows paths from being interpreted as escape sequences. Eg in 'H:\Documents\new\test.py' the \n would be interpreted as a newline character and the \t would be interpreted as a tab character.

You really should close a file after you've finished reading (or writing) it. However, it's better to use the with keyword to open files: that ensures the file gets closed properly even if there's an error. Eg,

filename = r'H:\Documents\Computing\GCSE COMPUTING\Revision\Practice Prog/christmascarol.txt'
with open(filename) as inputFile:
    carolText = inputFile.read()

Post Status

Asked in February 2016
Viewed 2,119 times
Voted 5
Answered 4 times

Search




Leave an answer