String Manipulation in Python: Basic Operations and Slicing

Real-world data is usually messy, and text data is no exception. String manipulation can help clean up data by removing unwanted characters, correcting formatting, and standardizing.

If you eventually move on to more advanced data science work, learning how to work with text data will be an essential tool in your data science toolbox.

Basics

Data Types

The text data type is one of the most commonly used types out there and often called a string, and can be abbreviated as str in Python. Below is how to check the data type of a string:

my_string = "Go Shockers"
print(type(my_string))
<class 'str'>

In python, single quotes work the same as double quotes, and typically it does not matter which one you use, but I recommend staying consistent with one type.

If you are concerned that the data type of the string may not be what is expected, you can set the type explicitly:

my_string = str("Go Shockers")
print(type(my_string))
<class 'str'>

Concatenation of Strings

Concatenation is when you have two or more strings that you want to join into one. You can use + to concatenate them.

city = "Santiago"
country = "Chile"

# Combine the two strings to display the City and Country in the format City, Country
print(city + ", " + country)
Santiago, Chile

Length of String

Use the len() function to return the length of a string:

print("The number of characters in the word Kansas: ", len("Kansas"))
The number of characters in the word Kansas: 6

If we want to grab a specific character by position, you can use [] to get the information:

state = "Kansas"

# To get the third character in the word Kansas
character3 = state[2]

print(character3)
n

We use 2 in the brackets because the positioning for strings and arrays in Python starts at 0, not 1.

Replace part of string

Use the replace() method to replace a part of the string with a different string. For example:

my_string = "April is the cruelest month."

# Replace April with February
my_string.replace('April', 'February')

print(my_string)
"February is the cruelest month."

Other common methods

Some other common methods frequently used in text cleaning are lower(), upper(), and split().

  • .lower(): converts to lowercase
  • .upper(): converts to all uppercase
  • .split(): splits a string into smaller parts, based on a delimiter. By default, it uses white space.

Examples

# Use .lower() to change to all lower case
my_phrase = "Life is a Beach."
my_phrase_lower = my_phrase.lower()
print(my_phrase_lower)

# Use .upper() to convert to all uppercase
my_phrase_upper = my_phrase.upper()
print(my_phrase_upper )

# Use .split() to split into smaller parts based on white space
my_phrase_split = my_phrase.split()
print(my_phrase_split)
life is a beach.

LIFE IS A BEACH.

['Life', 'is', 'a', 'Beach.']

Slicing a string

Slicing is an essential tool in python that you will eventually need to use. The basic syntax is list[start:stop:step], where start is what position you want the slicing to begin, stop the point until you want the slicing to go, but remember that it stops slicing right before the value, so the stop value is not included, and step is if you need to skip positions, but the default is always 1.

Examples

word = "audio"

# To get the first two characters
slice1 = word[:2]
print(slice1)

# The last character
slice2 = word[-1]
print(slice2)

# Skip every other letter
slice3 = word[::2]
print(slice3)
au
o
ado

Reverse string

For basic reversing of string, just use the slice syntax. Below you will see we access the string with [], and then we set the step with -1 in order to go back one character at a time. The default is 1, which goes forward one character at a time.

new_string = "Bubble"

new_string_reversed = new_string[::-1]

print("Original String: ", new_string)
print("Reversed String: ", new_string_reversed)
Original String: Bubble
Reversed String: elbbuB