Python string substring, contains, find and index comparison

Total
0
Shares

To get the substring within a string, we can use different functions like contains, find, index and in. In this article we will look at the code examples of all these methods and calculate the time complexity. This way you will know which method is good for overall code performance.

Code Example 1 – Using in

def in_(haystack, needle):
    return needle in haystack

print(in_("Captain America is the first Avenger", "first"))

In this code example, we have defined a python function in_ which is accepting two parameters – haystack and needle. It is checking if needle is in the haystack. Notice that we have defined our function with name in_ and not in. This is because in is a reserved keyword and we should not use it for naming our functions. Learn more about it here and here.

Code Example 2- Using contains

def contains_(haystack, needle):
    return haystack.__contains__(needle)

print(contains_("Captain America is the first Avenger", "first"))

Here we are using Python function __contains__.

Code Example 3 – Using find

def find_(haystack, needle):
    return haystack.find(needle) != -1

print(find_("Captain America is the first Avenger", "first"))

Code Example 4 – Using index

def index_(haystack, needle):
    try:
        haystack.index(needle)
    except ValueError:
        return False
    else:
        return True

print(index_("Captain America is the first Avenger", "first"))

    Tweet this to help others

Now lets compare all these methods and get their time performance –

import timeit
import json

def in_(haystack, needle):
    return needle in haystack
    
def contains_(haystack, needle):
    return haystack.__contains__(needle)
    
def find_(haystack, needle):
    return haystack.find(needle) != -1
    
def index_(haystack, needle):
    try:
        haystack.index(needle)
    except ValueError:
        return False
    else:
        return True
        
perf_dict = {
'in:True': min(timeit.repeat(lambda: in_('Captain America is the first Avenger', 'first'), number=1000)),
'in:False': min(timeit.repeat(lambda: in_('Captain America is the first Avenger', 'second'), number=1000)),
'__contains__:True': min(timeit.repeat(lambda: contains_('Captain America is the first Avenger', 'first'), number=1000)),
'__contains__:False': min(timeit.repeat(lambda: contains_('Captain America is the first Avenger', 'second'), number=1000)),
'find:True': min(timeit.repeat(lambda: find_('Captain America is the first Avenger', 'first'), number=1000)),
'find:False': min(timeit.repeat(lambda: find_('Captain America is the first Avenger', 'second'), number=1000)),
'index:True': min(timeit.repeat(lambda: index_('Captain America is the first Avenger', 'first'), number=1000)),
'index:False': min(timeit.repeat(lambda: index_('Captain America is the first Avenger', 'second'), number=1000)),
}

print(json.dumps(perf_dict, indent=2))

For calculating the minimum time required by any function we are using timeit.repeat. When I run the code, I got this output –

{
  "in:True": 0.00028002727776765823,
  "in:False": 0.0002694167196750641,
  "__contains__:True": 0.0004137204959988594,
  "__contains__:False": 0.00040215253829956055,
  "find:True": 0.00045281555503606796,
  "find:False": 0.000452638603746891,
  "index:True": 0.0004562549293041229,
  "index:False": 0.000827358104288578
}

From the benchmark, you can see that in function is nearly twice faster than __contains__, find and index.

Live Demo

Open Live Demo