In Python, is there a concise way of comparing whether the contents of two text files are the same?

I don't care what the differences are. I just want to know whether the contents are different.


Asked by: Darcy441 | Posted: 24-09-2021






Answer 1

The low level way:

from __future__ import with_statement
with open(filename1) as f1:
   with open(filename2) as f2:
      if f1.read() == f2.read():
         ...

The high level way:

import filecmp
if filecmp.cmp(filename1, filename2, shallow=False):
   ...

Answered by: Lenny542 | Posted: 25-10-2021



Answer 2

If you're going for even basic efficiency, you probably want to check the file size first:

if os.path.getsize(filename1) == os.path.getsize(filename2):
  if open('filename1','r').read() == open('filename2','r').read():
    # Files are the same.

This saves you reading every line of two files that aren't even the same size, and thus can't be the same.

(Even further than that, you could call out to a fast MD5sum of each file and compare those, but that's not "in Python", so I'll stop here.)

Answered by: Kellan795 | Posted: 25-10-2021



Answer 3

This is a functional-style file comparison function. It returns instantly False if the files have different sizes; otherwise, it reads in 4KiB block sizes and returns False instantly upon the first difference:

from __future__ import with_statement
import os
import itertools, functools, operator
try:
    izip= itertools.izip  # Python 2
except AttributeError:
    izip= zip  # Python 3

def filecmp(filename1, filename2):
    "Do the two files have exactly the same contents?"
    with open(filename1, "rb") as fp1, open(filename2, "rb") as fp2:
        if os.fstat(fp1.fileno()).st_size != os.fstat(fp2.fileno()).st_size:
            return False # different sizes ∴ not equal

        # set up one 4k-reader for each file
        fp1_reader= functools.partial(fp1.read, 4096)
        fp2_reader= functools.partial(fp2.read, 4096)

        # pair each 4k-chunk from the two readers while they do not return '' (EOF)
        cmp_pairs= izip(iter(fp1_reader, b''), iter(fp2_reader, b''))

        # return True for all pairs that are not equal
        inequalities= itertools.starmap(operator.ne, cmp_pairs)

        # voilà; any() stops at first True value
        return not any(inequalities)

if __name__ == "__main__":
    import sys
    print filecmp(sys.argv[1], sys.argv[2])

Just a different take :)

Answered by: Melissa999 | Posted: 25-10-2021



Answer 4

Since I can't comment on the answers of others I'll write my own.

If you use md5 you definitely must not just md5.update(f.read()) since you'll use too much memory.

def get_file_md5(f, chunk_size=8192):
    h = hashlib.md5()
    while True:
        chunk = f.read(chunk_size)
        if not chunk:
            break
        h.update(chunk)
    return h.hexdigest()

Answered by: Dainton882 | Posted: 25-10-2021



Answer 5

I would use a hash of the file's contents using MD5.

import hashlib

def checksum(f):
    md5 = hashlib.md5()
    md5.update(open(f).read())
    return md5.hexdigest()

def is_contents_same(f1, f2):
    return checksum(f1) == checksum(f2)

if not is_contents_same('foo.txt', 'bar.txt'):
    print 'The contents are not the same!'

Answered by: Kristian708 | Posted: 25-10-2021



Answer 6


f = open(filename1, "r").read()
f2 = open(filename2,"r").read()
print f == f2


Answered by: Oliver820 | Posted: 25-10-2021



Answer 7

For larger files you could compute a MD5 or SHA hash of the files.

Answered by: Alford145 | Posted: 25-10-2021



Answer 8

from __future__ import with_statement

filename1 = "G:\\test1.TXT"

filename2 = "G:\\test2.TXT"


with open(filename1) as f1:

   with open(filename2) as f2:

      file1list = f1.read().splitlines()

      file2list = f2.read().splitlines()

      list1length = len(file1list)

      list2length = len(file2list)

      if list1length == list2length:

          for index in range(len(file1list)):

              if file1list[index] == file2list[index]:

                   print file1list[index] + "==" + file2list[index]

              else:                  

                   print file1list[index] + "!=" + file2list[index]+" Not-Equel"

      else:

          print "difference inthe size of the file and number of lines"

Answered by: Julian574 | Posted: 25-10-2021



Answer 9

Simple and efficient solution:

import os


def is_file_content_equal(
    file_path_1: str, file_path_2: str, buffer_size: int = 1024 * 8
) -> bool:
    """Checks if two files content is equal
    Arguments:
        file_path_1 (str): Path to the first file
        file_path_2 (str): Path to the second file
        buffer_size (int): Size of the buffer to read the file
    Returns:
        bool that indicates if the file contents are equal
    Example:
        >>> is_file_content_equal("filecomp.py", "filecomp copy.py")
            True
        >>> is_file_content_equal("filecomp.py", "diagram.dio")
            False
    """
    # First check sizes
    s1, s2 = os.path.getsize(file_path_1), os.path.getsize(file_path_2)
    if s1 != s2:
        return False
    # If the sizes are the same check the content
    with open(file_path_1, "rb") as fp1, open(file_path_2, "rb") as fp2:
        while True:
            b1 = fp1.read(buffer_size)
            b2 = fp2.read(buffer_size)
            if b1 != b2:
                return False
            # if the content is the same and they are both empty bytes
            # the file is the same
            if not b1:
                return True

Answered by: Blake738 | Posted: 25-10-2021



Similar questions

comparing contents of two files using python

I have a file name exclusionlist.txt and i have contents in it like import os import re import subprocess ......and many more I have another file named libraries.txt and the contents of this file are import mymodule import empmodule,os import subprocess import datetime,logging,re .......and...


python - Getting contents of a URL and comparing locally

I'm looking to get a URL (it returns just one line, no html, just plaintext) every 3000sec. I want to compare it against "previous.txt" and call a function if it's different, then write it to previous.txt. If it's same, do nothing. Can someone point me in a place to starting ?I'm not python much before. Thanks.


python - comparing contents of 2 lists of lists

Here's the task I am having trouble with: Given 2 lists of lists, filter them down to only items that have nothing in common. E. g. if inner lists are identical, filter them out. If inner lists have at least one item in common, also filter them out. Note: There is only one level of nesting. The inner lists consist only of strings. I have a solution that works, but it's EXTREMELY messy. Looking for feedback...


Comparing contents of txt files in Python

I am trying to write a program that will open files, read the contents and compare them to other files opened. I need to show if they are Not close enough alike, Similar, or exact copies of each other. I am trying to use the filecmp module, but it's not working for me. Here's what I have so far: import filecmp #Opens selected files file1 = open('file1.txt') file2 = open('file2.txt') #Compares different fi...


list - Comparing tuple contents with int in python

a = [(0, "Hello"), (1,"My"), (3, "Is"), (2, "Name"), (4, "Jacob")] This is an example of a list, but when I try to this this it doesn't work: if time < a[3]: print ("You did it!") The problem is that I can't apparently compare a tuple with an int, but I only want to compare it to the first number in the tuple. How can I do this?


Comparing contents in two csv file line by line using python

Hi I need to do following steps while comparing two csv files using python: 0) open file 1 and file 2 1) read one line from file 1 2) read one line from file 2 3) compare contents in each of line and count number of same and different 4) if contents are different, write contents in file 1 and file 2 in an output file 5) go back to step 1) before reaching the end of file ...


python - Comparing the contents of two lists

I have the following list of lists: winValues = [[0,1,2],[3,4,5],[6,7,8],[0,3,6],[1,4,7],[2,5,8],[0,4,8],[2,4,6]] Lets say the player variables are: [6,3,2,4,2] How can i check if the player has a set of three winning numbers?


python - Comparing all Contents of a list to all lines within a text File

Below I have a list identifiers where values are appended within for loop logic not shown, this part is fine as it's quite basic. The next part is where I am quite confused as to what I am doing wrong, so I open a local text file and readLines here, I use a for loop to iterate through those lines. If any of the lines in the textfile match any of the lines in the identifiers list then I do not want to send an email (...


php - Comparing runtimes

Closed. This question needs to be more focused. It ...


python - Comparing List of Arguments to it self?

Kind of a weird question, but. I need to have a list of strings i need to make sure that every string in that list is the same. E.g: a = ['foo', 'foo', 'boo'] #not valid b = ['foo', 'foo', 'foo'] #valid Whats the best way to go about doing that? FYI, i don't know how many strings are going to be in the list. Also this is a super easy question, but i am just too tired to thi...


Comparing and updating array values in Python

I'm developing a Sirius XM radio desktop player in Python, in which I want the ability to display a table of all the channels and what is currently playing on each of them. This channel data is obtained from their website as a JSON string. I'm looking for the best data structure that would allow the cleanest way to compare and update the channel data. Arrays are problematic because I would want to be able...


python - comparing and sorting array

From two unequal arrays, i need to compare & delete based on the last value of an array. Example: m[0] and n[0] are read form a text file & saved as a array, [0] - their column number in text file. m[0] = [0.00, 1.15, 1.24, 1.35, 1.54, 2.32, 2.85, 3.10, 3.40, 3.80, 4.10, 4.21, 4.44] n[0] = [0.00, 1.12, 1.34, 1.45, 2.54, 3.12, 3.57]


Comparing elements in a list in Python's for -loop

What is wrong in the method end in the code? The method end returns always 1 although it should return 0 with the current data. # return 1 if the sum of four consecutive elements equal the sum over other sum of the other three sums # else return 0 # Eg the current sums "35 34 34 34" should return 0 data = "2|15|14|4|12|6|7|9|8|10|11|5...


Comparing Python nested lists

I have two nested lists, each nested list containing two strings e.g.: list 1 [('EFG', '[3,4,5]'), ('DEF', '[2,3,4]')] and list 2 [('DEF', '[2,3,4]'), ('FGH', '[4,5,6]')] I would like to compare the two lists and recover those nested lists which are identical with each other. In this case only ('DEF','[2,3,4]') would be returned. The lists could get long. Is there an effici...


plone - Comparing list item values to other items in other list in Python

I want to compare the values in one list to the values in a second list and return all those that are in the first list but not in the second i.e. list1 = ['one','two','three','four','five'] list2 = ['one','two','four'] would return 'three' and 'five'. I have only a little experience with python, so this may turn out to be a ridiculous and stupid way to attempt to solve it, but thi...


python - Comparing dicts and update a list of result

I have a list of dicts and I want to compare each dict in that list with a dict in a resulting list, add it to the result list if it's not there, and if it's there, update a counter associated with that dict. At first I wanted to use the solution described at Python : List of dict, if ...


datetime - Comparing a time delta in python

I have a variable which is <type 'datetime.timedelta'> and I would like to compare it against certain values. Lets say d produces this datetime.timedelta value 0:00:01.782000 I would like to compare it like this: #if d is greater than 1 minute if d>1:00: print "elapsed time is greater than 1 minute" I have tried converting...


Comparing two text files in python

I need to compare two files and redirect the different lines to third file. I know using diff command i can get the difference . But, is there any way of doing it in python ? Any sample code will be helpful






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top