Find min value in a CSV and print every row that includes it in Python

Thanks so much in advance for any help. I'm trying to write a script that will go through a folder of csv files, find the minimum value in the second column and print every row that contains it. The csv files the script looks through looks like this:

TPN,12010,on this date,25,0.00005047619239909304377497309619
TPN,12011,on this date,23,0.00003797836224092152019127884704
TPN,12012,on this date,78,0.0001130474103447076420049393022
TPN,12020,on this date,27,0.00005671375308512314236202279053
TPN,12021,on this date,60,0.00009856619048244864701475864425

The script looks like this:

import csv
import os

folder = '/Users/Documents/Senior/Thesis/Python/TextAnalysis/datedmatchedngrams2/'

identity = []
for filename in os.listdir (folder):
    with open(filename, 'rb') as inf:
        incsv = csv.reader(inf)
        column = 1               
        datatype = int
        data = (datatype(row[column]) for row in incsv)   
        least_value = min(data)
        print least_value
        for row in incsv:
            if least_value in column[1]:
                identity.append(row)
            else:
                print "No match"
        print identity

The error I get is:

  File "findfirsttrigram.py", line 12, in <module>
    identity.append("a")
NameError: name 'identity' is not defined

I also tried doing it like this:

import csv
import os

folder = '/Users/Documents/Senior/Thesis/Python/TextAnalysis/datedmatchedngrams2/'

for filename in os.listdir (folder):
    with open(filename, 'rb') as inf:
        incsv = csv.reader(inf)
        column = 1               
        datatype = int
        data = (datatype(row[column]) for row in incsv)   
        least_value = min(data)
        print least_value
        for row in incsv:
            if least_value in row:
                print row
            else:
                print "No match"

But that didn't work either. It did not give me an error but it also did not print "No match" so I have no idea where to start. Please help!!


Asked by: Freddie735 | Posted: 30-11-2021






Answer 1

You can do somthing like:

import csv

# for each_file in os.listdir (folder):    
with open(each_file) as f:
    m=min(int(line[1]) for line in csv.reader(f))
    f.seek(0)
    for line in csv.reader(f):
        if int(line[1])==m:
            print line

Answered by: Sawyer401 | Posted: 01-01-2022



Answer 2

The reason why your minimum value is not found is that you convert your column to an int when you are looking for a minimum value, but it still remains a string when you look at it as part of the row you have read. Try changing your code like this:

for row in incsv:
    if int(row[column])==least_value:
        print row
    else:
        print "No match"

Regarding the other error, inside the with clause the global identity appears to be not accessible. You can either reintroduce it with global or not use with clause.

Answered by: Chelsea496 | Posted: 01-01-2022



Answer 3

Ashalynd covered why the value testing would fail. However for the reason that your "No match" statement is never called is because your csv reader can't iterate over the data twice. Take a simple example like this.

with open(filename) as inf:
    incsv = csv.reader(inf)
    total_lines = 0
    for line in incsv:
        total_lines += 1
    print total_lines

    total_lines = 0
    for line in incsv:
        total_lines += 1
    print total_lines

Assuming there are 999 records it will output the following:

999
0

That's because at the end of the first iteration the file objects position is at the end. You need to reset it back to the start of the file to reiterate over the data. inf.seek(0) and the second example should be fine. Pretty sure this will work.

for filename in os.listdir (folder):
    with open(filename, 'rb') as inf:
        incsv = csv.reader(inf)
        column = 1               
        datatype = int
        #This sets the file's current position to the end
        data = (datatype(row[column]) for row in incsv)   
        least_value = min(data)
        print least_value
        #This resets the file's current position to be read again
        inf.seek(0)
        for row in incsv:
            # Check if the value is the same as properly casted data
            if least_value == datatype(row[column]):
                print row
            else:
                print "No match"

Answered by: Emma979 | Posted: 01-01-2022



Similar questions

Large Python Includes

I have a file that I want to include in Python but the included file is fairly long and it'd be much neater to be able to split them into several files but then I have to use several include statements. Is there some way to group together several files and include them all at once?


python - Put bar at the end of every line that includes foo

I have a list with a large number of lines, each taking the subject-verb-object form, eg: Jane likes Fred Chris dislikes Joe Nate knows Jill To plot a network graph that expresses the different relationships between the nodes in directed color-coded edges, I will need to replace the verb with an arrow and place a color code at the end of each line, thus, somewhat simplified: Jane -> Fred r...


python - Global includes in Django

I want to create a module containing different utility functions and classes to use across different apps. It's not going to define any models or views. What's the best way to do this?


python - How do I return a string that includes new lines?

I have a question that requires I use return and I do not know how to return on multiple lines. I need to be able to get an output that looks like this Dear so and so, kjhagjkahgsdhgl;dslhglk jkasdhgjkdshkglhds;g kjdghksadjglkdjslkg kjgahkjsdhlgkdsjg;lsd where the gibberish are strings that I have


Python For Loop includes the end of the range

I'm on checkio.org trying to solve this problem: You are given a two or more digits number N. For this mission, you should find the smallest positive number of X, such that the product of its digits is equal to N. If X does not exist, then return 0. Let's examine the example. N = 20. We can factorize this number as 2*10, but 10 is not a digit. Also we can factorize it as 4*5 or 2*2*5. The smallest number for 2*2*5 ...


python - How to build exe file which includes cv module

I am writing a simple security camera program. I used that code for accessing camera: import cv camera = cv.CaptureFromCAM(0) I tried .py file. It worked. But, when I compiled and ran exe file, I could not access camera. Program didn't react. On .py file, I could choose the camera from a window which has title named 'Video Source'. I think that this problem about accessing ...


python - Find length of a string that includes its own length?

I want to get the length of a string including a part of the string that represents its own length without padding or using structs or anything like that that forces fixed lengths. So for example I want to be able to take this string as input: "A string|" And return this: "A string|11"


python - How to Format Includes List for Py2app?

I have an app organized across several folders: models views controllers data_and_execution. I'm trying to build the app using Py2app, however, I'm getting import errors when running the app such as: "4/27/16 9:52:29.252 PM main[63983]: ImportError: No module named controllers.available_balances_controller" I believe it's because I ha...


How to read a file whose name includes '/' in python?

Now I have a file named Land/SeaMask and I want to open it, but it cannot be recognized as a filename by programme, but as a directory, how to do it?


python - How to change a word if it includes a certain letter

This question already has answers here:


Large Python Includes

I have a file that I want to include in Python but the included file is fairly long and it'd be much neater to be able to split them into several files but then I have to use several include statements. Is there some way to group together several files and include them all at once?


import - Python includes, module scope issue

I'm working on my first significant Python project and I'm having trouble with scope issues and executing code in included files. Previously my experience is with PHP. What I would like to do is have one single file that sets up a number of configuration variables, which would then be used throughout the code. Also, I want to make certain functions and classes available globally. For example, the main file would i...


python - Put bar at the end of every line that includes foo

I have a list with a large number of lines, each taking the subject-verb-object form, eg: Jane likes Fred Chris dislikes Joe Nate knows Jill To plot a network graph that expresses the different relationships between the nodes in directed color-coded edges, I will need to replace the verb with an arrow and place a color code at the end of each line, thus, somewhat simplified: Jane -> Fred r...


python - How to check if phone number entered by user includes country code?

Is there an easy way to check whether a phone number entered by the user includes country code and to validate that the number is correct? I don't use any specific formats, the number itself must be only digits, no ('s, -'s and the like. Is such validation possible without asking user for a country? The trick is that I want to work with all numbers world-wide. I guess it can't be done with regex (googled a bit and...


python - PyCUDA: C/C++ includes?

Something that isn't really mentioned anywhere (at least that I can see) is what library functions are exposed to inline CUDA kernels. Specifically I'm doing small / stupid matrix multiplications that don't deserve to be individually offloaded to the GPU but am offloading a larger section of the algorithm which includes this multiplication. Noone ever liked using their own linalg functions since someone has always ...


python - Something wrong without any error - Includes Tkinter

I'm not getting any error but the code doesn't do what I want so there must be somewhere in the code where I have made a mistake. What I want to do is if the words match then the words must be a pair and the two chosen cells should remain "self.hidden = False" and therefore the cells should still show the words behind the two cells. Else if the words doesn't match then the cells should be "self.hidden = True" and the two c...


python - Global includes in Django

I want to create a module containing different utility functions and classes to use across different apps. It's not going to define any models or views. What's the best way to do this?


python - my post method returns (u'') and django saves includes the (u'') string when saving it

This is how I retrieve the post data from the webpage. The person models can be saved but it includes the "(u'')" string. For example if change the firstname to "Alex", it gets the raw value u('Alex') and saves it. def submit_e(req, person_id=None): if(req.POST): try: person_id = req.POST['driver'] person = Person.objects.get(pk=person_id) ...


python - How do I return a string that includes new lines?

I have a question that requires I use return and I do not know how to return on multiple lines. I need to be able to get an output that looks like this Dear so and so, kjhagjkahgsdhgl;dslhglk jkasdhgjkdshkglhds;g kjdghksadjglkdjslkg kjgahkjsdhlgkdsjg;lsd where the gibberish are strings that I have


python - Celery beat queue includes obsolete tasks

I'm using periodic celery tasks with Django. I used to have the following task in my app/tasks.py file: @periodic_task(run_every=timedelta(minutes=2)) def stuff(): ... But now this task has been removed from my app/tasks.py file. However, I keep seeing call to this task in my celery logs: [2013-05-21 07:08:37,963: ERROR/MainProcess] Received unregistered task of type u'ap...






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top