Python CSV - Need to Group and Calculate values based on one key
I have a simple 3 column csv file that i need to use python to group each row based on one key, then average the values for another key and return them. File is standard csv format, set up as so;
ID, ZIPCODE, RATE
1, 19003, 27.50
2, 19003, 31.33
3, 19083, 41.4
4, 19083, 17.9
5, 19102, 21.40
So basically what I need to do is calculate the average rate col[2] for each unique zipcode col[1] in that file and return the results. So get average rate for all records in 19003, 19083, and so on.
I've looked at using csv module and reading the file into a dictionary, then sorting the dict based on unique values in the zipcode col but can't seem to make any progress.
Any help/suggestions appreciated.
Asked by: Brooke175 | Posted: 27-01-2022
Answer 1
I've documented some steps to help clarify things:
import csv
from collections import defaultdict
# a dictionary whose value defaults to a list.
data = defaultdict(list)
# open the csv file and iterate over its rows. the enumerate()
# function gives us an incrementing row number
for i, row in enumerate(csv.reader(open('data.csv', 'rb'))):
# skip the header line and any empty rows
# we take advantage of the first row being indexed at 0
# i=0 which evaluates as false, as does an empty row
if not i or not row:
continue
# unpack the columns into local variables
_, zipcode, level = row
# for each zipcode, add the level the list
data[zipcode].append(float(level))
# loop over each zipcode and its list of levels and calculate the average
for zipcode, levels in data.iteritems():
print zipcode, sum(levels) / float(len(levels))
Output:
19102 21.4
19003 29.415
19083 29.65
Answered by: Lily714 | Posted: 28-02-2022
Answer 2
Usually if I have to do complicate elaboration I use csv to load the rows in a table of a relational DB (sqlite is the fastest way) then I use the standard sql methods to extract data and calculate average values:
import csv
from StringIO import StringIO
import sqlite3
data = """1,19003,27.50
2,19003,31.33
3,19083,41.4
4,19083,17.9
5,19102,21.40
"""
f = StringIO(data)
reader = csv.reader(f)
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table data (ID text, ZIPCODE text, RATE real)''')
conn.commit()
for e in reader:
e[2] = float(e[2])
c.execute("""insert into data
values (?,?,?)""", e)
conn.commit()
c.execute('''select ZIPCODE, avg(RATE) from data group by ZIPCODE''')
for row in c:
print row
Answered by: Arthur446 | Posted: 28-02-2022
Similar questions
python - calculate array values input from first row and first column
I would like to calculate matrix values using a separate function and taking the first column and first row as input.
I would like to receive advice on optimising the code below:
#imports
import numpy as np
import pandas as pd
#numpy variant
#creation of sample matrix
x_range = range(-180, -80, 20)
y_range = range(5, 30, 5)
ma = np.zeros(shape=(6,6))
ma[0,1:]...
Use python for loop to get txt values and calculate the sum of values
my program is after read the txt file, it could display the average values.
First it is the apart of text file format, it has more lines here.
Timestamp CounterSamples
--------- --------------
3/15/2017 2:01:44 PM \\ins1617group15s\processor(_total)\% processor time
...
python - How can i calculate the sum of the values in a field less than a certain value
I have a CSV file separated by commas. I need to read the file, determine the sum of the values in the field [reading] less than (say 406.2).
My code so far is as follows:
myfile = open('3517315a.csv','r')
myfilecount = 0
linecount = 0
firstline = True
for line in myfile:
if firstline:
firstline = False
continue
fields = line.split(',')
linecount += 1
count = i...
python - Split rows and calculate new values in pandas
Imagine I have this dataframe:
df = pd.DataFrame([["a", 0], ["b,c", 2]], columns = ["name", "value"])
Which looks like this:
name value
0 a 0
1 b,c 2
When there is a comma in column name, I want to split that row and distribute the number from value equally between the new rows. So, the result must be:
python - I want use the values in my dict., too calculate - but it doesn't work
I have the following code:
mydict = {"test1": (2,3), "test2": (1)}
for key,value in mydict.items():
mydict[key] = 1/(1+value)
print(mydict)
But I get this error:
"TypeError: unsupported operand type(s) for +: 'int' and 'tuple'"
The thing that I want is this:
For test1 that contains of two values, I want the following:
1/(a+1)...
python - Calculate Color Between 2 RGB Values Based on Int Value Between 0-500
I don't know if this is the right place to ask this, so if it is, good, if not, maybe I can be directed to the right place?
Anyway, I am trying to take 2 user inputs that are RGB values in tuple form. Then, given an integer value between 0 and 500, I want to calculate an in-between RGB tuple. So, a much lower integer value would lead to an RGB tuple that is closer to the FIRST user-given RGB tuple, and a higher integer v...
python - Calculate mean cell values over different files
From about 50000 unstructured text files with about 1 million values (all numbers) I need to calculate the mean value for each cell the mean value over all 50.000 files.
The structure of the files is for example
4.7 3.9 5.9 6.2 6.6 6.6 6.5 4.7 5.5 11.2
21.9 12.4 5.6 4.5 5.8 6.7 5.4 3.6 3.9 0.7 0.8
and I need the mean over all files , for each cell position.
Python : Calculate values and send in email
UPDATE : I have corrected my code and below is working fine as expected
Basically i need an output like below in mail.
I achieved this. but need to know if any efficient code then below one.
name 5001 5010 9000 4 %
name 5002 5010 9000 4 %
name 5003 5010 9000 4 %
name 5004 5010 9000 4 %
Storing the values in...
python - How to calculate mean values of every age as per on another column values in Pandas
I'm looking for a way to get the average marital status of each age:
For example, for people who are 34 years old the median martial status is Single,
for 35 it is Single also and so on.
I group the dataframe
df_edad_estado_civil.groupby(['Estado_Civil', 'Edad'], as_index=False).mean()
but it issues errors like:
DataError: No numeric types to aggregate
python - How to calculate mean values from a column from a range of another column
I have a dataframe with two columns Distance(m) and height(m).
I want to calculate the max, min and average height values from an interval of 0.04439 m of distance.
Distance is a continuous series from 0 to 0.81m each 0.00222m with a total of 403 values length.
The aim is to extract 18 values (max min average) of Height from 18 intervals each 0.0439m distance (the continuous distance series between...
python - Calculate the mean of several values within the same row in a column
I have this df with several rows that have more than one value in the Percent column. Some rows have only one single value and some rows will have 2 - 3 values. I would like to calculate the mean for rows with more than one value.
location Ethnic Percent
0 Beaches-East York English , Scottish , Canadian 19.7 , 18.9 , 24.2
Since on...
python - How to calculate mean of every three values of a list
Closed. This question needs details or clarity. It ...
python - Add values from another array to calculate cumsum in numpy
Is there a way of creating this type of cumulative sum in numpy without using a for loop?
y[0] = x[0]
y[1] = y[0] + x[1]
y[2] = y[1] + x[2]
I'm confused about how to use the cumsum function in order to calculate the cumulative sum of array y while adding the values from array x.
python - calculate with values from two different dicts and return keys
I'm trying to calculate the cosine similarity between all values of dict1 and all values of dict2.
when i'm done, i want to return the keys of the dicts where the similarity is high. To do that, I want to save the results of cosine similarity in a similarity dict.
This is my attempt:
similarity_dictionary = {}
for x in dict1:
for y in dict2:
for x_key, x_val in dict1.items():
for y_k...
python - Calculate Mean Values in 2D Array Using Numpy
I have two 2D arrays with similar size of 67x30831. I was wondering how do I average the rows in order to have only one 2D array with size 67x30831 using numpy. Thank you for the help.
python - Calculate time between two different values in the same pandas column
I have data that look like the following
Device Time Condition
D1 01/11/2019 00:00 issue
D1 01/11/2019 00:15 issue
D1 01/11/2019 00:30 issue
D1 01/11/2019 00:45 issue
D1 01/11/2019 01:00 issue
D1 01/11/2019 01:15 Resolved
D1 01/11/2019 01:30 Resolved
D2 01/11/2019 01:45 issue
D2 01/11/2019 02:00 Resolved
D1 01/11/2019 01:45 issue
D1 01/11/2019 02:00 Reso...
python - My code doesn't calculate the last values in a loop
Closed. This question is not reproducible or was caused...
python - What am I doing wrong to calculate V from numpy array? I can not get array with "V" values
import numpy as np
a = 3.9 #m
L = 7.8 #m
x = np.arange(0,L+0.1,0.1)
def fun_v():
for value in x:
if 0 < x < a :
V= P*(1-(a/L))
if a < x < L :
V= -P*(a/L)
return (V)
V = fun_v()
print('V =',V)
#ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
python - How to Field Calculate only the values that aren't null?
I'm using ArcPro, the field calculator. I want to populate a hydrant layer with hydrant inspection data.
I have two fields from a joined table: field 1 and field 2. I want to populate field 1 with the values from field 2, but only if the values ARE NOT Null. Basically I want to preserve the values in field 1 if the corresponding value in field 2 is Null. I've been trying slight variations of the python code below. But i...
python - Calculate values with Tif file
I have interpolated data and would now like to add the height data from a Tif. file so that I can change the interpolated data depending on the altitude. However, I have problems to install rasterio and would therefore like to work with GDAL only. Does anyone have a sample code on how to do this?
How Python calculate number?
This question already has answers here:
How to calculate a mod b in Python?
Is there a modulo function in the Python math library?
Isn't 15 % 4, 3? But 15 mod 4 is 1, right?
python - How to calculate a date back from another date with a given number of work days
I need to calculate date (year, month, day) which is (for example) 18 working days back from another date. It would be enough to eliminate just weekends.
Example: I've got a date 2009-08-21 and a number of 18 workdays as a parameter, and correct answer should be 2009-07-27.
thanks for any help
python - Calculate time between time-1 to time-2?
enter time-1 // eg 01:12
enter time-2 // eg 18:59
calculate: time-1 to time-2 / 12
// i.e time between 01:12 to 18:59 divided by 12
How can it be done in Python. I'm a beginner so I really have no clue where to start.
Edited to add: I don't want a timer. Both time-1 and time-2 are entered by the user manually.
Thanks in advance for your help.
python - Calculate Matrix Rank using scipy
I'd like to calculate the mathematical rank of a matrix using scipy. The most obvious function numpy.rank calculates the dimension of an array (ie. scalars have dimension 0, vectors 1, matrices 2, etc...). I am aware that the numpy.linalg.lstsq module has this capability, but I was wondering if such a fundamental...
python - How do you calculate the area of a series of random points?
So I'm working on a piece of code to take positional data for a RC Plane Crop Duster and compute the total surface area transversed (without double counting any area). I cannot figure out how to calculate the area for a given period of operation.
Given the following Table Calculate the area the points cover.
x,y
1,2
1,5
4,3
6,6
3,4
3,1
Any Ideas? I've browsed Greens Theorem and I'...
EOL stops python on Calculate Field
Would anyone be able to help me modify these scripts to ignore the error and continue running ? I just need to figure out how to make the script skip over these errors and finish the rest of the lines.
Here is the full Python script:
# Import system modules
import sys, string, os, arcgisscripting
# Create the geoprocessor object
gp = arcgisscripting.create(9.3)
gp.OverWriteOutput = True
# Set the...
python calculate mouse speed
i am using the following method in python to get the X,Y corordinates at any given this
data = display.Display().screen().root.query_pointer()._data
x = data["root_x"]
y = data["root_y"]
z = time.time()
I want to calculate the mouse speed over a given time, is there any way i can calculate and show mouse speed in miles per hour???
krisdigitx
i now managed to fix th...
python - How to calculate slope in numpy
If I have an array of 50 elements, how would I calculate a 3 period slope and a 5 period slope?
The docs dont add much.....
>>> from scipy import stats
>>> import numpy as np
>>> x = np.random.random(10)
>>> y = np.random.random(10)
>>> slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
Would this work?
def slo...
python - How to calculate next Friday?
How can I calculate the date of the next Friday?
python - What's the best way to calculate a 3D (or n-D) centroid?
As part of a project at work I have to calculate the centroid of a set of points in 3D space. Right now I'm doing it in a way that seems simple but naive -- by taking the average of each set of points, as in:
centroid = average(x), average(y), average(z)
where x, y and z are arrays of floating-point numbers. I seem to recall that there is a way to get...
How Python calculate number?
This question already has answers here:
python - Calculate score in a pyramid score system
I am trying to calculate gamescores for a bunch over users and I haven't really got it yet. It is a pyramid game where you can invite people, and the people you invite is placed beneth you in the relations tree.
So if i invite X and X invites Y i get kickback from both of them. Let's say 10%^steps...
So from X i get 10% of his score and 1% from Y, and X get 10% from Y.
So to calculate this i was thi...
How to calculate a mod b in Python?
Is there a modulo function in the Python math library?
Isn't 15 % 4, 3? But 15 mod 4 is 1, right?
To calculate the sum of numbers in a list by Python
My data
466.67
465.56
464.44
463.33
462.22
461.11
460.00
458.89
...
I run in Python
sum(/tmp/1,0)
I get an error.
How can you calculate the sum of the values by Python?
python - How to calculate a date back from another date with a given number of work days
I need to calculate date (year, month, day) which is (for example) 18 working days back from another date. It would be enough to eliminate just weekends.
Example: I've got a date 2009-08-21 and a number of 18 workdays as a parameter, and correct answer should be 2009-07-27.
thanks for any help
python - How to calculate the scrape URL for a torrent
I've read the Bit-torrent specification and done a number of searches, trying to find out how I can get the seeds/peers/downloaded data from a torrent tracker (using Python). I can calculate the info hash from a Torrent no problem, which matches up with the info hash given by various working torrent applications.
However, when I try to get the information from the tracker I either timeout (the tracker is working) o...
datetime - How to use Python to calculate time
I want to write python script that acts as a time calculator.
For example:
Suppose the time is now 13:05:00
I want to add 1 hour, 23 minutes, and 10 seconds to it.
and I want to print the answer out.
How do I do this in Python?
What if date is also involved?
c# - Calculate percent at runtime
I have this problem where I have to "audit" a percent of my transtactions.
If percent is 100 I have to audit them all, if is 0 I have to skip them all and if 50% I have to review the half etc.
The problem ( or the opportunity ) is that I have to perform the check at runtime.
What I tried was:
audit = 100/percent
So if percent is 50
audit = 100 /...
python - Calculate time between time-1 to time-2?
enter time-1 // eg 01:12
enter time-2 // eg 18:59
calculate: time-1 to time-2 / 12
// i.e time between 01:12 to 18:59 divided by 12
How can it be done in Python. I'm a beginner so I really have no clue where to start.
Edited to add: I don't want a timer. Both time-1 and time-2 are entered by the user manually.
Thanks in advance for your help.
Still can't find your answer? Check out these communities...
PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python