fast filter method in python

I want to filter two list with any fastest method in python script. I have used the built-in filter() method for this purpose. but it is quite slow and taking too much time because I have very big list, I think more than 5 million item in each list or may be more. I do not know how I will make it. Please if anybody have idea or write small function for it.


Asked by: Elise626 | Posted: 05-10-2021






Answer 1

Maybe your lists are too large and do not fit in memory, and you experience thrashing. If the sources are in files, you do not need the whole list in memory all at once. Try using itertools, e.g.:

from itertools import ifilter

def is_important(s):
   return len(s)>10

filtered_list = ifilter(is_important, open('mylist.txt'))

Note that ifilter returns an iterator that is fast and memory efficient.

Generator Tricks is a tutorial by David M. Beazley that teaches some interesting uses for generators.

Answered by: Brad112 | Posted: 06-11-2021



Answer 2

If you can avoid creating the lists in the first place, you'll be happier.

Rather than

aBigList = someListMakingFunction()
filter( lambda x:x>10, aBigList )

You might want to look at your function that makes the list.

def someListMakingGenerator( ):
    for x in some source:
        yield x

Then your filter doesn't involve a giant tract of memory

def myFilter( aGenerator ):
    for x in aGenerator:
        if x > 10: 
            yield x

By using generators, you don't keep much stuff in memory.

Answered by: Roland381 | Posted: 06-11-2021



Answer 3

I guess filter() is as fast as you can possibly get without having to code the filtering function in C (and in that case, you better code the whole filtering process in C).

Why don't you paste the function you are filtering on? That might lead to easier optimizations.

Read this about optimization in Python. And this about the Python/C API.

Answered by: Sydney608 | Posted: 06-11-2021



Answer 4

Before doing it in C, you could try numpy. Perhaps you can turn your filtering into number crunching.

Answered by: David920 | Posted: 06-11-2021



Answer 5

Filter will create a new list, so if your original is very big, you could end up using up to twice as much memory. If you only need to process the results iteratively, rather than use it as a real random-access list, you are probably better off using ifilter instead. ie.

for x in itertools.ifilter(condition_func, my_really_big_list):
    do_something_with(x)

Other speed tips are to use a python builtin, rather than a function you write yourself. There's a itertools.ifilterfalse specifically for the case where you would otherwise need to introduce a lambda to negate your check. (eg "ifilter(lambda x: not x.isalpha(), l)" should be written "ifilterfalse(str.isalpha, l)")

Answered by: Jack997 | Posted: 06-11-2021



Answer 6

It may be useful to know that generally a conditional list comprehension is much faster than the corresponding lambda:

>>> import timeit
>>> timeit.Timer('[x for x in xrange(10) if (x**2 % 4) == 1]').timeit()
2.0544309616088867
>>> timeit.f = lambda x: (x**2 % 4) == 1
timeit.Timer('[x for x in xrange(10) if f(x)]').timeit()
>>> 
3.4280929565429688

(Not sure why I needed to put f in the timeit namespace, there. Haven't really used the module much.)

Answered by: Kelsey191 | Posted: 06-11-2021



Similar questions

CQL filter method - python

I have the following CQL table. session = columns.Date(primary_key=True, partition_key=True, required=True) entity = columns.Text(primary_key=True, partition_key=True, required=True) broker = columns.Text(primary_key=True, required=True) prof = columns.Text(primary_key=True, required=True) prod = columns.Text(primary_key=True, required=True) .... Reading documentation from cql engine ther...


CQL filter method - python

I have the following CQL table. session = columns.Date(primary_key=True, partition_key=True, required=True) entity = columns.Text(primary_key=True, partition_key=True, required=True) broker = columns.Text(primary_key=True, required=True) prof = columns.Text(primary_key=True, required=True) prod = columns.Text(primary_key=True, required=True) .... Reading documentation from cql engine ther...


How to filter data from a file using Python?

I'm trying to filter certain data from an HTML file. For example, the HTML file is as follows: <tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]">software_0.1-0.log</td><td align="right">17-Nov-2009 13:46 </td><td align="right">186K</td></tr> I need to extract the software_0.1-0 part as well as the 17-Nov-2009 part. How can...


how do I filter values from XML file in python

I have a basic grasp of XML and python and have been using minidom with some success. I have run into a situation where I am unable to get the values I want from an XML file. Here is the basic structure of the pre-existing file. <localization> <b n="Stats"> <l k="SomeStat1"> <v>10</v> </l> <l k="SomeStat2"> <v&g...


Can I filter a django model with a python list?

Say I have a model object 'Person' defined, which has a field called 'Name'. And I have a list of people: l = ['Bob','Dave','Jane'] I would like to return a list of all Person records where the first name is not in the list of names defined in l. What is the most pythonic way of doing this? EDIT: After thinking about it, what I really was trying to do is come up wi...


regex - What does this Perl XML filter look like in Python?

curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" | perl -ne 'print "\t" if /<name>/; print "$2\n" if /<(title|name)>(.*)<\/\1>/;' I have this shell script which gets the Atom feed with command-line arguments for the username and password. I was wondering if this type of thing was possible in Python, and if so, how I would go about doing it. The atom feed is just re...


list - array filter in python?

For example, I have two lists A = [6, 7, 8, 9, 10, 11, 12] subset_of_A = [6, 9, 12]; # the subset of A the result should be [7, 8, 10, 11]; the remaining elements Is there a built-in function in python to do this?


How to Filter from CSV file using Python Script

I have abx.csv file having three columns. I would like to filter the data which is having Application as Central and write it in same .csv file User ID Name Application 001 Ajohns ABI 002 Fjerry Central 900 Xknight RFC 300 JollK QDI 078 Demik Central I need to write User ID,Name,Apllication...


Filter xml data in Python

Please help, Python beginner, after getting all the data from xml, data_list = xmlTree.findall('.//data') e.g here I get 10 rows Now, I need to keep only a few rows for which attribute 'name' values match with elements of another list (inputID) with three IDs inside. e.g. remains only 3 rows whose name attribute match with the list elements Thank you.


ECG filter in python

I'm new to Python, I hope not to obvious questions, need some urgent help. I have a file with the signal, I have to answer the questions: a) present a statistical description of the original signal (maximum, minimum, average and standard deviation). b) Filter the signal to be observed with minimum noise and high frequency "base line wandering". Make the plot of this signal [Create subplot 1] c) Provide a descriptio...


Filter python mysql result

I am running a mysql query from python using mysql.connector library as per code below cnx = mysql.connector.connect(host=mysql_localhost, user=user, password=password, database=database) cursor = cnx.cursor() cursor.execute("select * from settings" ) results = cursor.fetchall() ID, server, port, user, password, temp_min ,temp_max = results[0] print(user) cursor.close() cnx.close() the re...


Filter out values from python mysql query

I am using mysql.connector in python to get a list of values from a database. can you please help extract each value from the list separately my code is as below cnx = mysql.connector.connect(host=mysql_localhost, user=user, password=password, database=database) cursor = cnx.cursor() cursor.execute("select * from settings" ) results = cursor.fetchall() print(results) and t...






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top