Python: Retrieve items from a set
In general, Python sets don't seem to be designed for retrieving items by key. That's obviously what dictionaries are for. But is there anyway that, given a key, you can retrieve an instance from a set which is equal to the key?
Again, I know this is exactly what dictionaries are for, but as far as I can see, there are legitimate reasons to want to do this with a set. Suppose you have a class defined something like:
class Person:
def __init__(self, firstname, lastname, age):
self.firstname = firstname
self.lastname = lastname
self.age = age
Now, suppose I am going to be creating a large number of Person
objects, and each time I create a Person
object I need to make sure it is not a duplicate of a previous Person
object. A Person
is considered a duplicate of another Person
if they have the same firstname
, regardless of other instance variables. So naturally the obvious thing to do is insert all Person
objects into a set, and define a __hash__
and __eq__
method so that Person
objects are compared by their firstname
.
An alternate option would be to create a dictionary of Person
objects, and use a separately created firstname
string as the key. The drawback here is that I'd be duplicating the firstname
string. This isn't really a problem in most cases, but what if I have 10,000,000 Person
objects? The redundant string storage could really start adding up in terms of memory usage.
But if two Person
objects compare equally, I need to be able to retrieve the original object so that the additional instance variables (aside from firstname
) can be merged in a way required by the business logic. Which brings me back to my problem: I need some way to retrieve instances from a set
.
Is there anyway to do this? Or is using a dictionary the only real option here?
Asked by: Kevin932 | Posted: 27-01-2022
Answer 1
I'd definitely use a dictionary here. Reusing the firstname
instance variable as a dictionary key won't copy it -- the dictionary will simply use the same object. I doubt a dictionary will use significantly more memory than a set.
To actually save memory, add a __slots__
attribute to your classes. This will prevent each of you 10,000,000 instances from having a __dict__
attribute, which will save much more memory than the potential overhead of a dict
over a set
.
Edit: Some numbers to back my claims. I defined a stupid example class storing pairs of random strings:
def rand_str():
return str.join("", (chr(random.randrange(97, 123))
for i in range(random.randrange(3, 16))))
class A(object):
def __init__(self):
self.x = rand_str()
self.y = rand_str()
def __hash__(self):
return hash(self.x)
def __eq__(self, other):
return self.x == other.x
The amount of memory used by a set of 1,000,000 instances of this class
random.seed(42)
s = set(A() for i in xrange(1000000))
is on my machine 240 MB. If I add
__slots__ = ("x", "y")
to the class, this goes down to 112 MB. If I store the same data in a dictionary
def key_value():
a = A()
return a.x, a
random.seed(42)
d = dict(key_value() for i in xrange(1000000))
this uses 249 MB without __slots__
and 121 MB with __slots__
.
Answer 2
Yes, you can do this: A set
can be iterated over. But note that this is an O(n) operation as opposed to the O(1) operation of the dict.
So, you have to trade off speed versus memory. This is a classic. I personally would optimize for here (i.e. use the dictionary), since memory won't get short so quickly with only 10,000,000 objects and using dictionaries is really easy.
As for additional memory consumption for the firstname
string: Since strings are immutable in Python, assigning the firstname
attribute as a key will not create a new string, but just copy the reference.
Answer 3
I think you'll have the answer here:
Moving Beyond Factories in Python
Answered by: Lenny435 | Posted: 28-02-2022Similar questions
sql server - Python: Retrieve Image from MSSQL
I'm working on a Python project that retrieves an image from MSSQL. My code is able to retrieve the images successfully but with a fixed size of 63KB. if the image is greater than that size, it just brings the first 63KB from the image!
The following is my code:
#!/usr/bin/python
import _mssql
mssql=_mssql.connect('<ServerIP>','<UserID>','<Password>')
mssql.select_db('<Database...
tuple in redis / python: can store, not retrieve
So, I've got redis working with python -- exciting!
I need to store a tuple and retrieve it / parse it later. Construct below isn't working, I think because the returned tuple is quoted -- there is a quote on either end of it.
It seems to me that the quotes indicate that it isn't actually a tuple, but rather a string.
So does anyone know how to get redis to actually return a working tuple? Thank...
numpy - Python: retrieve values after index in each column of table
I would like to get a table with values after each cell = 100 in a table. Is there an efficient method for completing this?
Now:
Col1 Col2 Col3 Col4
1 89 100 92
2 100 14 88
3 75 18 100
4 34 56 63
To:
Col1 Col2 Col3 Col4
1 nan 100 nan
2 100 14 nan
3 75 18 100
4 34 56 63
I've tried:
f...
Python: How to retrieve second word from the text
So the generator function generates a word char by char until "" and now I want the main function to call out generator function 100 times so that it would create a list words with 100 words. As I have it now it will call out the function 100x but only with one word. What should I do so that it would remember the words it has used already.
word = " "
def generator():
global word
with open("text.txt...
sqlite - Python: retrieve number of rows affected with SQL DELETE query
It seems it is quite easy to retrieve the number of rows SELECTed with a SQL query with
cursor.execute("SELECT COUNT(*) from ...")
result=cursor.fetchone()
but how should I retrieve the number of rows by a DELETE query?
Python: How to retrieve the count of values in a column reading a CSV?
Closed. This question needs to be more focused. It ...
Python: Retrieve list of sockets on current machine?
I'm brand new to Python (as of last week) and I'm still getting to grips with the basics so please excuse any ignorance I display.
As part of my homework I have been asked to make a basic port scanner and one of the functions I have to include is the retrieval of a list of sockets on the current machine. I have been looking around and managed to piece together a piece of code that allows me to enter the IP of the m...
key value - Python: from a dict, how retrieve object as key
If I have a dictionary of several Object:value,, How can I retrieve certain Object using it as [key]?
For example
class Obj():
def __init__(self, value):
self.value = value
dct = {Obj(foo):foo_value, Obj(bar):bar_value}
#How to do something like
#>>> dct[foo]
#foo_value
Suppose that foo_value can't be aasigned as property of Obj.
Python: XML retrieve from a URL to CSV
I am trying to write a Python script that dynamically reads the XML data from a URL, (e.g. http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72)
The format of the XML is as follows:
<station id="KCQT" name="Los Angeles / USC Campus Downtown" elev="179" lat="34.02355" lon="-...
Python: Retrieve Tuples From Set Based on First Value of Tuple
Suppose I have a set, s that looks like this:
s = set([(1,2), (1,4), (2,6)])
I want to retrieve all tuples in my set that have first element 1. Usually I'd have to give a full tuple, something like:
(1,2) in s
In this case, I want to retrieve all tuples of the form (1,_) where _ can be any number.
html - How can I retrieve the page title of a webpage using Python?
How can I retrieve the page title of a webpage (title html tag) using Python?
python - How to retrieve an element from a set without removing it?
Suppose the following:
>>> s = set([1, 2, 3])
How do I get a value (any value) out of s without doing s.pop()? I want to leave the item in the set until I am sure I can remove it - something I can only be sure of after an asynchronous call to another host.
Quick and dirty:
>>> elem = s.pop()
>>> s.add(elem)
sql server - Python: Retrieve Image from MSSQL
I'm working on a Python project that retrieves an image from MSSQL. My code is able to retrieve the images successfully but with a fixed size of 63KB. if the image is greater than that size, it just brings the first 63KB from the image!
The following is my code:
#!/usr/bin/python
import _mssql
mssql=_mssql.connect('<ServerIP>','<UserID>','<Password>')
mssql.select_db('<Database...
python - Best way to retrieve variable values from a text file?
Referring on this question, I have a similar -but not the same- problem..
On my way, I'll have some text file, structured like:
var_a: 'home'
var_b: 'car'
var_c: 15.5
And I need that python read the file and then create a variable named var_a with value 'home', and so on.
Example...
python - How to retrieve the selected text from the active window
I am trying to create a simple open source utility for windows using Python that can perform user-defined actions on the selected text of the currently active window. The utility should be activated using a pre-defined keyboard shortcut.
Usage is partially outlined in the following example:
The user selects some text using the mouse or the keyboard (in any application window)
python - How can I retrieve last x elements in Django
I am trying to retrieve the latest 5 posts (by post time)
In the views.py, if I try blog_post_list = blogPosts.objects.all()[:5] It retreives the first 5 elements of the blogPosts objects, how can I reverse this to retreive the latest ones?
Cheers
python - Retrieve module object from stack frame
Given a frame object, I need to get the corresponding module object. In other words, implement callers_module so this works:
import sys
from some_other_module import callers_module
assert sys.modules[__name__] is callers_module()
(That would be equivalent because I can generate a stack trace in the function for this test case. The imports are there simply to make that example complete an...
How do I retrieve Hotmail contacts with python
How can I retrieve contacts from hotmail with python?
Is there any example?
linux - How to retrieve the process start time (or uptime) in python
How to retrieve the process start time (or uptime) in python in Linux?
I only know, I can call "ps -p my_process_id -f" and then parse the output. But it is not cool.
python - Retrieve the two highest item from a list containing 100,000 integers
How can retrieve the two highest item from a list containing 100,000 integers without having to sort the entire list first?
Still can't find your answer? Check out these communities...
PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python