How can I get all rows with keys provided in a list using SQLalchemy?

I have sequence of IDs I want to retrieve. It's simple:


Is there a better way to do it?

Asked by: Freddie835 | Posted: 06-12-2021

Answer 1

Your code is absolutety fine.

IN is like a bunch of X=Y joined with OR and is pretty fast in contemporary databases.

However, if your list of IDs is long, you could make the query a bit more efficient by passing a sub-query returning the list of IDs.

Answered by: First Name676 | Posted: 07-01-2022

Answer 2

The code as is is completely fine. However, someone is asking me for some system of hedging between the two approaches of doing a big IN vs. using get() for individual IDs.

If someone is really trying to avoid the SELECT, then the best way to do that is to set up the objects you need in memory ahead of time. Such as, you're working on a large table of elements. Break up the work into chunks, such as, order the full set of work by primary key, or by date range, whatever, then load everything for that chunk locally into a cache:

 all_ids = [<huge list of ids>]

 while all_ids:
     chunk = all_ids[0:1000]

     # bonus exercise!  Throw each chunk into a multiprocessing.pool()!
     all_ids = all_ids[1000:]

     my_cache = dict(
           Session.query(, Record).filter(
       [0], chunk[-1]))

     for id_ in chunk:
         my_obj = my_cache[id_]
         <work on my_obj>

That's the real world use case.

But to also illustrate some SQLAlchemy API, we can make a function that does the IN for records we don't have and a local get for those we do. Here is that:

from sqlalchemy import inspect

def get_all(session, cls, seq):
    mapper = inspect(cls)
    lookup = set()
    for ident in seq:
        key = mapper.identity_key_from_primary_key((ident, ))
        if key in session.identity_map:
            yield session.identity_map[key]
    if lookup:
        for obj in session.query(cls).filter(
            yield obj

Here is a demonstration:

from sqlalchemy import Column, Integer, create_engine, String
from sqlalchemy.orm import Session
from sqlalchemy.ext.declarative import declarative_base
import random

Base = declarative_base()

class A(Base):
    __tablename__ = 'a'
    id = Column(Integer, primary_key=True)
    data = Column(String)

e = create_engine("sqlite://", echo=True)

ids = range(1, 50)

s = Session(e)
s.add_all([A(id=i, data='a%d' % i) for i in ids])

already_loaded = s.query(A).filter(, 10))).all()

assert len(s.identity_map) == 10

to_load = set(random.sample(ids, 25))
all_ = list(get_all(s, A, to_load))

assert set( for x in all_) == to_load

Answered by: Catherine111 | Posted: 07-01-2022

Answer 3

If you use composite primary keys, you can use tuple_, as in

from sqlalchemy import tuple_
session.query(Record).filter(tuple_(Record.id1, Record.id2).in_(seq)).all()

Note that this is not available on SQLite (see doc).

Answered by: Melissa836 | Posted: 07-01-2022

Answer 4

I'd recommend to take a look at the SQL it produces. You can just print str(query) to see it.

I'm not aware of an ideal way of doing it with standard SQL.

Answered by: Brad799 | Posted: 07-01-2022

Answer 5

There is one other way; If it's reasonable to expect that the objects in question are already loaded into the session; you've accessed them before in the same transaction, you can instead do:

map(session.query(Record).get, seq)

In the case where those objects are already present, this will be much faster, since there won't be any queries to retrieve those objects; On the other hand, if more than a tiny number of those objects are not loaded, it will be much, much slower, since it will cause a query per missing instance, instead of a single query for all objects.

This can be useful when you are doing joinedload() queries before reaching the above step, so you can be sure that they have been loaded already. In general, you should use the solution in the question by default, and only explore this solution when you have seen that you are querying for the same objects over and over.

Answered by: Rafael991 | Posted: 07-01-2022

Similar questions

python - Insert user provided values in database using flask and SQLAlchemy

Trying to insert some values in an existing table (clients) in a database (master) in a Flask application. The idea is to insert values provided by the user or some script (Stored in a variable) My flask script is a follows: from flask import Flask from flask_sqlalchemy import SQLAlchemy app = Flask(__name__) app.config['DEBUG'] = True app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://user:psw...

python - How can I use UUIDs in SQLAlchemy?

Is there a way to define a column (primary key) as a UUID in SQLAlchemy if using PostgreSQL (Postgres)?

python - Getting random row through SQLAlchemy

How do I select one or more random rows from a table using SQLAlchemy?

python - How to add an automatic filter to a relation with SQLAlchemy?

I'm using SQLAlchemy 0.5rc, and I'd like to add an automatic filter to a relation, so that every time it tries to fetch records for that relation, it ignores the "remote" ones if they're flagged as "logically_deleted" (a boolean field of the child table) For example, if an object "parent" has a "children" relation that has 3 records, but one of them is logically deleted, when I query for "Parent" I'd like SQLA to f...

python - What is the sqlalchemy equivalent column type for 'money' and 'OID' in Postgres?

What is the sqlalchemy equivalent column type for 'money' and 'OID' column types in Postgres?

python - SQLAlchemy and empty columns

When I try to insert a new record into the database using SQLAlchemy and I don't fill out all values, it tries to insert them as "None" (instead of omitting them). It then complains about "can't be null" errors. Is there a way to have it just omit columns from the sql query if I also omitted them when declaring the instance?

python - SQLAlchemy DateTime timezone

SQLAlchemy's DateTime type allows for a timezone=True argument to save a non-naive datetime object to the database, and to return it as such. Is there any way to modify the timezone of the tzinfo that SQLAlchemy passes in so it could be, for instance, UTC? I realize that I could just use default=datetime.datetime.utcnow; however, this is a naive time that would happily ac...

python - How can I order objects according to some attribute of the child in sqlalchemy?

Here is the situation: I have a parent model say BlogPost. It has many Comments. What I want is the list of BlogPosts ordered by the creation date of its' Comments. I.e. the blog post which has the most newest comment should be on top of the list. Is this possible with SQLAlchemy?

python - SQLAlchemy - INSERT OR REPLACE equivalent

does anybody know what is the equivalent to SQL "INSERT OR REPLACE" clause in SQLAlchemy and its SQL expression language? Many thanks -- honzas

python - Defining a table with sqlalchemy with a mysql unix timestamp

Background, there are several ways to store dates in MySQ. As a string e.g. "09/09/2009". As integer using the function UNIX_TIMESTAMP() this is supposedly the traditional unix time representation (you know seconds since the epoch plus/minus leap seconds). As a MySQL TIMESTAMP, a mysql specific data type not the same than unix timestamps. As a MySQL Date field, another mysql spec...

python - How to generate a file with DDL in the engine's SQL dialect in SQLAlchemy?

Suppose I have an engine pointing at MySQL database: engine = create_engine('mysql://arthurdent:answer42@localhost/dtdb', echo=True) I can populate dtdb with tables, FKs, etc by: metadata.create_all(engine) Is there an easy way to generate the SQL file that contains all the DDL statements instead of actually applying these DDL sta...

Still can't find your answer? Check out these communities...

PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python