Getting random row through SQLAlchemy

How do I select one or more random rows from a table using SQLAlchemy?


Asked by: David697 | Posted: 05-10-2021






Answer 1

This is very much a database-specific issue.

I know that PostgreSQL, SQLite, MySQL, and Oracle have the ability to order by a random function, so you can use this in SQLAlchemy:

from  sqlalchemy.sql.expression import func, select

select.order_by(func.random()) # for PostgreSQL, SQLite

select.order_by(func.rand()) # for MySQL

select.order_by('dbms_random.value') # For Oracle

Next, you need to limit the query by the number of records you need (for example using .limit()).

Bear in mind that at least in PostgreSQL, selecting random record has severe perfomance issues; here is good article about it.

Answered by: Edgar471 | Posted: 06-11-2021



Answer 2

Here's four different variations, ordered from slowest to fastest. timeit results at the bottom:

from sqlalchemy.sql import func
from sqlalchemy.orm import load_only

def simple_random():
    return random.choice(model_name.query.all())

def load_only_random():
    return random.choice(model_name.query.options(load_only('id')).all())

def order_by_random():
    return model_name.query.order_by(func.random()).first()

def optimized_random():
    return model_name.query.options(load_only('id')).offset(
            func.floor(
                func.random() *
                db.session.query(func.count(model_name.id))
            )
        ).limit(1).all()

timeit results for 10,000 runs on my Macbook against a PostgreSQL table with 300 rows:

simple_random(): 
    90.09954111799925
load_only_random():
    65.94714171699889
order_by_random():
    23.17819356000109
optimized_random():
    19.87806927999918

You can easily see that using func.random() is far faster than returning all results to Python's random.choice().

Additionally, as the size of the table increases, the performance of order_by_random() will degrade significantly because an ORDER BY requires a full table scan versus the COUNT in optimized_random() can use an index.

Answered by: John339 | Posted: 06-11-2021



Answer 3

If you are using the orm and the table is not big (or you have its amount of rows cached) and you want it to be database independent the really simple approach is.

import random
rand = random.randrange(0, session.query(Table).count()) 
row = session.query(Table)[rand]

This is cheating slightly but thats why you use an orm.

Answered by: Julian684 | Posted: 06-11-2021



Answer 4

There is a simple way to pull a random row that IS database independent. Just use .offset() . No need to pull all rows:

import random
query = DBSession.query(Table)
rowCount = int(query.count())
randomRow = query.offset(int(rowCount*random.random())).first()

Where Table is your table (or you could put any query there). If you want a few rows, then you can just run this multiple times, and make sure that each row is not identical to the previous.

Answered by: Miranda759 | Posted: 06-11-2021



Answer 5

Some SQL DBMS, namely Microsoft SQL Server, DB2, and PostgreSQL have implemented the SQL:2003 TABLESAMPLE clause. Support was added to SQLAlchemy in version 1.1. It allows returning a sample of a table using different sampling methods – the standard requires SYSTEM and BERNOULLI, which return a desired approximate percentage of a table.

In SQLAlchemy FromClause.tablesample() and tablesample() are used to produce a TableSample construct:

# Approx. 1%, using SYSTEM method
sample1 = mytable.tablesample(1)

# Approx. 1%, using BERNOULLI method
sample2 = mytable.tablesample(func.bernoulli(1))

There's a slight gotcha when used with mapped classes: the produced TableSample object must be aliased in order to be used to query model objects:

sample = aliased(MyModel, tablesample(MyModel, 1))
res = session.query(sample).all()

Since many of the answers contain performance benchmarks, I'll include some simple tests here as well. Using a simple table in PostgreSQL with about a million rows and a single integer column, select (approx.) 1% sample:

In [24]: %%timeit
    ...: foo.select().\
    ...:     order_by(func.random()).\
    ...:     limit(select([func.round(func.count() * 0.01)]).
    ...:           select_from(foo).
    ...:           as_scalar()).\
    ...:     execute().\
    ...:     fetchall()
    ...: 
307 ms ± 5.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [25]: %timeit foo.tablesample(1).select().execute().fetchall()
6.36 ms ± 188 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [26]: %timeit foo.tablesample(func.bernoulli(1)).select().execute().fetchall()
19.8 ms ± 381 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Before rushing to use SYSTEM sampling method one should know that it samples pages, not individual tuples, so it might not be suitable for small tables, for example, and may not produce as random results, if the table is clustered.


If using a dialect that does not allow passing the sample percentage / number of rows and seed as parameters, and a driver that does not inline values, then either pass the values as literal SQL text if they are static, or inline them using a custom SQLA compiler extension:

from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql import TableSample

@compiles(TableSample)
def visit_tablesample(tablesample, self, asfrom=False, **kw):
    """ Compile `TableSample` with values inlined.
    """
    kw_literal_binds = {**kw, "literal_binds": True}
    text = "%s TABLESAMPLE %s" % (
        self.visit_alias(tablesample, asfrom=True, **kw),
        tablesample._get_method()._compiler_dispatch(self, **kw_literal_binds),
    )

    if tablesample.seed is not None:
        text += " REPEATABLE (%s)" % (
            tablesample.seed._compiler_dispatch(self, **kw_literal_binds)
        )

    return text

from sqlalchemy import table, literal, text

# Static percentage
print(table("tbl").tablesample(text("5 PERCENT")))
# Compiler inlined values
print(table("tbl").tablesample(5, seed=literal(42)))

Answered by: Wilson407 | Posted: 06-11-2021



Answer 6

This is my function to select random row(s) of a table:

from sqlalchemy.sql.expression import func

def random_find_rows(sample_num):
    if not sample_num:
        return []

    session = DBSession()
    return session.query(Table).order_by(func.random()).limit(sample_num).all()

Answered by: Victoria164 | Posted: 06-11-2021



Answer 7

This is the solution I use:

from random import randint

rows_query = session.query(Table)                # get all rows
if rows_query.count() > 0:                       # make sure there's at least 1 row
    rand_index = randint(0,rows_query.count()-1) # get random index to rows 
    rand_row   = rows_query.all()[rand_index]    # use random index to get random row

Answered by: John903 | Posted: 06-11-2021



Answer 8

this solution will select a single random row

This solution requires that the primary key is named id, it should be if its not already:

import random
max_model_id = YourModel.query.order_by(YourModel.id.desc())[0].id
random_id = random.randrange(0,max_model_id)
random_row = YourModel.query.get(random_id)
print random_row

Answered by: Leonardo791 | Posted: 06-11-2021



Answer 9

Use this simplest method this example on choosing a random question from the database:-

#first import the random module
import random

#then choose what ever Model you want inside random.choise() method
get_questions = random.choice(Question.query.all())

Answered by: Joyce609 | Posted: 06-11-2021



Answer 10

Theres a couple of ways through SQL, depending on which data base is being used.

(I think SQLAlchemy can use all these anyways)

mysql:

SELECT colum FROM table
ORDER BY RAND()
LIMIT 1

PostgreSQL:

SELECT column FROM table
ORDER BY RANDOM()
LIMIT 1

MSSQL:

SELECT TOP 1 column FROM table
ORDER BY NEWID()

IBM DB2:

SELECT column, RAND() as IDX
FROM table
ORDER BY IDX FETCH FIRST 1 ROWS ONLY

Oracle:

SELECT column FROM
(SELECT column FROM table
ORDER BY dbms_random.value)
WHERE rownum = 1

However I don't know of any standard way

Answered by: Sienna748 | Posted: 06-11-2021



Similar questions

python - Random errors with SQLAlchemy

I'm using a setup with nginx, uwsgi and SQLAlchemy. I recently switched from SQLObject and I'm now seeing strange random errors with SQLAlchemy. For instance: sqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically. or: sqlalchemy.exc.NoSuchColumnError: "Could not locate column in row for column 'module.id'"


python - How can I use UUIDs in SQLAlchemy?

Is there a way to define a column (primary key) as a UUID in SQLAlchemy if using PostgreSQL (Postgres)?


python - simple update in sqlalchemy

UserTable is: id (INT) name (STR) last_login (DATETIME) Serving a web page request i have a user id in hand and I only wish to update the last_login field to 'now'. It seems to me that there are 2 ways: issue a direct SQL using db_engine (losing the mapper) OR query the user first a...


python - Nested Set Model and SQLAlchemy -- Adding New Nodes

How should new nodes be added with SQLAlchemy to a tree implemented using the Nested Set Model? class Category(Base): __tablename__ = 'categories' id = Column(Integer, primary_key=True) name = Column(String(128), nullable=False) lft = Column(Integer, nullable=False, unique=True) ...


python - sqlalchemy easy way to insert or update?

I have a sequence of new objects. They all look like similar to this: Foo(pk_col1=x, pk_col2=y, val='bar') Some of those are Foo that exist (i.e. only val differs from the row in the db) and should generate update queries. The others should generate inserts. I can think of a few ways of doing this, the best being: pk_cols = Foo.table.primary_key.keys() for f1 in foos: f2 = Foo.ge...


python - How to get sqlalchemy length of a string column

Consider this simple table definition (using SQLAlchemy-0.5.6) from sqlalchemy import * db = create_engine('sqlite:///tutorial.db') db.echo = False # Try changing this to True and see what happens metadata = MetaData(db) user = Table('user', metadata, Column('user_id', Integer, primary_key=True), Column('name', String(40)), Column('age', Integer), ...


python - SQLAlchemy filter query by related object

Using SQLAlchemy, I have a one to many relation with two tables - users and scores. I am trying to query the top 10 users sorted by their aggregate score over the past X amount of days. users: id user_name score scores: user score_amount created My current query is: top_use...


python - Group by hour in SQLAlchemy?

How do I group query results by the hour part of a datetime column in SQLAlchemy?


python - Mysql + SQLAlchemy + Pylons Issue

I'm trying to setup my models in Pylons using a MySQL backend and I keep getting the following: Traceback (most recent call last): File "/usr/local/bin/paster", line 8, in <module> load_entry_point('PasteScript==1.7.3', 'console_scripts', 'paster')() File "/usr/local/lib/python2.6/dist-packages/PasteScript-1.7.3-py2.6.egg/paste/script/command.py", line 84, in run invoke(command, comman...


python - SQLAlchemy INSERT IGNORE

How can I insert multiple data records into table ignoring duplicates. I am using SQLAlchemy. Thank you!


python - django and sqlalchemy

I'm fairly new to Django and have a basic question: I want to use an ORM that I can work with it for Django and other python projects, so the basic question is Django ORM agnostic and if so how can I use SQLAlchemy with it for example? If it's not, then what do you suggest for the above problem (using ORM objects that works with both Django and outside it)?


python - How can I use UUIDs in SQLAlchemy?

Is there a way to define a column (primary key) as a UUID in SQLAlchemy if using PostgreSQL (Postgres)?


python - How to add an automatic filter to a relation with SQLAlchemy?

I'm using SQLAlchemy 0.5rc, and I'd like to add an automatic filter to a relation, so that every time it tries to fetch records for that relation, it ignores the "remote" ones if they're flagged as "logically_deleted" (a boolean field of the child table) For example, if an object "parent" has a "children" relation that has 3 records, but one of them is logically deleted, when I query for "Parent" I'd like SQLA to f...


python - What is the sqlalchemy equivalent column type for 'money' and 'OID' in Postgres?

What is the sqlalchemy equivalent column type for 'money' and 'OID' column types in Postgres?


python - SQLAlchemy and empty columns

When I try to insert a new record into the database using SQLAlchemy and I don't fill out all values, it tries to insert them as "None" (instead of omitting them). It then complains about "can't be null" errors. Is there a way to have it just omit columns from the sql query if I also omitted them when declaring the instance?


python - SQLAlchemy DateTime timezone

SQLAlchemy's DateTime type allows for a timezone=True argument to save a non-naive datetime object to the database, and to return it as such. Is there any way to modify the timezone of the tzinfo that SQLAlchemy passes in so it could be, for instance, UTC? I realize that I could just use default=datetime.datetime.utcnow; however, this is a naive time that would happily ac...


python - How can I get all rows with keys provided in a list using SQLalchemy?

I have sequence of IDs I want to retrieve. It's simple: session.query(Record).filter(Record.id.in_(seq)).all() Is there a better way to do it?


python - How can I order objects according to some attribute of the child in sqlalchemy?

Here is the situation: I have a parent model say BlogPost. It has many Comments. What I want is the list of BlogPosts ordered by the creation date of its' Comments. I.e. the blog post which has the most newest comment should be on top of the list. Is this possible with SQLAlchemy?


python - SQLAlchemy - INSERT OR REPLACE equivalent

does anybody know what is the equivalent to SQL "INSERT OR REPLACE" clause in SQLAlchemy and its SQL expression language? Many thanks -- honzas


python - Defining a table with sqlalchemy with a mysql unix timestamp

Background, there are several ways to store dates in MySQ. As a string e.g. "09/09/2009". As integer using the function UNIX_TIMESTAMP() this is supposedly the traditional unix time representation (you know seconds since the epoch plus/minus leap seconds). As a MySQL TIMESTAMP, a mysql specific data type not the same than unix timestamps. As a MySQL Date field, another mysql spec...


python - How to generate a file with DDL in the engine's SQL dialect in SQLAlchemy?

Suppose I have an engine pointing at MySQL database: engine = create_engine('mysql://arthurdent:answer42@localhost/dtdb', echo=True) I can populate dtdb with tables, FKs, etc by: metadata.create_all(engine) Is there an easy way to generate the SQL file that contains all the DDL statements instead of actually applying these DDL sta...






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top