Coder Perfect

Using SQLAlchemy to get a random row


How can I use SQLAlchemy to select one or more random rows from a table?

Asked by cnu

Solution #1

This is primarily a database-related issue.

You can use this with SQLAlchemy because PostgreSQL, SQLite, MySQL, and Oracle all offer the ability to order by a random function:

from  sqlalchemy.sql.expression import func, select

select.order_by(func.random()) # for PostgreSQL, SQLite

select.order_by(func.rand()) # for MySQL

select.order_by('dbms_random.value') # For Oracle

The query must then be limited by the number of records you require (for example, by using.limit()).

Keep in mind that picking random records, at least in PostgreSQL, has serious performance concerns; here’s a nice essay on the subject.

Answered by Łukasz

Solution #2

If you’re using the orm and your table isn’t too big (or you’ve already cached the number of rows), and you want it to be database independent, there’s a fairly simple way to do it.

import random
rand = random.randrange(0, session.query(Table).count()) 
row = session.query(Table)[rand]

This is a little bit of cheating, but that’s why you use an orm.

Answered by David Raznick

Solution #3

There is a simple database-independent approach to pull a random record. Simply use the.offset() method. There’s no need to remove all the rows:

import random
query = DBSession.query(Table)
rowCount = int(query.count())
randomRow = query.offset(int(rowCount*random.random())).first()

What is the location of your table? (or you could put any query there). If you want a few rows, simply repeat the process, making sure that each row is distinct from the one before it.

Answered by GuySoft

Solution #4

From slowest to fastest, here are four alternative variants. At the bottom of the page, you’ll find the timeit results:

from sqlalchemy.sql import func
from sqlalchemy.orm import load_only

def simple_random():
    return random.choice(model_name.query.all())

def load_only_random():
    return random.choice(model_name.query.options(load_only('id')).all())

def order_by_random():
    return model_name.query.order_by(func.random()).first()

def optimized_random():
    return model_name.query.options(load_only('id')).offset(
                func.random() *

The following are the results of 10,000 timeit runs on my Macbook against a 300-row PostgreSQL table:


It’s clear that using func.random() is far faster than using Python’s random.choice() method to return all results ().

Furthermore, as the table size grows, the efficiency of order by random() degrades dramatically since ORDER BY requires a full database search whereas COUNT in optimized random() can use an index.

Answered by Jeff Widman

Solution #5

Some SQL DBMS, namely Microsoft SQL Server, DB2, and PostgreSQL have implemented the SQL:2003 TABLESAMPLE clause. Support was added to SQLAlchemy in version 1.1. It allows returning a sample of a table using different sampling methods – the standard requires SYSTEM and BERNOULLI, which return a desired approximate percentage of a table.

To create a TableSample construct in SQLAlchemy, use FromClause.tablesample() and tablesample():

# Approx. 1%, using SYSTEM method
sample1 = mytable.tablesample(1)

# Approx. 1%, using BERNOULLI method
sample2 = mytable.tablesample(func.bernoulli(1))

When using mapped classes, there is one caveat: the resulting TableSample object must be aliased before it can be used to query model objects:

sample = aliased(MyModel, tablesample(MyModel, 1))
res = session.query(sample).all()

Since many of the answers contain performance benchmarks, I’ll include some simple tests here as well. Using a simple table in PostgreSQL with about a million rows and a single integer column, select (approx.) 1% sample:

In [24]: %%timeit
    ...:     order_by(func.random()).\
    ...:     limit(select([func.round(func.count() * 0.01)]).
    ...:           select_from(foo).
    ...:           as_scalar()).\
    ...:     execute().\
    ...:     fetchall()
307 ms ± 5.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [25]: %timeit foo.tablesample(1).select().execute().fetchall()
6.36 ms ± 188 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [26]: %timeit foo.tablesample(func.bernoulli(1)).select().execute().fetchall()
19.8 ms ± 381 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Before rushing to utilise the SYSTEM sampling method, keep in mind that it samples pages rather than individual tuples, therefore it may not be appropriate for tiny tables and may not yield as random results if the table is clustered.

If you’re using a dialect that doesn’t let you supply the sample %, number of rows, or seed as parameters, and you’re using a driver that doesn’t let you inline values, pass the values as literal SQL text if they’re static, or use a custom SQLA compiler extension to inline them:

from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql import TableSample

def visit_tablesample(tablesample, self, asfrom=False, **kw):
    """ Compile `TableSample` with values inlined.
    kw_literal_binds = {**kw, "literal_binds": True}
    text = "%s TABLESAMPLE %s" % (
        self.visit_alias(tablesample, asfrom=True, **kw),
        tablesample._get_method()._compiler_dispatch(self, **kw_literal_binds),

    if tablesample.seed is not None:
        text += " REPEATABLE (%s)" % (
            tablesample.seed._compiler_dispatch(self, **kw_literal_binds)

    return text

from sqlalchemy import table, literal, text

# Static percentage
print(table("tbl").tablesample(text("5 PERCENT")))
# Compiler inlined values
print(table("tbl").tablesample(5, seed=literal(42)))

Answered by Ilja Everilä

Post is based on