How would one make Python objects persistent in a web-app?

I'm writing a reasonably complex web application. The Python backend runs an algorithm whose state depends on data stored in several interrelated database tables which does not change often, plus user specific data which does change often. The algorithm's per-user state undergoes many small changes as a user works with the application. This algorithm is used often during each user's work to make certain important decisions.

For performance reasons, re-initializing the state on every request from the (semi-normalized) database data quickly becomes non-feasible. It would be highly preferable, for example, to cache the state's Python object in some way so that it can simply be used and/or updated whenever necessary. However, since this is a web application, there several processes serving requests, so using a global variable is out of the question.

I've tried serializing the relevant object (via pickle) and saving the serialized data to the DB, and am now experimenting with caching the serialized data via memcached. However, this still has the significant overhead of serializing and deserializing the object often.

I've looked at shared memory solutions but the only relevant thing I've found is POSH. However POSH doesn't seem to be widely used and I don't feel easy integrating such an experimental component into my application.

I need some advice! This is my first shot at developing a web application, so I'm hoping this is a common enough issue that there are well-known solutions to such problems. At this point solutions which assume the Python back-end is running on a single server would be sufficient, but extra points for solutions which scale to multiple servers as well :)


  • I have this application working, currently live and with active users. I started out without doing any premature optimization, and then optimized as needed. I've done the measuring and testing to make sure the above mentioned issue is the actual bottleneck. I'm sure pretty sure I could squeeze more performance out of the current setup, but I wanted to ask if there's a better way.
  • The setup itself is still a work in progress; assume that the system's architecture can be whatever suites your solution.

Asked by: Aldus841 | Posted: 01-10-2021

Answer 1

Be cautious of premature optimization.

Addition: The "Python backend runs an algorithm whose state..." is the session in the web framework. That's it. Let the Django framework maintain session state in cache. Period.

"The algorithm's per-user state undergoes many small changes as a user works with the application." Most web frameworks offer a cached session object. Often it is very high performance. See Django's session documentation for this.

Advice. [Revised]

It appears you have something that works. Leverage to learn your framework, learn the tools, and learn what knobs you can turn without breaking a sweat. Specifically, using session state.

Second, fiddle with caching, session management, and things that are easy to adjust, and see if you have enough speed. Find out whether MySQL socket or named pipe is faster by trying them out. These are the no-programming optimizations.

Third, measure performance to find your actual bottleneck. Be prepared to provide (and defend) the measurements as fine-grained enough to be useful and stable enough to providing meaningful comparison of alternatives.

For example, show the performance difference between persistent sessions and cached sessions.

Answered by: Edgar196 | Posted: 02-11-2021

Answer 2

I think that the multiprocessing framework has what might be applicable here - namely the shared ctypes module.

Multiprocessing is fairly new to Python, so it might have some oddities. I am not quite sure whether the solution works with processes not spawned via multiprocessing.

Answered by: Catherine866 | Posted: 02-11-2021

Answer 3

I think you can give ZODB a shot.

"A major feature of ZODB is transparency. You do not need to write any code to explicitly read or write your objects to or from a database. You just put your persistent objects into a container that works just like a Python dictionary. Everything inside this dictionary is saved in the database. This dictionary is said to be the "root" of the database. It's like a magic bag; any Python object that you put inside it becomes persistent."

Initailly it was a integral part of Zope, but lately a standalone package is also available.

It has the following limitation:

"Actually there are a few restrictions on what you can store in the ZODB. You can store any objects that can be "pickled" into a standard, cross-platform serial format. Objects like lists, dictionaries, and numbers can be pickled. Objects like files, sockets, and Python code objects, cannot be stored in the database because they cannot be pickled."

I have read it but haven't given it a shot myself though.

Other possible thing could be a in-memory sqlite db, that may speed up the process a bit - being an in-memory db, but still you would have to do the serialization stuff and all. Note: In memory db is expensive on resources.

Here is a link:

Answered by: Julia303 | Posted: 02-11-2021

Answer 4

First of all your approach is not a common web development practice. Even multi threading is being used, web applications are designed to be able to run multi-processing environments, for both scalability and easier deployment .

If you need to just initialize a large object, and do not need to change later, you can do it easily by using a global variable that is initialized while your WSGI application is being created, or the module contains the object is being loaded etc, multi processing will do fine for you.

If you need to change the object and access it from every thread, you need to be sure your object is thread safe, use locks to ensure that. And use a single server context, a process. Any multi threading python server will serve you well, also FCGI is a good choice for this kind of design.

But, if multiple threads are accessing and changing your object the locks may have a really bad effect on your performance gain, which is likely to make all the benefits go away.

Answered by: Alfred882 | Posted: 02-11-2021

Answer 5

This is Durus, a persistent object system for applications written in the Python programming language. Durus offers an easy way to use and maintain a consistent collection of object instances used by one or more processes. Access and change of a persistent instances is managed through a cached Connection instance which includes commit() and abort() methods so that changes are transactional.

I've used it before in some research code, where I wanted to persist the results of certain computations. I eventually switched to pytables as it met my needs better.

Answered by: Melissa144 | Posted: 02-11-2021

Answer 6

Another option is to review the requirement for state, it sounds like if the serialisation is the bottle neck then the object is very large. Do you really need an object that large?

I know in the Stackoverflow podcast 27 the reddit guys discuss what they use for state, so that maybe useful to listen to.

Answered by: John864 | Posted: 02-11-2021

Similar questions

python - How to debug Web2py applications?

Is it possible? By debug I mean setting breakpoints, inspect values and advance step by step.

deployment - How to build and deploy Python web applications

I have a Python web application consisting of several Python packages. What is the best way of building and deploying this to the servers? Currently I'm deploying the packages with Capistrano, installing the packages into a virtualenv with bash, and configuring the servers with puppet, but I would like to go for a more Python based solution. I've been looking a bit into zc.buildout, but it's not clear for m...

Remote debugging of multi threaded Python Applications

How can I do remote debugging of a multi threaded Python application, running on an Embedded Linux based system, from Windows XP or Vista? So far I have only come across PyScripter based remote debugging. How does it perform?

design patterns - Open source examples of well designed Python applications

How will Python and Ruby applications be affected by .NET?

I'm curious about how .NET will affect Python and Ruby applications. Will applications written in IronPython/IronRuby be so specific to the .NET environment, that they will essentially become platform specific? If they don't use any of the .NET features, then what is the advantage of IronPython/IronRuby over their non .NET counterparts?

python - How can I get a list of the running applications with GTK?

How can I get a list of the running applications? I'm referring to the ones in the panel at the bottom of the screen.

django - Testing time sensitive applications in Python

I've written an auction system in Django. I want to write unit tests but the application is time sensitive (e.g. the amount advertisers are charged is a function of how long their ad has been active on a website). What's a good approach for testing this type of application? Here's one possible solution: a DateFactory clas...

python - Charts in django Web Applications

I want to Embed a chart in a Web Application developed using django. I have come across Google charts API, ReportLab, PyChart, MatPlotLib and ...

python - How do you deploy django applications for windows?

Closed. This question does not meet Stack Overflow guid...

Packaging Ruby or Python applications for distribution?

Are there any good options other than the JVM for packaging Python or Ruby applications for distribution to end-users? Specifically, I'm looking for ways to be able to write and test a web-based application written in either Ruby or Python, complete with a back-end database, that I can then wrap up in a convenient set of platform-independent packages (of some type) for deployment on Windows, Linux, OS X, and Free...

Still can't find your answer? Check out these communities...

PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python