python - Speeding up templates in GAE-Py by aggregating RPC calls -

- June 15, 2010

here's problem:

class city(model):   name = stringproperty()  class author(model):   name = stringproperty()   city = referenceproperty(city)  class post(model):   author = referenceproperty(author)   content = stringproperty()

the code isn't important... django template:

{% post in posts %} <div>{{post.content}}</div> <div>by {{post.author.name}} {{post.author.city.name}}</div> {% endfor %}

now lets first 100 posts using post.all().fetch(limit=100), , pass list template - happens?

it makes 200 more datastore gets - 100 each author, 100 each author's city.

this understandable, actually, since post has reference author, , author has reference city. __get__ accessor on post.author , author.city objects transparently , pull data (see this question).

some ways around are

use post.author.get_value_for_datastore(post) collect author keys (see link above), , batch get them - trouble here need re-construct template data object... needs code , maintenance each model , handler.
write accessor, cached_author, checks memcache author first , returns - problem here post.cached_author going called 100 times, mean 100 memcache calls.
hold static key object map (and refresh maybe once in 5 minutes) if data doesn't have date. cached_author accessor can refer map.

all these ideas need code , maintenance, , they're not transparent. if

@prefetch def render_template(path, data)       template.render(path, data)

turns out can... hooks , guido's instrumentation module both prove it. if @prefetch method wraps template render capturing keys requested can (atleast 1 level of depth) capture keys being requested, return mock objects, , batch on them. repeated depth levels, till no new keys being requested. final render intercept gets , return objects map.

this change total of 200 gets 3, transparently , without code. not mention cut down need memcache , in situations memcache can't used.

trouble don't know how (yet). before start trying, has else done this? or want help? or see massive flaw in plan?

i have been in similar situation. instead of referenceproperty, had parent/child relationships basics same. current solution not polished @ least efficient enough reports , things 200-1,000 entities, each several subsequent child entities require fetching.

you can manually search data in batches , set if want.

# given posts, fetches data template need # 2 key-only loads datastore. posts = get_the_posts()  author_keys = [post.author.get_value_for_datastore(x) x in posts] authors = db.get(author_keys)  city_keys = [author.city.get_value_for_datastore(x) x in authors] cities = db.get(city_keys)  post, author, city in zip(posts, authors, cities):   post.author = author   author.city = city

now when render template, no additional queries or fetches done. it's rough around edges not live without pattern described.

also might consider validating none of entities none because db.get() return none if key bad. getting basic data validation though. similarly, need retry db.get() if there timeout, etc.

(finally, don't think memcache work primary solution. maybe secondary layer speed datastore calls, need work if memcache empty. also, memcache has several quotas such memcache calls , total data transferred. overusing memcache great way kill app dead.)

Search This Blog

DR ode

python - Speeding up templates in GAE-Py by aggregating RPC calls -

Comments

Post a Comment

Popular posts from this blog

c++ - Convert big endian to little endian when reading from a binary file -

gdi+ - WxWidgets draw a bitmap with opacity -

C#: Application without a window or taskbar item (background app) that can still use Console.WriteLine() -