Inner join query with distinct using Django ORM
Disclaimer
I initially wrote a completely wrong article. Sorry for that. Thanks to astopy who let me know. Here under is a revised version.
Revised Article
One point of the Django documentation that I did not get right at first is : how to get the ORM to generate a bit complex join queries. Yeah, sorry, I learnt SQL before ORMs, as many people, and I definitely think in terms of relational models and SQL joins.
For example, given the following models involving deliveries of products in a driver schedule, how do you get the list of products appearing in a schedule ? (I know that this sounds a lot like a Data Base 101 course).
class Product(models.Model):
name = models.CharField(max_length=150)
class Schedule(models.Model):
name = models.CharField(max_length=150)
class Delivery(models.Model):
.... some data fields here
product=models.ForeignKey(Product)
schedule=models.ForeignKey(Schedule)
The answer is deceptively simple, just write :
s = Schedule.objects.get(....)
Product.objects.filter(delivery__schedule = s).distinct()
which roughly will translate into the following SQL :
select distinct(product.*) from product
inner join delivery on product.id = delivery.product_id
inner join schedule on schedule.id = delivery.schedule_id
where schedule.id = some_id
The important thing to notice is that in the line
Product.objects.filter(delivery__schedule = s).distinct()
you can use delivery as the beginning of the lookup parameters, while there is no delivery field in the Product class. Django will understand.
What is maybe a bit misleading (even if it is quite logical if you think about it), is that there actually is a field named delivery_set created on the Product class (it is a RelatedManager), but you can not use it in the lookup parameters. It took me a long time to figure this out, so it might be the case for other people too....
That said, everything is explained in the Django doc here even if for once, I find the explanation a bit perfunctory
Exporting a Schedule from a Django Application to Google Calendar
Disclaimer
This blog post is not very original, it is essentially a mix of two previous blog posts, one from Andrew Turner and one from Derek Willis.
And now, the real content
For my pet project, ZoFa, I wanted to allow my users to publish their schedule, computed on the site, to their favourite calendar application, particulary to Google Calendar, which is the calendar application I am using myself.
To do this, there are two steps to accomplish : First,create a view generating an ical file (with extension .ics) I used the vobject library to do this as showed here.
(By the way, vobject is a part of the doomed Chandler project described in excellent book. Highly recommended.)
Having battled quite some time with this, I think that the following cod snippet could be of interest to some people. So, here is the code of my view :
def ical(request, worker_id):
worker=Worker.objects.get(pk=int(worker_id))
cal = vobject.iCalendar()
cal.add('method').value = 'PUBLISH' # IE/Outlook needs this
for shift in worker.shift_set.filter(published=True):
vevent = cal.add('vevent')
vevent.add('dtstart').value=shift.begin #begin and end are python datetime objects
vevent.add('dtend').value=shift.end
vevent.add('summary').value=shift.name
vevent.add('uid').value=str(shift.id)
icalstream = cal.serialize()
response = HttpResponse(icalstream, mimetype='text/calendar')
response['Filename'] = 'shifts.ics' # IE needs this
response['Content-Disposition'] = 'attachment; filename=shifts.ics'
return response
Finally, just add the url to your urls.py :
url(r'^ical/(?P<worker_id>\d+)/shifts.ics$','ical',name="shifts_ical")
It seemed necessary for Google to pick it up to add a file name at the end of the url, but I am not sure about this.
The second step needs that your user import the url of your calendar into Google Calendar. To do this in the current interface :
- open the "other calendars" little frame
- click on "add" (at the bottom)
- click "add by url" and paste the url.
Some concluding remarks :
- It can take some time before Google really requests your calendar and displays it
- The .ics file generated can also be used in Outlook, ICal or Thunderbird.
- Your url needs to be publicly available : there is no way to provide to Google Calendar a login and a password to your site.
The next step for me would be to automatize the addition to google calendar (with a nice "Add your schedule to Google Calendar" button) but I still have no idea about the feasability of this.
And now, some gratuitous cuteness...
djangocon.eu 2010 Impressions
Last week, after two years of using and being delighted at the Django web development framework, I travelled to Berlin to meet the Django community at Djangocon Eu in Berlin.
Here are my impressions:
The attendance
People were very friendly, open and quite talkative, at the opposite of what I feared for such a technical conference. From my previous life in research, I am used to people brought together to present their research material, while not that much interested about others work. This was definitely not the case here.
I did not make any survey, but I think you could put most people in 3 big categories: web freelancers, web agency employees, and students . I did not meet any other person working like me in a company using Django for internal projects, although I think this is quite common. Most original occupation I encountered : a Swiss army lieutenant. And only men except for one exception...
The talks
The level of the talks was not necessarily always excellent, but again, coming from my experience of boring scientific talks beginning with an equation, I was quite enthusiastic. That said, this type of conference is suffering from the widely different levels of skills in the attendance that makes possible to get beginner-intermediate level talks (like Honza Král talk about testing) next to quite advanced talks (like Armin Ronacher talk about wsgi). Everything for everybody I guess, but also something inadequate for each one too.
My highlights are not very original : Jacob Kaplan-Moss Keynote, and Russel Keith-Magee talk about the Django development process. I liked those two presentations because they allowed me to get a hint of the opinions of key personalities in the Django community, while enjoying their very good presentation skills. They definitely made me want to contribue and gave me the feeling that the core developers of Django are constituting a well balanced team, where no one is the real star attracting all the attention (on the opposite of the Drupal or Ruby on Rails communities for example). The egos seem to be well managed. Furthermore, their attention to the human side of the problems is really interesting. I heard at leat thrice the motto "We must avoid technical solutions to people problems" coming out of their mouth, which I found quite significant. In my opinion, this makes for a really attractive community.
One final note on the talks topic : I know that NoSQL is the big buzzword currently and I can not decide if this is the case for good reasons but I am not sure that the attendance did enjoy so much time spent talking about it.
The organization
The organization, venue and food were all really great. Kudos to the organizers. As I have read on Twitter, those little tables with electrical plugs and excellent quality wireless network were really a plus.
Fun facts
- Like in rock concerts, it seems cool to wear t-shirts of previous editions, the older the better.
- There is a trend to make slides with no more than 5 words per slide, where almost all the space is devoted to huge pictures, possibly unrelated to the current topic and usually taken from Flickr
- Most people were multitasking to death : at the same time talking to their neighbours, tweeting, reading web sites (generally the ones cited by the speaker, but not always), posting pictures to Flickr. It enforced to me the image of the web developper who needs to be able to swallow tons of information daily, and to switch context instantly.
- Twitter is the second venue of the conference, where questions and answers are exchanged during and after the talks.
- I think there were more smartphones than people.
Conclusion
I am very happy that I did the effort to go to Berlin to meet like-minded people in real life. I came back with lots of subjects I want to dig in (Hudson, Class based views, South, and so much more)
In summary, even if I definitely think that those words are hugely overused in this community : Djangocon was AWESOME!. Djangocon For The Win !
P.S.: Remember that English is not my mother tongue. I will gratefully commit all grammar and spelling patches...
My Take on GWT
The Google Web Toolkit is a set of tools that allows you to write, in Java, Rich Internet Applications, that is applications running in your browser while looking like desktop application with features like rapid GUI updates, drag & drop,... At its core, GWT is a compiler of Java to Javascript, but it also includes a full development environment, and generates very nice serialization mechanisms for your application to transfer objects from your server to the client browsers. In the latest 2.0 version, it also includes tools to profile the performances of your applications.
I have now been using the Google Web Toolkit at work for 13 months. I can consequently talk about it confidently, even if what I will say is already out of date, since I am still working on the 1.7 version, while the 2.0 version has been out for three monts, fixing a lot of my gripes with the toolkit.
The standard workflow for writing GWT apps is as follows :
- first, write a backend in Java using your libraries/framework of choice (backend in other languages are possible, but definitely will not feel as natural).
- second, write some servlet on top of your backend, implementing a GWT interface that will cause serialization code to be added at compile time. Now your servlet can be queried by a GWT compiled client.
- Third, write, still entirely in Java, your client designed to be ran in a browser, using classes reminding Swing for the interface (TextBoxes, FlexTable, ...). This java code will be compiled into javascript (which is at the same time genius and evil, IMHO).
- During the development, and during debugging, you can use a hosted mode, where your code is running into an embedded browser (an old version of Internet Explorer until the 2.0 version of GWT, if you are working on Windows) and compiled on demand. This means that you can modify your client code, hit reload, and see the changes immediately and that you can debug it using the standard Eclipse debugger view. Even if it is a bit slow, especially while debugging, since all code is compiled on demand, it works flawlessly.
-
On the client side, when you make call to the server, those calls are executed asynchronously, meaning that a call to your servlet method MyObject getMyObject() will return immediately without output, deferring the response of the server to a later date and to a method handleSuccess(MyObject myObject). This constraint is clearly coming from the browser constraints (not many threads are available, and to keep UI responsive, you can not monopolise a thread too long), and cannot really be avoided.
Now, from the trenches some remarks about this nice theory :
The development process is quite slow : I think that waiting too long between the moment I write some code and the moment I can see the results on the screen can make me loose a lot of momentum. With GWT, when your application reaches a reasonable size, if you write a piece of code spanning over your client and your server (adding a parameter to a backend public function), you will have to recompile you server code, launch the hosted mode (which takes about a minute on my pc), and suffer the sluggishness of your interface in hosted mode to get to the feature you want to test. I often find myself wondering what it was I wanted to test, after so much time ... Furthermore, once you are happy with your new feature, you still have to compile your project completely, and it can take a lot of time, since the compilation has to take place for every supported browser (a different compiled version is made available for each one, taking into account discrepancies in the javascript engines implementations) and it definitely consumes a lot of CPU.
The toolkit in development consumes a lot of resources : I think it is quite unpractical to use GWT without using the corresponding Eclipse plugin and consequently, without using Eclipse. In my case, Eclipse easily consumes 1GB of ram. When you launch the hosted mode of GWT for my app, about 800 megs of ram are eated again. As Windows XP, my database (MySQL), my server (a JBoss) takes about 1GB of ram together, and since on 32 bits windows, it is difficult to enjoy more than 3 GB of ram, programming in GWT, especially when you use the more memory demanding debugging mode, is a game of attentiveness: I always must carefully avoid to get to the point where my system is swapping, a state from which it can take many minutes to get back from. (A solution could be found by using more memory and a 64 bits OS, but until recently, the hosted mode of GWT was not running nicely in 64 bits mode and I do not know what is the current state of affaire about this).
Fortunately he abstraction of javascript is not leaky. I never needed to take a look at the generated javascript and was always able to use the Eclipse debugger. In my opinion, this is quite astonishing. That said, on the contrary, for the layout of your application, the abstraction is VERY leaky. To get beautiful interfaces, you will have to use CSS a lot and understand it very well. Furthermore, you will have to understand the DOM to master your app layout. For example, getting a nice resizing behaviour can be quite demanding : the only trick I know to ensure that a resize goes well is to empty a DOM parent of all its DOM children, wait that it takes its size according to the grand parents constraints, then put back all the children in the right size. This is not exactly abstracted from the DOM intricacies. That said, if you have better tricks for handling "windowing" apps, I would be glad to hear about it.
Having access to all the Java base language is great. For example, it is totally possible to throw an exception on the server side that will be caught on the client side. I use that for example to report errors during data imports and this is much better than handling special error return codes for my functions. Furthermore, being able to use complex data structures (TreeSet, for example) and objects built on the top of them and debugging them easily is a great advantage, and probably the main reason why my current project would not have been possible without GWT. I can not imagine myself handling very complicated structure in pure Javascript, with its broken object model. That said again, one downside is that not all the Java standard API classes are present in the GWT libraries usable on the client side. For example, Calendar is not available, which lead to major pains in my project.
Finally, do not be fooled by the fact that you are writing in Java: you are still very limited by browser imperfections and you should save your resources, as much as possible. I for example, discovered too late that displaying one hundred labels of text in a FlexTable takes MUCH too much time to render in Internet Explorer 8. That said, obviously, I should have known that hundred is a limit number that no computer should ever be forced to handle....
In conclusion, I can say that I have a love/hate relationship with GWT, but that I sincerely hope that when I will be able to use version 2, once we have refactored all legacy code, most of my gripes will disappear: I will have faster compilations, less resource consumption (thanks to the new development mode, where you can use your browser of choice ? ), and less leaky abstractions (thanks to the new Layout Panels ? ). That said, I think that GWT is very alone in its niche : allowing to build intricate web applications that will not depend on an external browser plugin to work, while allowing you to use a battle proven programming language and relying on its community. In fact, I think that GWT is alone in this niche which makes it the de facto king of the hill...
(well, there is Pyjamas a Python port of GWT but it lacks the support of a big company like Google).
Django ORM performance tricks
Lately, I have been trying to enhance the response time of http://zofa.be which is written using django. I have done this mainly for fun, since, with about 30 000 pageviews a year, ZoFa is not precisely of the size of YouTube and can be handled easily by my shared hosting on djangohosting.ch (when they are not DDOSed, but that is another story).
Here are my three most useful tips after this experience:
As for any optimization effort, you should first be able to mesure performance. For Django, the de facto standard tool seems to be the Django Debug Toolbar even if it is officially not part of the Django framework itself. This tool is so great that it hurts. I would do many things to get such a quality tool in my daily java development. For my special case, what was the most interesting was the report on the number of sql queries done for each page. To give you a hint of how good it is, check this video. (Notice the jazz soundtrack, the jazz references are a hallmark of the Django culture). The great feature of the Debug toolbar for optimizing your queries is that it allows you to see directly in your browser all the queries done to render the current page.
Use the select_related() queryset method, documented here. The explanation is much better than whatever I could write, so read it.
Using the select_related() method amounts to one idea : get all the objects you need in one sql query rather than doing one query per object (Nevertheless, beware loading too much objects using those tricks, you should load just enough). A similar idea is the following : when you need to access all the objects of a queryset, you should cast it first to a list, since it will avoid that each element of the queryset is loaded independently.
Put in an example, you should do this :
items=Items.objects.all()
items= list(items)
for item in items:
print str(item)
instead of this :
items=Items.objects.all()
for item in items:
print str(item)
The first snippet makes one query, while the second makes as many queries as there are items. Notice that if you pass a queryset to a template that will display all its elements, you should cast it to a list too, for similar reasons. I first read about this in the Django documentation, but unfortunately, I am unable to find it again right now.
One finale note : since the site is heavily customized (à la Facebook) for each user, I am not sure I would have so much interest in using caching (even if I do think that the template caching introduced in Django 1.2 could be very efficient). That said, if I ever get performance problems (which would in fact be a happy problem), that would be the first thing on the to do list.
Edit: I just discovered the page in the official Django documentation about Data Base Access Optimization. It makes this current post pretty useless...
Next posts
The next posts on this blog will talk about :
- How I Learned to Stop Worrying and Love Django
- optimizing query performances in Django
- my habits as a hobbyist programmer
- django-mingus, the django app powering this blog
- the workflow of development in GWT and how to make it better
- cursing at css
- research vs industry (as I have experienced both)
- my take on model checking and program verification
Introduction
Hi ! My name is Martin De Wulf.
I multitask mainly between the following personnalities :
- software engineer at Routing International
- father of twin girls
- cohabitant of a physician specialized in infectious diseases
- PhD in computer sciences
- hobbyist programmer
I am using the pretext of Project 52 to launch a personal blog which main subjects will be:
- me and what I think (obviously)
- programming (using GWT, Django, Jquery, Spring, and many more)
- my hobby project : ZoFa.be a website allowing people working in shifts to specify their day preferences, publish plannings (on the site and via email) and exchange shifts.
- maybe a bit of photography and parenthood tips too.
As I would like to address as many people as possible, I will write in English. It is not my mother tongue and please excuse me for my many mistakes. All corrections are welcome. I hope that writing more in English will make me improve, since writing hundreds of pages for my thesis was not enough.
To summarize, people who could be interested by this blog are either nerds or geeks, and maybe, parents of twins.