2010-2-11

07:28 mulka
still ponding how to do subtasks... haven't figured out an ideal solution yet. anyone want to brainstorm?
07:28 mulka
pondering*
07:56 mulka
anyone here?
09:11 asksol
mulka: subtasks?
09:11 mulka
yea... one task depends on a set of other tasks
09:18 mulka
asksol?
09:18 asksol
just launching them, or depends on their result?
09:19 mulka
depends on the result
09:20 asksol
well, that's when it gets funky
09:20 asksol
the best way is to use callbacks
09:20 asksol
callbacks all the way down
09:20 asksol
and never wait for the result
09:21 mulka
how would I do a callback with celery?
09:21 asksol
e.g. @task def refresh_feed(feed_url, callback=None): result = refresh_the_feed(); if callback: callback(result)
09:22 asksol
that result could be another task for example
09:22 asksol
s/result/callback
09:23 asksol
refresh_feed(url, callback=update_subscription_counts.delay)
09:25 asksol
siwu was even working on a task chain wrapper
09:25 mulka
didn't realize you could pass functions as arguments to tasks
09:26 asksol
well, you can, as long as you use pickle
09:26 mulka
does the code get passed across the wire, or does it just use the code that's on the worker server?
09:27 asksol
to be compatible with json as well, you could just pass the name of a task, and then in the code: from celery.registry import tasks; callback = tasks[callback]; callback.delay(result)
09:27 asksol
it doesn't serialize the actual source code
09:27 asksol
if you think about it, that would be kinda crazy :)
09:28 asksol
>>> pickle.dumps(operator.add)
09:28 asksol
'coperator\nadd\np0\n.'
09:30 mulka
serializing source code I think has been done before, but I don't think its straight forward to implement
09:30 mulka
that's what picloud does, right?
09:30 asksol
no, it doesn't actually
09:31 asksol
or, it does that for lambda
09:31 asksol
lambdas
09:31 asksol
and for stuff defined in __main__
09:31 asksol
but you still need to upload any modules to your vm
09:32 asksol
imagine disassembling most of django and assembling it on the other side again
09:33 mulka
that would be kind of silly
09:34 asksol
I'd like to steal the picloud pickler, but it's not open source
09:35 mulka
let's say I have a task that depends on the results of two other tasks. how would I get the results from the other two tasks together in the same function?
09:39 asksol
well, that's where it gets really tricky
09:39 asksol
using TaskSet and waiting for it is the easiest
09:39 asksol
but it's not optimal
09:40 asksol
09:43 mulka
asksol: great work on Celery by the way! it's shaping up to be a really useful piece of software
09:54 asksol
thanks!
10:12 mulka
seems like a TaskSet itself should be able to have a callback passed to it
10:14 mulka
task chains could be interesting. you said siwu was working on something? anywhere I could get more info?
10:26 asksol
you'd have to ask him
10:27 asksol
it's not that easy to track when a taskset is done
10:27 asksol
but it was an idea yeah
10:28 asksol
to keep a counter, and the last task calls the callback when the counter hits the taskset size
10:32 asksol
padt: killall usually doesn't work for me
10:32 asksol
maybe if you write killall python manage.py celeryd, or something like that
10:35 padt
asksol: ok. I just assumed it would do more or less the same thing as what's in the docs. If it doesn't work just kill the fixme
10:35 asksol
at least, that's the universal solution :)
10:35 asksol
works on an old sparcstation, works now
10:36 padt
heh
10:37 mat__
asksol: is there anyway of setting a routing_key for httpdispatchtask
10:37 mulka
Oh... another question... what result store do you recommend for large results? Seems like a lot of them can't handle things bigger than 1MB.
10:37 asksol
mat__: HttpDispatchTask.apply_async(..., routing_key=)
10:38 asksol
mulka: what is large? for really large files maybe you should use HDFS :)
10:38 asksol
large results
10:39 asksol
that's for gigabyte large.
10:39 mulka
I think my results are only going to be a few MBs
10:40 mat__
asksol: HttpDispatchTask.apply_async(args=[TORNADO_URL, action_template, instance.follower_id], routing_key='activity.stream.follow') tried that, just having serious thinking issues about how to do kwargs
10:40 mat__
can you pass in a dict to kwargs?
10:40 asksol
yes, of course
10:40 asksol
no it's the special CeleryKeywordArgumentDictWriter
10:41 mat__
lol, too early for me
10:41 mat__
was up re deploying machines till 3 am, and started again at 9am
10:41 mat__
brain dead
10:42 asksol
hehe, i understand
10:43 mat__
just broke all our stream system, doh!
10:43 asksol
ouch
10:44 asksol
mulka: the best is to not send large results
10:44 asksol
but I guess you know that
10:45 asksol
just trying to make you think of ways to avoid it
10:47 mat__
asksol: thanks a lot again, back online now
10:48 mat__
still cant find a way of testing real time streams
10:49 mulka
mat__: real time streams?
11:04 padt
I wrote some context thingy to test the twitter streamin api. might be what you need
11:05 padt
11:05 padt
it the stream is http that is
13:53 padt
redditors have no sense of humour: http://www.reddit.com/r/Python/comments/b0g9u/c...
13:55 asksol
hehe
14:02 asksol
14:02 asksol
nice nice!
21:21 donspaulding
I'm trying to run celeryd as a Windows service. Am I in uncharted waters here?
21:22 donspaulding
scratch that, I'm *thinking* of running celeryd as a Windows service, and trying to gauge how much work will be involved.
22:29 mulka