Still no full-text search? Mystified by the priorities.

32 views
Skip to first unread message

Jonathan Feinberg

unread,
Apr 8, 2009, 4:15:59 PM4/8/09
to Google App Engine
Long ago I attracted a flame-fest when I expressed my opinion that
adding support for other programming languages should be given less
priority than fixing bugs and adding infrastructural features. Here we
are, months later, and the big announcements are

1) Java (my God, why?)

and

2) Cron jobs (...but I could already write cron jobs to hit a URL)

In the meantime, full-text search is not even on the roadmap.

I'm torn. As the creator of Wordle, I'm truly grateful to Google and
the GAE team for the use of an automatically-scaling app
infrastructure. It has been a pleasure to use. On the other hand, the
lack of search has been a huge problem for Wordle users, and I've got
no good options.

I acknowledge that search is my pet issue; I don't claim to represent
a community or interest group with these comments. Then again, I can't
think of a CRUD-style app that doesn't require or benefit from text
search. So, while I'd consider using GAE in the future for some
stateless utility micro-site, or maybe a static site, I won't use it
again for anything with user-created data. While I've begun to regret
having used it for Wordle, I admit that it's my own fault for not
having thought through the implications of having no full-text search
available.

Tom Brander

unread,
Apr 8, 2009, 4:44:30 PM4/8/09
to Google App Engine
I concur, it certainly cut out most blogging/ wiki type uses... Which
of course represents a major segment and class of web apps, One would
think that the king of search could make this issue go away...

Gopal Patel

unread,
Apr 8, 2009, 5:28:21 PM4/8/09
to Google App Engine
Even simplest of application need search. It should be there much
before than they think J of java. I don't mind with java or python ,
both are new to me. But it is better to stick to one and give more
infrastructure feature than pleasing few bunch of either group. The
irritating point is , its not even in road map yet.. and they are
thinking of XMPP.... one of the ugliest decision from
google......damn.

Gopal Patel

unread,
Apr 8, 2009, 5:42:01 PM4/8/09
to Google App Engine

Julian

unread,
Apr 9, 2009, 3:43:59 AM4/9/09
to Google App Engine
Maybe I don't fully understand the problem but what would prevent
anyone from coding the full-text search ??

For example, Jonathan (very nice app by the way) you already have word
segmentation, why don't you build your own table of indexes that
references for each word a list of wordles? If there is write
contention, you could divide alphabetically, by languages, etc..

Julian

Surya G

unread,
Apr 9, 2009, 4:21:17 AM4/9/09
to Google App Engine
Yea I completely agree , why don,t they allow us to tap into the
search prowess , instead addding questionable language support . I am
a java programmer by day , but I was comfortable with python in few
days , not sure if there is even a demand for new languages by serious
users.

Adam Sah

unread,
Apr 9, 2009, 5:21:08 PM4/9/09
to Google App Engine
In the meantime, you might consider Google Base-- it's far from ideal
in many ways,
but it works for many apps and I've launched several high volume
apps on it.
http://base.google.com/

For apps with requirements Base can't meet including commercial apps,
I've heard
good things about Lucene, and you could run this on a commercial
VPS (I've had
good luck with linode.com) or hosting facility, e.g. Amazon EC2.

adam
(Google engineer)

dalenewman

unread,
Apr 9, 2009, 5:41:46 PM4/9/09
to Google App Engine
I concur also.

I expected to see "Search API" right next to "Datastore API" when I
started working with GAE.

It makes sense and fits the model; offer some for free and then charge
for more.

I would think a Search API that leveraged Google's search
infrastructure would be GAE's killer app.

However, maybe with java support now, you can run Lucene. I imagine
there might be some problems writing the indexes to the filesystem
though, which is really why you'd need search to be an infrastructure
aware API like datastore or memcache.

Dale
www.bookdope.com

Ubaldo Huerta

unread,
Apr 9, 2009, 7:54:09 PM4/9/09
to Google App Engine
I second the opinion that proper full text search should have been in
the road map. My app soon will need fulltext searching. I'm
considering dumping all the data in google base and "forwarding" the
searching against the google base api. I wonder if I'd be violating
google base terms of service or whether there is a cap on queries,
etc. Haven't checked.

Anyhow, from my perspective, the priorities are also dead wrong,
unless the current aim has shifted to get corporations to start
outsourcing some of their stuff to the cloud. Hence java support. I'm
happy, though, about the "cron", it might come in handy for the
dumping business (not without irony)

Google: The (albeit monumental) task of changing the rules of the game
for web startups won't be completed until full text is available.

Portos

unread,
Apr 10, 2009, 3:20:19 PM4/10/09
to Google App Engine
I agree 100% with this topic

Jonathan Feinberg

unread,
Apr 10, 2009, 7:35:06 PM4/10/09
to Google App Engine
On Apr 9, 1:21 pm, Adam Sah <adam....@gmail.com> wrote:

> In the meantime, you might consider Google Base

You know, I was going to tell you to get real, but that's not so
crazy. I have two questions for you:

Google base correctly handles unstructured text in its index, with
stemming and legitimate scoring, etc.?

Will there be a pragmatic way for me to get the text data from my
existing 720,000 data items into Google base without blowing through
my CPU quotas?

Ubaldo Huerta

unread,
Apr 11, 2009, 4:14:12 PM4/11/09
to Google App Engine
Jonathan

I see no other simpler way than the google base thing. Of course, you
can get yourself a sql instance in amazon cloud (or set up lucene,
etc), etc but if you are here is because don't want to be a sys admin
guy :-)

In terms of efficient use of resource and control I would go with the
remote api (which is wonderful, btw) I've only used it indirectly, via
bulk uploader.

http://code.google.com/appengine/articles/remote_api.html

Since there is no lightweight queuing in GAE, I think you (we :-) are
going to need to keep track of what has been processed (although
tracking could be done in the client side, just as well)

Off the top of my head, I'd think that the ordering by primary key
should be the best way to query the data you'll dump to google base.
BTW, google base is very fast, and even let's you do geo queries. In
gae, geo queries are a nightmare because of the one inequality
operator in queries, so I had to fight with the mercator projection,
store tiles ids per zoom level, etc. That's way I though of
outsourcing bounding box queries to google base a while back. Didn't
end up doing it because it's yet another moving part, and the latency
would have ruined the user experience.

Anyhow, maybe there will be, at last, a legitimate use case for google
base :-)

It'd be nice to hear how it goes if you decide to look further or if
you figure out another way to support full text search. BTW, did you
check whether the base api tos allow it?

It'll also be nice to hear the opinion of gea moderators.

-U

Tom Brander

unread,
Apr 29, 2009, 8:37:30 PM4/29/09
to Google App Engine
I don't get Google's reticence to even comment on this issue and give
some guidance as to when an how we might be looking for a solution.

Thomas McKay - www.winebythebar.com

unread,
Apr 29, 2009, 8:58:31 PM4/29/09
to Google App Engine
I concur. Awkward putting the GAE badge on my homepage and then adding
caveats to the search fields.

Joe Bowman

unread,
Apr 29, 2009, 11:52:59 PM4/29/09
to Google App Engine
What about Yahoo! Boss? You can restrict it to search a site, and
while not documented, has functionality such as inurl and inpath which
you could use to push out the specific data you need. The one trick
would be to make sure Yahoo searches the proper path, but I'm sure
there's ways to get that in their crawler list.

On Apr 29, 4:58 pm, "Thomas McKay - www.winebythebar.com"

Lee Olayvar

unread,
Apr 30, 2009, 3:45:33 AM4/30/09
to google-a...@googlegroups.com
Fully agree. The fact that its not out yet is surprising, the fact that its not even on the roadmap is simply jaw dropping bizarre.
--
Lee Olayvar

pran__

unread,
Apr 30, 2009, 2:57:19 PM4/30/09
to Google App Engine
+1, it would be great if Search API could be provided. I have been
using the Searchable Model for quite sometime, and it fits to my basic
needs, but i know a search engine that has made the expectations of
people really high, as soon as they see a search box :-)

--
Pranav Prakash

On Apr 30, 8:45 am, Lee Olayvar <leeolay...@gmail.com> wrote:
> Fully agree. The fact that its not out yet is surprising, the fact that its
> not even on the roadmap is simply jaw dropping bizarre.
>

Waldemar Kornewald

unread,
Apr 30, 2009, 8:27:24 PM4/30/09
to Google App Engine
While SearchableModel itself is rather limited the principle behind it
can be improved a lot. We'd be willing to sell our "search" app
(currently, only for Django, but webapp is planned). It comes with:
* ability to only index certain properties (instead of all string
properties as with SearchableModel)
* Porter Stemmers for English and German (you can search for 'cheap
cars' and find results with 'cheap car')
* word prefix search (match anything starting with ...)
* values index (allows for searching all values of a certain property;
e.g.: automatically generate a list of all tags of your blog posts and
make the tags themselves searchable for auto-completion)
* auto-completion via jQuery/AJAX for prefix search and values index
* easy to use views and templates for showing search results
* key-based pagination (only browsing entities without search
capability, though)
* some kind of "coarse-grained" sorting of results

In case you wondered, it does have all of App Engine's limitations:
* no result sorting
* no ranking
* no more than ca 5000 unique words per entity
But with the integrated Porter Stemmer you get much better search
results than with SearchableModel and you can make your website easier
to use by integrating auto-completion with just a few lines of code.

We want to setup a demo site, so you can see it in action. We plan to
publish everything in May.

Regarding the price: This package will be available for a one-time fee
and you can use it for an unlimited number of developers (i.e.: *no*
yearly per-developer license fee). Minor release upgrades will be
available for free, of course. Possibly, we might also provide a free
upgrade to a search app which is adapted to Google's full-text search
API when that gets released.

If you're interested and want to learn more please contact me at
wkornewald[at]gmail[dot]com

Bye,
Waldemar Kornewald

dalenewman

unread,
Apr 30, 2009, 8:27:37 PM4/30/09
to Google App Engine
Looks like the java community already has this search business all
figured out :-)

http://www.kimchy.org/searchable-google-appengine-with-compass/

I guess this Compass thing must use Lucene and store the Lucene
indexes in the GAE data store. Reminder; this is a guess based on a
quick skim of the blog post (link above).

Only time will tell if this works out.

Dale

Waldemar Kornewald

unread,
Apr 30, 2009, 8:44:04 PM4/30/09
to Google App Engine
I highly doubt that the index can be updated efficiently in a single
request. This might work for a handful entities on the local
development server, but I'm sure it'll quickly break down if you have
a few 100 or 1000 items. Otherwise, if it were that trivial Google
could've provided that feature a long time ago.

I could imagine that if you used a script to remotely update the
search index you could actually get acceptable search performance
(i.e., on the already-built index), but I don't have any hard numbers
here and Shay Banon didn't know if the port would perform well on App
Engine, either.

Bye,
Waldemar Kornewald
--
Use Django on App Engine with app-engine-patch:
http://code.google.com/p/app-engine-patch/

Ian Lewis

unread,
May 1, 2009, 9:20:14 AM5/1/09
to google-a...@googlegroups.com
This is also not currently working on deployed appengine.
--
=======================================
株式会社ビープラウド  イアン・ルイス
〒150-0012
東京都渋谷区広尾1-11-2アイオス広尾ビル604
email: ianm...@beproud.jp
TEL:03-5795-2707
FAX:03-5795-2708
http://www.beproud.jp/
=======================================

Waldemar Kornewald

unread,
May 1, 2009, 8:36:56 PM5/1/09
to Google App Engine
Hi everyone,
if you're potentially interested in buying our search package, as
described above in this thread (http://tinyurl.com/dxen3z), please
take part in this short survey, primarily to help us find a fair
price:

http://www.surveymonkey.com/s.aspx?sm=CzIohuPfdcTL8z484vcX4Q_3d_3d

While there is not yet a demo site we hope that you can at least
provide an approximate estimate. Thanks a lot!

Bye,
Waldemar Kornewald
Reply all
Reply to author
Forward
0 new messages