Discussion:
gnome "storage" project, smart windows
Matt Price
2005-12-10 23:12:32 UTC
Permalink
hey folks,

some time ago there was an ambitious gnome project called storage:
http://www.gnome.org/~seth/storage/features.html

it seems to have died, far as I can tell. Beagle is the nearest thing
we've got, I guess, though KDE is working on this thing called tenor,
for kde 4:
http://appeal.kde.org/wiki/Tenor

anyway, I love beagle, am just getting used to its awesome power, but
I would like to have slightly more robust interfaces e.g. "smart
folders" ? la MacOS, with beagle-derived search results updated every
time you open the "folder" (so it's not really a folder, but an alias
for a command I guess.

Has anyone implemented anything of this sort? and/or, does anyone
know whether "storage" still exists in development somewhere?

THanks much,

Matt

-------------------------------------------
Matt Price matt.price at utoronto.ca
History Department, University of Toronto
(416) 978-2094
--------------------------------------------
David Hart
2005-12-11 00:10:52 UTC
Permalink
Post by Matt Price
http://www.gnome.org/~seth/storage/features.html
it seems to have died, far as I can tell. Beagle is the nearest thing
we've got, I guess, though KDE is working on this thing called tenor,
http://appeal.kde.org/wiki/Tenor
anyway, I love beagle, am just getting used to its awesome power, but
I would like to have slightly more robust interfaces e.g. "smart
folders" ? la MacOS, with beagle-derived search results updated every
time you open the "folder" (so it's not really a folder, but an alias
for a command I guess.
That's not the way I understand that Beagle works. AFAIU the kernel
notifies Beagle of changes in the filesystem and Beagle schedules
those changes for investigation. In practice, I've found that Beagle
often reacts to those changes in less than a second (you can see some
problems that was causing me with Mutt (the mail reader that I use )
in a thread from last week).

As much as I love Beagle I've had to disable it for the time being.
I've had problems with the index that Beagle creates growing too much,
and worse, Mono (on which Beagle depends) has been grabbing memory (like
a memory leak).

I wish I could spend the time to help with bug reports but I just
can't afford it at the moment (I need to find some paid employment).

Thanks for the links above (which I'll investigate tomorrow).
--
David Hart <ubuntu at tonix.org>
Matt Price
2005-12-12 18:13:51 UTC
Permalink
Post by David Hart
Post by Matt Price
anyway, I love beagle, am just getting used to its awesome power, but
I would like to have slightly more robust interfaces e.g. "smart
folders" ? la MacOS, with beagle-derived search results updated every
time you open the "folder" (so it's not really a folder, but an alias
for a command I guess.
That's not the way I understand that Beagle works. AFAIU the kernel
notifies Beagle of changes in the filesystem and Beagle schedules
those changes for investigation. In practice, I've found that Beagle
often reacts to those changes in less than a second (you can see some
problems that was causing me with Mutt (the mail reader that I use )
in a thread from last week).
that's my understanding too. The best search client for beagle
updates search resultsi n real time. What I'd like is a way to access
common searches in a persistent way -- in effect like having a script
that launches best with a particular query, but from the interface
point of view it would be cool if the search results could look like
filesi n a folder. I don't see that this is incompatible in principle
with beagle, but I also don't know enough aboutthe inner workings
thereof to be sure.
Post by David Hart
As much as I love Beagle I've had to disable it for the time being.
I've had problems with the index that Beagle creates growing too much,
and worse, Mono (on which Beagle depends) has been grabbing memory (like
a memory leak).
... not related to not having xattr enabled in fstab, is this? I find
beagle to be working pretty well at the moment...

matt
Post by David Hart
I wish I could spend the time to help with bug reports but I just
can't afford it at the moment (I need to find some paid employment).
Thanks for the links above (which I'll investigate tomorrow).
-------------------------------------------
Matt Price matt.price at utoronto.ca
History Department, University of Toronto
(416) 978-2094
--------------------------------------------
David Hart
2005-12-13 02:32:02 UTC
Permalink
Post by Matt Price
that's my understanding too. The best search client for beagle
updates search resultsi n real time. What I'd like is a way to access
common searches in a persistent way -- in effect like having a script
that launches best with a particular query, but from the interface
point of view it would be cool if the search results could look like
filesi n a folder. I don't see that this is incompatible in principle
with beagle, but I also don't know enough aboutthe inner workings
thereof to be sure.
Ahh... I see what you mean now. Yes, a folder view of searches would be
cool.
Post by Matt Price
Post by David Hart
As much as I love Beagle I've had to disable it for the time being.
I've had problems with the index that Beagle creates growing too much,
and worse, Mono (on which Beagle depends) has been grabbing memory (like
a memory leak).
... not related to not having xattr enabled in fstab, is this? I find
beagle to be working pretty well at the moment...
If that's user_xattr then no, I have it enabled on the volumes that
Beagle was searching.
--
David Hart <ubuntu at tonix.org>
'Forum Post
2005-12-13 04:24:55 UTC
Permalink
"Searching" is the problem. It's a stupid idea for a machine which you
control. I cannot tell the folks at wired where to put their pages, but
I most certainly can do this on my own machine.

I've been working on a project that solves this from the ground up; it
works with legacy systems via "snooping" on selected folders and
squirreling things away as they appear. Files are stored in a fixed
heirarchy and their metadata stored in an easily accessed sqlite
database (of course it could be made "real" sql if needed).

In an image containing about 50,000 files I can "search" and (for
example) buuild an mp3 playlist or an image collection in the time it
takes to type the constraints and hit enter. Because the original paths
are also stored as metadata this also works just dandy with systme files
- for example, every debian package downloaded via synaptic gets
catalogued and then symbolically linked back into the "cache." This is
a zero maintenance way to build a local repository.

Right now it's all command line stuff. I was working on a python
interface but had a crash two days ago that cost me my desktop at the
time and it's taken me until now to get this machine back up.

slocate has been a part of linux since about forever. It's stable,
fast, and mature - it's just stupid to redesign that functionality from
the ground up just so you can piggyback on the latest gee-whiz memory
gobbling design platform.
--
poptones
Matt Price
2005-12-13 08:54:58 UTC
Permalink
Post by 'Forum Post
"Searching" is the problem. It's a stupid idea for a machine which you
control. I cannot tell the folks at wired where to put their pages, but
I most certainly can do this on my own machine.
but you never screw up? or get confused about which project you put
stuff in?
Post by 'Forum Post
I've been working on a project that solves this from the ground up; it
works with legacy systems via "snooping" on selected folders and
squirreling things away as they appear. Files are stored in a fixed
heirarchy and their metadata stored in an easily accessed sqlite
database (of course it could be made "real" sql if needed).
In an image containing about 50,000 files I can "search" and (for
example) buuild an mp3 playlist or an image collection in the time it
takes to type the constraints and hit enter. Because the original paths
are also stored as metadata this also works just dandy with systme files
- for example, every debian package downloaded via synaptic gets
catalogued and then symbolically linked back into the "cache." This is
a zero maintenance way to build a local repository.
hmm, well that sounds very interesting and I'd lvoe to try it out.
how about e.g. full text earching of structured documents like
open-document files? I like that feature in beaggle...
Post by 'Forum Post
slocate has been a part of linux since about forever. It's stable,
fast, and mature - it's just stupid to redesign that functionality from
the ground up just so you can piggyback on the latest gee-whiz memory
gobbling design platform.
hmm, to be honest I find locate utterly inadequate to the task of
searching, since it searches on names, not contents. maybe slocte is
different?
-------------------------------------------
Matt Price matt.price at utoronto.ca
History Department, University of Toronto
(416) 978-2094
--------------------------------------------
'Forum Post
2005-12-13 11:08:32 UTC
Permalink
Post by 'Forum Post
"Searching" is the problem. It's a stupid idea for a machine which you
control. I cannot tell the folks at wired where to put their pages, but
I most certainly can do this on my own machine.
but you never screw up? or get confused about which project you put
stuff in?
All the time. I have a few well named folders like "music" and "dvds"
and "tv" (all in /hollywood, of course) but I also have thousands of
image and music files that don't really fit anywhere in particular. I
also have about a dozen desktop backups and every one of these has
within it many duplicate files but also many unsorted files that were
in some "download" queue or tempdir at the time I backed them up.
Storing and sorting these will be the next step of the acid test.

The point I was making is that I should not have to concern myself with
where the file goes - so long as I describe it relatively adequately or
am able to describe it in a "search" why should I have to worry about
where it goes on my drive? Having to mess with sorting stuff into
"folders" and constantly worry ablut what's updated, where this goes
now that I've added ten thousand whatevers and the old method doesn't
work anymore... and then still ahve to rely on a "search engine" to
interact with that stuff is just nuts - it's twice the work.
Post by 'Forum Post
hmm, well that sounds very interesting and I'd lvoe to try it out.
how about e.g. full text earching of structured documents like
open-document files? I like that feature in beaggle...
Of course, that's a fundamental need. You could not locate a doc by
description if you were not cataloging this stuff.

But having to screw with mono on a system when the basis of a
completely adequate system is part of the existing operating system
seems to me a great waste of resources on many levels. Rather than
create a user-centric "search engine" with primitve security and flaky
behavior, why not instead just build on what's already there and
stable? No, slocate doesn't index stuff by content - but the greater
point is this: even slocate works backwards.

If I save a file I cannot do so by magic. I cannot wish the file to
exist on the hard drive, nor can I retrieve it without the aid of the
operating system. Every time a file is stored on the disc it goes
through linux - nothing goes in or out without linux knowing about it.

So why does linux then have to go back and "search" for all this stuff?
Why isn't linux instead catalogin each file into a *quickly searchable*
database every time it stores that file? And why do I have to know
where that file goes?

The system has been built as it is and changing this from the ground up
is impractical. But adapting the system to perform this maintenance is
not at all impractical. You can even do this with *system* files like
those in the /etc and /var directories. Because linux has a perfectly
usable system that allows symbolic linking of resources, "active"
system files can be stored in a structure that does not fit the
/usr/var/etc paths but is still acessible in this manner.

So, for example, when I edit my /etc/fstab file why does it get
overwritten? Why doesn't linux just remember the old one and swap in
the new one? It knows I have edited the file and changed its contents -
why does this then have to be retroactively indexed?

I cannot rewrite the kernel - my talents simply are not up to the task.
But there are ways to model this behavior and that's what I'm working
with. A system like beagle can work just fine, but beagle's main
weakness is that it has terribel security and it seeks to overcome this
weakness by being built inside some "sandbox." It's an illusion of
safety that really just adds so much complexity it becomes brittle.

As an example, here's the routine that hashes and stores the file and
then catalogs basic information about the file. This is a very simple
example that doesn't incorporate worry about magic numbers and plugin
miners - it just takes a given bit of metadata, a file (or collection
of files) and stores them away. It's nothing but a bash script - well
developed, robust technology in a completely unoptimized, simple,
readable and maintainable script.

if [[ -f "$_file" ]] && [[ -e "$_file" ]];then
_hash=`md5sum "$_file"`;
_hash="${_hash:0:32}"
_FIL=${_file##*/};
fldr="${storagepath}${_hash:0:2}/${_hash:2:2}/${_hash:4:28}";
# echo "Storing $_file at hash $fldr" >> ~/wtf.log

if [[ -e "$fldr" ]];then
echo "$_file $fldr">>${storagepath}dupes.list;
# echo -e "folder exists\n" >> ~/wtf.log
cmd="insert into _alias values('${_hash}','${_FIL}')";
sqlite ${storagepath}meta.db "$cmd";

else mkdir "$fldr";
cp "$_file" "$fldr";
_SIZ=`stat -c "%s" "$_file"`;
_DATE=`stat -c "%Y" "$_file"`;
cmd="insert into _files
values('${_hash}','${_FIL}','${_FIL##*.}','${_SIZ}','${_DATE}',1,'archive','${_DATE}')";
sqlite ${storagepath}meta.db "$cmd";
# echo -e "files info stored: $cmd \n" >> ~/wtf.log
cmd="insert into _meta
values('${_hash},','<keywords>${keywords}</keywords><originalpath>${_file}</originalpath>')";
sqlite ${storagepath}meta.db "$cmd";
# echo -e "meta info stored: $cmd \n" >> ~/wtf.log
# rm -f "$_file";
chmod 444 "${fldr}/${_FIL}"
fi;
fi;

In a structure of about 50,000 files the metadata folder is, at
present, less than 30MB. This can be backed up separately and could
also be exchanged with others. Since info about the files is stored
along with their unique hash (and of course md5 can be replaced with
any other) the system can quickly and easily decide if a given file is
present, for example in file sharing applications - just look up the
hash and then see if the files located there match. Because all this
isn't "built into the filesystem" it can be used on any filesystem and
can be maintained with existing, mature and human-friendly tools.

Simple example: when downloading files from usenet I no longer even
look at a dialog box; the system itself monitors my incoming usenet
folder and, when it sees a new file appear there it locates the cached
text, copies the fields I have specified (posted by, group, date,
subject line, x-ref number), then hashes and stores the file and its
metadata. Putting together a playlist involves describing the music or
files I want without having to be concerned about the location of the
data.

On the "hunter-gatherer" backend, if a "new and improved" version of a
file is posted the system can instantly tell simply by comparing the
posted filesizes to the files it already has. Because the metadata is
more comprehensive than just filenames (as in slocate) it can be
smarter about telling the difference between, say, Al Cooper and Alice
Cooper. But because it never -replaces- a file but only adds more,
mistakes are easily corrected. This would allow building "smart agents"
that can pool the array of resources available to a desktop machine (web
search, p2p, torrent, usenet, irc etc) in "tivo like" fashion. The more
data it collects the more it knows about your tastes and the better it
is able to find other relevant data for the owner. And because it uses
existing security models this can all be built to whatever level of
paranoia the eu happens to feel prudent.
--
poptones
Matt Price
2005-12-13 12:04:32 UTC
Permalink
wow.

this is pretty rad stuff.

I agree with you about mono, which I would rther not have to deal
with. ANd I also agree it'd be great to have thiss stuff built into
the os -- I hear reiser4 was certain similar capabilities maybe?
would be cool if it did.

anyway gotta hop on a plane but thx for this & keep me informed as you
movethrough tis project! very cool indeed.

matt
Post by 'Forum Post
Post by 'Forum Post
"Searching" is the problem. It's a stupid idea for a machine which you
control. I cannot tell the folks at wired where to put their pages,
but
Post by 'Forum Post
I most certainly can do this on my own machine.
but you never screw up? or get confused about which project you put
stuff in?
All the time. I have a few well named folders like "music" and "dvds"
and "tv" (all in /hollywood, of course) but I also have thousands of
image and music files that don't really fit anywhere in particular. I
also have about a dozen desktop backups and every one of these has
within it many duplicate files but also many unsorted files that were
in some "download" queue or tempdir at the time I backed them up.
Storing and sorting these will be the next step of the acid test.
The point I was making is that I should not have to concern myself with
where the file goes - so long as I describe it relatively adequately or
am able to describe it in a "search" why should I have to worry about
where it goes on my drive? Having to mess with sorting stuff into
"folders" and constantly worry ablut what's updated, where this goes
now that I've added ten thousand whatevers and the old method doesn't
work anymore... and then still ahve to rely on a "search engine" to
interact with that stuff is just nuts - it's twice the work.
Post by 'Forum Post
hmm, well that sounds very interesting and I'd lvoe to try it out.
how about e.g. full text earching of structured documents like
open-document files? I like that feature in beaggle...
Of course, that's a fundamental need. You could not locate a doc by
description if you were not cataloging this stuff.
But having to screw with mono on a system when the basis of a
completely adequate system is part of the existing operating system
seems to me a great waste of resources on many levels. Rather than
create a user-centric "search engine" with primitve security and flaky
behavior, why not instead just build on what's already there and
stable? No, slocate doesn't index stuff by content - but the greater
point is this: even slocate works backwards.
If I save a file I cannot do so by magic. I cannot wish the file to
exist on the hard drive, nor can I retrieve it without the aid of the
operating system. Every time a file is stored on the disc it goes
through linux - nothing goes in or out without linux knowing about it.
So why does linux then have to go back and "search" for all this stuff?
Why isn't linux instead catalogin each file into a *quickly searchable*
database every time it stores that file? And why do I have to know
where that file goes?
The system has been built as it is and changing this from the ground up
is impractical. But adapting the system to perform this maintenance is
not at all impractical. You can even do this with *system* files like
those in the /etc and /var directories. Because linux has a perfectly
usable system that allows symbolic linking of resources, "active"
system files can be stored in a structure that does not fit the
/usr/var/etc paths but is still acessible in this manner.
So, for example, when I edit my /etc/fstab file why does it get
overwritten? Why doesn't linux just remember the old one and swap in
the new one? It knows I have edited the file and changed its contents -
why does this then have to be retroactively indexed?
I cannot rewrite the kernel - my talents simply are not up to the task.
But there are ways to model this behavior and that's what I'm working
with. A system like beagle can work just fine, but beagle's main
weakness is that it has terribel security and it seeks to overcome this
weakness by being built inside some "sandbox." It's an illusion of
safety that really just adds so much complexity it becomes brittle.
As an example, here's the routine that hashes and stores the file and
then catalogs basic information about the file. This is a very simple
example that doesn't incorporate worry about magic numbers and plugin
miners - it just takes a given bit of metadata, a file (or collection
of files) and stores them away. It's nothing but a bash script - well
developed, robust technology in a completely unoptimized, simple,
readable and maintainable script.
if [[ -f "$_file" ]] && [[ -e "$_file" ]];then
_hash=`md5sum "$_file"`;
_hash="${_hash:0:32}"
_FIL=${_file##*/};
fldr="${storagepath}${_hash:0:2}/${_hash:2:2}/${_hash:4:28}";
# echo "Storing $_file at hash $fldr" >> ~/wtf.log
if [[ -e "$fldr" ]];then
echo "$_file $fldr">>${storagepath}dupes.list;
# echo -e "folder exists\n" >> ~/wtf.log
cmd="insert into _alias values('${_hash}','${_FIL}')";
sqlite ${storagepath}meta.db "$cmd";
else mkdir "$fldr";
cp "$_file" "$fldr";
_SIZ=`stat -c "%s" "$_file"`;
_DATE=`stat -c "%Y" "$_file"`;
cmd="insert into _files
values('${_hash}','${_FIL}','${_FIL##*.}','${_SIZ}','${_DATE}',1,'archive','${_DATE}')";
sqlite ${storagepath}meta.db "$cmd";
# echo -e "files info stored: $cmd \n" >> ~/wtf.log
cmd="insert into _meta
values('${_hash},','<keywords>${keywords}</keywords><originalpath>${_file}</originalpath>')";
sqlite ${storagepath}meta.db "$cmd";
# echo -e "meta info stored: $cmd \n" >> ~/wtf.log
# rm -f "$_file";
chmod 444 "${fldr}/${_FIL}"
fi;
fi;
In a structure of about 50,000 files the metadata folder is, at
present, less than 30MB. This can be backed up separately and could
also be exchanged with others. Since info about the files is stored
along with their unique hash (and of course md5 can be replaced with
any other) the system can quickly and easily decide if a given file is
present, for example in file sharing applications - just look up the
hash and then see if the files located there match. Because all this
isn't "built into the filesystem" it can be used on any filesystem and
can be maintained with existing, mature and human-friendly tools.
Simple example: when downloading files from usenet I no longer even
look at a dialog box; the system itself monitors my incoming usenet
folder and, when it sees a new file appear there it locates the cached
text, copies the fields I have specified (posted by, group, date,
subject line, x-ref number), then hashes and stores the file and its
metadata. Putting together a playlist involves describing the music or
files I want without having to be concerned about the location of the
data.
On the "hunter-gatherer" backend, if a "new and improved" version of a
file is posted the system can instantly tell simply by comparing the
posted filesizes to the files it already has. Because the metadata is
more comprehensive than just filenames (as in slocate) it can be
smarter about telling the difference between, say, Al Cooper and Alice
Cooper. But because it never -replaces- a file but only adds more,
mistakes are easily corrected. This would allow building "smart agents"
that can pool the array of resources available to a desktop machine (web
search, p2p, torrent, usenet, irc etc) in "tivo like" fashion. The more
data it collects the more it knows about your tastes and the better it
is able to find other relevant data for the owner. And because it uses
existing security models this can all be built to whatever level of
paranoia the eu happens to feel prudent.
-------------------------------------------
Matt Price matt.price at utoronto.ca
History Department, University of Toronto
(416) 978-2094
--------------------------------------------

Niran Babalola
2005-12-12 23:47:56 UTC
Permalink
Smart folders will apparently be in the next version of Nautilus.
http://blogs.gnome.org/view/alexl/2005/12/07/0
Post by Matt Price
hey folks,
http://www.gnome.org/~seth/storage/features.html
it seems to have died, far as I can tell. Beagle is the nearest thing
we've got, I guess, though KDE is working on this thing called tenor,
http://appeal.kde.org/wiki/Tenor
anyway, I love beagle, am just getting used to its awesome power, but
I would like to have slightly more robust interfaces e.g. "smart
folders" ? la MacOS, with beagle-derived search results updated every
time you open the "folder" (so it's not really a folder, but an alias
for a command I guess.
Has anyone implemented anything of this sort? and/or, does anyone
know whether "storage" still exists in development somewhere?
THanks much,
Matt
-------------------------------------------
Matt Price matt.price at utoronto.ca
History Department, University of Toronto
(416) 978-2094
--------------------------------------------
--
ubuntu-users mailing list
ubuntu-users at lists.ubuntu.com
http://lists.ubuntu.com/mailman/listinfo/ubuntu-users
--
Niran Babalola
http://niran.org
Matt Price
2005-12-13 01:06:06 UTC
Permalink
Post by Niran Babalola
Smart folders will apparently be in the next version of Nautilus.
http://blogs.gnome.org/view/alexl/2005/12/07/0
wicked, thanks,
matt

-------------------------------------------
Matt Price matt.price at utoronto.ca
History Department, University of Toronto
(416) 978-2094
--------------------------------------------
Phillip Susi
2005-12-13 05:01:47 UTC
Permalink
Hah, what a coincidence. I was just grumbling about how updatedb runs
every day when I boot up and spins the hell out of my hard drive for 5
mins. I thought "hey, doesn't the kernel have a mechanism these days to
notify a monitoring daemon about changes in the filesystem? Why doesn't
someone write a program that monitors for the changes and updates the db
rather than rescanning the entire filesystem each day?"

It sounds like that's kind of what this beagle thing does... I'll have
to check into it.
Post by Matt Price
hey folks,
http://www.gnome.org/~seth/storage/features.html
it seems to have died, far as I can tell. Beagle is the nearest thing
we've got, I guess, though KDE is working on this thing called tenor,
http://appeal.kde.org/wiki/Tenor
anyway, I love beagle, am just getting used to its awesome power, but
I would like to have slightly more robust interfaces e.g. "smart
folders" ? la MacOS, with beagle-derived search results updated every
time you open the "folder" (so it's not really a folder, but an alias
for a command I guess.
Has anyone implemented anything of this sort? and/or, does anyone
know whether "storage" still exists in development somewhere?
THanks much,
Matt
Continue reading on narkive:
Loading...