Radified Community Forums
http://radified.com/cgi-bin/yabb2/YaBB.pl
Rad Community Non-Technical Discussion Boards >> YaBB Forum Software + Rad Web Site >> Apache file-extention mojo
http://radified.com/cgi-bin/yabb2/YaBB.pl?num=1208670491

Message started by Rad on Apr 20th, 2008 at 12:48am

Title: Apache file-extention mojo
Post by Rad on Apr 20th, 2008 at 12:48am
Sent WiredTree tech sppt the follow ticket today:


Quote:
Hi.

I'm sure you guys have seen how some sites reconfigure their file-extentions in cool, customized ways, such as:

http://arstechnica.com/index.ars

http://arstechnica.com/news.ars Heard this was done in Apache, a black-art about which I know nothing.

I would like to configure my home page:

http://radified.com/index2.html

to be synonymous with: index.rad (cool, huh?)

but I have zillions of pages currently pointing to index2.html .. so that file would have to point to index.rad

Is this something you guys could help with?

Or something I need to research on my own?

Title: Re: Apache file-extention mojo
Post by Rad on Apr 20th, 2008 at 12:51am
Response:


Quote:
This would be something you would have to research on your own.  A point in the right direction would be using mod_rewrite to alias .rad to any extension in question.

You can rename all of your files to .rad and we can add that handler in to apache but overall thats a huge pain.  Mod_rewrite will be much cleaner.

Magoo? Nigel? Help?

Title: Re: Apache file-extention mojo
Post by Rad on Apr 20th, 2008 at 1:45am
Q: What is the chance I screw something up and render the entire site useless?


Quote:
Just remove the mod_rewrite rules from the .htaccess file and you will be back to normal.

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 20th, 2008 at 7:33am
Well, you have lots of options here, depending on where you put the config and how you want to do it. By the way, the advice you were given about renaming or copying the files from .html to .rad is incorrect, it's hardly difficult (it's a simple command) but the real issue is long-term site maintenance; it's something that becomes a problem long-term as you edit pages.

There are lots of subtleties with mod_rewrite and whether the directives are present in a .htaccess file or globally in httpd.conf - if you've got AllowOverride already configured in the httpd.conf so that you can set mod_rewrite directives in the .htaccess files, then you just need the .htaccess file to contain these two lines:

RewriteEngine On
RewriteRule (.*)\.rad $1.html

Just as a side thing, the best advice I can give you with learning web configuration and if you want to experiment with dynamic content is to get a private copy of Apache on which you can experiment; it's your choice whether to grab a copy of something like XAMPP or run it on the same OS as your webhost uses. Most web development issues are platform-independent so it's much of a muchness except that running the same configuration as your main host means that it's just a matter of copy things up from the test environment to publish them live.

Having a private test server doesn't matter much for static content, but if you have any interest in going into dynamic content (writing PHP scripts, for instance) then it's far better to develop on a local instance.

The other thing that I recommend is using a source-code control system like Subversion for the website as well, so that your test copy of Apache (especially if it's in a VM) is completely disposable, and you can recreate an entire website and its configuration just by checking out a working copy.

Title: Re: Apache file-extention mojo
Post by MrMagoo on Apr 20th, 2008 at 9:08am
I agree with Nigel that mod_rewrite is the way to go if you want to do this.  The only thing you have to think about is that you can't leave it on very long and still have the option to turn it off.  Once people start linking to index.rad, you will break their links if you turn mod_rewrite off.

I also agree that its REALLY nice to have a development server to play around with.  You can test everything without messing up your actual website.  Most of this stuff isn't much fun to read about but its fun to play with on the server.  Doesn't take anything special - any old 500 MHz computer wtih 512 MB ram will run a command line server sweetly.

Title: Re: Apache file-extention mojo
Post by Rad on Apr 20th, 2008 at 10:09am
Hi. Thanks.


wrote on Apr 20th, 2008 at 7:33am:
if you've got AllowOverride already configured in the httpd.conf

From my host sppt:


Quote:
Its always set to AllowOverride All on cPanel servers.


wrote on Apr 20th, 2008 at 7:33am:
you have lots of options here

That's what I'm looking for. Do I know the options and their respective pro's and con's?

Specifically, here's what I know so far .. is that I have a file in root directory named index2.html, which I would like to be/make synonymous with index.rad .. for purely cool, cosmetic reasons .. and as a learning experience.

I don't use any *.php files, except for one where I was .. playing/learning about the basics of PHP. PHP will come later, but I still have mastering more CSS on my plate.

Learning subversion sounds interesting for the future, but may (since we are still primarily static) increase my learning curve too much too quickly. (Especially when I still have lots of legal stuff I'm dealing with here .. of a non-website nature.)

I already have a copy of WAMP-Server installed on my laptop:

http://www.en.wampserver.com/

.. tho I have v1.7.1 installed, and I see that WS v2.0b has been released. I think my host is good about maintaining updated versions of the OS/software installed. Looks like I installed WAMP ~ a year ago, last May:

http://blogs.radified.com/2007/05/wamp5_root_password_mysql_database_phpmyadmin.html

Do you think that will be okay? (WAMP vs XAMPP, v1.71 vs 2.0b) Our server uses Linux - CentOS 5, Apache 2.2.8, MySQL 5.0.45-community, PHP 5.2.5.

Is there any other info you need to help me make a wise decision?

I have zillions of pages currently linking to index2.html. Would I have to keep *duplicate* copies of BOTH index2.html and index.rad maintained? That would suk. I am familiar with maintaining two different copies of the same page and don't like it at all.

I would like to be able to delete index2.html,and simply update index.rad .. tho I would always need a pointer from index2.html to index.rad, since I could never be sure I update every last file which contained a link to index2.html, and search engines all know index2.html (and not index.rad).

So, I'm trying to get a feel for the details of how this would work.


Quote:
the real issue is long-term site maintenance; it's something that becomes a problem long-term as you edit pages

Can you spell this out for me?

My other question is .. does this affect ALL *.html files? Or just the html files in the directory containing the htaccss file? Ideally, it would (at first) affect JUST the index2.html file (maximum control)

Does it turn index2.html into index.rad or do I actually have to upload a file named index.rad?

That's enuf questions for now (but I have more). I apologize if they sound silly. But I am clueless.

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 20th, 2008 at 7:47pm
I'll reply in multiple messages since I have to grab moments of free time here and there, but I'll get to everything eventually.


Rad wrote on Apr 20th, 2008 at 10:09am:
Learning subversion sounds interesting for the future, but may (since we are still primarily static) increase my learning curve too much too quickly

Fair enough, and I agree, but let me just say that getting into the habit of using a change-management system is one of those things where the benefits really accrue over time. It's sorta like brushing your teeth, in that doing so each day doesn't make a huge difference but over time it adds up, even if you're just working solo.

The thing is that there's no harm in starting - when I got into the habit about 20 years ago with RCS, all it took to start was basically the same thing you have now with Subversion; install it, do an initial checkin, and largely you can forget about it since all you do it periodically check more stuff in and never look at it. The fact that you're capturing that change history means that when you do have the time to start learning more about the change-management tools, you'll have some valuable history already captured.

One particularly nice thing is that Subversion clients typically have all the tools to work locally with a repository, so (like RCS back in the day) there's no need to set up a server to get started. A particularly fun one to use on Windows is TortoiseSVN, which gives you all the tools added to the Windows desktop as a really unobtrusive context menu extension.

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 20th, 2008 at 9:52pm

Rad wrote on Apr 20th, 2008 at 10:09am:
Do you think that will be okay? (WAMP vs XAMPP, v1.71 vs 2.0b) Our server uses Linux - CentOS 5, Apache 2.2.8, MySQL 5.0.45-community, PHP 5.2.5.

It should be fine in terms of WAMP vs XAMPP, and for basic stuff there's not a huge difference between Apache 1.x and 2.x, although it's probably best to use something based around Apache 2.x since the world is basically switching to it.

This is where VMs shine; if you want to experiment with Apache2 on a regular CentOS distro (or any other really, there's basically no difference been Apache2 installs on any UNIX, with the notable exception of MacOSX which has radically different conventions for where things go and how services are configured) just download an appliance you can use with VMWare Player or create your own in something like VirtualBox. It's not either/or any more when running a whole 'nother OS in a sandbox is as easy as double-clicking a shortcut.

It's probably not a good use of your time installing any Linux distro on an old physical box to start with, unless you have a burning need for a server that is running 24/7 separate from your primary Windows desktop - use VMs instead. It's vastly more convenient, almost certainly way faster (given that your main desktop is probably multicore), you can get preinstalled VM appliance versions of every distro imaginable - you can experiment with and or all of the distributions that way until you decide on one you particularly. Even if you have a favourite client OS, at that point it's still usually better to stick with VMs - if you install your own with VirtualBox, for instance, the free version of that lets you take snapshots like the paid version of VMWare Workstation, and snapshots mean that you can play with absolutely anything at all and just insta-revert the snapshot if it doesn't work how you like.

[ This is what makes having an SVN repository on your primary machine (VM host) particularly helpful. Since I periodically need to do stuff that runs on a variety of Linux distros *and* MacOSX *and* different Windows editions *and* DOS *and* OpenSolaris, and I just rotate between them. Fire up one, sync from the repository, work for a while, commit back, fire up the next OS, rinse, repeat. Rather than dealing with the tedium of continuously patching them all since there's so much version churn, it's honestly easier to build them and keep them stable, and just build a new VM periodically as new major releases of various things drop out. ]

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 20th, 2008 at 11:54pm

Rad wrote on Apr 20th, 2008 at 10:09am:
I have zillions of pages currently linking to index2.html. Would I have to keep *duplicate* copies of BOTH index2.html and index.rad maintained? That would suk. I am familiar with maintaining two different copies of the same page and don't like it at all.

That's just one strategy, and it's more a temporary one. The thing with a re-organization like this is that it takes time; time to plan, time to make the changes, time to test, and then finally deploy. The problem isn't dealing with the before and after of the re-org, it's that it's hard to stop the world completely during that time. So a really good transition plan still lets you make content edits during the transition and not have them wind up either lost or fouling up the transition.

[ Incidentally, another plug for change-control. Subversion has the rare virtue amongst change-management systems of understanding things like file renames natively. If you have two branches, say a "live" branch in which index2.html exists as now, and an "experimental" one in which it has been renamed to index2.rad, you can maintain the "live" one indefinitely while you figure out the experimental one - as long as you tell Subversion about the renames, it will know that the two files with different names represent the same underlying thing and can help propagate edits between the two. ]

Now, in UNIX the traditional tool for this kind of process when applied to static content is the symbolic link; you'd migrate in stages by first (using a simple shell script) creating symlinks under the new name for all the existing files, so that during the transition time either name works. Then you can migrate the internal hyperlinks so that you consistently use the new name form, and finally you exchange the real file and the symlink (again, a simple shell script). At that point the symlinks can sit around as long as they need to.

What you're doing using mod_rewrite is essentially the same thing as a symlink, it's just that because mod_rewrite can work on patterns, it means that Apache is capable of doing the same thing via the RewriteRule patterns as a shell script would have done if you were doing it via the filesystem. Also since Apache is doing the work at the level of the URL namespace, it's a technique that works on non-UNIX platforms, and you don't have to worry about clearing out the individual redirects at the end either. The overall process isn't actually that different, it's just different mechanisms.


Rad wrote on Apr 20th, 2008 at 10:09am:
I would like to be able to delete index2.html,and simply update index.rad .. tho I would always need a pointer from index2.html to index.rad, since I could never be sure I update every last file which contained a link to index2.html, and search engines all know index2.html (and not index.rad).

A word of warning about that; if you want to physically use .rad, remember that not only will you have to teach Apache that .rad has a mime-type of text/html (which is easy enough using the AddType configuration directive), but you'll face that same problem with any tool you ever edit or associate the content with in future.

With Windows at least it's not too painful to edit the file associations so you can associate .rad with some editor, but you'll also need to be confident that whatever editor you use is going to be happy with .rad as HTML content, and there could be well be little annoyances that plague you from now until doomsday in all kinds of tools.

For instance, since I have a copy sitting on my desktop to try, I just tried KompoZer. It's got a hard-wired notion of what file extensions mean HTML (so the default open dialog won't show you any .rad files until you type in *rad, which you'd need to do every single time), and it really gets unhappy if you ask it to work with a file that has a non-HTML extension. Visual Studio lets you customize the editing experience you need, but you can't customise the File Open dialog filters so it knows that ".rad" is web content.

That's just a couple of tools, and maybe those don't matter to you, and maybe all the tools that do matter to you (or will do in the future) will be able to be taught to associate .rad with HTML content. Even so, it's worth bearing in mind that there's always a pretty big risk whenever you move away from any really well-established convention.

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 21st, 2008 at 1:49am

Rad wrote on Apr 20th, 2008 at 10:09am:
My other question is .. does this affect ALL *.html files? Or just the html files in the directory containing the htaccss file? Ideally, it would (at first) affect JUST the index2.html file (maximum control)

What tends to happen during Apache's URL-processing pipeline is that it traverses the URL component by component; at each stage, it checks the .htaccess file for the part of the URL it's resolved to something, and if that causes a rewrite it can affect the rest of the URL. Each .htaccess file can't generally change any part of the URL that's been processed in getting to the current directory, but it can affect anything to the right within the URL.

So, the .htaccess file in the root directory can apply rules that apply anywhere. For instance, it so happens that the regular expression I gave you turns any URL ending in .rad into a .html one, even in subdirectories, because the .* pattern in a regular expression means "any characters" and that will consume the entire rest of the URL in order to find a match - that is inside parentheses though, so the $1 on the right-hand side which says how the URL is rewritten puts them all back.

So, if we start with http://host/foo/bar/index.rad, by the time the .htaccess in the root gets to go, we're left with foo/bar/index.rad (Apache having already processed the host part) and that gets rewritten to foo/bar/index.html, because the .htaccess file in the root dir has the first chance at rewriting the URL and the example rule I gave you matched it. After doing that, it then resolves the "foo" and processes the .htaccess file in that directory, which can then add more rules, and then does the same thing with "foo/bar" (looking for .htaccess, processing any rules in it, etc), and finally if there's an actual file called foo/bar/index.html it'll serve that up.

It's all down to either wrangling the regular expressions that match URLs so they capture exactly the amount you want, and/or adding RewriteCond rules that look at other things to decide whether the rewrite occurs. If I changed the regular expression to being this:
 RewriteRule ([^/]*).rad $1.html
this means that the pattern only matches things that don't have slashes in them. However, this pattern will look throughout the whole URL to find a match, and so what it'll do is that it'll skip over everything in the incoming URL up to the last /, and then the pattern in the parens will match the last part of the path before the .rad - so in this case, the substitution will end up transforming foo/index.rad into index.html and strip out the path completely.

I can use the "anchor" character in the regular expression to insist that the pattern not go romping down the whole rest of the URL, like this:
 RewriteRule ^([^/]*).rad $1.html
where the leading ^ is a metacharacter that means "start of string", so this will only match files in the current directory and leave other URLs referring to subdirectories alone; this rule will match http://host/foo.rad and convert it to foo.html but it will leave http://host/foo/bar/foo.rad completely alone.

A RewriteRule that just did this:
 RewriteRule ^index.rad index2.html
since it has no actual pattern characters in the pattern, just matches that one file, and the use of the anchor ensures that this particular pattern only applies in the root and won't capture anything in any subdirectories.

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 21st, 2008 at 2:14am

Rad wrote on Apr 20th, 2008 at 10:09am:
Does it turn index2.html into index.rad or do I actually have to upload a file named index.rad?

The first example rule I gave you matches incoming URLs ending in .rad and converts them internally into .html, so index.rad got rewritten to index.html and then Apache (re-)consulted the filesystem to see if that existed. It would then serve up whatever was in index.html, and it would also rewrite xyzzy.rad into xyzzy.html (and so forth) because I used a quite general pattern. Note that if the rule ended up mapping into something that didn't exist, the default 404 ErrorDocument would tend to report the rewritten URL, not the original one (this is a quirk of .htaccess files, explained in the tech notes).

If you want to go the other way, just flip around the pattern and substitution: doing it just for the index2 case gives us
 RewriteRule ^index2.html index.rad
will make Apache turn index2.html into index.rad which Apache then tries to serve out of the filesystem.

Now, the final quirk is that Apache generally gives higher priority to static files than rewrite rules in .htaccess files; if we had a RewriteRule that converted index2.html into index.rad, but there was a file present called index2.html, Apache would find that and serve it up before it went looking for rewrite rules.

This is a strange quirk that results from the internal ordering of the operations in Apache and how far it's had to process the URL to convert it into a filesystem path before even getting to the stage of looking in a .htaccess file - see the technical notes for some other quirks resulting from this. Rewrite rules that have been hoisted out of .htaccess into the global httpd.conf don't have to live by the same rules, but then you have the different problem of usually needing to restart Apache if you tweak the httpd.conf since it caches what's in that basically forever (whereas it processes the .htaccess files only on-demand and doesn't cache their content very much - so you can change them at will but they get reprocessed a lot).

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 21st, 2008 at 3:35am
All this is assuming, by the way, that you're going to keep the content much as it is now (static HTML), which I wouldn't. What would I do?

Well, your study of CSS gets you a long way toward that. Once you are doing all the fancy styling from CSS, rather than using table-heavy layout, you really don't need much more than a nice-to-edit syntax like Markdown to do the main content generation and most everything else is custom CSS. Honestly, HTML is an awful format for content - I'd pretty much always choose to use a much saner source syntax, and let a Wiki-type engine with a reasonably sane basic markup engine (e.g., DokuWiki if you want to use something off-the-shelf) handle converting the source syntax into HTML.

That way you get through-the-web editing, decent content change history (HTML makes for butt-ugly diffs), the ability to totally customize your internal URLs (because they use a separate, much simpler syntax than external references), and all kinds of good stuff even if you don't use the rest of the Wiki-type functions.

If you don't want to serve the content dynamically (if you're really scared of the resource usage, say), you can still get the benefits of using something like a wiki yourself - keep your wikified environment on a private server, and have it publish the entire thing to static pages that you sync to the public server.

Of course, there's the problem of getting your existing content into such a system, but depending on the system there are ways and means, and actually with many of these Wiki-type systems they rely on mod_rewrite to convert incoming URLs into CGI Queries so (as discussed above) any existing static content takes priority over dynamic content, meaning you can convert at your leisure.

For our internal DokuWiki I wrote some Javascript that converted HTML to wikitext for page import, and for grins since I recently wanted to host some Open-Source projects on Google Code but their wiki system is pretty awful, I wrote some JavaScript that actually does the whole wiki->HTML conversion in the client browser, on the fly. Since it's JavaScript, it runs not just in a browser, but also in Windows Script host so it can do bulk operations on entire directories of source files. Unfortunately I'm stuck since having done that much, it's been taking months to get it signed off as a personal project (since in theory I need explicit permission before I start in order to avoid it becoming Symantec's property).

Title: Re: Apache file-extention mojo
Post by Rad on Apr 21st, 2008 at 12:47pm

wrote on Apr 20th, 2008 at 7:47pm:
but I'll get to everything eventually

Thanks, Nigel. I feel fortunate to benefit from your experience/expertise. Your messages are rich, so I'm reading them multiple times, to digest.


wrote on Apr 20th, 2008 at 7:47pm:
The thing is that there's no harm in starting

You make a compelling argument. Downloading now. I like that the TortoiseSVN subtext contains the word "cool". My biggest hesitation is that I know nothing about SVN, which makes it seem complex, and therefore time-consuming to learn. You seem to suggest I can start now and learn as I go .. a concept I find more palatable.


wrote on Apr 20th, 2008 at 7:47pm:
One particularly nice thing ... no need to set up a server to get started.

I like that. This seems to imply that I will be using a server "out there" somewhere, then? No? Update, In the help I found this:


Quote:
http://tortoisesvn.net/node/90

Tortoise SVN can create a respository for you, as long as its a file based one. I am not sure if it can create a server based one.

If you are using Tortoise SVN in file based mode, just right click on a folder you want to turn into a repository and select "Create Repository here" from the Tortoise SVN menu.

The phrase "change management" brings up philosophical implications, when applied to human beings.

I will read & respond to the rest of your replies in separate posts.

Title: Re: Apache file-extention mojo
Post by Rad on Apr 21st, 2008 at 2:50pm

wrote on Apr 20th, 2008 at 9:52pm:
best to use something based around Apache 2.x

My version of Apache on WAMP 1.7.1 installed last May = Apache 2.2.4
PHP 5.2.2 (this site uses 5.2.5), MySQL 5.0.37


wrote on Apr 20th, 2008 at 9:52pm:
just download an appliance you can use with VMWare Player  

Had to look that up: http://www.vmware.com/products/player/

Do I need VMWare to use VMWare Player?

The VM world is another place where research is required to get up to speed, but I've noted how you and others (Pleo?) have praised it, so it's on my list of things to learn. As of right now however, I'm still foggy on how it works, exactly.

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 21st, 2008 at 3:08pm
The way Subversion is set up, where the "repository" - the master copy of the history of everything you put into it - lives is identified by a URL, and that URL comes in all kinds of variations.

Subversion can work with dedicated Subversion servers (if you ever install Cygwin, for instance, you get the same command-line version that you get for UNIXes, which includes a server called "svnserve") that use a custom protocol and uses URLs of the form svn://host or svn+ssh://host

You can also export the repository over HTTP and HTTPS using an Apache2 plug called (as you'd expect) mod_svn, and that means you work with things in regular http: or https: URLs - Google Code Projects use this, as does SourceForge and other hosts all over the web.

Or, you can just refer to the place the repository is stored on a local disk using a file:/// URL. And you can do all three, in fact.

To get started using TortoiseSVN, make an empty directory somewhere (say, C:\svn) and right-click on it. One of the context menu options is "Create repository here...", and that populates an empty directory with a blank Subversion database.

From now on, that means a URL like file:///c:/svn refers to the whole repository, and URLs like file:///c:/svn/stuff/working/ refer to a subdirectory within the repository.

To load stuff into a repository, right-click on a directory and choose "Import...", then enter a URL where you want it to go - the first part of the URL pointing at the repository, and the rest picking a (virtual) subdirectory within it, so file:///c:/svn/ and then tack on anything you like. There are all kinds of "proper" conventions for what to call things, but those are just conventions and Subversion doesn't actually care what you do that much.

Now if you right-click on an empty folder or the desktop, you can "check out" something, and pick some of the content you checked in. What happens is that it gets copied out for you to then start editing, and in addition the Subversion client remembers where you got it from, and what version you have.

When you have a working copy subdirectory that contains stuff which it knows is all up to date, Tortoise shows the subdirectory with a big green tick. If you edit a file, it changes to a red tick to let you know you should update the content at some point via a "check in", and if you add new files you can tell it to grab all the newly-added files and throw them in as well.

Whenever you make a change you can put a summary note describing what you did so that you can figure out what you were doing (pretty handy when you're looking at stuff in a year's time), and that's pretty much all you need to know for starters. You can have as many working copies of things as you like, and you can just delete them if they don't contain anything you want to keep.

As long as all you do is change and add things, that's all you need to know. There's plenty of useful things you can do just with that, but they'll come about once you start changing things - you can use the "diff" tool to let you know what you've changed in a file recently (since it was last checked in), and the "Show log" tool to view the history of all the edits you've made as you check them in, and you can wind things back to not just the last version, but any earlier one.

It's a trivial thing, but like I said it's just a habit. It's like making little postings to a personal blog as you make changes to the files.

Title: Re: Apache file-extention mojo
Post by Rad on Apr 21st, 2008 at 3:12pm

wrote on Apr 20th, 2008 at 11:54pm:
The thing with a re-organization like this is that it takes time

Hmmm. I was thinking I could make one change and blammo > we're rocking with cool, rad web pages. At this point, I fail to see what will take time. (I'll continue reading.)


wrote on Apr 20th, 2008 at 11:54pm:
so that during the transition time either name works

Okay. This makes sense. Either name is what I want, for reasons previously mentioned (pre-existing links and search engines).


wrote on Apr 20th, 2008 at 11:54pm:
Now, in UNIX the traditional tool for this kind of process when applied to static content is the symbolic link; you'd migrate in stages by first (using a simple shell script) creating symlinks under the new name for all the existing files, so that during the transition time either name works. Then you can migrate the internal hyperlinks so that you consistently use the new name form, and finally you exchange the real file and the symlink (again, a simple shell script). At that point the symlinks can sit around as long as they need to.

This paragraph seems to be the meat-n-potatoes of what I'm after. I see what you're saying but don't understand they underlying technology to accomplish.

I'd just like to start with one file (the home page). Is that possible, or does it open a site-wide can o' worms once I start?


wrote on Apr 20th, 2008 at 11:54pm:
A word of warning about that; if you want to physically use .rad, remember that not only will you have to teach Apache that .rad has a mime-type of text/html (which is easy enough using the AddType configuration directive),

Uh, "warning," .. that didn't sound good. Maybe what I want to do ( a better approach), would be to KEEP using index2.html and work all thr Apache mojo from the Apache end. My aim/desire here is to maintain ONLY ONE FILE (not two). However that is accomplished doesn't matter. Whatever you would recommend as the more elegant and practical approach is fine by me.


wrote on Apr 20th, 2008 at 11:54pm:
could be well be little annoyances that plague you from now until doomsday  

I already have plenty of little annoyances in my life. What wouldyou recommend as the best option?

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 21st, 2008 at 3:17pm

Rad wrote on Apr 21st, 2008 at 2:50pm:
Do I need VMWare to use VMWare Player?  

Nope. Player *is* VMWare, it's just a simple edition that doesn't let you make completely new virtual machines from scratch or do some of the sophisticated snapshot operations that the full version does.

Just download Player, install it, pick up an existing Virtual Machine, called "appliances" when set up for Player - let's say you want to try Ubuntu, just download an appliance (unpack it if it's zipped up) and then you can run the virtual machine by double-clicking the VMX file and you're off!

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 21st, 2008 at 3:29pm

Rad wrote on Apr 21st, 2008 at 3:12pm:
At this point, I fail to see what will take time. (I'll continue reading.

Well, these things tend to start with wanting to make just one quick change... but it never stops with just one, does it?

As soon as you have an alias like .rad, if you like it you'll want to start preferring it (so it's what appears in bookmarks and the like), which means rewriting all the internal hyperlinks in your pages so that's the "official" name you expose to the external world. That change is the one that takes time and effort to accomplish.

And so, the dominos start to fall....

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 21st, 2008 at 3:34pm

Rad wrote on Apr 21st, 2008 at 3:12pm:
I'd just like to start with one file (the home page).

OK, then, just put this into the root .htaccess file:
 RewriteEngine on
 RewriteRule ^index.rad index2.html

Do that, and then you should be able to browse to index.rad and what Apache will serve you up is what's in index2.html in the filesystem.


Quote:
Maybe what I want to do ( a better approach), would be to KEEP using index2.html and work all thr Apache mojo from the Apache end.

That's definitely the best. If a file contains HTML, you should call it .html, meaning no surprises or confusion for you (or any HTML tools you use) in the future.

Title: Re: Apache file-extention mojo
Post by Rad on Apr 21st, 2008 at 3:37pm

wrote on Apr 21st, 2008 at 1:49am:
Apache's URL-processing pipeline is that it traverses the URL component by component

Mostly Greek. Read it over several times. Started seeing more light, but still over my head.

As a side note, just thinking out loud, it seems interesting that a browser can read a file named *.ars, or *.rad .. when when those files are not natively supported by web browsers. Know what I mean? I mean, SOMETHING must be telling the browser that *.rad = *.html .. right?



Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 21st, 2008 at 4:01pm
It just means that as it works through, say, a URL like http://radified.com/cgi-bin/yabb2/YaBB.pl?num=1208670491/15#18 Apache starts at the left-hand side and starts chewing through the URL one part at a time until it's eaten the whole thing or found something to serve up.

cgi-bin
 -> look for <Directory> sections under httpd.conf to find where the root is in the filesystem
 -> check what it is, directory or file under the root
 -> not a file, look in ~/.htaccess
 -> look for rules that apply to cgi-bin/yabb2/YaBB.pl?num=1208670491/15#18
 -> none, tick off cgi-bin
 -> look for <Directory> sections under httpd.conf to find where cgi-bin is in the filesystem

yabb2
 -> check what it is, directory or file under cgi-bin
 -> not a file, look in ~/cgi-bin/.htaccess
 -> look for rules that apply to yabb2/YaBB.pl?num=1208670491/15#18
 -> if none, tick off yabb2
 -> look for <Directory> sections under httpd.conf to find where cgi-bin/yabb2 is in the filesystem

YaBB.pl
 -> check what it is, directory or file under
 -> it's a file, look in .htaccess for allow/deny
 -> it's a script, think about giving it to mod_perl
 -> give it to mod_perl, which then starts chewing on num=1208670491/15#18


Edit: fix cut-n-paste error

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 21st, 2008 at 4:10pm

Rad wrote on Apr 21st, 2008 at 3:37pm:
I mean, SOMETHING must be telling the browser that *.rad = *.html .. right?

Yup. That typically goes into the HTTP part of the transaction. There's an HTTP header called Content-Type which describes what kind of content of the result of an HTTP get is. text/plain, text/html, whatever.

If the content is generated by a script, it can write whatever it likes there, but for content coming from the filesystem, this is determined by a part of Apache that uses the file extensions - the mod_mime component, which uses the AddType directive, and there's one of those in effect for .html files.

There isn't a rule for .rad, but because of the way mod_rewrite works when the rewrite rules are in the .htaccess files, if mod_rewrite does anything to the URL it basically re-submits the rewritten URL to Apache to start over again. So on the second (or third, or however many tries it takes) attempt Apache eventually gets to the .html file in the filesystem, and only *then* decides it has to fake up the Content-Type: header. By that stage it's working with something called .html, so that's what it uses to look up in the mime-type registry, and Hey Presto! your browser (which doesn't care about file extensions itself) is being told "trust me, this is really HTML" by a Content-Type: of text/html

Title: Re: Apache file-extention mojo
Post by Rad on Apr 21st, 2008 at 5:38pm

wrote on Apr 21st, 2008 at 3:34pm:
OK, then, just put this into the root .htaccess file:
RewriteEngine on
RewriteRule ^index.rad index2.html

Okay, I'm going to try this now.

I am wondering if I load page index2.html, will it still load page index2.html, or will it change to index.rad on-the-fly?

It would be cool if every time somebody loaded index2.html, they got index.rad .. and also (of course) if they loaded index.rad, they got index.rad

I love this stuff. I really appreciate your help. If you need web space or a rad email acct or a MTOS blog .. or something I can do to reciprocate, I am eager to do so. (Same offer applies to all who help make Radified so Rad.)

Title: Re: Apache file-extention mojo
Post by Rad on Apr 21st, 2008 at 5:42pm
Woohoo!!!

http://radified.com/index.rad

That is sooo cool.

Now, can we get it to convert index2.html to index.rad on the fly?

Or is that something we shouldn't want to do?

The index.rad updates as I update index2.html right? (Update, yeah it does. Makes sense. Might have to refresh tho.)

So I can start using index.rad here as my homepage link? .. from now on? .. no foreseeable problems?

I am stoked! (Wonder why these little things get me so excited.)

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 21st, 2008 at 6:32pm

Rad wrote on Apr 21st, 2008 at 5:42pm:
Now, can we get it to convert index2.html to index.rad on the fly?

Let's just be clear what you mean by this, because there are actually two three things.

If you want the underlying file in the filesystem called index.rad, that will be doable, but remember my warning about the struggle that will cause? If you flip the rewrite rule around, the underlying file type coming off the filesystem will be .rad, and then Apache won't be able to use the existing mime-type rules, and so the web browsers won't see HTML any more until we fix that, and as I said that's the start of a Sisyphean slipperly slope (say it 5 times fast!) with trying to "fix up" the fact that you aren't putting HTML in HTML files.

So, this is one to try in a VM rather than live. But to do it, use this in .htaccess

 RewriteEngine On
 RewriteRule ^index2.html index.rad
 AddType text/html .rad

and then rename the file on the filesystem using

 mv index2.html index.rad

Now, it's the index2.html file that doesn't exist and the .rad one that does.

The other alternative way of reading your question, by the way (which I've thus far pretended doesn't exist) is "can I change all the outgoing URLs served up without having to edit my HTML files"?

The scary fact is that you can. You shouldn't, but you can.

Apache allows arbitrary filter modules to get in between the source content and the actual network phase, and one such thing is called mod_proxy_html - it works by reparsing all the HTML that 's in the original content, goes hunting for URLs in all kinds of out-of-the-way places, and then spits out rewritten HTML.

You really, really, *don't* want to do this in your case. It's possible, but this is seriously a tool of the ultimate last resort. I'd bet your web hosts are sane and don't have this module enabled, because there are nasty things that can be done with it if it's misconfigured.

Edit: three things. Let me move on to the third possibility next message, since it's actually useful to know.

Title: Re: Apache file-extention mojo
Post by Rad on Apr 21st, 2008 at 6:44pm
By "on-the-fly" I mean .. reader requests (clicks on link to) index2.html, and what they receive is > index.rad

We agreed that we want the file which actually sits in directories (both on my home computer and on the server) ..to have an *.html extension.

In this way, a user would receive index.rad whether they clicked on index.rad or index2.html (but both would look like content contained in index2.html)

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 21st, 2008 at 7:04pm
The third thing you can add to the mix is a capability of mod_alias, which is to send an HTTP-level response for incoming URLs to tell the requesting web browser to use a different URL. Unlike mod_rewrite, which is sneaky and invisible, this exists to be visible (and slow, because it requires end-to-end round trips).

The Redirect directive (which comes in various flavours) matches a URL and sends back a special HTTP status code that says "not here any more, but....". You can say whether the redirection is temporary, or permanent - and if permanent, things like web spiders (oh like say, search engines) take this as a big hint to reindex things under the new preferred URL.

The URL you send people to can't be relative, it has to be a full URL. Which on the one hand has its uses, since you can use it to redirect subdomains, but on the other hand is fragile because if someone comes via www.radified.com or just plain radified.com, it'll end up forcing the issue for them.

If you take the above example which has .rad as the file in the filesystem, and then add this to the .htaccess

 Redirect /index2.html http://www.radified.com/index.rad

then if you type http://radified.com/index2.html into a browser, the browser address bar will switch to http://www.radified.com/index.rad instead (at the latency cost of an extra server round-trip).

If you're thinking next "can I use mod_alias to generate redirects for index2.html to index.rad and then use mod_rewrite to map index.rad back to index2.html" .... well, at that point you're Crossing the Streams. Just don't go there.

It can be done, but you want to do it in a maintainable and robust way and that means you are best advised to take a whole 'nother step up the ladder to the next level on the Tower of Power, and use CGI-type handling instead to completely divorce the URLs from the filesystem, at least as far as Apache proper is aware of it.

Title: Re: Apache file-extention mojo
Post by Rad on Apr 21st, 2008 at 10:03pm
Nigel, thanks for all the help today. Starting to get tired. Just got dark here .. (which means it probably just got light there).

I will review (study) your other posts tomorrow. Feel like I learned a lot today.

You rock! (I gave you + karma points.)

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 22nd, 2008 at 7:12pm

Rad wrote on Apr 21st, 2008 at 10:03pm:
Nigel, thanks for all the help today. Starting to get tired. Just got dark here .. (which means it probably just got light there).

Now that we're past the equinox (daylight savings gone here, in place for you) it's actually only 5 hours different from NZST to PDT. It's 7 hours pre-equinox (southern summer) when we're in NZDT and you're in PST. So, you wrote that at 3pm my time - been in the office 28 hours straight at that point, gotta love the final stretch before a release.


Quote:
You rock! (I gave you + karma points.)

Karma evidently didn't agree. Because I was so tired heading home last night around 8pm I forgot one of the speed traps I know to avoid and got nailed with a speeding fine. Perfect finale to what was already a pretty lousy day.

Aaaaaaaanyway, lemme know when you're ready for the next phase.

What I'd suggest the ideal approach going forward is, is that you use a script to effectively provide a mirror world for your existing site. That can give you a better way of doing what mod_proxy_html does, much simpler - effectively you can have two sites in parallel (and you could in fact do this with subdomains) where one has a DocumentRoot with the custom .htaccess that refers to a script that actually grabs the source documents from the other site's DocumentRoot, and not only presents them to the world using a custom URL (where documents are .rad) but can rewrite a few of the inter-document links.

That gives the benefit of having your actual physical site structure stay as-is, perfectly normal, with HTML in .html files and normal inter-document URLs that all the standard web tools - link checkers, structure validators, etc. - are happy with, and you have a script that transforms the documents as they pass through to make the alternate view of things more seamless.

The point of using a script rather than something like mod_proxy_html (which as I said is unlikely to be available to you anyway - reverse proxying is NOT something you want to mess with) is that the latter is complex, and opaque, and designed for dealing with nasty legacy websites. By building it yourself you might miss some of the exotic corner cases, but the result is probably more useful to you and you'll understand it better.

Title: Re: Apache file-extention mojo
Post by Rad on Apr 22nd, 2008 at 9:00pm

wrote on Apr 22nd, 2008 at 7:12pm:
it's actually only 5 hours different from NZST to PDT

Seems hard to believe. I'll have to analyze closer on the big, 12-foot tall globe at the bank next time I stop by.


wrote on Apr 22nd, 2008 at 7:12pm:
been in the office 28 hours straight  

Actually, I'm not surprised, cuz I know you're a true professional. They're lucky to have you.


wrote on Apr 22nd, 2008 at 7:12pm:
got nailed with a speeding fine

That totally suks. I *hate* speeding tiks. Such a waste of money. Do you have these new cameras at busy traffic lights there .. that take your picture? They set the yellow-light shorter in order to increase revenue.


wrote on Apr 22nd, 2008 at 7:12pm:
lemme know when you're ready for the next phase

Thanks. I appreciate your help more than I can say. I get the Bug tomorrow AM, and this is my weekend, so I'll be playing dad 'til Sunday PM.


wrote on Apr 22nd, 2008 at 7:12pm:
What I'd suggest the ideal approach going forward  

When Nigel shares his version of "ideal" I'm listening. I think Magoo mentioned something similar to this when we were discussing the possibily of running a mirror server down-under (to lower response times for those on the other side of the world).


wrote on Apr 22nd, 2008 at 7:12pm:
a better way of doing what mod_proxy_html does, much simpler  

Hmmm. My eyebrows are raised.


wrote on Apr 22nd, 2008 at 7:12pm:
but can rewrite a few of the inter-document links

Didn't know that could be done .. altho I should realize there's probably very little that CAN'T be done .. if one knows the necessary mojo (like you).


wrote on Apr 22nd, 2008 at 7:12pm:
is that you use a script to effectively provide a mirror world for your existing site

I would obviouskly need help with this.

As a side note, from reading some of your previous posts in this thread, one thing that stuck out in my mind was that I DIDN'T want to do anything that increased response time, as I feel a fast-loading page is important. For example, there was one option you spelled out where it would take *two* round-trips for a reader to get a page. That didn't sound appealing to me.

Title: Re: Apache file-extention mojo
Post by Nigel Bree on Apr 22nd, 2008 at 10:31pm

Rad wrote on Apr 22nd, 2008 at 9:00pm:
Seems hard to believe. I'll have to analyze closer on the big, 12-foot tall globe at the bank next time I stop by.

Heh. Or use Google Earth. A fun thing to do is turn on the display of sunlight and set it to spin through the day, so you get an animation which shows how the day/night and night/day transitions sweep across the globe, which is a nice way of getting a feeling for how we humans really experience night and day in different parts of the world during different seasons.

Right now, when the day-to-night transition here is passing across Auckland, the sun is also setting in Alaska. On a Mercator view of the Pacific, it's a diagonal slanted with the top on the right hand side. Not a huge tilt, since we're not that far off the equinox, but still (for extra grins, wind your computer clock to June)

The day-to-night transition is diagonal on a Mercator map too, but it runs the other way. Rotate the time slider in Google Earth, and when dawn is happening in New Zealand, it's happening in Sibera, not Alaska (and going forward to June, dawn in Japan is underway).


Quote:
Do you have these new cameras at busy traffic lights there .. that take your picture? They set the yellow-light shorter in order to increase revenue.

We've had cameras since the 90's that capture people going through red lights at notorious intersections in Auckland, mainly to capture the plates.

However, basically I have four lights to go to get on the motorway from the office, and that's about it. I live a full 100km north and there is precisely one traffic light between me and home (I don't think there are any traffic lights in any of the towns in the entire Kaipara district where I live). Fun fact about Kaipara - the harbour has ~3200km of coastline. Talk about fractal geometry....


Quote:
I get the Bug tomorrow AM, and this is my weekend, so I'll be playing dad 'til Sunday PM.

Heh, fair enough. Concentrate on what matters!

Radified Community Forums » Powered by YaBB 2.4!
YaBB © 2000-2009. All Rights Reserved.