Welcome, Guest. Please Login
 
  HomeHelpSearchLogin FAQ Radified Ghost.Classic Ghost.New Bootable CD Blog  
 
Pages: 1 2 3 4
Send Topic Print
PHP and regular expressions (Read 64661 times)
Nigel Bree
Ex Member




Back to top
PHP and regular expressions
May 23rd, 2008 at 8:40pm
 
Good to see you're starting on PHP, Rad. The big question though, which isn't related to PHP as such is how comfortable are you with regular expressions? Those are the things we were actually dealing with in the mod_rewrite rules, to transform the incoming URLs to something else, but it hadn't occurred to me to ask at the time if you were familiar with them.

If there's one skill associated with PHP that I would recommend you concentrate hard on, it would be those; since web pages are just big documents, and regular expressions are so useful for wrangling strings, so practice every web-related technology uses them heavily. Being really comfortable with both reading and writing them will give you a real leg up.
 
 
IP Logged
 

Rad
Radministrator
*****
Offline


Sufferin' succotash

Posts: 4090
Newport Beach, California


Back to top
Re: PHP and regular expressions
Reply #1 - May 23rd, 2008 at 11:46pm
 
Hi Nigel. Regular Expressions are covered in chapter 9 .. which begins like so:

Quote:
Programmers build applications that are based on established rules regarding the
classification, parsing, storage, and display of information, whether that information
consists of gourmet recipes, store sales receipts, poetry, or some other collection of
data. This chapter introduces many of the PHP functions that you’ll undoubtedly use
on a regular basis when performing such tasks.
This chapter covers the following topics:

• Regular expressions: A brief introduction to regular expressions touches upon
the features and syntax of PHP’s two supported regular expression implementations:
POSIX and Perl. Following that is a complete introduction to PHP’s
respective function libraries.

• String manipulation: It’s conceivable that throughout your programming
career, you’ll somehow be required to modify every possible aspect of a string.
Many of the powerful PHP functions that can help you to do so are introduced
in this chapter.

• The PEAR Validate_US package: In this and subsequent chapters, various PEAR
packages are introduced that are relevant to the respective chapter’s subject
matter. This chapter introduces Validate_US, a PEAR package that is useful for
validating the syntax for items commonly used in applications of all types,
including phone numbers, Social Security numbers (SSNs), ZIP codes, and state
abbreviations. (If you’re not familiar with PEAR, it’s introduced in Chapter 11.)

CHAPTER 9 ■ STRINGS AND REGULAR EXPRESSIONS

Regular Expressions

Regular expressions provide the foundation for describing or matching data according
to defined syntax rules. A regular expression is nothing more than a pattern of characters
itself, matched against a certain parcel of text. This sequence may be a pattern with
which you are already familiar, such as the word dog, or it may be a pattern with
specific meaning in the context of the world of pattern matching, <(?)>.*<\ /.?>,
for example.

PHP is bundled with function libraries supporting both the POSIX and Perl regular
expression implementations. Each has its own unique style of syntax and is discussed
accordingly in later sections. Keep in mind that innumerable tutorials have been
written regarding this matter; you can find information on the Web and in various
books. Therefore, this chapter provides just a basic introduction to each, leaving it to
you to search out further information.

If you are not already familiar with the mechanics of general expressions, please
take some time to read through the short tutorial that makes up the remainder of this
section. If you are already a regular expression pro, feel free to skip past the tutorial to
the section “PHP’s Regular Expression Functions (POSIX Extended).”
Regular Expression Syntax (POSIX)

The structure of a POSIX regular expression is similar to that of a typical arithmetic
expression: various elements (operators) are combined to form a more complex expression.
The meaning of the combined regular expression elements is what makes them
so powerful. You can locate not only literal expressions, such as a specific word or
number, but also a multitude of semantically different but syntactically similar strings,
such as all HTML tags in a file.

■Note POSIX stands for Portable Operating System Interface for Unix, and is representative of a set of
standards originally intended for Unix-based operating systems. POSIX regular expression syntax is an
attempt to standardize how regular expressions are implemented in many programming languages.

The simplest regular expression is one that matches a single character, such as g,
which would match strings such as gog, haggle, and bag. You could combine several
letters together to form larger expressions, such as gan, which logically would match
any string containing gan: gang, organize, or Reagan, for example.
 
WWW  
IP Logged
 
Nigel Bree
Ex Member




Back to top
Re: PHP and regular expressions
Reply #2 - May 24th, 2008 at 12:19am
 
I'll take that as meaning you aren't familiar with them yet, but these are definitely the most important part of PHP for you to study and seek to completely understand. PHP itself isn't particularly interesting or powerful by comparison (although it's certainly useful as a practical tool, and worth learning on that basis alone, its design is terrible and it is otherwise just as undistinguished as all the other "popular" languages of that kind).

In addition to being an  important practically useful thing, regular expressions are are immensely important in other ways; when you get to the * operator, you will see it called the Kleene closure. Stephen Kleene was a logician in the early 20th who with that great mind Alonzo Church laid the foundations of modern computing well before the machines themselves existed.

They defined and laid out the mathematics of the regular languages (including regular expressions) and the lambda calculus (Lisp, essentially) in the 1930's. If you can gain a firm grasp of regular expressions, and you then do the same with Lisp (JavaScript is essentially a very simplified Lisp), then you will have within your grasp the ability to easily understand every computer language, because all any computer language is (or does) is express the essence of these ideas in superficially different ways.
 
 
IP Logged
 
Rad
Radministrator
*****
Offline


Sufferin' succotash

Posts: 4090
Newport Beach, California


Back to top
Re: PHP and regular expressions
Reply #3 - May 24th, 2008 at 6:20am
 
Quote:
I'll take that as meaning you aren't familiar with them yet

Smiley True.

Quote:
PHP itself isn't particularly interesting or powerful by comparison  

What *is* then, powerful & interesting by comparison?

Quote:
its design is terrible  

Can you provide a "for example"?

Quote:
laid the foundations of modern computing well before the machines themselves existed

On purpose? Or did it simply turn out this way? I don't see how they could've laid the foundations of modern computing if no computers existed.

Quote:
They defined and laid out the mathematics of the regular languages (including regular expressions) and the lambda calculus (Lisp, essentially)

Interesting. I never equated math with programming before, altho now, I can see how they would intersect. I've always been good in math. Math is my gig, so to speak. Had 2 semesters of calculus (both 'A's.).
 
WWW  
IP Logged
 
MrMagoo
Übermensch
*****
Offline


Resident Linux Guru

Posts: 1026
Phoenix, AZ (USA)


Back to top
Re: PHP and regular expressions
Reply #4 - May 24th, 2008 at 1:18pm
 
Yes, this sounds like like it could turn into a very interesting discussion.  I've found PHP to be very useful (although I only know the basics.)  I'm thinking about learning a lot more about web programing soon for a side project.  Is there a language you suggest?  Is Rails as good as the hype?  Or do you just use Perl or some cgi scripting?  Or do you like the .NET languages?
 
WWW  
IP Logged
 
Nigel Bree
Ex Member




Back to top
Re: PHP and regular expressions
Reply #5 - May 24th, 2008 at 9:25pm
 
Rad wrote on May 24th, 2008 at 6:20am:
What *is* then, powerful & interesting by comparison?

Unfortunately, none of the languages that do fit that category are widely used for web purposes; there's a reason, for that, of course. By being deliberately simplified for the purpose of little more than wrangling strings, PHP and its ilk are easier to learn (and safer for hosting firms to deploy) than full-blown programming languages, and you'll be productive sooner than you would be with them.

However, just bear in mind that all PHP really is, is a vehicle for taking in and pushing out strings. You'll move beyond it one day, so don't get too invested in it, that's all. It will serve its purpose for you now, but as you learn more you'll almost certainly outgrow it.

Rad wrote on May 24th, 2008 at 6:20am:
Can you provide a "for example"?

I could, but to be fair I'd be judging it as if it was a proper programming language, which isn't quite fair. Remember, PHP started as a very cut-down, deliberately simplified subset of Perl, which has then grown and mutated to serve pragmatic ends rather than to have much conceptual integrity.

You can do a lot with it, and do it well - as I've mentioned before, DokuWiki is an excellent example of good PHP code, one of the very few around, and you will learn a lot by studying it. However, as soon as your needs get very complex you'll be well outside the "sweet spot" for it as a tool.

PHP suffers most because all its original facilities were oriented around making simple things simple. Unfortunately, that has meant that trying to do anything complex in it is in turn very complex, and quite different. So, designs that start out small tend to quickly reach a point of collapse where they are unmaintainable messes - DokuWiki as an example, hit just such a wall and only coped with this by essentially going into hibernation for long time, and getting almost 50% rewritten (breaking all the existing user extensions to it).

A lot of PHP's bad reputation comes from this, because much of the non-trivial PHP out there is at or well past the point where its initial design no longer works.

Rad wrote on May 24th, 2008 at 6:20am:
On purpose? Or did it simply turn out this way? I don't see how they could've laid the foundations of modern computing if no computers existed.

What computers do - what they can possibly do - is subject to some mathematical laws which are absolutely universal, and which these gentlemen had worked out in the context of what mathematicians called recursive function theory.

When real computers later arrived, Alan Turing was working out the mathematical limits of what they could do, which lead to the abstract model of them called the Turing Machine. Turing and Church then discovered that these two things - Church's recursive systems of equations, and Turning's machine with the tape - were absolutely equivalent in every way (and many other systems have subsequently turned out to be equivalent to them too).

Similarly, what Turing had posed as the Halting Problem for his machines (whether the programs would ever complete or not) turned out via this correspondence to be roughly equivalent to Kurt Goedel's famous Incompleteness Theorem, which established that there are limits of what could be determined via proof.

This deep equivalence between computers and the behaviour of recusive functions (and abstract models of logic and proof) was unanticipated and quite surprising. But it's there. Quite what this really means, in a philosophical sense, is something that has generated all kinds of thought.

For instance, many years ago, a mathematician called John Conway posed a simple evolving system called The Game of Life. Popularized by Martin Gardner in his Mathematical Games column, it became something of a hit (that particular issue was one I read as a child, in fact; the Mathematical Games column helped set the course of my life).

However, it was no mere diversion. What Conway and his collaborators were trying to investigate was whether they could create a simple system which was what we call Turing-complete, and whether it could in essence function as a computer. And it turns out that it can. Computation is almost inevitable, in fact
 
 
IP Logged
 

Nigel Bree
Ex Member




Back to top
Re: PHP and regular expressions
Reply #6 - May 24th, 2008 at 9:54pm
 
MrMagoo wrote on May 24th, 2008 at 1:18pm:
Or do you like the .NET languages?

That's a complex thing. The .NET runtime (which you can study in the SSCLI aka Rotor) I would regard as a work of art. It really is a thing of beauty, and there are parts of the framework (such as the regex implementation) which use it in exquisite ways.

The languages, not so much, although they are definitely evolving in the right way.

When Java came out, the marketing hype was that it would be an amalgam of Smalltalk (great) and Common Lisp (great) but be easier for C++ programmers (well, ok). Unfortunately, the last thing was really the only part that was really true, because they had thrown away all the great parts of Smalltalk and Common Lisp and was left with something that was really pretty lame, and which spawned a truly incompetent developer culture around it over the next decade.

.NET, in contrast, largely does meet the reality of allowing folks to combine the best of the best - it's reflective like SmallTalk, and if you want to you can do Lispy things in it too. Which leaves me on the fence about it as a whole, because there's so much to like - C#3.0 captures a lot of good language design, and the run-time is a great design, and they really did achieve a lot making all the specialized .NET languages  interoperate smoothly - but there's also some things around it that aren't so good. The libraries (the .NET Framework) have a lot of the "Java culture" about them, for instance, and for GUI work are tied far too closely to Windows.

Right now, nothing is all things to all people, especially because of the split between the client (JavaScript being the only game in town, and being a Lisp being naturally good) and server sides. And now you have things like Google App Engine giving a huge boost to whatever languages they decide to support in future. Whether (and how) Microsoft respond to this will be interesting.
 
 
IP Logged
 
Rad
Radministrator
*****
Offline


Sufferin' succotash

Posts: 4090
Newport Beach, California


Back to top
Re: PHP and regular expressions
Reply #7 - May 24th, 2008 at 10:32pm
 
Your posts are content-rich .. so much so that I have to re-read them ..  which I'm doing now.

If you ever need a web host, by the way, for anything .. I'd be glad to host you .. for free. I have plenty of available space.

I've made a similar offer to Magoo & NightOwl & Dan Goodall and others here .. but nobody has taken me up on the offer.

Just a side note .. as you obviously are well-read, and have interesting things to say. I have never done this b4, so it might take me a while to set it up, but it doesn't seem too complicated. All you would need is a domain name.

I would be interested to hear a single-paragraph description of all the popular programming languages .. as you see them .. compared and contrasted.

Looks like the most recent DocuWiki was released a few weeks ago:  http://www.splitbrain.org/projects/dokuwiki
 
WWW  
IP Logged
 
Nigel Bree
Ex Member




Back to top
Re: PHP and regular expressions
Reply #8 - May 25th, 2008 at 2:31am
 
Rad wrote on May 24th, 2008 at 10:32pm:
I would be interested to hear a single-paragraph description of all the popular programming languages .. as you see them .. compared and contrasted.

It wouldn't be so interesting, really. I don't have the ability to write well enough to make it interesting.

Other people with similar knowledge to mine do it better and funnier, like Steve Yegge - this part of that little rant still gives me the giggles every time:
Quote:
There are "better" languages than Perl — hell, there are lots of them, if you define "better" as "not being insane". Lisp, Smalltalk, Python, gosh, I could probably name 20 or 30 languages that are "better" than Perl, inasmuch as they don't look like that Sperm Whale that exploded in the streets of Taiwan over the summer. Whale guts everywhere, covering cars, motorcycles, pedestrians. That's Perl. It's charming, really.

But Perl has many, many things going for it that, until recently, no other language had, and they compensated for its exo-intestinal qualities. You can make all sorts of useful things out of exploded whale, including perfume. It's quite useful. And so is Perl.

You can't help but admire a phrase like "exo-intestinal qualities". The man can write.

Thanks for the offer of some web hosting, but I really don't have the time to do much of anything with that. The next year at work is going to be crushingly busy, albeit in a good way.
 
 
IP Logged
 
Nigel Bree
Ex Member




Back to top
Re: PHP and regular expressions
Reply #9 - May 25th, 2008 at 3:16am
 
Oops, corrected the Life link to go to the right one here. My bad, forgot there was something else with a similar name.

Anyway, about programming, you're definitely going to do well with it. It's fun.

PHP *is* a good choice to start, but you you won't stop at one language. Just for web work you'll want to become familiar with JavaScript at some point, for instance, and probably you'll end up picking up either Perl or Python too just in the nature of things. Whether or not they are "good" in an abstract sense they are handy to know, and as you learn techniques from more languages it expands your repertoire for the others.

Do you have a clear idea of what you want to achieve with your first PHP project, yet? One of the things that we set aside after looking at the index.rad/.htaccess thing was making a script to make the "index.html" -> "index.rad" translation happen so that we can have two alternative views of the site content from a single set of source documents.
 
 
IP Logged
 
Nigel Bree
Ex Member




Back to top
Re: PHP and regular expressions
Reply #10 - May 25th, 2008 at 3:38am
 
By the way, the business about Conway having created Life to explore computability was something I never knew; last year he was here in New Zealand and he gave a public lecture where he talked about that. It's always good to keep an eye out for opportunities like that to hear great men speak - keep an eye out on the websites of the universities near you!

Anyway, in addition to computability, Conway and his colleagues were also interested in von Neumann machines. If you read the science fiction of the recently deceased Arthur C. Clarke you may recall that in the book of 2001, the mysterious black monoliths were self-replicating probes, i.e. von Neumann machines, the same ones Conway was looking into.

Now, the architecture of the modern computer CPUs we use aren't things with tapes like Turing Machines are - instead, they are built on a thing called the von Neumann Architecture, invented by the very same John von Neumann who had earlier proposed the self-replicating machines that Conway was exploring.
 
 
IP Logged
 

Rad
Radministrator
*****
Offline


Sufferin' succotash

Posts: 4090
Newport Beach, California


Back to top
Re: PHP and regular expressions
Reply #11 - May 25th, 2008 at 10:42am
 
This is good: http://steve.yegge.googlepages.com/tour-de-babel

(I am still there.) Thanks.

And von Naumann was part of the Manhattan project. Wow.
 
WWW  
IP Logged
 
Nigel Bree
Ex Member




Back to top
Re: PHP and regular expressions
Reply #12 - May 25th, 2008 at 4:27pm
 
Pretty much every one of Steve's articles are gold. For instance, I commented earlier that PHP code starts to run into real problems as it grows; now, this isn't unique to PHP or indeed any language. It's largely a problem with PHP because people start using it because it's easy (which is a good thing, in the main) but then their small thing grows large.

As for von Neumann, yup, involved not just with the Manhattan Project but with guys like Paul Dirac on quantum mechanics.

Oh yeah, another thing which turns out to be able to run programs... Noam Chomsky, who spent much of the 20th century studying (human) language and communicaton, and devising grammars for them to describe human languages ... those grammars are equivalent to Turing machines too.

Since I've mentioned a linguist, this leads on to the Sapir-Whorf Hypothesis, which can close the loop on the discussion somewhat.

In computer language, this definitely holds. The computer languages you know and learn constrain the way you think about solving problems with them. It's absolutely the case that (the vast majority, anyway) of people work to solve problems using the terms and style of thought that goes with the language they use.

For regular expressions, this isn't a problem because they are part of the universal mathematical background (and producing and consuming strings is part of how computers deal with human beings, so are universally useful).

For computer languages, their power in solving problems is a seductive trap which can end up with one believing that their way of solving problems is equally universal, when other computer languages (are consciously designed to) embody different modes of thought yielding different solutions.

Hence, why you'll never stay still and won't stop with PHP. Even if you could get everything you need done in it, learning a different programming language will free your mind (and you'll get better in PHP as a result too, by being able to "step outside the box" somewhat).
 
 
IP Logged
 
Rad
Radministrator
*****
Offline


Sufferin' succotash

Posts: 4090
Newport Beach, California


Back to top
Re: PHP and regular expressions
Reply #13 - May 25th, 2008 at 9:50pm
 
So much to think about.

Regarding programming languages, it's hard not to be drawn to Ruby (& Ruby on Rails) as a result of reading Steves piece:

Quote:
If languages are bicycles, then Awk is a pink kiddie bike with a white basket and streamers coming off the handlebars, Perl is a beach cruiser (remember how cool they were? Gosh.) and Ruby is a $7,500 titanium mountain bike.

Quote:
For the most part, Ruby took Perl's string processing and Unix integration as-is, meaning the syntax is identical, and so right there, before anything else happens, you already have the Best of Perl. And that's a great start, especially if you don't take the Rest of Perl.

But then Matz took the best of list processing from Lisp, and the best of OO from Smalltalk and other languages, and the best of iterators from CLU, and pretty much the best of everything from everyone.

He never did mention PHP, tho.
 
WWW  
IP Logged
 
Nigel Bree
Ex Member




Back to top
Re: PHP and regular expressions
Reply #14 - May 25th, 2008 at 10:29pm
 
Rad wrote on May 25th, 2008 at 9:50pm:
He never did mention PHP, tho.

True. In general though, pretty much everything that applies to Perl also applies to PHP; it's a little different because Larry Wall isn't involved and PHP's purpose is much more focused because it's a subsetted variant of Perl built purely for pumping out web pages rather than being a general Unix scripting tool.

It's less insane, but there are still whale guts everywhere. And of course, it's still eminently useful despite that.

Ruby's nice, no doubt about it. Try the interactive web tutorial. However, in the context of Web development there's a lot of .NET-style confusion between Ruby the language (which is nice but like Python effectively defined by a single implementation) and Rails the web framework which Ruby developers use.

Most people who talk about Ruby are really talking about Rails.

The thing about Rails is that although it happened on Ruby, it's really not that tied to it. Smalltalk and Lisp-like languages have done all this stuff for decades. The same things that the Rails folks are all ga-ga about are just "oh, so you're finally catching on to where we were in the 1970's".

Of course, there's a marketing problem with Smalltalk and Lisp, which is despite being both better than everything else which has tried to replace them, has the problem that they are a) old, and therefore not shiny and new and possessing "hotness", and b) a little weird to modern eyes which have only ever known the Windows PC.

So, although by all accounts Seaside a popular Smalltalk library does most of what Rails done and does much better besides, it can't become the flavour of the month, because it's "yuck Smalltalk, icky" to most modern developers (who have never look at, let alone learned a language designed before they were born).

Similarly the good parts of Javascript are taken straight from Lisp, but if that fact hadn't been disguised it'd never have got anywhere. In fact, it only got taken seriously because of the "Java" prefix, despite Javascript actually being Lisp and not garbage like Java ... which is inspired marketing, because it delivers on real authentic Lisp power but because Java was "hot" at the time, it traded on that hotness and got taken up by people who would never in a billion years go near a Lisp.

Most of language choice in the industry is just marketing, and naive faith that "new = better".

The reality is that Lisp and Smalltalk have never been exceeded as designs. Why .NET deserves to win is that it packages up Smalltalk, in particular, in a way that modern folks don't have a clue that it's where all the concepts they are using come from.

The big new thing in C# for instance, is LINQ. That's just a repackaging of a concept - List Comprehensions - that has been ho-hum in the functional programming world for decades. But LINQ, see, it's Shiny! And New! And a generation of developers will never realise that it's the same stuff the old-timers have been trying to tell them about for decades.
 
 
IP Logged
 
Pages: 1 2 3 4
Send Topic Print