Feeds:
Posts
Comments

I saw the Pragmatic Wetware book by Andy Hunt.  “Hmm”, I thought, “a book on the human brain and how it works.  Interesting, but I have too much else to read and study.” Wrong.

Apparently I had missed the pragmatic part of that title.  As I am averaging three to five technical books a month right now, an investment in becoming more efficient in my learning and study would surely pay off.  So, I got the book, and I am glad I did.

Just as you want to be an efficient coder, if you are doing a lot of studying, you want to be an efficient learner.  I will talk about a few of things I found over a few blogs, but I highly recommend purchasing this book and implementing it.

In the Dreyfuss model, there are five levels of progression of mastery in terms of problem solving and the mental models you form:

  • Novice.  They have little experience to rely on, but they can follow context-free rules (recipes) to guide them (think ISP help-desks and pre-canned scripts).
  • Advanced Beginner.  They can break away from context-free rules a little bit and try things on their own, but they do not have a big picture understanding.
  • Competent.  They have developed conceptual domain models and can work with them and solve problems.
  • Proficient.  They can reflect, improve, and learn from the experience of others.  They can apply maxims (as opposed to just recipes) to problems.
  • Expert.  They work from intuition instead of simply reason and are primary sources of information.

Becoming an expert is more than just collecting a body of skills and knowledge; it is about moving to intuition-based problem solving based upon those experiences.

We have a great cookbook that my wife and I pick out a new recipe from to try each week.  I take the recipe, and I follow it carefully (X amount of flour, Y amount of seasoning X, etc.).  A great chef (such as my grandmother) did not follow a recipe – it was based on intuition gained over experience (a pinch of this, adjust it this much because of the humidity, etc.).  I tried to ask my grandmother once how to make something, and she couldn’t really articulate her responses beyond “add a little bit of this” and “cook it until its ready but not too long”.  This is the typical mark of an expert.

What does this mean in terms of coding?  Well, it means becoming an expert is a little less about collecting skills and more about applying skills intuitively given a particular context.  Any fool can apply a parade of various design patterns (I recall being one of those fools 13 years ago), but an expert will intuitively apply the right pattern given a particular context.  That seriously shifted my paradigm when I read it.

Personally, I have a backlog of skills I am studying and investing in to increase my knowledge and skills portfolio.  However, I plan to spend a lot more time going forward on context.  One of the things I will soon be doing is personally reviewing my entire technical career to determine what I did and did not do right given the particular contexts I was working in.  Am I repeating any poor patterns, what should I continue, what should I drop?  I expect it to be a worthwhile exercise.

I always wanted to read the Anti-Patterns book, but never got to it.  I stumbled across the Anti-Patterns list on Wikipedia at http://en.wikipedia.org/wiki/Anti-patterns and found it to be a worthwhile read.  Obviously, a big part of learning what to do is learning what not to do as well.

I got to the end of it, and discovered a link to Software Development Philosophies at http://en.wikipedia.org/wiki/List_of_software_development_philosophies.  Now, I can Kanban with the best of them!

A worthwhile read – check it out!

Regular expressions were not my strong suit, but I felt it was important to master them (or at least reach competency).  So, on a flight back from Walt Disney World last summer, I studied them from one of my Ruby books and summarized them in detail in Evernote (to refer back to them).  I thought I had a good enough understanding where I could refer back to my Evernote reference material and quickly put it together – I was wrong.

Writing a Rails app I have been working on for fun, it took longer for me to put a working regular expression together than I expected.  I had studied, summarized and even written a few short programs for regular expressions in Ruby.  However, what I hadn’t done was to go from the other direction – take a domain problem and map it back to a regular expression.  That’s when I came up with “Regular Expression Pushups”.

When I was younger, I used to be able to do a massive amount of push-ups.  Rather than continuing to increase the count, I tried to do them more quickly.  Similarly, I reviewed my notes and created thirty-three problems that would use the underlying techniques.  I would do these as quickly as possible, and then later change the questions a little to see if I could do them more quickly.  The idea was to better form the “regular expression neural networks” in my brain

I’ve gone through one round and have seen a huge improvement.  I expect the second round will go a lot faster than the first (how could it not)?!

Here are the ones I put together:

  1. Find out if a certain string exists in as a substring within a document; and if so, where
  2. Replace the contents of a string within a document with something else
  3. Find three keys parts of a document and pull them out
  4. Find the word “foo” in a sentence but it cannot be “foobar”.
  5. Find a string that is a whole word only
  6. Find a string that is not a case-sensitive match
  7. Find the string foo or bar
  8. Match foo and goo (and so forth) but not boo without using these as words in your formula
  9. Find all strings that end in “oo” (3 character, and unlimited characters)
  10. Find any string that ends in “oo” but boo is not valid (REPEATS #8?  Or subtle difference?  I think I reversed them)
  11. See if a word matches that starts with “foo”.  Additionally, one that does not start with “foo”.
  12. See if a string matches that ends with “bar”.  Additionally, one that does not end with “bar”.
  13. Find a substring that begins with “foo” and ends with “bar”.
  14. Find the first and last word in a sentence
  15. Find the first character that is not a number or digit in a string
  16. Find the first number in a string (and last number)
  17. What is the text before the phrase “in the middle”?  What is the text that follows?
  18. Find  “abc” or “abcabc” and so forth in the sentence
  19. Find the string “abc” or “abcdef”
  20. Find the string “abc” or “abc1″ or “abc11″ and so forth
  21. Find all alphabetic words that end with the first “3″
  22. Find the word that starts with an alphabetic character and ends with 17
  23. Find the telephone number in the format 571-217-9451
  24. Find a number (without commas) that is at least 5 digits
  25. Find a number between 4 to 9 digits
  26. Find the word that matches a sequence of 5 instances where there is one to 3 numbers and a single character
  27. Find the string “abc” at the end of the string where there is a newline character
  28. Given a dollar figure, return the portion without the cents.
  29. Given string “#@%# 123bar 23bar 342 siojbar”, find the first word that does not have “bar” in it
  30. Find all the instances of numbers greater than 5 digits
  31. Find the number followed by the a space and word “bang” from “123 howdy 456 wow 789 bang”
  32. Do a greedy match (from “abc!def!ghi!” get the whole thing for .+!)
  33. Do a non-greedy match (from “abc!def!ghi!” get “the whole thing”abc!”" for .+!)

I wrote a simple program for work a week ago – a webMethods java service that deletes a directory and recursively deletes all files and sub-directories.  I put together some unit tests, and it ran great.  Then, I was told that when invoked from another service (creating the directory for zip/unzip), half the time the directory would not be deleted (though the children usually were), unless the program was run in debug mode.  Of course, when the program ran stand-alone, it was flawless.

Suspecting a race condition, I played with some delays (for analysis only) between the child and parent deletions as well as between this service and the service invoking it.  No luck with that.  Perhaps the invoking service was taking time to release the directory resource so I tried a 15 second delay to rule that out – no luck as well.

It didn’t make sense for such a simple program.  My guess was that another process was sometimes not releasing the resource and got stuck down that path for a bit.  After a while, I decided to create a mind-map as to what was going on and what I was observing to see what would be revealed.  As suggested in the Pragmatic Wetware book by Andy Hunt that I am finishing up, after a little bit of time the R-mode of my brain took over and I found a number of things I could try.

One of those things was checking to see if the directory had any children even after deleting all of them.  Of course, this was silly because only the directory remained empty (the files were gone), and I almost skipped trying.  Much to my surprise, they were not empty.  Playing with Winscp, I discovered unexpected .nfs files were showing up as files were deleted.  Furthermore, deleting the normal children mysteriously caused these files to be created, and deleting these files caused other .nfs files to suddenly spawn into the directory.  Thus, the directories were no longer empty and could not be deleted.

I drove home thinking about this debugging incident and how to make it better and more efficient.  Here is what came to mind:

  • The use of mind maps was certainly effective and something I want to continue
  • Its important to challenge your assumptions and don’t get locked into them early.  Yes, it could be that another process did not release it, but there are other possibilities as well.  Using a mind map earlier would have helped, but more helpful would have been to assess how locked into my assumptions I was
  • Debugging is a technical and creative endeavor.  Studying L-mode facts about the situation, and then employing R-Mode techniques earlier on would have helped
  • You don’t need to stay locked on the problem until it is solved.  I should have moved on to other work and let the R-mode side of brain work on the problem in the background
  • Sitting down and trying to think of all the “evil” ways that the system could be messing me up was also helpful
  • Continue playing and trying things that should not happen – I am really glad I did that!

In the end, I coded my program to delete the children before the parent directory because File.delete did not get rid of non-empty directories.  Because I saw the children gone, that obvious possiblity eluded me for a bit.  That is what I am thinking about for the future.

One of the things I have appreciated about my “new job” is getting to work with a technically astute large-scale mission-critical enterprise architecture. Part of this includes data warehousing and business intelligence, two areas I have had interest in since reading “Super Crunchers” by Ian Ayres (a very interesting look at data mining to find extremely useful and actionable information).

OLTP vs. OLAP

When first introduced to data warehousing, my mental model was of a super complex, massive farm of database servers.  The reality is that data warehouses sprang up because of their different needs from transactional operations:

  • Online Transaction Processing (OLTP). This mirrors how a database would be used in a typical web-based  production environment such as placing orders within Amazon.  A large number of short-lived transactions in a write-oriented database that is probably barely coming up with its capacity needs.  With limited capacity, do you really want to be executing in-depth analysis queries against your production transaction-oriented database?  Even if you did, how could you possibly tune it to be as fast as possible from both a read and write perspective?  You cannot.
  • Online Analysis Processing (OLAP). Instead of doing write-oriented update transactions, OLAP focuses on the more read-oriented queries and statistical analysis.  For this, you want your database to be optimized for read operations as these queries can take a while to execute, and the organization will want to be able to identify trends  or other issues from this near real-time data.  For example, think about how Amazon mines your purchase patterns to make suggestions, or to increase or decrease the price of books and other offerings.  This is actionable business intelligence leading to greater agility and returns.

Thus, your data warehouse is typically a separate write-oriented database containing near-real time information (typically anywhere from less than an hour to a day stale) with a much larger time horizon.

ETL

One of the inputs can be your OLTP database, but a data warehouse is typically composed of numerous data feeds.

This is where Extract Transform and Load (ETL) typically comes in:

  • Extract.  Extract the data from multiple data sources
  • Transform.  Transform and clean the data. Different data sources can have different representations of the same conceptual entity.  Furthermore, they can contain data errors (e.g., data entry input errors) and other related problems that need to be addressed when trying to put together an integrated picture from many different data sources
  • Load.  Load the transformed data into the data warehouse

Business Intelligence (BI) and Data Mining

Business Intelligence is the use of the data in the data warehouse to derive actionable business level information.  This can include analysis along a number of different dimensions (e.g., sales per region, sales across timing, trending) as well as forecasting the future based upon the past.  Analysts can use ad hoc queries to see what is going on and to do “what-if” kind of scenarios.

Data Mining can be employed to identify trends and other relationships within the data that would not be so readily obvious.

Star Schema

For update-oriented operations such as a web site handling placing orders, you want a normalized schema for greater efficiency.  However, for the analysis queries performed in a data warehouse, you typically want a mix between a normalized schema and a star schema.  So what is a star schema?

I am going to use an example from the book “Oracle Essentials”, which I just finished reading.  Here is a typical query (which shows the advantages of a star schema):

Show me how many sales of widgets (a product type) were sold by a store chain (a sales channel) in Louisiana (a geography) over the past 3 months (a time)

This query involves many dimensions:

  • product type
  • sales channel
  • geography
  • time

In the star schema, you would have a central fact table (representing sales transactions) with four connected dimension tables (e.g,  Product, Channel, Geography and Time).

For efficiency, data within these dimensions is usually hierarchical (e.g., for the time dimension, day rolls up into week, which rolls up into month,which rolls up into quarter, which rolls up into year).  If your data is looking for a particular quarter, it can be executed against that summary as opposed to all the more granular data related to weeks and days.  Hence these are referred to as summary tables.

Conclusion

Once again, I recommend reading “Super Crunchers” by Ian Ayres for more real world uses of data mining.

After summarizing the JMS Tutorial, I wrote a number of code samples to try to cover the essentials and experiment to make sure things worked the way I thought.  This included at least the following:

  • Using queues and topics
  • Implementing a durable topic subscriber
  • Using local transactions and experimenting with rolling back
  • Using non-transaction mode and experimenting with what happens when messages not acknowledged
  • Sending multiple messages with different priorities
  • Sending messages with a timeout and making sure it actually does
  • Using receive in blocking mode, blocing mode with a timeout, and listeners for synchronous and asynchronous
  • Playing with persistent and non-persistent modes and restarting ActiveMQ to see how it is handled
  • Having multiple consumers for topics vs. queues and seeing if all or some of them get the messages
  • Playing with the request/reply pattern

It is that last item that took a little longer to wrap my head around as I was mixing up the various destinations.

Invoking a method and getting the return value back is one example of the request/reply pattern.  Invoking a RPC via web services or even old-style CORBA is another example.  All of these are synchronous.  As JMS is inherently asynchronous, you can use it to create asynchronous requests and replies.

One approach would be to set up a queue that the requester can send the request message to, and another queue that the replier can use to send the reply message back to the requester.  However, this does not scale.  What if you have need to add another requester?  Another reply queue would need to be set up administratively, and the replier logic would need to be updated to differentiate between the two requesters.

Addtionally, what if the one requester sends three requests to the replier before any of the replies come back?  There needs to be some way to be able to correlate the replies back to the original requests.

JMS offers the ability to create temporary queues and topics.  Here is how it works:

  1. The requester creates the temporary queue, and adds the queue info to the request message
  2. The requester sends the request message to the queue used by the replier
  3. The requester waits for incoming reply messages on the temporary queue it created
  4. The replier receives the request message on the normal queue it is listening on, processes it, creates the reply message, and uses the message’s temporary queue information to know what queue to send the reply message to
  5. The replier includes the original id of the request message so that the requester can correlate this reply with a particular request, in case the requester has made other requests to this replier
  6. The replier sends the reply message to the temporary queue
  7. The requester retrieves the request from the temporary queue, and uses the original request id from the reply to properly correlate the reply with its request
  8. Once done using the system, the requester can then delete the temporary queue

Here is part of the requester code:


requester = session.createProducer(replierDestination);
TextMessage requestMessage = session.createTextMessage();
requestMessage.setText("This is the request");
TemporaryQueue temporaryQueue = session.createTemporaryQueue();
MessageConsumer responseConsumer = session.createConsumer(temporaryQueue);
requestMessage.setJMSReplyTo(temporaryQueue);
request.send(requestMessage);
TextMessage reply = (TextMessage) responseConsumer.receive();
String correlationId = reply.getJMSCorrelationID();

Here is part of the replier code:


TextMessage request = (TextMessage)replier.receive();
producer = session.createProducer(request.getJMSReplyTo());
Message response = session.createTextMessage("processed message: " + request.getText());
response.setJMSCorrelationID(request.getJMSMessageID());
producer.send(response);

My new job uses JMS, finally giving me a good reason to really play with this technology.   After reviewing the coding techniques, I needed a JMS Provider to work against that would be quick and easy to set up – Apache ActiveMQ.  It was quick and easy:

  1. Went to http://activemq.apache.org/activemq-530-release.html and downloaded the Unix version for my Mac.
  2. Unzipped and made sure executable.
  3. Made sure the activemq file under bin/macosx was executable
  4. From the install directory, executed:  bin/macosx/activemq start
  5. Went to the admin UI to set up topics and queues:  http://localhost:8161/admin/
  6. Stopped the server with:  bin/macosx/activemq stop

How easy was that?!  Now, on to the JMS coding, where I ran into a few speed bumps along the way.

First, if you are going to do JMS development outside of a J2EE container, you need to include jms.jar (which you can download standalone from Sun).

Second, this exception pops up while trying to get my JNDI contex:

javax.naming.NoInitialContextException: Need to specify class name in environment or system property, or as an applet parameter, or in an application resource file:  java.naming.factory.initial
at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:645)

Googling on this, Sun’s forum states that when this happens, most likely it is due to installing the JDK before uninstalling the previous JDK.  On a Windows this is not as big of a deal, but uninstalling and then installing the JDK is a little more challenging on the Mac.  Fortunately, a little more research found another workaround:


Properties props = new Properties();
props.setProperty(Context.INITIAL_CONTEXT_FACTORY,"org.apache.activemq.jndi.ActiveMQInitialContextFactory");
props.setProperty(Context.PROVIDER_URL,"tcp://localhost:61616");
jndiContext = new InitialContext(props);

Third, it can’t find a number of the classes it needs.  I included the following three jar files from the ActiveMQ installation:

  • activemq-core-5.3.0.jar
  • commons-logging-1.1.jar
  • geronimo-j2ee-management_1.0_spec-1.0.jar

Fourth, what name should I use for the connection factory for jndiContext.lookup?  “ConnectionFactory” ends up working well.

Fifth, its time to set up the topics and queues.  I turned to the Admin UI, able to set up the physical queues and topics but seeing no way to set up the JNDI logical queues and topics.  Trying to create destinations without these is not working.  However, it turns out that ActiveMQ makes it really easy to set up with dynamicQueues and dynamicTopics:


queueDestination = (Destination) jndiContext.lookup("dynamicQueues/SyncQueue");
topicDestination = (Destination) jndiContext.lookup("dynamicTopics/SyncTopic");

After these issues, all the other work I did with JMS went pretty straight-forward.

Okay, I know that all of this are not strictly versus each other, but I used to find the combination of options for SOAP messages intimidating.  Why so many options to essentially accomplish the same thing?  The following are the basic options:

  • rpc/literal
  • rpc/encoded
  • document/literal
  • document/encoded

Since encoded is not part of the WS-I (Web Service Interoperability) standard, that just leaves rpc/literal and document/literal.   So, you can just use the former for request/response RPC type calls, and the latter for passing business documents, right?  Wrong!

In actuality, the rpc vs. document is misleading as you can make rpc-style calls with either underlying representation.

The default and most commonly used is document/literal.  The underlying WSDL will be more complicated, but WSDL is not supposed to be for humans (so they say).  The rpc style is limited to very simple XSD types such as String and Integer, and the resulting WSDL will not even have a types section to define and constrain the parameters.

Far more complicated typing is allowed with document.  This is because the document style also comes with an XSD that can be used to validate the incoming SOAP messages; if you use rpc you will not have an XSD to validate against.  However, the rpc style’s SOAP messages are easier to look at and understand.

The wrapped variant (used with document), which is a de facto standard, helps to address this.  It re-arranges the SOAP message some so that it is easier to understand from the programmer’s view and looks more like rpc.  It clearly identifies the service operations and the names of each parameter.  The downside is that the client code you have to write is a little more complicated to put together.

In the end, document/literal/wrapped is the default in Java’s web services.

I have my java web service up and running, and I am deploying it to Tomcat.  However, when Tomcat starts up, the deployment is failing with the following exception:

java.lang.ClassNotFoundException: com.sun.xml.ws.transport.http.servlet.WSServletContextListener

I am told that Tomcat should have everything it needs, but that is clearly referring to Tomcat 5 and not Tomcat 6.

According to Techie Gyan, the following is the problem:

“The second change which tells about the cause of the error above is that tomcat 6.0 supports JAX-WS 2.1 and not JAX-WS 2.0 and java 6 supports 2.0 only (till some version, now it started supporting the newer version as well).”

My approach to fixing the problem is different, but it works:

  1. Go to https://jax-ws.dev.java.net/
  2. Download the latest version (2.2 at the time of this writing)
  3. Unzip the file and place the jars in Tomcat’s lib directory
  4. Restart Tomcat

Now I have no issues accessing my web service from Tomcat.

In my last blog on this, I discussed some of the Capacity best practices.  I am now going to briefly touch on the remainder of the book:

  • Networking. The section talks about best practices in a data center.  This includes different networks for different functions (e.g., production, admin) and different NIC cards on different machines to segment traffic.  It also discusses the usage of Virtual IPs.
  • Security.  Here, the discussion centers on the principle of “Least Privilege”.  Strategies for not having to run things as root are discussed, as well as how to deal with outbound passwords so that they cannot be compromised.
  • Availability. He begins with a strategy on how to document required availability to avoid ambiguity.  Strategies for load balancing and clustering are discussed, as well as how to appropriately use reverse proxies (looks like I need to play with Squid).
  • Administration. There are a number of strategies discussed here that will make administrators lives a lot easier.  This includes making the QA environment more closely resemble the production environment.  I particularly liked how his discussion around zero, one or many.  If you have 20 servers in a farm, it makes a big difference to have more than one in QA (though 20 isn’t needed) in terms of issues discovered.  He discussed strategies for dealing with configuration files, and how to facilitate clear start-ups and shut-downs.
  • Design Summary. This chapter delves into a number of general design considerations for consider for production, including making your application as easy to operate as possible.
  • Transparency. This book goes deep into strategies to reveal as much about the internal operations of the servers and the systems as a whole.   After going through some of his black Friday failure strategies, I would want to have as many of these as possible!

I appreciate good computer science books.  This is, seriously, one of the best books I have ever read.  It has made a HUGE difference in my understanding and capabilities in this area.  I strongly recommend it.

Older Posts »