Archive

Archive for June, 2010

How to become an all-round software developer

June 20, 2010 3 comments

Marius Watz - Stockspace In a world where many people can write code, it is not always easy to see who is the right person to hire. From the employee perspective, being a well rounded software developer can help one to successfully compete in the marketplace and to edge the ups and downs cycles in the tech world.

Becoming a well rounded software developer, takes time and effort, there is a lot to cover. At the basis, it requires having the right skills, attitude and motivation.  Next, it requires making the right choices about the education, the experience, and the technology to be exposed to. Then it requires investing long hours building the technical skills. The final piece is working on the soft skills like communication, good understanding of the business objectives, and customers needs.

I’ve compiled the list below to help you choose your path for becoming a great resource in today’s marketplace, and also to remind me what am I looking for in a candidate when I’m hiring.

  • Education – bachelor or master degree in computer sciences from a decent college
  • Experience:
    • Significant contributor to one ore more of the projects listed below:
      • Building scalable enterprise solutions – high throughput
      • Building High traffic web-sites
      • Building custom UI controls (rich client)
      • Building Real-time application – low latency
    • Supporting large scale implementations of one of the above applications
  • Technology
    • Computer language – object oriented design and programming, reflection, exception handling, libraries(files system, logging, tracing), data structures, debugging, multi threading.
    • Databases – transactions(isolation levels) , queries that spans multiple tables, concurrency (database contention). More than one database type.
    • Performance analysis – code profiling, memory, GC monitoring, query analyzer tools
    • Application server – configuration, deployment, performance tuning
    • Deep understanding of the underline OS(s) (now it is even required to have some understanding of virtualization technology )
    • Web development  – MVC, Java Script, CSS, HTML, session management, AJAX, template language, SSL
    • Security – understanding of IT compliance requirements.
      • Web site- configurations, encryption, eliminate cross site scripting, SQL injection, validation, and etc.
      • Back-end – to design it in such a way that only few IT personnel should perform operation on the server.
    • I18N and L10N – Unicode, the advantage of UTF-8, date and money formats, resource bundle, the cost of localization and how to minimize it, database considerations.
    • Unit testing tools,  ORM tools, Interoperability
    • Design patterns – at the minimum: singleton, decorator, publisher/subscriber(observer)
    • Architectural patterns – at the minimum: pipes and filters, layers, MVC, n-tier
    • Algorithms- at the minimum: sorting, searching/traversing (BFS, DFS), automaton, recursion
    • Scalability: Load Balancing(horizontal), threads and objects pooling(vertical), queues and remoting technologies(distributed), and caching.
  • Mentality
    • Being humble and curious, how else you can learn?
    • When something does not make sense to you, you know that it is an opportunity to learn something new.
    • You care about TPS (Transaction Per Second) and or the number of concurrent users so much that you want to frame those performance reports.
    • The answers you provide to customers, tech support, consultants, and peers are always as accurate as you can deliver. It means that you will have researched and double checked your answer before providing it.
    • Commitment
      • It is OK to be behind schedule, as long as you know it, and you have alerted your manager with enough time that something can be done about it.
      • It is given that there is not enough time during the work week to become a all-round software developer
      • You know that when the commitment is driven by the business there is no “work week”
    • You can recognize a good idea when you see it, but you don’t need to be the one that came up with it.
    • Business acumen – good balance between doing what that is right for engineering and what that is right for the business.
    • You can’t live without: source control, requirements analysis, some sort of development process, your own toolkit, google, several technical newsgroup, and blogs.
    • Thinking about testability and supportability during design time
    • When using a new library, framework, or API, it is not a black box for you – you look under the cover.
    • When you need to fix something in somebody else’s code, rewriting it is not the only/first option that comes to mind.

I probably missed few items and some technologies may change over time, but I hope that it could help you to stay on track for becoming a well rounded software developer.

Now, if you are one, I would love to chat- see the About page for contact information.

Picture credit Ansomia

Google’s search engine is the 21st infrastructure.

June 11, 2010 5 comments

Google’s search engine is the 21st infrastructure.

Search is infrastructure

When we think about infrastructure on a large scale we think about roads, train tracks, ports, and utilities – all things that are essential to the smooth running of our economy. Online searching has become so essential to our lives today that I think that we should add it to the traditional world infrastructure list.

Building and maintaining a search engine is so expensive and labor intensive that it requires the same kind of planning and upkeep that, say, the Golden Gate Bridge does.

I see two similarities between traditional infrastructure and search engines. The first is that a search engine is a mission critical system. The second is because the cost required for building and maintaining a good search engine is enormous—just as the costs are for ports, railroad tracks, and the electrical grid.

Mission critical system

Can you imagine a week without Google? Think for a moment how many times a day you use a search engine for a task. Life would be much harder without it. We are using a search engine to find a place, a person or a job. It is the same case when looking for information about a disease, a company or a product. Modern search engines also help to find directions, contact info, stock quotes and innumerable other things. I can’t think of a day without using a search engine (mostly Google but others too). Metaphorically search engines take us from one place to another (like planes, trains and boats), and if well designed and maintained they can save us an enormous amount of time and energy. But if that is not the case, they can be a big waste of time!

The mighty task

The web is big and expanding. In February of 2007, the Netcraft Web Server Survey found 108,810,358 distinct websites (not pages). In March of 2009 (only two years later) the number had more than doubled, to 224,749,695. The number of web pages is more accurate than the number of websites but I think that the numbers above tell us enough about the size of the web.

New blogs are popping up every day, and blogs can post in some cases multiple times a day. With the recent introduction of microblogging services like Twitter and other personal life streaming tools, content is growing even more rapidly. The information is also dynamic: websites go down and pages are being constantly modified. Blogs allow people to leave comments over time. Content is much more than text and can include video, audio, and images.

A search consists of many steps. It usually starts with crawling – getting the data. This is a mighty task that requires building an army of web crawlers to spider the web. It requires a crawling plan using sophisticated algorithms looking for new content and also for keeping the stored ones up to date. It necessitates an immense amount of storage space and heavy computation resources.
The other tasks include indexing, lingual processing and ranking (for relevance and popularity). (If you are interested in learning how Google scales this process by breaking down tasks even further, read the following blog post about Google Architecture)

It is impossible to compare entirely, but it seems like building and maintaining a large-scale search engine is as hard as building a new power station and probably costs as much too.

Living with Monopoly

The purpose of this section is to get you thinking about my analogy and what it might mean.

The Monopoly question – do we need more than one search engine?

In some ways, a search engine industry might fit the definition of what’s known as a “Natural monopoly” (wikipedia):

  1. “…it is the assertion about an industry, that multiple firms providing a good or service is less efficient (more costly to a nation or economy) than would be the case if a single firm provided a good or service.”
  2. “It is said that this is the result of high fixed costs of entering an industry which causes long run average costs to decline as output expands”

Google could be defined as a natural monopoly.  It now has more than a 70% market share.
The first definition raises the question: why do we need to more than one search engine provider? The second could explain why only one provider may survive.

Why we don’t need more than this one?

I’m personally not concerned about Google’s monopoly power to set rates. As a consumer I don’t feel any pricing power:) but maybe the companies that pay for ads do.

I do have a couple of concerns: The first is about the cost to the country and the world of maintaining a search engine or duplicating the effort in a large scale.
The second is that because it is such an important and world critical system, more stakeholders around the globe should be paying attention.

High Energy cost

Here is an excerpt from Data Center Energy Forecast – Executive Summary – July 29, 2008.

“As of 2006, the electricity use attributable to the nation’s servers and data centers is estimated at about 61 billion kilowatt-hours (kWh), or 1.5 percent of total U.S. electricity consumption. Between 2000 and 2006 electricity use more than doubled, amounting to about $4.5 billion in electricity costs. This amount was more than the electricity consumed by color televisions in the U.S. It was equivalent to the electricity consumed by 5.8 million average U.S. households (which represent 5% of the U.S. housing stock). And it was similar to the amount of electricity used by the entire U.S. transportation manufacturing industry (including the manufacture of automobiles, aircraft, trucks, and ships)”

Google is making an effort to reduce the cost of their data centers’ energy bills. My concern is that having multiple Google size search engine companies around seems as wasteful as pooling multiple power lines to every home. I also think that the energy consumption should be distributed across the globe since the search engine serves the entire world and not only one country.

What will happen if Google goes belly up?

I know that this seems radical and almost unimaginable at this point, but what if one day advertisers find another place to buy ad-space other than SERPs? Our lives are so dependent on Internet search technology that if no one can pay for the cost of maintaining one, that would have a direct impact on the world economy.

Maybe we need a different solution?

To reiterate:
-Search is a very large task
-Search is costly
-Search has become essential to the modern economy
-Google is effective but it is a monopoly
Yet today it is so mission critical that we need to watch it closely or maybe even break it up.

Regulations

One way to deal with a mission-critical natural monopoly is to turn it into some sort of government-granted monopoly. In this case it is not the government but some sort of world organization that can enforce regulations and demands like:

  • More energy efficient data centers
  • Better storage solutions
  • Crawl to cover more ground – deep web
  • Accounting governance and building cash reserves.

I know that this might sound like a radical idea. Please remember, the purpose of this article is not to support a return to a controlled market but to get us aware of the cost, power and dependencies associated with search engines.

Explore alternative search technologies (similar to exploring alternative energy sources)

In addition to possible regulations, there are other ways to address the functions that a natural monopoly like Google currently serves:

  • Split the search task like crawling, storage and indexing and distribute them across multiple venors.
  • Create better crawling algorithmsCuil claimed to find a more efficient and scalable ways to crawl the web (it is not about Cuil it is about the idea).
  • Real-time search (conversational search) – If you believe that real-time search is the future than you already know that maybe there is no need for deploying such a huge crawling tasks in order to find great content. Let the crowd do the job.
  • p2p - distribute the the crawl, indexing, ranking and storage, across many search users. This technology mitigates the single point of failure risk and leverages existing unused computational resources.

Summary

The new president of the United States, Barack Obama, is leading his 21st Century New Deal with the hope that big investment in the country’s infrastructure will spur economic growth and prosperity. Online search has become a mission critical task in our lives. It has an impact on the world economy and energy consumption. I think that it should not be overlooked. To the traditional infrastructure list of transportation, telecommunication and energy we should add the 21st century infrastructure – online search engine.
In the same way that nations monitor the condition of their infrastructure, they should be looking at search engine implementations and technologies.

A few points that I like you to take from this post are:

  • A search engine is more than software
  • The tasks of building and maintaining new search engine on a large scale have an impact on society
  • Search is a global objective
  • We are heavily dependent on this technology
  • Google is a monopoly – for better or worse.

Do you share my opinion that search engines have an impact on the world economy?
Do you agree with me that Google is a mission critical system today?
Should we be worried if someone might duplicate the task of keeping a large portion of the web crawled, stored and indexed?

**This blog post was published before on AltSearchEngine.com (my guest post) and it is no longer available so I decided to publish it here again.

Picture credit to my favorite artist Ron Shoshani

Reblog this post [with Zemanta]

Follow

Get every new post delivered to your Inbox.