Welcome to NALIN.

It's all about Technology and Programming.


This blog includes information to the trending technologies, hottest programming tutorials and solution to programming problems.


Do you want to contribute to this blog with your knowledge and skills? If yes, then contact us here


Thank you !!!

Web Scraping in Java

Leave a Comment
In the article, I will be talking in brief about the most popular tools used for web scraping in Java and also which I prefer and why.

When it comes to web scraping, everybody says Python is the best language for it. Well, since I do not have any experience in Python, I will not comment on this. However, web scraping in Java is not as difficult as people think it is. In fact, people who are already familiar with the concept of regular expressions will have absolutely no difficulty in doing so in Java, since regular expression are the same regardless of which programming language you choose. Also, if you already have experience of web scraping in other languages, you will soon be able to do it in Java too. All it takes is knowledge of basic syntax and concept of loops. The other tools that dominate the web scraping domain in Java are Jsoup and Jaunt libraries.

Jsoup is a free and open source Java library that enables you to scrape and parse HTML from websites, files or even strings. DOM traversal is extremely simple in Jsoup. Form data submission for GET requests are very easy but it can be little tedious for POST requests, especially if there are a lot of data fields. Jsoup makes use of CSS selectors in order to select and extract data.

Just like Jsoup, Jaunt is also a Java library that allows you to scrape and parse HTML from websites, files and strings. It is also a free library but not open source. This library is free in the sense that you have to renew your license every month. Meaning, you will have to download a new version of Jaunt every month. You will also need to replace the old jar files with the new ones in your previous projects for them to work again. If you do not want this, there is also a paid version of Jaunt. Functionality wise, Jaunt can do almost everything that Jsoup can and more. Jaunt provides a facility to parse JSON and XML as well and also supports REST APIs. This is one of the major reasons why most people prefer Jaunt. For selection and extraction of data, Jaunt has its own syntax.

While Jaunt is more powerful than Jsoup, I prefer to stick with Jsoup. Jaunt's syntax is more readable than CSS selectors, but hey, we're programmers, we are used to reading codes and CSS selectors just look better to us. Having to download new jar files every month and replacing them in every project you have ever done is just not feasible. Yes, Jsoup cannot parse JSON or XML, but we can always combine Jsoup and regular expressions for those matter.

These are my thoughts on web scraping in Java. If you like to share your opinion, feel free to leave a comment.

Generating HTML output in Servlet

Leave a Comment
If you just need to handle a handful of requests URI in your EE web module, then it might be easier to generate your own HTML response in your Servlet code instead of using a full blown template library. As part of my examples, I tried out a very simple Java DSL that generate html output when writing your own Serlvet. The code looks like this:

package nalin.servlet.web;

import java.io.IOException;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@WebServlet("/index")
public class IndexServlet extends HtmlWriterServlet {

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        HtmlWriter html = createHtmlWriter(req, resp);
        String message = getMessage(req, html);
        
        html.header()
            .h(1, "Welcome to Servlet Example")
            .p("Let's explore Java Servlet 3.x Features.")
            .p(message)
            .ul(
                html.link("Index", "/index"),
                html.link("Hello", "/hello"),
                html.link("Sys Props", "/sys-props")
            )
            .footer();
    } 

}

I wrote a base HtmlWriterServlet class that provide a method where you can get an instance of a HtmlWriter builder. The benefit of wrapping the HTML like builder is that it's more easier to read and helps generate correct well form tags. For example the "ul" and "table" accepts Java List or Map object, and it generates the correct html tags.

Here is another example how I generate a table view of Java System Properties page with few lines of code:

package nalin.servlet.web;

import java.io.IOException;
import java.util.TreeMap;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@WebServlet("/sys-props")
public class SysPropsServlet extends HtmlWriterServlet {

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        HtmlWriter html = createHtmlWriter(req, resp);
        TreeMap sysProps = new TreeMap(System.getProperties());
        html.header()
            .h(1, "Java System Properties")
            .table(sysProps)
            .footer();
    }

}

The simple HtmlWriter class provide few html builder methods and it can help generate HTML links with relative context paths. You can easily further improve it to help generate more HTML code such as form tags etc.

Also, note that ServletResponse object let you have full control on writing custom responses, so you are not restricted to only returning HTML. You can write binary output such as PDF or even MP3 files. You simply need to control the Response Writer and the correct corresponding content mime type and size that will return.

How Servlet and JSP create sessions ?

Leave a Comment
In Servlet, you may get the Session object by "httpServletRequest.getSession(true)". The "true" flag will create the session if it doesn't already exist, else it gets the existing session.

Now if you want to check whether you have the session exists or not (without have to create one if doesn't exist), you need to pass in "false" and then check for "null".

Session session = httpServletRequest.getSession(false);
if (session == null) {
  // do something without creating session object.

}

Now comes the trick part. If you run above code and then dispatch the request to render a JSP page, you might quickly come to find out that the container will create a new Session object still! It turns out that by default JSP will create new Session object if you do not have one! To disable this, you need to set this explicitly on top of the JSP page:

<% page session="false" %>

Only with this you will able to actually prevent creation of unnecessary Session object if you were to use JSP for output! Something to watch out for when debugging session based application.

Angular 2 with Angular CLI

1 comment
Angular 2 is an open source framework for building mobile and desktop applications. Rather than a successor of AngularJS 1.x, Angular 2 can be considered an entirely new framework built on learnings from AngularJS 1.x.

The Angular CLI

One of the easiest ways start a new Angular 2 application is to use the brand new Angular command-line interface (CLI) that allows you to:
  • generate boilerplate code for new Angular 2 applications
  • add features (components, directives, services, pipes, etc) to existing Angular 2 applications
To install Angular CLI, run:
$ npm install -g angular-cli
which will install the ng command globally on your system.

To verify whether your installation completed successfully, you can run:
$  ng version
which should display the version you have installed.

Generating Test Application

Now that we have Angular CLI installed, we can use it to generate our Test application:
$ ng new test-app

Result:
installing ng2
  create .editorconfig
  create README.md
  create src/app/app.component.css
  create src/app/app.component.html
  create src/app/app.component.spec.ts
  create src/app/app.component.ts
  create src/app/environment.ts
  create src/app/index.ts
  create src/app/shared/index.ts
  create src/favicon.ico
  create src/index.html
  create src/main.ts
  create src/system-config.ts
  create src/tsconfig.json
  create src/typings.d.ts
  create angular-cli-build.js
  create angular-cli.json
  create config/environment.dev.ts
  create config/environment.js
  create config/environment.prod.ts
  create config/karma-test-shim.js
  create config/karma.conf.js
  create config/protractor.conf.js
  create e2e/app.e2e-spec.ts
  create e2e/app.po.ts
  create e2e/tsconfig.json
  create e2e/typings.d.ts
  create .gitignore
  create package.json
  create public/.npmignore
  create tslint.json
  create typings.json
Successfully initialized git.
Installed packages for tooling via npm.

You can now:

# enter new directory the CLI created for you
$ cd test-app

# start the development server
$ ng serve

which will start a local development server that you can navigate to in your browser on http://localhost:4200/.

The application will automatically reload when a source file has changed.

Angular Ingredients

Angular CLI already generated the entire Angular 2 application boilerplate for us when we used the ng new command. But it doesn’t stop there. It can also help us add ingredients to our existing Angular application using the ng generate command:

# Generate a new component
$ ng generate component my-new-component

# Generate a new directive
$ ng generate directive my-new-directive

# Generate a new pipe
$ ng generate pipe my-new-pipe

# Generate a new service
$ ng generate service my-new-service

# Generate a new class
$ ng generate class my-new-class

# Generate a new interface
$ ng generate interface my-new-interface

# Generate a new enum
$ ng generate enum my-new-enum

Five reasons: Why is Java the best programming language?

Leave a Comment
Java is one of the best programming languages for development. The popularity and usage of Java are still increasing even after two decades which is a big time for any Programming language. There are only a few programming languages which seem hard to get replaced and Java is one of them. Here are 5 reasons why Java is the best programming Language.

1. Object Orientation

Java is an object-oriented programming language that supports all principles like Data Abstraction, Encapsulation, Polymorphism, Overloading, Overriding, and Inheritance. Which makes it as powerful as C++. We all know C++ is the extended version of C programming. Thus, it makes Java better than C programming.

2. Rich API

Another big reason to learn Java is it’s Rich API. Java provides API for almost everything you need in development like I/O, networking, utilities, XML parsing, database connection etc. Whatever left is covered by open source libraries like Apache Commons, Google Guava, and others.

3. Great collection of Open Source libraries

The big organisations like Apache, Google, and others has contributed to add a lot of great libraries, which makes Java development very easy, faster and cost effective.

4. Platform Independent and Free

In the 1990s, this was the main reason for Java’s popularity. The idea of platform independence is great, and Java’s tagline “write once run anywhere” was enticing enough to attract lots of new development in Java. This is still one of the reason of Java being best programming language, most of Java applications are developed in Windows environment and run in UNIX platform.

Java is free from the start, i.e. you don’t need to pay anything to create Java application. This FREE thing also helped Java to become popular among individual programmers, and among large organisations.

5. Wonderful Community and Documentation

There is Java community to help beginners, advanced and even expert Java programmers. Java actually promotes taking and giving back to community habit. Lots of programmers, who use open source, tester etc. The expert programmer provides advice FREE at various Java forums and StackOverflow. This is simply amazing and gives a lot of confidence to a newbie in Java.

Javadoc made learning easy and provide an excellent reference while coding in Java. With an advent of IDE, you don’t even need to look Javadoc explicitly in a browser, but you can get all information in your IDE window itself.


Java is everywhere, it’s on the desktop, it’s on mobile, it’s on a card, almost everywhere and so is Java programmers.
Integrated Development Environment (IDE) Like Eclipse and Netbeans made Java development much easier, faster and fluent. It’s easy to search, refactor and read code using IDEs.

Design Pattern: The Pipeline

Leave a Comment
Today I’ll have a look into the Pipeline pattern, a design pattern inspired from the original Chain of Responsibility pattern by the GoF.

The Chain Of Responsibility

Basically the Chain of Responsibility defines the following actors:
  1. Command: the object to be processed
  2. Handler: an object handling interface. There can be many handlers in the chain. This interface defines 2 methods
  • setSuccessor(Handler successor): defines the next successor in the chain to pass the command to
  • handle(Command command): handle the command
The goal of this pattern is to decouple the Command object from its processing chain. A new Handler can be easily added to the chain without breaking any existing code.


The disadvantage of this design is the number of operations needed when a new handler is added into the pipeline. You need to break an existing successor/predecessor link, insert the new handler then relink it to the chain. It is very similar to a linked list. If a successor link is badly set accidentally, the whole chain is broken.

In the context of dependency injection within a container (JEE or Spring) it is not easier either. 3 operations are required:
  • unlinking an existing handler
  • relinking the new handler with its successor
  • relinking the existing predecessor to the new handler

The Pipeline

When an user posts a new Tweet, it needs to be processed by several services:
  • contentValidationService: to limit the tweet size to N characters, including the URL shortening
  • xssEncodingService: encode the tweet content to prevent XSS (cross-site scripting) attacks
  • tweetService: simply persist the tweet in the repository
  • userlineService: put the tweet in the current user line
  • timelineService: put the tweet in the current user timeline
  • taglineService: put the tweet in the appropriate taglines if it contains tags
  • mentionlineService: put the tweet in the timeline of any mentioned users
  • contactsService: spread the tweet to the timeline of all followers of current user
As you can see, the tweet processing services resemble quite a lot to a pipeline of processors. And more importantly the processing order does really matter. xssEncodingService should be placed before the tweetService (responsible of the persistence in repository). Similarly, the tweetService needs to be placed before any line related service because in a multi-threaded environment, the tweet needs to be persisted before any user could read it.

The Pipeline pattern meet those requirements but is more flexible than the Chain Of Responsibility pattern. It defines the following actors:
  • Command: command object to be process. A tweet in our example
  • PipelineManager: central class of the pattern. Define the following attribute and method
    • handlers: list (ordered) of Handlers
    • doPipeline(Command object): execute the processing pipeline
  • Handler: an interface defining the process(Command object) method
Below is the formal UML class diagram:
The major difference with the Chain Of Responsibility pattern is the introduction of the PipelineManager actor. The flexibility of the Pipeline pattern comes from the fact that at any time, a new Handler can be injected into the pipeline through the PipelineManager.

Below is a sample Spring XML configuration illustrating dependency injection.

<bean id="tweetPipelineManager" class="service.pipeline.tweet.TweetPipelineManager">
    <property name="tweetHandlers">
        <list>
            <ref bean="tweetContentValidationService"/>
            <ref bean="xssEncodingService"/>
            <ref bean="tweetService"/>
            <ref bean="userlineService"/>
            <ref bean="timelineService"/>
            <ref bean="taglineService"/>
            <ref bean="mentionlineService"/>
            <ref bean="contactsService"/>
            <ref bean="retweetService"/>
        </list>
    </property>
</bean>

Adding a new tweetHandler is as easy as adding a new entry in the list of handlers. Furthermore, since the handlers collection is a List, the order is preserved.

HTML templating with jQuery

Leave a Comment
In this short article, we’ll see some techniques to generate DOM elements and section using jQuery and its chaining feature.

Dynamic DOM elements creation

There are 2 techniques out there to create dynamic DOM elements.
  1. Using the createElement() method on the Document standard oject:
    var myDiv = document.createElement("div");
    var mySpan = document.createElement("span");
    mySpan.createTextNode(displayedText);
    myDiv.appendChild(mySpan);

  2. Using jQUery
    var myDiv = $('div').append(($('span').html(displayText));
Clearly the second method is better and cleaner. However this technique is suitable only for occasionnal dynamic DOM creation. If you need to create iterable items (like a table or a list of items) with dynamic data, it’ll become quickly a nightmare.

HTML templating with jQuery

The idea

Here is a very common use case: you are designing a user search page. There is a search form with various parameters. Below the search section is the result section where you display all found users in a table.

In the old style architecture, a click on “Search” button will submit the form to the server, which will then execute the query and build the result table and render the response page. All the job is done server-side.

With RESTfull architecture, the server only takes care of the DB querying and will return a list of JSON objects as results. The client-side is responsible for the data formatting. There “HTML templating” comes into play.

The idea is to create a template row representing a result and hide it at the end of the page:

<div>
    <form id="searchForm">
        ...
        <button type="submit" onclick="doSearch()">Search</button>
        ...
    </form>
    <section id="resultArea">
        <table id="resultTable">
            <thead>...</thead>
            <tbody></tbody>
        </table>
    </section>
</div>    
<div id="templates" style="display:none;">
    <tr id="rowTemplate">
        <td class="name">Name</td>
        <td class="age">Age</td>
        <td class="skills">Java, C#, Python</td>
        <td class="xp">5 years</td>
    </tr>
     
    <span id="someOtherTemplate">
        ...
        ...
    <span>
</div>

As shown above, all the DOM element templates are nested inside a div container with style=”display:none;” so it will not be visible on the page.

The same div can contain several template elements, like our template row for search result or any other templates. You need to give them a unique id so they can be easily fetched using jQuery selector.

Implementation

Now let’s see how we can format result row upon reception of the JSON data from server. Let’s assume that the returned data is a list of JSON objects representing each an user details:

{
  "name": "John Skit",
  "age": "29",
  "skills": "Java, Groovy, HTML5",
  "xp": "7 years"
}

function doSearch()
{
    $.ajax({
        type: 'GET',
        url:  '/search',
        dataType: 'json',
        success: function(data)
        {
            $.each(data, function(index,user)
            {
                $('#rowTemplate').clone().attr('id','')
                .find('.name').html(data.name).end()
                .find('.age').html(data.age).end()
                .find('.skills').html(data.skills).end()
                .find('.xp').html(data.xp).end().
                .appendTo('#resultTable tbody');
            });
        }
    });
}

In the ajax success function, upon reception of the JSON data set, we iterate through the “data” list. For each user JSON we:
  1. clone the #rowTemplate
  2. clear its id
  3. find each row column by its class
  4. fill in its HTML content with JSON data
  5. append the whole row to the body of #resultTable
That’s pretty simple. The job is done with a jQuery chain using end() method to go back and forth to the row element .

Please notice that at step 2, we need to clear the id of the cloned element otherwise we’ll end up having several rows with the same id in the DOM, big trouble!

Of course, the example is quite straightforward because the logic is minimalist. If you have complex logics with if/else if blocks involved, all the jQuery chaining beauty is gone!

function doSearch()
{
    $.ajax({
        type: 'GET',
        url:  '/search',
        dataType: 'json',
        success: function(data)
        {
            var row;
            $.each(data, function(index,user)
            {
                row =$('#rowTemplate').clone().attr('id','');
 
                row.find('.name').html(data.name).end()
                .find('.age').html(data.age);
 
                // Age discrimination
                if(data.age > 50) 
                {
                    row.find('.age').addClass('oldDuffer'); 
                }
                 
                row.find('.skills').html(data.skills).end()
                .find('.xp').html(data.xp);
 
                // XP highlighting
                if(data.xp < 3) 
                {
                    row.find('.xp').addClass('junior'); 
                }
 
                row.appendTo('#resultTable tbody');
            });
        }
    });
}