WYSIWYG

http://kufli.blogspot.com
http://github.com/karthik20522

Monday, December 9, 2013

Image processing Benchmarks


For this benchmark the following most widely used image processing libraries were considered.

- imagemagick [http://www.imagemagick.org/script/index.php]
- graphicsmagick [http://www.graphicsmagick.org/]
- epeg [https://github.com/mattes/epeg]
- opencv [http://opencv.org/]
- vips

Test environment:
Memory: 5.8 GB
Processor: Intel Xeon CPU W3530 @ 2.80Ghz x 4 Core
OS: Ubuntu 13.04 / 64 bit
Graphics: Gallium 0.4 on AMD Redwood

Original Image - 350KB - 3168x3168 pixels | Resized to 640x480
imagemagick x 3.69 ops/sec ±2.27% (23 runs sampled)
gm x 5.03 ops/sec ±0.68% (29 runs sampled)
opencv x 19.18 ops/sec ±1.27% (49 runs sampled)
epeg x 35.49 ops/sec ±1.16% (60 runs sampled)
vips x 40.62 ops/sec ±5.01% (69 runs sampled)

Original Image - 1 MB - 3000x2000 | Resized to 640x480
imagemagick x 4.97 ops/sec ±2.35% (29 runs sampled)
gm x 5.00 ops/sec ±0.54% (29 runs sampled)
opencv x 15.15 ops/sec ±1.36% (41 runs sampled)
epeg x 27.47 ops/sec ±0.98% (69 runs sampled)
vips x 36.26 ops/sec ±6.05% (89 runs sampled)

Original Image - 15MB - 5382x6254 pixels | Resized to 640x480
imagemagick x 0.87 ops/sec ±1.20% (9 runs sampled)
gm x 0.87 ops/sec ±0.66% (9 runs sampled)
vips x 1.74 ops/sec ±0.43% (13 runs sampled)
opencv x 1.88 ops/sec ±4.09% (9 runs sampled)
epeg x 3.87 ops/sec ±0.78% (14 runs sampled)

From the above results, VIPS seems to be the fastest among all followed by epeg and opencv. But one thing to consider is the features provided vs performance. Libraries such as VIPS and EPEG are more optimized towards image resizing and image cropping while opencv, graphicsmagick and imagemagick provides a slew of image processing and analysis features.

Code snippet for benchmarking: https://gist.github.com/karthik20522/7605083

Labels: ,

Monday, November 18, 2013

Speedier upload using Nodejs and Resumable.js

[updated] View source code at https://github.com/karthik20522/MultiPortUpload

Resumable.js is by far one of the best file uploading plugin that I have used followed by Plupload. Resumable.js provides offline mode features where if a user gets disconnected while uploading it would automatically resume when online. Similar to Plupload it has chunking options. Nodejs on other hand provides a non- blocking functionality which is perfect for uploading purposes.

There is no upload speed difference between upload plugins (resumablejs, plupload etc) except for few features hear and there. Recently I developed a proof of concept for speedier upload using existing plugins and systems. As part of the research was to emulate other file accelerator where multiple ports are used to upload files, thus making uploading quicker.

Using the same above concept, I modified the resumable.js to accept multiple urls as an array and upload individual chunks to different urls in a round-robin style. On the backend I spawned nodejs in multiple ports. But resumable.js only uploads multiple chunks in parallel but not multiple files. This limitation was overcome with some simple code change and following is a test result with various scenarios.

Note: in resumable.js, simultaneous sends option was set to 3

Single Server single file upload Multiple Server single file upload Multiple server + multiple file upload
1 file (109MB) 54secs 56 secs 56 secs
59 file (109MB) 152secs 156 secs 17 secs


Single Server single file upload – default configuration on resumable.js and single Node.js server to accept files/chunks
Multiple Server single file upload – modified resumable.js to take multiple urls and Node.js was configured to listen to different ports (3000, 3001, 3002). Resumable.js when uploading chunks would upload to different ports in parallel.
Multiple Server + multiple file upload – modified resumable.js to upload multiple files and multiple chunks in parallel instead of one file at a time.

But the above test results are for only 3 simultaneous connections. Modern browsers can handle more than 3 connections, following is the number of connections per server supported by current browsers. The theory is that browsers make parallel connections when different domains are used and uploading parallel would make use of full user bandwidth for faster upload.

BrowserConnections
IE 6,72
IE8 6
Firefox 2 2
Firefox 3 6
Firefox 4 6 (12?)
Safari 4 6
Opera 4
Chrome 6 7


Let’s test the above scenario with 10 simultaneous connections:

Single Server single file upload Multiple Server single file upload Multiple server + multiple file upload
1 file (109MB) 27 secs 18 secs 18 secs
59 files (109MB) 156 secs 158 secs 14 secs


Server was using almost entire Bandwidth on a multi-file upload! ~1Gbps; 986Mbps!

As you can clearly see from the above results having different upload endpoints (ports/host-names) would allow browser to make parallel connections as it would treat as a new host.

Advantages:
  • Customizable. In house development
  • As Fast as user bandwidth
  • Use Resumable.js plugin for offline support! Win-Win for everyone!
Disadvantages:
  • Html5 only i.e. No IE 9 and below support!
  • Server s/w needs to be reliable enough to handle huge data & IO operations
Note: Maximum chunk size for the above test were set to 1MB. There is a bit of code which determines the user Internet speed and determines the chunksize; I am doing this basically by downloading a JPEG file and calculating the time token to download. This chunkSize calculation is just a POC

Labels: , ,

Wednesday, October 23, 2013

Smart Thumbnail Cropping

Scaling an image down to a thumbnail size is a common practice when hosting it on websites to reduce page load time and save bandwidth and so on.. But very little has been done in optimization of those thumbnails from a human view-ability point of view. Human view-ability, what? Take a large image where background covers the major part of the image and shrink it down to a thumbnail size (say 192 px) and notice that the details of the image is subdued by the background.

To solve this problem of smart cropping, I am using a variation of descriptors and image processing tricks to extract only the most feature rich part of the image and preserve the aspect ratio while cropping the image. Following are the test results of algorithm used:
Sample 1
Original Thumbnail Feature Extraction Cropped Thumbnail
Sample 2
Original Thumbnail Feature Extraction Cropped Thumbnail
Sample 3
Original Thumbnail Feature Extraction Cropped Thumbnail


So what's in the pipeline:
  • Open source the image processing code
  • Build a http handler (ASP.NET http handler) for dynamic cropping
Thoughts?

Labels: , ,

Friday, October 11, 2013

Event Viewer - Image Search

"Event Viewer" is an yet another attempt to visualize images similar to my {re}Search Timeline project.
Demo at : http://karthik20522.github.io/EventViewer



The whole point of this proof-of-concept project is to visualize the images from the perspective of the events rather than just displaying an grid of images. For example, a search on GettyImages.com website basically displays a list of images in a tabular fashion which provides no sense of association between individual images being displayed. But having them grouped together as part of an event provides a sense of association and correlation between images.

Displaying images is always a tricky business. A dominant color filter technique could probably provide an alternate way of scanning thru images as user might be more interested in images of particular color than the fine details of an image.

From a technology standpoint for building this project was nothing special.
  • ASP.NET MVC 4 - Razor
  • Amazon SQS - for event scrapping from GettyImages
  • Connect API for event and image detail lookup
  • MongoDB
  • Dominant Color Extraction
Source code at: https://github.com/karthik20522/EventViewer

Labels: , , ,

Tuesday, October 1, 2013

Development Stack and stuff

For the past year or two, I had been dabbling with different technologies, frameworks looking for an ideal combination of frontend, backend and development tools. Following are what I tend to use and recommend for both personal and consultancy projects.
I usually customize most of these open-source projects as per my needs

Project Management Database
  • MongoDB
CMS Bulletin Board Configuration Management Logging and Analysis Search Source Control AWS – Amazon Web Services
  • SQS – Simple Queue service
  • SES – Simple Email Service
Hosting Image processing Development Tools/Frameworks/Languages Misc Libraries and websites

Labels:

Friday, September 6, 2013

Spray.io REST service - Http Error Codes - Lesson 8

View the lessons list at https://github.com/karthik20522/SprayLearning

In the previous post in this series about Spray.io REST API, I talked about simple error handling and rejections. What about errors in the context of RESTful API best practices? From the perspective of the developer consuming your Web API, everything is a black box. When developing their applications, developers depend on well-designed errors when they are troubleshooting and resolving issues when using your API.

There are over 70 HTTP status codes but offcourse we don't need to use all of them. Check out this good Wikipedia entry for all HTTP Status codes. So in Spray, we basically can implement all the status codes in the complete magnet depending on if its an Exception or Rejection.
Checkout out these error/httpStatus code from various RestAPIs:

Labels: ,

Thursday, September 5, 2013

Useful Git Alias and Hooks

Git by itself is very raw and sometime hard to read the console outputs. Following are some of git alias that I tend to use on a daily basis.

git lg

This is a replacement to git log. git log doesn't provide any useful information such as branch name nor it's pretty looking! Yes pretty looking like colors and such. Following git alias solves this mundane problem.
Following is a comparison screenshot of git lg vs git log

Using git log:


Using git lg alias:

git unpushed

Every now and then I try to look for all uncommitted changes. But this quite a pain to look for from git console. Problem solved with the following alias:

git undo

git out-of-the-box doesn't provide an undo option to revert the previous commit. Pain! Following alias intends to solve the problem:


Pre-push hooks

One of the most important hooks that I use is a pre-push hook that executes unit tests before pushing to "master" branch. This is important as it can save broken test cases being pushed to master branch. Following hook script:

Labels:

Monday, August 19, 2013

Spray.io REST service - Exception, Rejection and Timeout Handling - Lesson 7

View the lessons list at https://github.com/karthik20522/SprayLearning

Handling exceptions with the application and returning a valid http response with message is probably the way to go for building readable REST Api's. In spray,exceptions thrown during route execution bubble up thru the route structure up to the next enclosing handleExceptions directive. If you’d like to customize the way certain exceptions are handled simply bring a custom ExceptionHandler into implicit scope of the runRoute wrapper. For example:

More information on Handling Exceptions can be found at spray-routing/key-concepts/exception-handling/

How about handling Rejections? Similar to handling exceptions we can handle rejections in similar fashion. In this example I have created a separate trait with the rejection handler. I came across an issue with some conflict with shapeless syntax and rejection handler syntax "::"
More information on Handling Rejections at spray-routing/key-concepts/rejections/

Timeout Handling: spray-routing itself does not perform any timeout checking, it relies on the underlying spray-can to watch for request timeouts. The timeout value is defined in the config file (application.conf) More information on Timeout Handler at spray-routing/key-concepts/timeout-handling/

Labels: ,

Sunday, August 18, 2013

Spray.io REST service - API Versioning - Lesson 6

View the lessons list at https://github.com/karthik20522/SprayLearning

Before we get into spray, would recommend reading different ways of Versioning an API or best practices of versioning an API at best-practices-for-api-versioning or versioning-rest-api.

To summarize the stackoverflow discussion, there are 3 ways to do versioning.
  • Header based - using X-API-Version
  • URL based - http://{uri}/v1/getCustomer
  • Content negotiation via Accept headers - application/vnd.example.v1+json (mediatype)
For this tutorial, I would be implementing the first two.

1) Header based - using X-API-Version Here I am building a Directive that extracts the version from the request header. If not exists than I am defaulting to 1 In the above code there are two keywords, "extract" and "provide". These are part of Sprays BasicDirective. "extract" basically allows you to extract a single value and "provides" allows you to inject a value into the Directive. But wait, we can make this trait even smaller by getting rid of the provide all together to something as follows: Note: there are more than one way to write same operation in scala/spray. bane of my existence!
More info at: spray/routing/directives/BasicDirectives.scala

Now that we have defined the directive, we just need to inherit the directive to service trait and call versioning to extract the version number from X-API-Version header field. 2) url based - http://{uri}/v1/getCustomer
Here we are basically performing regex on the incoming request.uri and extracting the version out of "v*". Spray provides quite a few PathFilters and one of them being PathMatcher. More info at:

Labels: ,

Saturday, August 17, 2013

Spray.io REST service - Authentication - Lesson 5

View the lessons list at https://github.com/karthik20522/SprayLearning

"Directives" are small building blocks of which you can construct arbitrarily complex route structures. A directive does one or more of the following:
  • Transform the incoming RequestContext before passing it on to its inner route
  • Filter the RequestContext according to some logic, i.e. only pass on certain requests and reject all others
  • Apply some logic to the incoming RequestContext and create objects that are made available to inner routes as "extractions"
  • Complete the request
More detailed information on directives can be found at http://spray.io/documentation/1.2-M8/ or here Spray-routing/predefined-directives-alphabetically/. In future examples, I would be using Directives so basic understanding would be useful. For this example I would be using Authentication directive to validate the request with a username and password as part of the header.

So, for authentication a new UserAuthentication trait is created which has a function that returns a ContextAuthenticator. The authenticate directive expects either a ContextAuthenticator or a Future[Authenication[T]]. At the crust of ContextAuthenticator is basically Future[Authenication[T]].

Below code that describes the authentication directive basically takes in a ContextAuthenticator and returns a Future of Authentication type of Either Rejection or an Object of type T. What this means is that Either a Rejection is returned (Left) like FailedAuthenticationRejection when credentials are missing or failed; or when successful an object is returned (Right)

The UserAuthentication trait has two functions, one that returns a ContextAuthentication and other that returns a Future. In this example, I am reading the username and password from the application.conf file and validating against it.
Note that I am importing "scala.concurrent.ExecutionContext.Implicits.global", this is because futures require an ExecutionContext and I letting spray to use the default actor ExecutingContext. //implicit val system = ActorSystem("on-spray-can")

Now that we have the authentication setup, we need to inherit the UserAuthenticationTrait using "with" and use the "authentication" directive by passing the "authenticateUser" function that we defined.


Further readings on this topic:
Spray/routing/authentication/HttpAuthenticator.scala
Spray/routing/SecurityDirectivesSpec.scala
https://groups.google.com/forum/#!topic/spray-user/5DBEZUXbjtw
Spray/routing/RejectionHandler.scala

Labels: , ,

Friday, August 16, 2013

Spray.io REST service - MongoDB - Lesson 4

View the lessons list at https://github.com/karthik20522/SprayLearning

Now that we have Spray Service Setup and Routes defined, we can now hookup the database to add and fetch customer information. For the database I am using MongoDB and Casbah. Casbah is a Scala toolkit for MongoDB. Casbah is a "toolkit" rather than "driver", as Casbah is a layer on top of the official mongo-java-driver for better integration with Scala.

To get Casbah setup, we need to add casbah and it's related dependencies to the Build.scala file.
Note that we have slf4j and scala logging in the dependencies. Without slf4j you would get "Failed to load class org.slf4j.impl.StaticLoggerBinder" error.

In my example, I have created a MongoFactory that has 3 functions: getConnection, getCollection and closeConnection.
Now that we have our Factory method, the next is building the data access class for inserting and fetching data. Following code snippet has 2 operations:
  • saveCustomer - which returns back the GUID after inserting into MongoDB
  • findCustomer - find customer by GUID
Now to integrate this to the service: More information and resources on Casbah:

Labels: , ,

Thursday, August 15, 2013

Spray.io REST service - Json Serialization, De-serialization - Lesson 3

View the lessons list at https://github.com/karthik20522/SprayLearning

For a REST based interface, JSON request and response is the norm. There are two ways to set the response format in Spray.

1) using respondWithMediaType In this method, each route/action would have to be explicitly set with media type of json. More info at https://github.com/spray/spray/wiki/Misc-Directives

2) Globally overriding the default format.

For all Json related operations, I am using Json4s as the json library. In the build.scala add the following json4s dependency. Note: you would need to reload the project in sbt and regenerate Eclipse. Eclipse does not refresh itself when new dependencies are added

In the CustomerServiceActor we will need to set the default Format to json4sFormats by adding the implicit formatter For example purpose, lets update the getCutomer action to return a mocked customer. The expected response should be a json formatted customer object.
To post a customer object and deserialize it to customer type, we can use spray entity [http://spray.io/documentation/1.1-M8/spray-httpx/unmarshalling/] and cast the post value to a JObject Further reading on json4s at https://github.com/json4s/json4s

Labels: ,

Spray.io REST service - Controller, Actions, Multiple controllers - Lesson 2

View the lessons list at https://github.com/karthik20522/SprayLearning

Once you have the Spray.io project setup (http://kufli.blogspot.com/2013/08/sprayio-rest-web-api-basic-setup.html) you can find the boot file and controller files in the "src/main/scala/com/example" folder path. There are two files that are of interest. One is the Boot.scala and other MyService.scala. Boot.scala is basically the placeholder to define the controller (single, multi controllers) and the http binding. Actions of these controllers are coded in the MyService.scala file.

In the following example, I am renaming the MyService.scala to a more meaningful service to CustomerService.scala. Before we start working on the project would like to generate the eclipse project files for the code to be edited on Eclipse rather than Sublime or any text editor.

To get it setup on eclipse, you can run the “eclipse” command within the sbt console. Note that, in plugins.sbt you should have sbteclipse plugin. View the initial project setup http://kufli.blogspot.com/2013/08/sprayio-rest-web-api-basic-setup.html

Following screenshot is the structure of the project within eclipse.



CustomerService.scala class:

This class basically is the controller class with Actions/routes defined in them. I personally like to separate the service behavior from service actor as we want to be able to test it independently without having to spin up an actor.
customerRoutes is the place holder where all the routes/actions are defined and implemented. An example of how routes look is as follows: The urls for accessing the above routes are as :

How about multiple controllers? It is as easy as creating a new trait extending HttpService and piping into the existing route. For example, if we want to create an Ajax service for all search related operations, following is a code snippet: The above project can be downloaded at https://github.com/karthik20522/SprayLearning

Labels: ,

Spray.io REST Web API - Basic setup

What is Spray: "spray is an open-source toolkit for building REST/HTTP-based integration layers on top of Scala and Akka. Being asynchronous, actor-based, fast, lightweight, modular and testable it's a great way to connect your Scala applications to the world". Basically spray is web framework for building REST based services similar to ASP.NET WebAPI. Spray can be used for hosting both a rest based service as well as web pages (static/dynamic)

To get spray up and running you can download the SprayTemplate project from Github.com/spray
Note: there are multiple branches of spray with different spray, akka and server versions. For this example I am using on_spray-can_1.2 branch which has spray-can (server). Scala 2.10, Akka 2.2 and Spray 1.2

Folder Setup:



There are two important files that one must be aware of.

1) build.sbt : this file contains basic information of your project like com name, version and scalaversion. But addition to that there is libraryDependencies which has a list of external/internal libraries that the project might use. For example, if we decide to use json4s library or a mongoDB library, this is the place we would be adding to.
If you notice in the libraryDependecies we have {"io.spray"%"spray-can"% "1.2-M8"}, this can be read as {“GroupId” %”DependencyName”%”Version”}. But what about double “%%”, this is basically letting sbt know to use the version that is compatible with the current version of scala that is being used.

These plugins can be located at maven repository at mvnrepository.com. An example dependency: http://mvnrepository.com/artifact/org.json4s/json4s-native_2.10/3.2.5

2) plugins.sbt: this file contains the plugins that sbt intends to use. Note that plugins.sbt requires an extra line break between the plugins. An example plugins.sbt looks like the following:



To run the project, in the project folder run the following commands:

The above project can be downloaded at https://github.com/karthik20522/SprayLearning

Labels: ,

Friday, August 9, 2013

Beers in my Belly - XX


Sculpin IPA

Jenlain ambree

Labels:

Friday, July 26, 2013

Spray.io Quick Resources

Spray Authentication Routing References
  • http://www.gtan.com/akka_doc/scala/routing.html
  • http://spray.io/documentation/spray-routing/
  • https://github.com/spray/spray/wiki/Authentication-Authorization
  • https://github.com/spray/spray/wiki/Configuration
  • https://github.com/spray/spray/wiki

Spray Test
  • http://spray.io/documentation/spray-testkit/
  • http://etorreborre.github.io/specs2/guide/org.specs2.guide.Runners.html#Via+SBT

Spray Logback/Loggin
  • http://doc.akka.io/docs/akka/2.1.0/scala/logging.html
  • http://tantrajnana.blogspot.com/2012/10/using-c3p0-with-logback.html
  • http://doc.akka.io/docs/akka/snapshot/java/logging.html

Labels:

Friday, July 12, 2013

Beers in my Belly - XIX


Bretagne Celtic

Kronenbourg

Labels:

Thursday, July 11, 2013

Swaggered Development

The whole notion of interface based development model is to separate the backend developers from frontend developers by exposing the operations and models. While models represent the entity fields and its properties like required status, max length and other validation characteristics etc, operations represents the restful endpoints or API operations for the client to consume.
Swagger is an API documentation service that help define the operations and models that are both machine readable and human readable. Swagger exposes json schema (draft3 representation) of operations and models for machine readability and a UI that represents the operations, request, responses, summary, error codes etc for human readability.

By exposing the endpoints information and request and response parameters, you have effectively provided an insight to your operations. On any development project consisting on front end developers and backend developers, its always a waiting game for front end devs as they are dependent on api and its documentation for them to consume and develop. By providing an insight to the operations would allow consuming developers to start immediately mocking the REST service responses.

Much said, let’s build a simple swagger doc using Swagger.NET and WebApi.Swagger.net and SwaggerUi can be downloaded as nuget packages.


Once these two packages are installed make sure to have the XML documentation enabled in the project properties of your solution.


To get your API endpoints swaggerified, add XML comments, remarks and parameters information to your Action XML description such as follows:
Basically upon compilation, a resource list file (json) is created for machine readability and swaggerUI provides a very neat API explorer and documentation interface for human readability similar to below.


In part 2 of Swaggered Development, I shall show how to generate class files from resource files and also validation.

Labels:

Tuesday, July 9, 2013

Log analysis using Logstash, ElasticSearch and Kibana

Introduction:
Logstash is a free tool for managing events and logs. It has three primary components, an Input module for collecting logs from various sources [http://logstash.net/docs/1.1.13/], a parsing module for tweaking and parsing data and finally a storage/output module to save or pass along the parsed data to other systems [http://logstash.net/docs/1.1.13/].
ElasticSearch is this awesome distributable, RESTful, free Lucene powered search engine/server. Unlike SOLR, ES is very simple to use and maintain and similar to SOLR, indexing is near realtime.
Kibana is a presentation layer that sits on top of Elasticsearch to analyze and make sense of logs that logstash throws into Elastic search; Kibana is a highly scalable interface for Logstash and ElasticSearch that allows you to efficiently search, graph, analyze and otherwise make sense of a mountain of logs.
Logstash + ElasticSearch + Kibana combination can be compared to open sourced Splunk but on a smaller scale.

Setup:
Logstash, is as easy as downloading [http://logstash.net/] the JAR file and setting up the input and ouput sources and running the java command. In this example, I will be monitoring a log file and writing it into Elasticsearch server for users to analyse the data using Kibana.


Elasticsearch: download the zip package from the site [http://www.elasticsearch.org/download/] and run the elasticsearch.bat file.
Note: make sure the JAVA_HOME is setup up right for the logstash and elasticsearch to work.



Kibana: download the kibana files from Github [https://github.com/elasticsearch/kibana] and either run it as a standalone app or make it part of ElasticSearch plugins. You can do this by copying the kibana files to the ElasticSearch plugins / sites directory.



*Open config.js in your favorite editor
*Set elasticsearch: 'http://localhost:9200', to your ElasticSearch server

Use case:
In general most of these log analyzer always talk about analyzing website traffic etc similar to the videos that Kibana has on their website. [http://kibana.org/about.html]. This is great but in real world logs and events are more than just website traffic such as information flow checkpoints, performance data etc.
In our case, lets assume we have some data that is being passed from one system to another and we are logging to a file. A simple representation of this information flow is as follows:



So basically there are 4 systems or states that the data is passed thru, Ingest, Digest, Process and Exit. At each of these systems, an event is logged to track the data flow or basically checkpoints. These events are logged in dataLog.log file as mentioned in the above logstash configuration file.

Once the logstash is up and running, logstash basically tails the files and copies the logged events to elastic search as JSON objects. Elasticsearch index;s all the fields and kibana is now ready to access the data. Following are some of the cases that can be analyzed using Kibana:

Show all data flowing thru the system

Filter by Id


Get All Error'd


Advanced Filter using Lucene Syntax



The above reporting/analysis are just a few examples that can be achieved using Kibana + Elasticsearch. With Kibana you can design your own custom dashboards with configurable panels that can be grouped by role. Charts and panels are fully interactive with features like drill down, range selection and customization. With using Elasticsearch, rapid data growth is as easy as adding more ES servers (in cluster).

Labels: ,