Introduction to Web Scraping with HttpWebRequest using ASP.NET MVC 3

So, before we begin. Let me quickly highlight the points that I will be covering in this article

1. What is Web Scraping ?

2. Difference between Web Crawling and Web Scraping ?

3. Web Scraping using ASP.NET MVC

4. Summary / Further Reading.

So, lets get started…

1. What is Web Scraping ? (Wiki)

Web scraping is a computer technique of extracting information from a website. This is done by creating programs that processes the html web pages of the target web site, and extracting information out of it.

2. Difference between Web Crawling and Web Scraping ?

Crawling” refers to automatically retrieving web pages and following links to find still more web pages.

Scraping” means parsing those pages to extract pieces of information in a structured way. It also refers to creating a programmatic interface, an API, that interacts with a site through an HTML interface meant for humans.

In short “Crawling implies indexing, whereas scraping implies copying the content.”

3. Web Scraping using ASP.NET MVC

Below is what I had written to scrape data from my WordPress website, I have added comments wherever applicable to make the code easier to read.

Below are the constant that you need to define, ‘UserName’ and ‘Pwd’ are the login details to my WordPress account, ‘Url’ stand for the login page url and ‘ProfileUrl’ is the address of the page where the profile details are shown.

const string Url = "http://yassershaikh.com/wp-login.php";  
const string UserName = "guest";  
const string Pwd = ".netrocks!!"; // n this not my real pwd :P  
const string ProfileUrl = "http://yassershaikh.com/wp-admin/profile.php";  


public ActionResult Index()  
{  
    string postData = Crawler.PreparePostData(UserName, Pwd, Url);  
    byte[] data = Crawler.GetEncodedData(postData);

    string cookieValue = Crawler.GetCookie(Url, data);

    var model = Crawler.GetUserProfile(ProfileUrl, cookieValue);

    return View(model);  
}  

I had created a static class called “Crawler”, here’s the code for it.

// preparing post data  
public static string PreparePostData(string userName, string pwd, string url)  
{  
    var postData = new StringBuilder();  
    postData.Append("log=" + userName);  
    postData.Append("&");  
    postData.Append("pwd=" + pwd);  
    postData.Append("&");  
    postData.Append("wp-submit=Log+In");  
    postData.Append("&");  
    postData.Append("redirect_to=" + url);  
    postData.Append("&");  
    postData.Append("testcookie=1");

    return postData.ToString();  
}

public static byte[] GetEncodedData(string postData)  
{  
    var encoding = new ASCIIEncoding();  
    byte[] data = encoding.GetBytes(postData);  
    return data;  
}

public static string GetCookie(string url, byte[] data)  
{  
    var webRequest = (HttpWebRequest)WebRequest.Create(url);  
    webRequest.Method = "POST";  
    webRequest.ContentType = "application/x-www-form-urlencoded";  
    webRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2";  
    webRequest.AllowAutoRedirect = false;

    Stream requestStream = webRequest.GetRequestStream();  
    requestStream.Write(data, 0, data.Length);  
    requestStream.Close();

    var webResponse = (HttpWebResponse)webRequest.GetResponse();

    string cookievalue = string.Empty;  
    if (webResponse.Headers != null && webResponse.Headers["Set-Cookie"] != null)  
    {  
        cookievalue = webResponse.Headers["Set-Cookie"];

        // Modify CookieValue  
        cookievalue = GenerateActualCookieValue(cookievalue);  
    }

    return cookievalue;  
}

public static string GenerateActualCookieValue(string cookievalue)  
{  
    var seperators = new char[] { ';', ',' };  
    var oldCookieValues = cookievalue.Split(seperators);

    string newCookie = oldCookieValues[2] + ";" + oldCookieValues[0] + ";" + oldCookieValues[8] + ";" + "wp-settings-time-2=1345705901";  
    return newCookie;  
}

public static List<string> GetUserProfile(string profileUrl, string cookieValue)  
{  
    var webRequest = (HttpWebRequest)WebRequest.Create(profileUrl);

    webRequest.Method = "GET";  
    webRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2";  
    webRequest.AllowAutoRedirect = false;

    webRequest.Headers.Add("Cookie", cookieValue);

    var responseCsv = (HttpWebResponse)webRequest.GetResponse();  
    Stream response = responseCsv.GetResponseStream();

    var htmlDocument = new HtmlDocument();  
    htmlDocument.Load(response);

    var responseList = new List<string>();

    // reading all input tags in the page  
    var inputs = htmlDocument.DocumentNode.Descendants("input");

    foreach (var input in inputs)  
    {  
        if (input.Attributes != null)  
        {  
            if (input.Attributes["id"] != null && input.Attributes["value"] != null)  
            {  
                responseList.Add(input.Attributes["id"].Value + " = " + input.Attributes["value"].Value);  
            }  
        }  
    }

    return responseList;  
}  

4. Summary / Further Reading.

How to check if an uploaded file is an image or not in ASP.NET MVC 3

Before reading this article, you might want to know how to code for uploading a file using ASP.NET MVC 3. For this you check my previous post..

Ok, so in this article I’m gonna show how to determine if the file uploaded is an image or not. Now since we use using HttpPostedFileBase, I highly recommend that you read this msdn link, specifically these two properties of the HttpPostedFileBase class.

  • ContentType
  • FileName

Below is a screenshot in the debug mode, where in the action “GetFiles” I have posted a image file as an argument to it. Check out the two properties we talked about earlier.

Here the ContentType property reads “image/png” and the FileName property has the entire file path with the extension.

So keeping these two properties in mind, I have made this method, which you can use (or even extend it) in your project to find out if the file posted is an image or not.

private bool IsImage(HttpPostedFileBase file)  
{  
    if (file.ContentType.Contains("image"))  
    {  
        return true;  
    }

    string[] formats = new string[] { ".jpg", ".png", ".gif", ".jpeg" }; // add more if u like...

    foreach(var item in formats)  
    {  
        if (file.FileName.Contains(item))  
        {  
            return true;  
        }  
    }

    return false;  
}  

Cheers !

“The remote server returned an error: (417) Expectation Failed” error while data crawling – [SOLVED]

While working with HttpWebRequest for data crawling, I came across a very weird error,

The remote server returned an error: (417) Expectation Failed

Solution : The System.Net.ServicePointManager class has a static property named Expect100Continue. After setting this value to false, the error stopped.

So, here is what you should set to false,

System.Net.ServicePointManager.Expect100Continue = False  

MSDN Link

Using resultsFormatter in jQuery Tokeninput with ASP.NET MVC 3 Razor

This is in continuation on my previous post, where I had explained how to setup and use the jQuery Tokeninput. In this article, I will be showing you how to use the ‘resultsFormatter’ function.

resultsFormatter is a function that returns an interpolated HTML string for each result. Use this when you want to include images or multiline formatted results.

Jquery

$("#selectCity").tokenInput("@Url.Action("
    SearchWithCity ")", {
        propertyToSearch: "name",
        resultsFormatter: function(item) {
            return "<li>" + item.name + " - <i>" + item.city + "</i>" + "</li>";
        },
    });

and here is my controller action ‘SearchWithCity’…

[HttpGet]
public JsonResult SearchWithCity(string q)
{
    var searchResults = Helper.SearchContactByCity(q);
    var jsonResult = searchResults.Select(results => new { id = results.Id, name = results.Name, city = results.City });
    return Json(jsonResult, JsonRequestBehavior.AllowGet);
}

When you run this code, below is the output you should get.

It is as simple as this, you can include all html tags to customize your search results.

Using jQuery Tokeninput with ASP.NET MVC 3 Razor

We all know the need for having a autocomplete textbox in any web application. jQuery Tokeninput is one sucg jQuery plugin which allows your users to select multiple items from a predefined list, using autocompletion as they type to find each item.

Read Full Documentation from here.

Now lets integrate this plugin with our ASP.NET MVC 3 application.

Step 1 : Download !

Before reading any further, please download the plugin from here. Skip this step if you have already downloaded the plugin.

Step 2 : Unzip and Import !

Unzip the file downloaded and add the javascript file “jquery.tokeninput.js” to your solution as shown below.

Next up, import this js script to your view, I have added mine as shown below, and yes you also need to import jquery-1.5.1.min.js.

<script src="../../Scripts/jquery-1.5.1.min.js" type="text/javascript"></script>
<script src="../../Scripts/js/loopj/jquery.tokeninput.js" type="text/javascript"></script>    

Step 3 : How to write Jquery for this ?

First, we need to create a text field, Below is a text field that I have created for this example, note that my text-field has an id of “nameBox”.

This id will later be used in the jQuery TokenInput jquery call.

<p>
    Data from token input => <input type="text" id="nameBox" />
</p>

Here is the jquery call, which uses the id of the textbox we have defined above.

<script type="text/javascript">
    $("#nameBox").tokenInput("@Url.Action("SearchWithName")");
</script>

and here is how should write your action.

[HttpGet]
public JsonResult SearchWithName(string q)
{
    var searchResults = Helper.SearchContactByName(q);
    var jsonResult = searchResults.Select(results => new { id = results.Id, name = results.Name, city = results.City });
    return Json(jsonResult, JsonRequestBehavior.AllowGet);
}

Important Note (Source) : Your script should output JSON search results in the following format ONLY:

[
    {"id":"856","name":"House"},
    {"id":"1035","name":"Desperate Housewives"},
    ...
]

Now, lets run our application, you should get an output similar to the one shown below.

Output

Cheers !

Update : I have added a new post here, that show how to use **resultFormatter **with jQuery Tokeninput.

Update 2: We can change the default sent query param ‘q’ to say in the below used ‘searchTerm’ :

this.$("#abcTextbox").tokenInput("/Sports/GetAbc", {
    queryParam: "searchTerm"
});

Update 3: We can send extra param to the url using token input, as shown below :

this.$("#abcTextbox").tokenInput("/Sports/GetAbc?someParam1=cricket&someParam2=yasser", {
    queryParam: "q"
});

What are HTML Helpers ? How to use HTML Helpers in ASP.NET MVC 3 using Razor ?

Why HTML Helper ?

Its a common need to generate the “same” block of HTML and Razor code, over and over again repeatedly in different views. Now writing this code over and over again can be tedious and error prone.

MVC Framework provides HTML Helpers to solve this problem.

How to use HTML Helpers in ASP.NET MVC 3 using Razor ?

Step 1 : Create a class “Helper.cs” as shown below. Make sure that the class and all its method are declared as static.

Helper.cs

public static class Helper  
{  
    public static MvcHtmlString Greet(this HtmlHelper html)  
    {  
        string message = DateTime.Now.Hour < 12 ? "Good Morning." : "Good Afternoon.";

        return new MvcHtmlString(message);  
    }  
}  

The first parameter to a HTML helper method, will always be an HtmlHelper object with the this keyword.

Step 2 : To import the HTML helper into all your views, you have to include a reference in web.config of the Views folder.

<system.web.webPages.razor>  
    <host factoryType="System.Web.Mvc.MvcWebRazorHostFactory, System.Web.Mvc, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35" />  
    <pages pageBaseType="System.Web.Mvc.WebViewPage">  
    <namespaces>  
    <add namespace="System.Web.Mvc" />  
    <add namespace="System.Web.Mvc.Ajax" />  
    <add namespace="System.Web.Mvc.Html" />  
    <add namespace="System.Web.Routing" />  
    <add namespace="MvcApplication1.Utilities"/>  
    </namespaces>  
    </pages>  
</system.web.webPages.razor>  

Step 3 : Now go to your view, and you will see the the HTML helper method that we have just created, as in the screenshot below…

Index.cshtml

<p>  
@Html.Greet()  
</p>  

Step 4 :

Output

How to add a folder in Visual Studio 2010 Solution Explorer ?

It is a very common requirement to add a folder and all files in it, under a solution in Visual Studio 2010. So here’s a small guide on how to do it.

Procedure to be followed :
1. Select the “Show All Files” option.

2. Right click on the folder you want to include in the solution

3. Select the “Include in project” option. And the folder is included in your solution

That’s it ! Hope this helps :)

How to upload and display an image using Html.BeginForm in Razor View MVC 3

Uploading an image and then displaying it is a very common requirement for any web application. Recently I was required to allow a user to add a profile picture and later on display it on his/her dashboard.

So here is what I had done, hope this helps you too.

Razor Code

@using (Html.BeginForm("SaveSettings", "Blog", FormMethod.Post, new { enctype="multipart/form-data"}))
{
    @Html.TextBoxFor(m => m.BlahBlah)

    // and more...

    <input type="file" name="file" id="file" />
    <input type="submit" name="submitButton" value="Save" />
}

The important thing to note here is we have added an html property called enctype as enctype="multipart/form-data".

Next up, the action method for this form. The action accepts two input parameters :

  • UserModel model : has all data entered in the form.
  • HttpPostedFileBase file : here is where you will have your uploaded image posted to.

Code

[HttpPost]
public ActionResult Index(UserModel model, HttpPostedFileBase file)
{

if (file.ContentLength > 0) {

    // code for saving the image file to a physical location.
    var fileName = Path.GetFileName(file.FileName);
    var path = Path.Combine(Server.MapPath("~/Uploads/Profile"), fileName);
    file.SaveAs(path);

    // prepare a relative path to be stored in the database and used to display later on.
    path = Url.Content(Path.Combine("~/Uploads/Profile", fileName));
    // save to db

    return RedirectToAction("Index");

}

You may sometime also require to upload multiple images, that too isn’t difficult below is the code for that. Using IEnumerable.

Razor Code

@using (Html.BeginForm("SaveSettings", "Blog", FormMethod.Post, new { enctype="multipart/form-data"}))
{
    <input type="file" name="file" id="file" /> <input type="file2" name="file2" id="file2" /> <input type="submit" name="submitButton" value="Save" />
}

and here is how you will be reading the files posted.

[HttpPost]
public ActionResult Index(IEnumerable<HttpPostedFileBase> files) {
    foreach (var file in files) {

        // iterate through each file here...
    }
    return RedirectToAction("Index");
}

Now that you have uploaded the image and saved the image path, the next task is to display the image.

Displaying the uploaded image

<img width="200px" height="150px" src="@Url.Content(Model.ImageFilePath)"/>

Cheers !

**References **

http://haacked.com/archive/2010/07/16/uploading-files-with-aspnetmvc.aspx

http://stackoverflow.com/questions/7321383/displaying-an-uploaded-image-in-mvc-3-razor

http://stackoverflow.com/a/5248365/1182982

How to add / import namespace to Razor View in MVC 3

There are two methods in which you can add a namespace to a Razor view in ASP.NET MVC 3.

Method 1
The first way is to use the @using statement in .cshtml files, which imports a namespace to the current file only.

Example :-

@using Namespace1
@using Namespace2.SomeClass

Method 2
and the second way is to define these namespaces in the web.config of the View directory of your project.

Example:-

<system.web.webPages.razor>
  <pages pageBaseType="System.Web.Mvc.WebViewPage">
    <namespaces>
      <add namespace="System.Web.Mvc" />
      <add namespace="System.Web.Mvc.Ajax" />
      <add namespace="System.Web.Mvc.Html" />
      <add namespace="System.Web.Routing" />
      .
      <!-- can more more here... -->
    </namespaces>
  </pages>
</system.web.webPages.razor>

or you can add your custom namespace like this too :

<add namespace="Custom.Yasser" />
<add namespace="Custom.Mohsin" />

How to pass values from Controller to View using ViewBag in MVC 3

Hello, In this small article I have shown how to pass a value from a Controller to a View using ViewBag.

Below is the code.

Controller code

public class HomeController : Controller  
{  
    //  
    // GET: /Home/
    public ActionResult Index()  
    {  
        ViewBag.Greet = DateTime.Now.Hour < 12 ? "Good Morning" : "Good Afternoon";  
        return View();  
    }
}  

Razor code

@{  
    Layout = null;  
}

<!DOCTYPE html>

<html>  
    <head>  
    <title>Index</title>  
    </head>  
    <body>  
        <div>  
        @ViewBag.Greet, World !  
        </div>  
    </body>  
</html>

Output

Thanks !