ONDotNet.com    
 Published on ONDotNet.com (http://www.ondotnet.com/)
 See this if you're having trouble printing code examples


O'Reilly Book Excerpts: C# Cookbook

Cooking with C#, Part 2

by Stephen Teilhet and Jay Hilyard

Editor's note: Stephen Teilhet and Jay Hilyard, authors of the recently released C# Cookbook, hand-selected these recipes to excerpt on ONDotNet to give you a real glimpse at the kinds of solutions you'll find in the book. Like all the recipes in this latest release in O'Reilly's cookbook series, the solutions here get straight to the heart of the problem, like how to use the GetHTMLFromURL method to grab the HTML you want from a URL. And in case you missed them, check out the first batch of recipes the authors chose for publishing here.

Recipe 13.8: Obtaining the HTML from a URL

Problem

You need to get the HTML returned from a web server in order to examine it for items of interest. For example, you could examine the returned HTML for links to other pages or for headlines from a news site.

Solution

We can use the methods for web communication we have set up in Recipe 13.5 and Recipe 13.6 to make the HTTP request and verify the response; then, we can get at the HTML via the ResponseStream property of the HttpWebResponse object:

public static string GetHTMLFromURL(string url)
{
    if(url.Length == 0)
        throw new ArgumentException("Invalid URL","url");

    string html = "";
    HttpWebRequest request = GenerateGetOrPostRequest(url,"GET",null);
    HttpWebResponse response = (HttpWebResponse)request.GetResponse( );
    try
    {
        if(VerifyResponse(response)== ResponseCategories.Success)
        {
            // get the response stream.
            Stream responseStream = response.GetResponseStream( );
            // use a stream reader that understands UTF8
            StreamReader reader = new StreamReader(responseStream,Encoding.UTF8);

            try
            {
                html = reader.ReadToEnd( );
            }
            finally
            {
                // close the reader
                reader.Close( );
            }
        }
    }
    finally
    {
        response.Close( );
    }
    return html;
}

Discussion

The GetHTMLFromURL method is set up to get a web page using the GenerateGetOrPostRequest and GetResponse methods, verify the response using the VerifyResponse method, and then, once we have a valid response, we start looking for the HTML that was returned.

C# Cookbook

Related Reading

C# Cookbook
By Stephen Teilhet, Jay Hilyard

The GetResponseStream method on the HttpWebResponse provides access to the body of the message that was returned in a System.IO.Stream object. In order to read the data, we instantiate a StreamReader with the response stream and the UTF8 property of the Encoding class to allow for the UTF8-encoded text data to be read correctly from the stream. We then call ReadToEnd on the StreamReader, which puts all of the content in the string variable called html and return it.

See Also

See the "HttpWebResponse.GetResponseStream Method," "Stream Class," and "StringBuilder Class" topics in the MSDN documentation.

Recipe 15.8: Synchronizing the Reading and Writing of a Resource Efficiently

Problem

You have a resource that is shared by multiple threads. You need to provide exclusive access to this resource when a thread is writing to it. However, you do not want the overhead of providing exclusive access to this resource when multiple threads are only reading from it. You want to allow one thread to access a shared resource only if it is writing to it, but you also want to allow multiple threads to read from this resource. While multiple threads can read from a resource, a write operation cannot occur while any thread is reading from this resource.

Solution

Use the ReaderWriterLock class from the FCL. The ReaderWriterLock is optimized for scenarios where you have data that changes infrequently but needs protection for those times when it is updated in a multithreading scenario. To illustrate, the GradeBoard class represents a board where an instructor will post the grades students received from a class. Many students can read the grade board, but only the instructor can post a grade (write) to the grade board. Students will not, however, be able to read from the board while the instructor is updating it:

class GradeBoard
{
    // make a static ReaderWriterLock to allow all student threads to check
    // grades and the instructor thread to post grades
    static ReaderWriterLock readerWriter = new ReaderWriterLock( );

    // the grade to be posted
    static char studentsGrade = ' ';

    static void Main( )
    {
        // create students
        Thread[] students = new Thread[5];
        for(int i=0;i<students.Length;i++)
        {
            students[i] = new Thread(new ThreadStart(StudentThreadProc));
            students[i].Name = "Student " + i.ToString( );
            // start the student looking for a grade
            students[i].Start( );
        }

        // make those students "wait" for their grades by pausing the instructor
        Thread.Sleep(5000);

        // create instructor to post grade
        Thread instructor = new Thread(new ThreadStart(InstructorThreadProc));
        instructor.Name = "Instructor";
        // start instructor
        instructor.Start( );

        // wait for instructor to finish
        instructor.Join( );

        // wait for students to get grades
        for(int i=0;i<students.Length;i++)
        {
            students[i].Join( );
        }
    }

    static char ReadGrade( )
    {
        // wait ten seconds for the read lock
        readerWriter.AcquireReaderLock(10000);
        try
        {
            // now we can read safely
            return studentsGrade;
        }        
        finally
        {
            // Ensure that the lock is released.
            readerWriter.ReleaseReaderLock( );
        }
    }

    static void PostGrade(char grade)
    {
        // wait ten seconds for the write lock
        readerWriter.AcquireWriterLock(10000);
        try
        {
            // now we can post the grade safely
            studentsGrade = grade;
            Console.WriteLine("Posting Grade...");
        }        
        finally
        {
            // Ensure that the lock is released.
            readerWriter.ReleaseWriterLock( );
        }
    }

    static void StudentThreadProc( )
    {
        bool isGradeFound = false;
        char grade = ' ';
        while(!isGradeFound)
        {
            grade = ReadGrade( );
            if(grade != ' ')
            {
                isGradeFound = true;
                Console.WriteLine("Student Found Grade...");
            }
            else // check back later
                Thread.Sleep(1000);
        }
    }

    static void InstructorThreadProc( )
    {
        // everyone likes an easy grader :)
        PostGrade('A');
    }
}

Discussion

In the example, the ReaderWriterLock protects access to the grade resource of the GradeBoard class. Lots of students can be continually reading their grades using the ReadGrade method, but once the instructor attempts to post the grades using the PostGrade method, the grade resource is locked so that no one but the instructor can access it. The instructor updates the grades and releases the lock, and the pending student read requests are allowed to resume. All students continue to read the grade board, check to see if the grades have been posted, and then wait before making another request. Once the grades are posted, each student finds it, and the thread for that student terminates.

The Main method calls Join on the instructor and student threads to wait until those threads finish before continuing and ending. If it did not do this, the program could potentially end before the threads finish. It protects against a ThreadInterruptedException, as the Join calls could potentially throw this if the thread aborts. The threads are named using the Name property to ease debugging.

See Also

See the "ReaderWriterLock Class" and "Thread Class" topics in the MSDN documentation.

Copyright © 2009 O'Reilly Media, Inc.