Main Stories
Slash Boxes

Slash Open Source Project

Slashcode Log In

Log In

[ Create a new account ]

Article Poll

Poll I found this article to be
Very Helpful
Not Helpful
Not Very Helpful
[ Results | Polls ]
Comments:0 | Votes:0

Efficient implementation of nested comments

posted by Krow on 03:31 PM November 9th, 2000   Printer-friendly   Email story
Laubzega writes "Recently I've been working on implementation of nested comments for my website (, but it's polish-only). While this site gets maybe 1% of what Slashdot has to withstand in terms of hits, I wanted to create an efficient, scalable solution. And that's where the fun started.

I looked into SlashCode and while I don't grok Perl at all, I figured that it's querying all article's comments at once and then does the rest of processing in Perl (which I was unfortunately unable to follow).

For my site I created something completely different: I send an SQL query to the DB for every nesting level. This means about 5-8 queries for a typical page of my site (the peak was on a page with around 170 comments, where it needed 10 queries). Yes, the DB has more to do, but the results are such that I only have to layout them recursively, without any sorting or searching whatsoever.

My questions: has anyone done any research into efficient database layouts and clever SQL-queries (wrt to nested comments) that would extract as much as possible without forcing to much work on PHP/Perl/Java side? Also, what is the better solution - extract all comments in one go and then process them or rather have multiple queries returning better formed data sets (assuming fast database like MySQL)? Thanks."

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login
Loading... please wait.
  • I can see you're fairly proud of your methodology, but it really is better to take the time to form a SQL statement that does everything already. In my development of thatware [], that's what I've done and with one statement using WHERE and ORDER clauses, I call all the comments from the database for a story already formatted with the proper nesting. I just have one simple if statement to decide if it needs to go into a separate loop that nests the posts below each other for proper visual threading.
  • It would take a slightly more complex set of operations to calculate the "next" comment id before inserting a comment, but that's not as often an occurence as reading the comment table.

    Perhaps a comment could have two IDs? Arguably, this would make the tables not normalized, but it would certainly make the system more efficient.

  • I've been thinking of doing something like that (it's probably a common solution I think), but you need to pad the cids, if not you'll get things in the wrong order.



    So at the moment I'll be converting it to hex and padding stuff so that it'll be 8 bytes per cid (assuming a max of 2 billion or so comments overall). So it'll be 9 bytes per depth/level since I'm going to use another byte for the del
  • No, this isn't exactly what I did. Assume you have following comment list:

    level1 level2 level3

    Now, b
  • I'm glad you asked this question because it raises issues that we should all think about regarding Slash 1.09 and Bender. From Slash 0.9 to 1.09, a relatively large percentage of data model changes were made to work around performance problems that only affect sites that:
    1. have very high traffic, like Slashdot
    2. run on mySQL
    The data model changes I am referring to are things like:
    • the division of the user table into several separate tables
    • the creation of the new stories table, which serves as a cache for data which is already stored in the stories table
    The two changes I cite as examples both came in with Slash 0.9 because they were not present in Version 0.3 which my company uses for [].

    These changes in particular work around performance issues with versions of mySQL that may not persist forever and do not necessarily affect the target databases contemplated for the Bender release.

    My point is, you should not take the data manipulation code in Slash and analyze it, thinking that it is laid out for maximum efficiency in your application.

    I looked at [] in order to see how it works. Although I don't speak Polish at all, I will try to give some advice, which may or may not be worth two zlotys.

    If you are using mySQL, my experience is that the database handles simple SQL queries at relatively high net data rates. This leads me to believe that your approach of issuing several queries to compose the comment section of the page may be a good one.

    Databases with more sophisticated query optimizers (like Oracle, Sybase, DB/2, etc.) would probably benefit from doing a smaller number of SQL queries with more sophiticated SQL expressions. This is because the vendors have spent the money to make their query optimizers support the kind of queries that are required for applications like data warehousing.

    Enterprise SQL databases like these do not compare well to mySQL on response time for establishing and tearing down database connections, and responding to simple queries. The reason is simply that one system cannot be optimized to do everything well.

    In general, however, I think it is safe to say that the SQL code in many places in Slash could be optimized. But, the changes that would be indicated would depend on exactly what your configuration was and how your users use the site.

    Dave Aiello
    Chatham Township Data Corporation



    Dave Aiello
    Chatham Township Data Corporation []

  • by Anonymous Coward
    I'm currently writing the comment system for []. (The code will be posted eventually, but I don't feel that it's ready yet.)

    I've written the comment system in a way similar to yours. That is, an SQL query is done for each nesting level and for each thread. Here's a snippet of code [] (It's PHP). I wish that Slash would allow <PRE>.

    I'm pretty sure that this isn't the most efficient way of doing it, and would really appreciate some pointers. Would it be a good idea to create a separate comments table for each article? Currently, all of the comments are thrown into the same table with an article_id colum.

  • I've had an idea that would greatly simplify the task of ordering threads in an application like this. What if you assigned "comment ids" that had the parent thread id in them.

    For example, the first post (First Post!!) could be comment id 1. The first reply to it is 1.1 . The first reply to that is 1.1.1. The next reply at that level is 1.1.2 , and so on.

    It would take a slightly more complex set of operations to calculate the "next" comment id before inserting a comment, but that's not as often an occurence as reading the comment table.

    If you implemented something like this, all you could have to do is

    SELECT * from comments ORDER BY comment_id

    and the threads could come out in the correct order. It would be simple to determine the thread level by simply counting the number of decimal points in the comment_id.

    Anyone have any comments as to the practicality of this?
  • You can use PRE in Slash. However, since PRE allows users to totally screw up formatting, we have it turned off. You can turn it on my adding the PRE tag to in the right place.

    Note that we have added the "Code" post mode. It'd be nice to be able to embed Code snippets, too. One thing I like about the Infopop Ultimate Bulletin Board [] is the UBB Code []. We could have another post mode that allows you to do something pseudo-code like this, that Slash would then format into HTML.
  • by Anonymous Coward
    Can you give some details about how you did this, perhaps an example of the sql-statement? I tried to find that info on your site, but perhaps I am to dumb.
  • Hey, that's a very clever and simple idea. I'll implement it asap and see if it delivers. Thanks.