Analyzing Performance: A Comparative Study of Two Approaches for Processing Large Datasets
Adam C. |

In the realm of data processing, the efficiency of code is paramount, especially when dealing with large datasets. This blog post aims to compare the performance of two code snippets designed to count and organize data from a sizable dataset. The snippets, albeit achieving the same result, implement different strategies to handle the data. We will dissect each approach, evaluating their strengths and weaknesses.

Photo by Scott Graham on Unsplash

Snippet 1: Functional Approach

The first snippet employs a functional programming paradigm, utilizing methods like flatMap and reduce to succinctly process the data. The primary steps involve flattening nested structures, aggregating quantities, and sorting the results. The final output is a well-organized array of publications with corresponding quantities.

if (data && data.publicationSentInquiries) {
  const midRows = data.publicationSentInquiries;
  rows = midRows.flatMap((item) =>
    item.inquiryPublications.map((pub) => ({
      inquiryId: item.id,
      publicationName: pub.name,
      publicationQty: parseInt(pub.qty, 10),
    }))
  );

  const publications = rows.reduce((acc, item) => {
    const key = item.publicationName;
    acc[key] = (acc[key] || 0) + item.publicationQty;
    return acc;
  }, {});

  sortedPublications = Object.entries(publications)
    .map(([publicationName, publicationQty]) => ({
      publicationName,
      publicationQty,
    }))
    .sort((a, b) => b.publicationQty - a.publicationQty);

  const totalSent = sortedPublications.reduce(
    (total, pub) => total + pub.publicationQty,
    0
  );

  sortedPublications.push({
    publicationName: "Total Sent",
    publicationQty: totalSent,
  });

  // Now, sortedPublications contains the desired result
}

Snippet 2: Iterative Approach

Contrastingly, the second snippet follows an iterative approach, utilizing forEach loops and conventional data manipulation techniques. It builds an array (rows) by iterating through the input data, incrementally accumulating publication quantities, and eventually sorting the results using an external library (orderBy).

let rows = [];
if (data && data.publicationSentInquiries) {
  const midRows = data.publicationSentInquiries;
  midRows.forEach((item) => {
    const publications = item.inquiryPublications;
    const pubRows = publications.map((pub) => ({
      inquiryId: item.id,
      publicationName: pub.name,
      publicationQty: pub.qty,
    }));
    rows = rows.concat(pubRows);
  });
}

const publicationRows = [];
const publications = {};
rows.forEach((item) => {
  if (publications[item.publicationName]) {
    const currentQty = publications[item.publicationName];
    publications[item.publicationName] =
      currentQty + parseInt(item.publicationQty, 10);
  } else {
    publications[item.publicationName] = parseInt(item.publicationQty, 10);
  }
});

const keys = Object.keys(publications);

let totalSent = 0;
keys.forEach((key) => {
  totalSent += publications[key];
  publicationRows.push({
    publicationName: key,
    publicationQty: publications[key],
  });
});

const sortedPublications = orderBy(
  publicationRows,
  ["publicationQty"],
  ["desc"]
);

sortedPublications.push({
  publicationName: "Total Sent",
  publicationQty: totalSent,
});

Performance Analysis:

Code Readability and Maintainability:

  • Snippet 1 leverages functional programming, resulting in concise and expressive code. The chaining of methods enhances readability and reduces the likelihood of errors.
  • Snippet 2, while functional, relies on traditional iteration methods, potentially making the code less intuitive for developers unfamiliar with such practices.

Execution Time:

  • Snippet 1 leverages built-in array methods, which are generally optimized for performance. The use of functional paradigms can lead to better execution times, especially when processing large datasets.
  • Snippet 2, though functional, may exhibit slightly higher execution times due to the manual iteration and reliance on an external library for sorting.

Memory Usage:

  • Snippet 1 employs methods like flatMap and reduce to manipulate data without creating intermediary arrays, potentially reducing memory overhead.
  • Snippet 2 builds arrays incrementally, possibly leading to higher memory consumption, especially when dealing with substantial datasets.

Flexibility:

  • Snippet 1 is modular and can be easily adapted to different data structures or additional processing steps.
  • Snippet 2, though effective, may require more effort to modify or extend due to its procedural nature.

Conclusion:

While both code snippets achieve the desired outcome, the choice between them depends on specific requirements. Snippet 1, with its functional approach, excels in readability and potential performance gains. Snippet 2, though slightly more traditional, might be preferable in situations where familiarity with conventional iteration methods is crucial.

Developers should carefully consider factors such as code readability, execution time, memory usage, and adaptability when choosing an approach. Ultimately, the decision should align with the project's objectives and the team's coding preferences.