Overview

Collaborative filtering for Better Carbon Calculators

This in-class exercise was done towards the end of an introductory course on Artificial Intelligence, as part of a regular computer science and engineering program ABET accreditation on the criterion concerned with lifelong learning. One week before the in-class exercise, students were told that they would have an open book and open note exercise on relationships between

(a) a paper, Ross, J., Shantharam, N., and Tomlinson, B. (2010) "Collaborative Filtering and Carbon Footprint Calculation". In: Proceedings of the 2010 International Symposium on Sustainable Systems and Technology (ISSST). Washington D.C.: IEEE (retrieved from http://www.ics.uci.edu/~jwross/pubs/RossShantharamTomlinson-BetterCarbon-ISSST2010.pdf)^[1]

(b) a previously undiscussed section on clustering from the course textbook (Poole, D. & Mackworth, A., Artificial Intelligence: Foundations of Computational Agents. Cambridge University Press). (The exercise could be easily adapted to other treatments of clustering, and even adapted to other forms of unsupervised learning).

There was no greater specificity given on what the exercise would cover, other than the relationships between these two readings. In addition students were given a hardcopy of the Ross, et al paper, and told that they could use this during the exercise, and that they would be turning in their marked up hardcopy of the paper with their answers to the exercise, along with any notes that they prepared before walking into class. They were told that the quality of notes and mark up could not hurt their exercise score, and might in fact help it.

This exercise is open notes, open book, and open internet. Also, refer to your (marked up) copy of “Collaborative Filtering and Carbon Footprint Calculation” by Ross, Shantharam, and Tomlinson.

1. Consider Section 4 of the Ross, et all paper (pp. 2-4), which overviews the "Better Carbon Calculator”. Consider Section 11.1 on Clustering from the Poole and Mackworth textbook.

(a) Write a short paragraph that Poole and Mackworth could add to Example 11.1 as a third instance of clustering, in this case to relate to better carbon calculation. It’s ok to cut and paste and adapt text from the Ross et all paper – just quote text from that paper that you use if that is what you do (but significant paraphrasing requires no further acknowledgement in this context – your use of the ideas in the Ross et al paper in this exercise is taken as a given).

Instructor comments: A fully correct answer would be ‘A better carbon calculator groups users into clusters based on similar responses to geographical and carbon-lifestyle questions, to better predict unknown choices be subsequent users (or expanding this latter point -- a user who does not answer all questions can be placed in the best-fitting cluster based on known responses, with cluster average values predicted as the user’s unknown responses).

This question is intended to assess student ability to understand and express the gist of the relationship between an application (carbon calculator) and a process abstraction (clustering) that can implement the application. An ability to recognize such associations is critical for lifelong learning and professional development.

Ideally, an answer made unambiguous reference to (i) known variables that represented the basis of ascertaining cluster membership, and (ii) cluster membership as a basis of predicting values for unobserved variables (unknown user responses).

(b) Figure 2 of Ross et al shows 10 of the variables that are used in user profiles by the Better Carbon Calculator. Note that two of these variables are NOT continuous (i.e., Vehicle fuel type, Vehicle size). However, assume for the moment that all variables used by the Better Carbon Calculator are continuous. Briefly describe what (else) you would have to change about the Better Carbon Calculator (as described), in order to use the K-means algorithm AS IS (i.e., with no changes from the algorithm described in the text) to organize user profiles. You may want to answer (c) and (d) before finishing your answer to this.

Instructor comments: (b) and (e) ask students to more deeply consider/imagine/understand the association between application (Better Carbon Calculator) and process abstraction (clustering). Each of (b) and (e) ask the student to consider this association from a different angle. For purposes of ABET rubric, (b) and (e) are assessed together. A fully correct answer would list at least three points beyond recognizing that (i) BCC would be restricted to continuous variables, and (ii) different similarity measures are used. Additional points can be drawn from (iii) the Better Carbon Calculator (BBC) variables would have to be scaled similarly (iv) limit the BCC to consider a fixed and known number of clusters and/or extend k-means to handle indeterminate number of clusters (v) a recognition that BCC allows data to continually arrive, and this would have to be restricted or somehow managed in K-Means as described were to be used (which assumes the data is available en masse)

but other points are relevant too (distinction between hard and soft clustering).

The additional points (beyond the different similarity functions) follow a discounting function.

(c) How is “similarity” assessed by the Better Carbon Calculator (you just need to give the name of the similarity function, but if you looked it up on the Web, for example, and want to give a more detailed definition then do so, and cite the source).

Instructor comments: At 0.5 pt each, (a) and (d) assess an ability/motivation to look up facts, and are supporting questions for (b) and (e).

Cosine similarity (0.5 pt)

(d) This question is ‘the inverse’ of (b). How would you have to change/adapt the k-means algorithm described in the text to organize the user profiles of the Better Carbon Calculator, with no changes (as best you can tell) to the Better Carbon Calculator as it is described in the paper. I would be happy if you simply stated some issues that would have to be addressed in doing this adaptation? But if you have thoughts of how the issues would be resolved, then by all means elaborate them.

Instructor comments: See comments for (c)

(e) In a short paper like the Ross et al, it is often difficult to elaborate on all the things that a reader would require for full understanding. List terms, concepts, findings, etc that were mentioned/addressed in the paper that you feel you would have to look elsewhere for a fuller understanding or to otherwise get the answers (e.g., from the authors themselves). Your exercise grade will not be affected by your answer to the following: Did you actually pursue any such clarifications already? Tell me about it if so.

Instructor comments: (e) and (f) are intended to assess the ability of students to critically assess material that they read, and to highlight that uncertainty in their understanding of this material is an invitation to follow up. (e) and (f) are counted together, with credit given for points raised in (f) and/or (g) (using a discounted scale of 1.5, 1.0, 0.5, 0.5) that seemed a legitimate issue to the instructor. These points could be drawn from the following:

(i) inadequate description of experimental design, (ii) variance measures not reported for experimental results, (iii) implementation details of the BCC (it is the lack of great detail which makes the paper ideal in some ways for this assessment, allowing students to imagine a possible implementation), but other points are possible.

In the case where parts (e)+(f) fell short of the 3.5 pts possible discussed so far, I examined the marked up hardcopy to see if there was commentary there that would justify additional points. Up to 0.5 point is given if student indicates that they went beyond the source documents given one week in advance of the exercise. This is indicative of “recognition” for need for life-long learning

(f) Are there any of issues that give you pause in believing that the Better Carbon Calculator would be advantageous relative to other calculators, or at all. An issue may be something you identified in (f) as something that you needed more clarification about, or it could be something that the paper itself identified, or something else.

Instructor comments: See comments for (e)

Sources

↑ Ross, J., Shantharam, N., and Tomlinson, B. (2010) "Collaborative Filtering and Carbon Footprint Calculation". In: Proceedings of the 2010 International Symposium on Sustainable Systems and Technology (ISSST). Washington D.C.: IEEE (retrieved from http://www.ics.uci.edu/~jwross/pubs/RossShantharamTomlinson-BetterCarbon-ISSST2010.pdf.

[Ross10-1] Ross, J., Shantharam, N., and Tomlinson, B. (2010) "Collaborative Filtering and Carbon Footprint Calculation". In: Proceedings of the 2010 International Symposium on Sustainable Systems and Technology (ISSST). Washington D.C.: IEEE (retrieved from http://www.ics.uci.edu/~jwross/pubs/RossShantharamTomlinson-BetterCarbon-ISSST2010.pdf.

[1]