Databases present unique challenges at the intersection of information access, data extraction, compilations, and copyright law, particularly magnified by the emergence of big data and advancements in artificial intelligence.
The 1976 Copyright Act lacks specific definitions for "data" or "databases," leading to ambiguity in their legal treatment.
Compilations: Defined in Section 101 of the Copyright Act as works formed by the collection and assembling of pre-existing materials or data that are selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes an original work of authorship. Originality necessitates creativity in the selection, coordination, or arrangement of data.
Feist Publication, Inc. v. Rural Telephone Service Co. (1991): This landmark case established that the "sweat of the brow" doctrine does not confer copyright protection; originality requires creativity and not merely effort or labor.
The scope and nature of choices made in the arrangement of elements are critical factors in determining a database's originality and, consequently, its copyright protection.
Although the originality rule seems straightforward, its application is often ambiguous and contentious in practice.
Scenario: A client commissions a painting, titled "The Next Rembrandt," created using artificial intelligence (AI) by analyzing and replicating elements from Rembrandt's existing works. Assume all original Rembrandt paintings are still protected by copyright.
Question 1: Does the AI-generated painting qualify as an original work of art?
If the AI-generated work does not copy protected elements from the original Rembrandt paintings and exhibits a modicum of creativity in its composition, it may be deemed an original expression.
A thorough investigation is essential to ascertain that the AI-generated painting is not substantially similar to any specific copyrighted Rembrandt works.
Question 2: Does the AI-generated painting meet the criteria of a compilation, considering it was created using an AI program trained on a database of all identified Rembrandt paintings?
If the AI-generated work involves arranging or combining elements from authorized Rembrandt paintings, it could be classified as a compilation.
Copyright protection for such a work would be limited to the unauthorized reproductions of the work as a whole, extending only to the selection and arrangement of pre-existing works.
Factual data within databases, if consisting purely of facts and not pre-existing copyrighted works, generally does not qualify for copyright protection.
Feist underscores a tension between the unprotected status of facts within a database and the potential copyright protection afforded to the compilation itself.
Addressing and resolving this distinction presents ongoing challenges in copyright law.
Section 101: Focuses on the protectability of compilations, emphasizing that the selection, arrangement, and coordination of data must qualify as original to warrant copyright protection.
Section 103: Clarifies that an author of a compilation can only protect their original contributions to the database.
This framework results in a "thin copyright," wherein the use of unprotected data cannot be legally restricted.
Compilations typically incorporate pre-existing materials; incorporating copyright-protected works into a database does not grant additional rights over those pre-existing materials.
Many AI-generated databases rely on others' works for machine learning purposes, potentially leading to copyright infringement.
Further consideration of the database created to train the AI system for producing "The Next Rembrandt."
Question: Does the database itself possess originality?
If the database creation involved no choices in terms of content inclusion, as the intention was to create a painting in the style of Rembrandt, the resulting work may not be genuinely new.
Taxonomies are significant; absent choices in what should be included, the database may not be eligible for even minimal copyright protection.
Databases play a crucial role in big data applications, such as those used for targeted advertising. The copyrightable nature of these compilations becomes particularly complex when they consist primarily of factual data.
Section 101's definition of compilations raises intricate questions about originality, especially when the compilation comprises facts.
Experian's Case: Experian compiled personal data (e.g., name, address, purchasing history) for use in marketing campaigns.
Value as a Criterion: The Court clarified that value is not the determinant for copyright protection; originality is paramount.
Feist Standard: The standard emphasizes creativity in the selection, arrangement, and presentation of data.
Data Compilation: Experian compiled data from multiple, reliable sources, verified data sources, checked for inconsistencies.
The court found Experian's Compiled Consumer View Database (CVD) protectable.
Data Removal: Removed data for individuals deemed not valuable for marketing (e.g., those in prison or the very elderly).
The court emphasized the need to select actual data and not accept all data presented.
Number of sources used and determination of which data to include for specific purposes were critical.
Cited the database of business interests for the Chinese community in Key Publications as a favorable example.
Experian made choices in data selection rather than using all data from a single source.
Rejection of "Sweat of the Brow": The "sweat of the brow" doctrine—industrious collection—was rejected as a basis for originality in Feist.
Culling Process: The question arises whether Experian's culling process simply represents another version of the rejected "sweat of the brow" doctrine.
Creative Choices: It is questionable whether Experian's choices regarding inclusion in a consumer marketing database constitute true creativity, particularly considering the low level of choice involved (removing erroneous names, inmates, and the elderly).
Treatment of Culling: A critical question is whether the treatment of the culling process in Experian aligns with McLean Hunter.
Threshold of Originality: The fundamental question is whether any level of choice can elevate a database to the threshold of originality.
Question 1: If the database used to create the painting were composed of all known Rembrandt paintings, would it qualify for copyright protection under Experian's reasoning?
The key consideration is whether there is sufficient selectivity to demonstrate creativity.
Question 2: If the painting itself is considered a compilation, would it meet the originality standard for compilations?
Database Creation: The creation of the database might qualify for originality if choices were made in determining which works are authentic Rembrandts. This includes whether to include or exclude works attributed to, but not confirmed as, authentic.
The portrait should be protectable as a compilation because choices were made to create a work not substantially similar to pre-existing works.
Since it doesn't look like a single painting, the AI database likely took elements from various sources.
Whether it ultimately qualifies as copyright protectable due to its AI authorship is a separate question.
Creating a new AI-generated Fred Flintstone cartoon raises serious issues because the cartoons forming the database are copyright protected.
Using copyrighted works without permission to create a database for an AI program can violate the copyright holder's rights.
If the database violates copyright in the pre-existing works, the new Fred Flintstone cartoon also violates those rights.
Taxonomies or classification schemes aren't automatically excluded from eligibility as a system or process under section 102(b).
Explore when taxonomies are protectable and the impact of that protection on the ability to use the components used to create the taxonomy.
If a compilation is created through the exercise of opinion or judgment in selecting items, it's likely original.
If different people would create the same database, it shouldn't be protectable.
Why isn't creating a telephone directory of all residents in a given location an original taxonomy?
In American Dental Association, the court recognized that classification systems can be creative and subject to protection.
The creativity of the taxonomy marks the expression even after the fundamental scheme has been devised.
Even short descriptions and classification numbers qualify as original works of authorship.
In WHATEVER IT TAKES Transmissions, Inc. v. Gordon, the court addressed protection for a parts numbering system.
The court cited American Dental's language but held that original and creative ideas are not copyrightable.
How can an idea be creative and not original?
Can this rationale be reconciled with Feist and McLean Hunter?
To support its determination, the court in WHATEVER IT TAKES emphasized the randomness of the numbering system.
Consider the database created for the AI-generated work.
Is there anything original in the choice of what to include?
Are you limited by the fact that if you choose to do a work about Rembrandt, you necessarily will include all works identified as Rembrandt's?
Concluding thoughts about compilations, collective works, and the exclusion of ideas, facts, processes, and systems under Section 102.
A collective work includes periodical issues, anthologies, and similar works in which a number of contributions are assembled into a collective whole.
Collective works are defined as compilations under Section 101, so the same tests apply for originality.
Both are protected only to the extent that their selection, arrangement, or coordination is creative.
Facts are not protectable, but copyrighted works maintain individual creativity.
Just because a work is part of a database doesn't make it freely available for use.
Owners of works lawfully included in a collective work maintain their right to protect their own rights in the work.
The editor of a collective work can only claim originality in selection, arrangement, and presentation.
McLean Hunter is surprising due to the conclusion that the value of a car is not a fact but a constructed value.
Rejection of copyright protection for the Bikram yoga sequence in Bikram's Yoga College of India, L.P. v. Evolation Yoga, LLC.
The court found Bikram Yoga to be an unprotectable healing art system.
The rejection of copyright for the sequence of exercises, whose order had been set by the creator, was surprising.
Aesthetic choices should have separated this yoga regime from others, but the court did not agree.
Shows what a slippery slope determining something as a process can be when protecting illustrations.
Statements insisting on the medical benefits of Bikram Yoga made it difficult to argue for copyright protection.
The book stated that Bikram's 26 exercises move fresh blood to 100% of the body, restoring systems to a healthy order.
The sequence was designed to scientifically warm and stretch muscles, ligaments, and tendons in a specific order.
Numerous other yoga sequences did not make the selection of the Bikram sequence protectable.
The court rejected any argument that the creator's intent to incorporate aesthetic elements should have impacted its decision.
The beauty of a process does not permit one who describes it to gain copyright to exclude others from practicing it.
The sequence is considered unprotectable as a process, primarily reflecting function rather than expression.
The Copyright Office issued a rule in 2012 stating that it would not register a system or process for exercise routines resulting in health improvements.
However, copyright protection would be available for photographs, depictions, and other illustrations of these routines.
Feist recognized that compilations could be protected even if composed of unprotected materials (facts and ideas), as long as they showed creativity in selection, arrangement, and coordination.
There has been a push to apply compilation analysis to a wider variety of works.
Such analysis can provide a basis for seeking copyright protection for works composed of wholly unprotected elements.
However, applying this to non-fact-based works could reduce the potential scope of protection.
No matter how much work you have made in a compilation, you still get a thin copyright.
When determining protectable expression of a photo, courts sometimes use an analysis that comes close to that used for compilations.
The question the court was trying to decide is how much similarity do the two photo share.
Determine copyright protection for compilations, start with the idea expression. What idea was the photographer trying to convey.
According to the court, there were three possible ideas. One, a businessman contemplating suicide by jumping from a building. Two, a businessman contemplating suicide by jumping from a building seen from the vantage point of the businessman with his shoes set against the street far below. Or three. Something even more general, such as a sense of desperation produced by urban professional life.
The idea that is being expression will have a strong impact on the extent to which the similarities between the two photos are seen to result from the common idea (one that would not be copyright protectable).
This is an example of the difficulty of the idea-expression compilation analysis