• Computer Database
  • Database Management
  • Computer Science

A Comparison of NoSQL and Relational Database Management Systems (RDBMS)

  • December 2020

Musa Garba at Kaduna State University

  • Kaduna State University

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Ogirima SAO
  • Oluyinka T. Adedeji

Elizabeth Amusan

  • Hiwa Ali Faraj

Ramon Lawrence

  • Terrence Mason

Indrakshi Ray

  • Neal Leavitt
  • Opeyemi Michael
  • INT J ENG SCI
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

relational database research papers

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Relational Database

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Last »
  • Clouds Follow Following
  • Free Software Follow Following
  • Distributed Systems Follow Following
  • Cloud Follow Following
  • Database Systems Follow Following
  • Database Management Systems Follow Following
  • Nomophobia Follow Following
  • Risk management and control of ERP projects Follow Following
  • Cyber Bullying Follow Following
  • Humanities Computing (Digital Humanities) Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Journals
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

A relational model of data for large shared data banks

relational database research papers

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options.

  • BELYAEV O KHOMCHENKOVA I SINITSYNA J DYACHKOV V BYZOVA A BADEEV A ALEKSEEV D MAKAROV Y (2024) V. I. ABAEV'S HISTORICAL-ETYMOLOGICAL DICTIONARY: ISSUES IN THE DEVELOPMENT OF A DIGITAL BILINGUAL EDITION Lomonosov Journal of Philology 10.55959/MSU0130-0075-9-2024-47-02-4 (75-86) Online publication date: 16-Jun-2024 https://doi.org/10.55959/MSU0130-0075-9-2024-47-02-4
  • Dhatterwal J Kaswan K Saxena S Panwar A (2024) Big Data for Health Data Analytics and Decision Support Computational Convergence and Interoperability in Electronic Health Records (EHR) 10.4018/979-8-3693-3989-3.ch006 (93-116) Online publication date: 9-Aug-2024 https://doi.org/10.4018/979-8-3693-3989-3.ch006
  • Tracz P Plechawska-Wójcik M (2024) Comparative analysis of the performance of selected database management system Journal of Computer Sciences Institute 10.35784/jcsi.5927 31 (89-96) Online publication date: 30-Jun-2024 https://doi.org/10.35784/jcsi.5927
  • Show More Cited By

Recommendations

Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at ...

Information

Published in.

cover image Communications of the ACM

IBM Scientific Center, Houston, TX

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • composition
  • consistency
  • data integrity
  • data organization
  • data structure
  • derivability
  • hierarchies of data
  • networks of data
  • predicate calculus
  • retrieval language

Contributors

Other metrics, bibliometrics, article metrics.

  • 5,740 Total Citations View Citations
  • 52,337 Total Downloads
  • Downloads (Last 12 months) 8,990
  • Downloads (Last 6 weeks) 977
  • Bergami G Fox O Morgan G (2024) Matching and Rewriting Rules in Object-Oriented Databases Mathematics 10.3390/math12172677 12 :17 (2677) Online publication date: 28-Aug-2024 https://doi.org/10.3390/math12172677
  • Grigg I (2024) Triple Entry Accounting Journal of Risk and Financial Management 10.3390/jrfm17020076 17 :2 (76) Online publication date: 14-Feb-2024 https://doi.org/10.3390/jrfm17020076
  • Videsott P Robecchi M Schaber J (2024) (Re)cartographier la Galloromania médiévale : enjeux et perspectives quarante ans après l’ Atlas de Dees Zeitschrift für romanische Philologie 10.1515/zrp-2023-0041 139 :4 (1003-1047) Online publication date: 17-Jan-2024 https://doi.org/10.1515/zrp-2023-0041
  • Fejza A Genevès P Layaïda N (2024) Efficient Enumeration of Recursive Plans in Transformation-Based Query Optimizers Proceedings of the VLDB Endowment 10.14778/3681954.3681986 17 :11 (3095-3108) Online publication date: 1-Jul-2024 https://dl.acm.org/doi/10.14778/3681954.3681986
  • Kakaraparthy A Patel J (2024) SplitDF: Splitting Dataframes for Memory-Efficient Data Analysis Proceedings of the VLDB Endowment 10.14778/3665844.3665849 17 :9 (2175-2184) Online publication date: 1-May-2024 https://dl.acm.org/doi/10.14778/3665844.3665849
  • Song J Dou W Gao Y Cui Z Zheng Y Wang D Wang W Wei J Huang T (2024) Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database Construction Proceedings of the VLDB Endowment 10.14778/3659437.3659445 17 :8 (1884-1897) Online publication date: 1-Apr-2024 https://dl.acm.org/doi/10.14778/3659437.3659445
  • Huang Z (2024) Disambiguate Entity Matching using Large Language Models through Relation Discovery Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI 10.1145/3665601.3669844 (36-39) Online publication date: 9-Jun-2024 https://dl.acm.org/doi/10.1145/3665601.3669844

View options

View or Download as a PDF file.

View online with eReader .

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Two-Bit History

Computing through the ages

relational database research papers

Important Papers: Codd and the Relational Model

29 Dec 2017

It’s hard to believe today, but the relational database was once the cool new kid on the block. In 2017, the relational model competes with all sorts of cutting-edge NoSQL technologies that make relational database systems seem old-fashioned and boring. Yet, 50 years ago, none of the dominant database systems were relational. Nobody had thought to structure their data that way. When the relational model did come along, it was a radical new idea that revolutionized the database world and spawned a multi-billion dollar industry.

The relational model was introduced in 1970. Edgar F. Codd, a researcher at IBM, published a paper called “A Relational Model of Data for Large Shared Data Banks.” The paper was a rewrite of a paper he had circulated internally at IBM a year earlier. The paper is unassuming; Codd does not announce in his abstract that he has discovered a brilliant new approach to storing data. He only claims to have employed a novel tool (the mathematical notion of a “relation”) to address some of the inadequacies of the prevailing database models.

In 1970, there were two schools of thought about how to structure a database: the hierarchical model and the network model. The hierarchical model was used by IBM’s Information Management System (IMS), the dominant database system at the time. The network model had been specified by a standards committee called CODASYL (which also—random tidbit—specified COBOL) and implemented by several other database system vendors. The two models were not really that different; both could be called “navigational” models. They persisted tree or graph data structures to disk using pointers to preserve the links between the data. Retrieving a record stored toward the bottom of the tree would involve first navigating through all of its ancestor records. These databases were fast (IMS is still used by many financial institutions partly for this reason, see this excellent blog post ) but inflexible. Woe unto those database administrators who suddenly found themselves needing to query records from the bottom of the tree without having an obvious place to start at the top.

Codd saw this inflexibility as a symptom of a larger problem. Programs using a hierarchical or network database had to know about how the stored data was structured. Programs had to know this because they were responsible for navigating down this structure to find the information they needed. This was so true that when Charles Bachman, a major pioneer of the network model, received a Turing Award for his work in 1973, he gave a speech titled “ The Programmer as Navigator .” Of course, if programs were saddled with this responsibility, then they would immediately break if the structure of the database ever changed. In the introduction to his 1970 paper, Codd motivates the search for a better model by arguing that we need “data independence,” which he defines as “the independence of application programs and terminal activities from growth in data types and changes in data representation.” The relational model, he argues, “appears to be superior in several respects to the graph or network model presently in vogue,” partly because, among other benefits, the relational model “provides a means of describing data with its natural structure only.” By this he meant that programs could safely ignore any artificial structures (like trees) imposed upon the data for storage and retrieval purposes only.

To further illustrate the problem with the navigational models, Codd devotes the first section of his paper to an example data set involving machine parts and assembly projects. This dataset, he says, could be represented in existing systems in at least five different ways. Any program \(P\) that is developed assuming one of five structures will fail when run against at least three of the other structures. The program \(P\) could instead try to figure out ahead of time which of the structures it might be dealing with, but it would be difficult to do so in this specific case and practically impossible in the general case. So, as long as the program needs to know about how the data is structured, we cannot switch to an alternative structure without breaking the program. This is a real bummer because (and this is from the abstract) “changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information.”

Codd then introduces his relational model. This model would be refined and expanded in subsequent papers: In 1971, Codd wrote about ALPHA, a SQL-like query language he created; in another 1971 paper, he introduced the first three normal forms we know and love today; and in 1972, he further developed relational algebra and relational calculus, the mathematically rigorous underpinnings of the relational model. But Codd’s 1970 paper contains the kernel of the relational idea:

The term relation is used here in its accepted mathematical sense. Given sets \(S_1, S_i, ..., S_n\) (not necessarily distinct), \(R\) is a relation on these \(n\) sets if it is a set of \(n\)-tuples each of which has its first element from \(S_1\), its second element from \(S_2\), and so on. We shall refer to \(S_j\) as the \(j\)th domain of \(R\). As defined above, \(R\) is said to have degree \(n\). Relations of degree 1 are often called unary , degree 2 binary , degree 3 ternary , and degree \(n\) n-ary .

Today, we call a relation a table , and a domain an attribute or a column . The word “table” actually appears nowhere in the paper, though Codd’s visual representations of relations (which he calls “arrays”) do resemble tables. Codd defines several more terms, some of which we continue to use and others we have replaced. He explains primary and foreign keys, as well as what he calls the “active domain,” which is the set of all distinct values that actually appear in a given domain or column. He then spends some time distinguishing between a “simple” and a “nonsimple” domain. A simple domain contains “atomic” or “nondecomposable” values, like integers. A nonsimple domain has relations as elements. The example Codd gives here is that of an employee with a salary history. The salary history is not one salary but a collection of salaries each associated with a date. So a salary history cannot be represented by a single number or string.

It’s not obvious how one could store a nonsimple domain in a multi-dimensional array, AKA a table. The temptation might be to denote the nonsimple relationship using some kind of pointer, but then we would be repeating the mistakes of the navigational models. Instead. Codd introduces normalization, which at least in the 1970 paper involves nothing more than turning nonsimple domains into simple ones. This is done by expanding the child relation so that it includes the primary key of the parent. Each tuple of the child relation references its parent using simple domains, eliminating the need for a nonsimple domain in the parent. Normalization means no pointers, sidestepping all the problems they cause in the navigational models.

At this point, anyone reading Codd’s paper would have several questions, such as “Okay, how would I actually query such a system?” Codd mentions the possibility of creating a universal sublanguage for querying relational databases from other programs, but declines to define such a language in this particular paper. He does explain, in mathematical terms, many of the fundamental operations such a language would have to support, like joins, “projection” ( SELECT in SQL), and “restriction” ( WHERE ). The amazing thing about Codd’s 1970 paper is that, really, all the ideas are there—we’ve been writing SELECT statements and joins for almost half a century now.

Codd wraps up the paper by discussing ways in which a normalized relational database, on top of its other benefits, can reduce redundancy and improve consistency in data storage. Altogether, the paper is only 11 pages long and not that difficult of a read. I encourage you to look through it yourself. It would be another ten years before Codd’s ideas were properly implemented in a functioning system, but, when they finally were, those systems were so obviously better than previous systems that they took the world by storm.

If you enjoyed this post, more like it come out every four weeks! Follow @TwoBitHistory on Twitter or subscribe to the RSS feed to make sure you know when a new post is out.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Adv Radiat Oncol
  • v.5(6); Nov-Dec 2020

Artificial Intelligence Research: The Utility and Design of a Relational Database System

Although many researchers talk about a “patient database,” they typically are not referring to a database at all, but instead to a spreadsheet of curated facts about a cohort of patients. This article describes relational database systems and how they differ from spreadsheets. At their core, spreadsheets are only capable of describing one-to-one (1:1) relationships. However, this article demonstrates that clinical medical data encapsulate numerous one-to-many relationships. Consequently, spreadsheets are very inefficient relative to relational database systems, which gracefully manage such data. Databases provide other advantages, in that the data fields are “typed” (that is, they contain specific kinds of data). This prevents users from entering spurious data during data import. Because each record contains a “key,” it becomes impossible to add duplicate information (ie, add the same patient twice). Databases store data in very efficient ways, minimizing space and memory requirements on the host system. Likewise, databases can be queried or manipulated using a highly complex language called SQL. Consequently, it becomes trivial to cull large amounts of data from a vast number of data fields on very precise subsets of patients. Databases can be quite large (terabytes or more in size), yet still are highly efficient to query. Consequently, with the explosion of data available in electronic health records and other data sources, databases become increasingly important to contain or order these data. Ultimately, this will enable the clinical researcher to perform artificial intelligence analyses across vast amounts of clinical data in a way heretofore impossible. This article provides initial guidance in terms of creating a relational database system.

Introduction

This issue of Advances in Radiation Oncology presents a series of articles around applications of artificial intelligence (AI) in our field. One of the potential benefits of AI is that it can pore through large amounts of data to discover patterns not evident to clinicians. However, this vast volume of data cannot be accommodated within a single spreadsheet (which is how most clinical researchers work when conducting standard multivariable analyses). In fact, many researchers erroneously describe spreadsheets as databases. This article will demonstrate both what a relational database system is and how it is superior to a spreadsheet. It will also discuss considerations when implementing a relational database system (RDBS) for your own research purposes, using an actual lung cancer radiation therapy database as an example. I have also provided some excellent Wikipedia references that contain abundant additional information, beyond what can be encapsulated in a single article. (These, in turn, reference computer science literature for the very intrepid reader, but such references might extend beyond the level of understanding of all but the most technically inclined.)

One might question why a database system is necessary for AI research. This article will demonstrate that a database enables creation of a multidimensional structure to cleanly and accurately contain these data. To perform AI analysis requires efficient storage of hundreds or thousands of data points on a single patient or even on a single course of radiation therapy. There is a famous illustration of the “data science hierarchy of needs” ( Fig 1 ). To perform an AI analysis, one must first create an RDBS to serve as the storage mechanism. This creation of a system to store structured data entails a major part of the bottom row of the pyramid. To create a database, then, will set the reader down the path toward conducting AI research at their own institution.

An external file that holds a picture, illustration, etc.
Object name is gr1.jpg

The data science hierarchy of needs. Used with permission of Monica Rogati ( aipyramid.com ). For details, see text. Abbreviation : AI = artificial intelligence.

Origin of Relational Databases

The concept of a RDBS was first described in a seminal article in 1970. 1 The theoretic construct was that all data could be defined or represented as a series of relations with or to other data. The article was quantitative in that it used relational algebra and tuple relational calculus to prove its points. 2 IBM used this theoretic framework to design what became the initial SQL (pronounced “see-quell” or “ess-cue-ell”) language, which they used to manipulate and retrieve data from IBM’s original RDBS. 3 Since that time, the American National Standards Institute and the International Standards Organization have deemed SQL to be the standard language in relational database communication. 2 Today, there are a wide variety of commercial and open-source relational database systems available for use. These vary in their features and relative strengths or weaknesses, but, fundamentally, they all operate using the principles defined in the Codd article. 1 The SQL language is well defined and is used to write code to query (or update) the data within an RDBS.

Fundamental Disadvantage of Spreadsheets

Spreadsheets are designed to incorporate and analyze one-to-one (1:1) relationships ( Fig 2 a). Each patient has a single birth date and a single death date. However, medical records are rife with “one-to-many” relationships ( Fig 2 b). A patient might receive multiple different courses of radiation therapy treatment, as in the example provided, or might have multiple chemotherapy administrations. To accommodate these data, a spreadsheet quickly balloons in size ( Fig 2 c). Not only is this inefficient (duplication of data), but it also makes maintenance of the spreadsheet extremely cumbersome and prone to error. For instance, in this example, when patient “12345” passes away, the “DeathDate” needs to be updated in 5 rows of the spreadsheet (because she had 2 courses of radiation therapy and 4 cycles of chemotherapy). It is not difficult to imagine that a researcher could neglect to update the “DeathDate” in each place, introducing errors. To further expound upon the issue, imagine a patient who takes numerous medications or has variable numbers of comorbid illnesses; the rows required to encapsulate 1 patient explode. To use a data science term, the dimensionality of the data balloons. But, to reiterate the point, spreadsheets are only designed to encapsulate 1:1 relationships (2-dimensional data). But patient data are multidimensional.

An external file that holds a picture, illustration, etc.
Object name is gr2.jpg

(a) Spreadsheets are useful where there is a one-to-one correspondence of data. For instance, each unique medical record number (MRN) represents a single patient, with a single birth/death date and a single first and last name. (b) Spreadsheets “break down” when describing 1-to-many correspondences. In this example, 2 patients have a total of 5 courses of radiation therapy treatment between them. (c) To accommodate all the data in our simple example, a spreadsheet needs to store redundant data (colored in red). The data storage requirements quickly balloon. Furthermore, as additional traits and factors are added to the spreadsheet, it becomes impossible to follow, as one patient will require untold numbers of rows to capture all relevant data concepts. Stated another way, the data are multidimensional. Maintenance and updating of fields become error-prone (see text). Abbreviations : DOB = date of birth; Lname = last name; Fname = first name; LUL = upper lobe; MRN = medical record number; RLL, right lower lobe; RUL = right upper lobe; SBRT = stereotactic body radiotherapy.

Fundamental Advantage of Relational Databases

RDBS gracefully manage one-to-many relationships. They can do so because a database is created of numerous different tables, which are explicitly linked ( Fig 3 ). Every table must also contain a key, which is a unique, required identifier for each row of data. Relationships between the tables are defined when creating the database tables or fields. In the “demographics” table, medical record number, “MRN,” is the key. For the “TreatmentCourse” and “Chemotherapy” tables, the keys are “TreatmentCourse” and “ChemotherapyID,” respectively. Note that “TreatmentCourse 1” in the “TreatmentCourse” table pertains to breast radiation therapy treatment. This, in turn, is linked to 4 cycles of chemotherapy in the “Chemotherapy” table, each of which is uniquely identified in that table, in turn.

An external file that holds a picture, illustration, etc.
Object name is gr3.jpg

In a relational database, data are stored in multiple tables, which are joined via defined variables. In this fictitious example, note that the patient only has one “DeathDate” to update. Furthermore, each course of treatment (“TreatmentCourse”) can have multiple chemotherapy cycles associated with it. Note, too, that medical record number (MRN) only appears in 2 of the 3 tables (it is not needed in the “Chemotherapy” table). If the researcher wishes to retrieve the MRN, it can be obtained via a SQL query, linking back to one of the tables that contains it. Abbreviations : DOB = date of birth.

When comparing Fig 3 (a database) to Fig 2 c (a spreadsheet), note that Fig 3 contains the same information as Fig 2 c without the addition of repetitious information (colored red in Fig 2 c). Alternately, in a spreadsheet, the researcher could manually aggregate and summarize the chemotherapy delivered into a single cell in a single row of the spreadsheet (ie, “flatten” the data, to use a data science term), but then some data would be lost. Using the chemotherapy administrations as an example, if one were to “flatten” the data down to a single spreadsheet cell stating “4 cycles of Adriamycin/Cytoxan,” one loses the dates of administration. If one summarizes the data as “4 cycles of Adriamycin/Cytoxan: <date1>, <date2>, <date3>, <date4>,” the dates and the chemotherapy occupy the same cell and the data are retained but are no longer discrete; one loses the ability to filter the spreadsheet by chemotherapy kind or by date.

Conversely, a SQL database cleanly encapsulates these multidimensional data ( Fig 3 ). Each table is 2-dimensional in structure. But because it can contain multiple rows of data on 1 patient (chemotherapy administrations, to follow the same example), a multidimensional structure is created, as 4 chemotherapy cycles link to one of 2 courses of radiation therapy (“TreatmentCourse” table) for 1 patient (“Demographics” table). Now, imagine a clinical database with millions of rows of data spread across hundreds of tables, as in the real-life example described below. Clearly, a spreadsheet would not be adequate.

Additional Advantages of Relational Databases

  • 1. Each row of data in a table has a unique identifier (a key). Consequently, one cannot accidentally add a row of data into a database table twice.

An external file that holds a picture, illustration, etc.
Object name is gr4.jpg

Note that each field in this database table is specifically designed. It has a “type” (kind) and a “size” (length). When importing data from numerous external sources, these definitions can prevent erroneous imports (for details, see text). Note that the field “MRN” is the key for this table. All the data in this table refer back to “MRN” via a one-to-one relationship. MRN can be used as the key because no 2 patients have the same MRN.  Abbreviations : DOB = date of birth; MRN = medical record number.

  • 3. Not only must the data types correspond, but the data lengths must be observed. If the database design states that a field is a decimal with 3 places to the right of the decimal place, then a fourth decimal place would be truncated at import. Alternately, the database could declare an error, which might also imply that the field contains erroneous data.
  • 4. A key from one table can be linked “backward” to a key from another table (termed a “foreign key”). As an example ( Fig 3 ), the database can be designed such that the MRN from “TreatmentCourse” must refer to an MRN already contained within the “Demographics” table. If one tried to import data into “TreatmentCourse” and it used an MRN not listed in “Demographics,” the import would fail. Such a situation might imply, for instance, that the MRN was incorrect in the external data source (or in the database). Or perhaps it relates to a patient who received prostate radiation therapy (but you have a breast cancer database).
  • 5. Foreign key relationships also work in the opposite direction: If one realizes that a patient is represented in the database who should not be, one can delete the patient from the “Demographics” table and the database will delete all data about that patient from all the other data tables automatically.
  • 6. RDBS are specifically optimized to manage vast amounts of data. Large spreadsheets (containing thousands of rows and hundreds of columns) are extremely slow and memory intensive. However, one can query across or manipulate many gigabytes of data in fractions of a second in many RDBS, as the data stores are highly optimized and efficient from both a computational and memory utilization perspective.
  • 7. RDBS are much more secure than spreadsheets. An institution’s IT team might allow one to access some tables within an institution’s data warehouse, but not others. One’s access could be restricted to defined subsets of patients. One might have “read” access to these data, but not “write” access (or “write” access to only some subset of fields). Databases might likewise be set up such that only users from specific IP addresses or computers may access them. The login systems set up by IT departments for these purposes typically use state of the art encryption algorithms, 2-factor authentication, and the like. In contrast, an Excel spreadsheet can be “locked” such that only some fields are editable. But it is not possible to restrict data access by user. Furthermore, this restriction is to “write” access only, not to “read” access. It is true that one can “hide” columns in a spreadsheet and then lock it, to prevent a given user from viewing them, but the spreadsheet maintainer must do this manually before distributing the spreadsheet (time-consuming and prone to error).

Benefits of SQL

As described earlier, SQL is a defined, standardized language for composing queries within an RDBS, or to manipulate and update these data. Some database systems provide “extensions” to the SQL standard, to provide some additional and specific functionality (details available in the vendors’ literature). It is beyond the scope of this article to teach SQL coding. However, many excellent online resources are available for the interested reader. Functionally, SQL allows one to search for any number of variables or combinations of variables across any number of tables, simultaneously. This can be extremely powerful and useful, both for retrieving and for manipulating and updating data. Queries can be saved for reuse or modification later. As stated above, these queries typically produce output in fractions of a second, even across vast pools of data.

Our institution has a database of patients who have received radiation therapy to the lung, whether for primary cancer or metastatic disease to the lung. 4 The database and some of its details of implementation are described below, but first, some “real-life” examples of what such an RDBS system can do (not possible when using a spreadsheet):

  • • Real-life example 1: Find patients who might be candidates for a certain lung cancer clinical trial. For this particular study, they must have previously received lung SBRT, have nonmetastatic disease, no evidence of recurrence, be alive (obviously), be at least 2 years out from the end of the prior SBRT treatment, and must have been seen in follow-up within the past 1.5 years. By constructing an appropriate SQL query, 135 patients were found (out of more than 4600 in the database) to pass along to the PI for closer inspection.
  • • Real-life example 2: It takes only a few minutes to set up very complex queries. If one has a basic facility with SQL, one can design a query such as: “Find all patients with stage II or III lung cancer treated with concurrent chemoradiotherapy who developed neutropenia during treatment, who are female, 70 years of age or older, and who take any antihypertensive medication (defined in a certain list).” Ultimately, such queries are only limited by one’s imagination (and the richness or completeness of the data coming from the source systems).

It is true that one can “filter” data in Excel to rapidly find subsets. However, this filtering is limited to “true or false” matching. In this example, it would be impossible to discover the patients who developed neutropenia while undergoing radiation therapy unless one had a “neutropenia (yes or no)” column. But one cannot perform the arithmetic “where date of neutropenia > RadiationStartDate and < RadiationEndDate” to filter the data without writing code in Visual Basic, which is likely beyond the ability of most.

Disadvantage of SQL

With SQL, it is possible to create highly complex queries; it is a rich and powerful language. However, these can be quite complicated and obtuse to a nontechnical person. Some systems do provide graphical tools to help build SQL queries, but, even so, there are some users for whom all but the simplest SQL queries will be beyond their technical skills.

Database Implementation

Databases may, and often do, contain many thousands of tables and millions of rows of data. (In other words, they can contain data far in excess of the requirements of any one radiation oncologist or even any one radiation oncology department). In fact, some systems allow even single tables to contain terabytes or even petabytes of data. 5 Consequently, there are numerous systems available to accommodate any researcher’s needs. Some of the very best are open source (free). Software is available across a wide variety of operating systems. Wikipedia provides an outstanding overview of the topic. 5

To implement a database system, it is first necessary to have a discussion with the IT Department at your institution. There is no single solution for creating a data repository that holds true for researchers across all institutions. The solution can vary, depending on the resources at your institution and the level of access the IT Department has into the underlying patient data source systems (often defined in the institutional contracts signed with the individual vendors). Some large centers have elaborate data warehouse systems. Smaller centers might provide access to data from individual source systems but might not have compiled them into a single data warehouse. Some IT departments might have adequate resources to provide output data from their data warehouse to individual researchers, when needed. Others might not. Some might provide a dedicated research server on which the researcher can construct a database. Other researchers might need to rely instead upon existing servers within their department. I do not recommend that one set up a database system on a free-standing laptop or desktop machine, as there are Health Insurance Portability and Accountability Act concerns (the computer could be stolen). Data should be backed up across a secured network electronically.

Creation of a Lung Cancer Radiation Therapy Database

I began my own database several years ago. My need grew out of a sense of frustration regarding lack of access to clinical data. At the time, at my institution, it was a difficult (and somewhat mysterious) process to procure data from the data warehouse. However, data from Mosaiq (Elekta AB, Stockholm, Sweden), which is our department’s record or verify system, were available. These data formed the nucleus of the original database. Basic demographic information and radiation therapy prescriptions, dates of treatment, dose delivered, tumor stage, and the like, were exported, using custom software. Research IT provided a Linux server, on which I implemented the database. I chose to use MariaDB (MariaDB Foundation, DE), as it is a powerful, well-regarded, commercially supported, free, open-source database system whose SQL functions are congruent with those of Oracle (which is a database system I had used previously). Because my institution could not support an implementation of an Oracle, MariaDB was an excellent alternate option. MariaDB does include a Windows GUI database administration tool for administering its databases (creating tables, writing SQL code, importing or exporting data, and the like). I had previously used a similar commercial database administration product called Navicat (Premium Cybertech Ltd, Hong Kong), which provides similar functionality, so I elected to purchase that. Similarly, I imported all data I had captured in spreadsheets for previous research projects. More recently, I have gained access to data from our data warehouse and so have created numerous additional tables to store the information. At present, the database contains more than 3 million rows of data on approximately 4800 patients, spread across more than 170 tables.

Incorporation of Data from Other Institutional Data Systems

To import data from outside source systems requires a multistep process, referred to as “ETL” (“Extraction, Transformation, and Loading”) in the data science literature.” 6 The issues go far beyond the physical importation of data into the database; importing spreadsheets of data are a trivial task. There are numerous issues in ETL, which are critical to consider when designing a database and importing data into it. Furthermore, many of these issues are not inherently obvious. In fact, a large proportion of the time required to create a database and fill it with clinically useable data derives from the ETL involved. The oft-reproduced “Data Science Hierarchy of Needs” illustrates this fact ( Fig 1 ). Most of the discussion in this article addressed aspects of the bottom-most layer of the pyramid. ETL comprises the majority of the next 2 layers of the pyramid and is the topic of another article.

Research data are not available at this time.

Sources of support: none.

Disclosures: Dr Dilling reports personal fees and nonfinancial support from NCCN, personal fees from Varian, personal fees and nonfinancial support from Harborside Press, nonfinancial support from Astra Zeneca, all outside the submitted work.

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • DOI: 10.1117/12.2681360
  • Corpus ID: 258884495

Research on distributed relational database based on MySQL

  • Lanying Shi , Kai Wang , +3 authors Yiquan Jiang
  • Published in Other Conferences 23 May 2023
  • Computer Science, Engineering

One Citation

A secure and optimization based clustering for vertical and horizontal fragmentation in distributed database management system, 9 references, data access security in cloud computing: a review, an end-to-end intelligent monitoring system based on pinpoint, a solution on cloud and digital transformation for it system using devops yundao platform, privacy-preserving association rule mining algorithm for encrypted data in cloud computing, research on query optimization algorithm in distributed database, related papers.

Showing 1 through 3 of 0 Related Papers

IMAGES

  1. Introduction to relational database management system pdf

    relational database research papers

  2. (PDF) Summarizing Relational Databases

    relational database research papers

  3. Relational Database Research Paper

    relational database research papers

  4. Relational and non relational databases

    relational database research papers

  5. Relational Database

    relational database research papers

  6. Query Optimization(Research Paper)

    relational database research papers

VIDEO

  1. what is the Relational database management

  2. An introduction to relational database theory part1

  3. What is Relational database Management System? #class10 #it402 #database

  4. Relational Agents Group

  5. Entity-Relationship Modelling and the Relational Model (IMDb Part 1)

  6. Structure of Relational Databases

COMMENTS

  1. Relational data paradigms: What do we learn by taking the materiality

    Thus, despite the relational database's continued dominance in many contexts, modern databases' specific material forms can vary dramatically. For instance, though all relational databases organize data into sets of interlinked tables, the specific file types and querying languages vary depending on the software platform being used.

  2. Database management system performance comparisons: A systematic

    Several database management system performance comparisons have been conducted and published as both vendor white-papers as well as in scientific fora. The approaches and reporting in such studies have been criticized in previous literature. In this study, we systematically surveyed 117 DBMS performance comparison studies.

  3. (PDF) Design and Analysis of a Relational Database for Behavioral

    Paper — Design and Analysis of a Relational Database for Behavioral Experiments Data Processing Fig. 5. Comparison of time needed to iterate over all records (in seconds) for 10 and 80 mi l-

  4. Recommendations for Evolving Relational Databases

    Setting the Context. Before getting into the meta-model and approach explanations, let us set the context in which the approach is designed. Database Schema: The concept of database schema commonly refers to the way data are organized in a database (through tables and referential integrity constraints for relational databases). However, RDBMSs also allows one to define behavior inside the ...

  5. PDF Architecture of a Database System

    Architecture of a Database System

  6. Relational and NoSQL Databases: The Appropriate Database Model Choice

    Abstract: For over four decades, Relational database management systems RDBMS have been the primary model for data storage, retrieval and management. However, due to the continuous information growth in current organizations and the increasing needs for scalability and performance, specially while handling a very huge amount of data that generated by various new generation real time ...

  7. PDF Relational Deep Learning: Graph Representation Learning on Relational

    that make much better use of the rich predictive signal in relational data. This paper lays the ground for future work by making the following main sections: • Blueprint. Relational Deep Learning, an end-to-end learnable approach that ultilizes the predictive signals available in relational data, and supports temporal predictions ...

  8. The Role Concept for Relational Database Management Systems

    In this paper we outline research towards a role-concept-enabled relational database system. We describe a definition of this concept based on existing results and discuss open research questions ...

  9. Scalable linear algebra on a relational database system

    Proceedings of the IEEE 33rd International Conference on Data Engineering, 2017, 523-534. algebra and distributed relational algebra, the foundation of modern database systems, meaning that it is easy to use a database for scalable linear algebra. Relational database sys- tems are highly performant, reaping the benefits of decades of research ...

  10. PDF SchemaDB: Structures in Relational Datasets

    5 CONCLUSION. Database schema data sets are needed for various ML applications, including to automate and scale the synthesis of databases for use in cyber deception. SchemaDB is intended to enable such research, as well as to provide a standardised example for other potential data set providers.

  11. (PDF) A Comparison of NoSQL and Relational Database ...

    This research compares the performance of relational and non-relational databases namely Oracle, and MongoDB by executing complex queries on a large set of data that is available in document-based ...

  12. PDF Relational Cloud: A Database-as-a-Service for the Cloud

    This paper introduces a new transactional "database-as-a-service" (DBaaS) called Relational Cloud. A DBaaS promises to move much of the operational burden of provisioning, configuration, scal- ing, performance tuning, backup, privacy, and access control from the database users to the service operator, offering lower overall costs to users.

  13. Relational Database Management System Research Papers

    The mapping between the ontology and the mathematics of a relational database is the assignment of a category in the ontology to each of the structures described by the mathematics. Assigned these categories, these structures become the tables, rows, columns, domains, primary keys, foreign keys and non-keys of a relational database.

  14. PDF Spark SQL: Relational Data Processing in Spark

    Spark SQL: Relational Data Processing in Spark

  15. Relational Database Research Papers

    In this paper, the performance evaluation of MySQL and MongoDB is performed where MySQL is an example of relational database and MongoDB is an example of non relational databases. A relational database is a data structure that allows you to connect information from different 'tables', or different types of data buckets. Non-relational database ...

  16. A relational model of data for large shared data banks

    In Section 1, inadequacies of these models are discussed. A model based on n -ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section 2, certain operations on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency ...

  17. Important Papers: Codd and the Relational Model

    The relational model was introduced in 1970. Edgar F. Codd, a researcher at IBM, published a paper called "A Relational Model of Data for Large Shared Data Banks.". The paper was a rewrite of a paper he had circulated internally at IBM a year earlier. The paper is unassuming; Codd does not announce in his abstract that he has discovered a ...

  18. Artificial Intelligence Research: The Utility and Design of a

    Origin of Relational Databases. The concept of a RDBS was first described in a seminal article in 1970. 1 The theoretic construct was that all data could be defined or represented as a series of relations with or to other data. The article was quantitative in that it used relational algebra and tuple relational calculus to prove its points. 2 IBM used this theoretic framework to design what ...

  19. The Basics of Relational Databases Using MySQL

    Going beyond a simple database table, a relational database fits more complicated systems by relating information from two or more database tables. This paper will use MySQL to develop a basic appreciation of relational databases including user administration, database design, and SQL syntax. It will lead the reader in downloading and ...

  20. Seminal Papers in Data Science: A Relational Model for Large Shared

    Photo by Franki Chamaki on Unsplash. Even with the rising popularity of NoSQL, most companies are still using some form of SQL-based relational database management system.While SQL (then called SEQUEL) was first introduced by IBM's Donald D. Chamberlain and Raymond F. Boyce in 1974, their work built on the ideas of Edgar F. Codd. Codd was another IBM computer scientist who proposed a ...

  21. PDF A Relational Model of Data for Large Shared Data Banks

    A Relational Model of Data for Large Shared Data Banks E. F. CODD IBM Research Laboratory, San Jose, California Future users of large data banks must be protected from having to know how the data is organized in the machine (the ... access to large banks of formatted data. Except for a paper by Childs [l], the principal application of relations ...

  22. A Relational Database Management System Approach for Data Integration

    In this paper we introduce a data integration system by implementing a function into the context of PostgreSQL. The aim of this work is to collect files to process from two different data sources (a platform of Physical Testing Software (PTS) and another one of Physical Simulation Software (PSS)), in order to retrieve specific records through a query and integrate them. Both these platforms ...

  23. Research on distributed relational database based on MySQL

    A distributed database based on MySQL transformation can solve the requirements of distributed transaction processing, meet the requirements of distributed transaction processing, ensure data backup, high availability and high performance, and improve efficiency. With the rapid development of big data, cloud computing, artificial intelligence and other technologies, the expansion of database ...