Database Normalization as a By-product of Minimum Message Length Inference

Dowe, David L.; Zaidi, Nayyar Abbas

doi:10.1007/978-3-642-17432-2_9

Cited by 3 publications

(2 citation statements)

References 8 publications

(8 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MML has been used for a variety of problems, including clustering and mixture modeling [ 29 , 30 ] ([ 19 ] Section 6.8), clustering of protein dihedral angles [ 31 ], decision graphs (as an extension of decision trees, allowing for disjunctions, or “or”) [ 32 ] (Section 7.2.4 [ 19 ]) and multi-way joins in decision graphs with dynamic attributes [ 33 ], causal Bayesian nets (or Bayesian networks, or causal nets) ([ 19 ] Section 7.4) and Bayesian nets with decision trees in their (leaf) nodes [ 34 , 35 ], inference of probabilistic finite state automata (or probabilistic finite state machines, PFSAs, PFSMs) ([ 19 ] Section 7.1) and hierarchical PFSAs [ 36 ], and (given sufficient data and time, and based to whatever degree on the above-mentioned inference of Bayesian nets) automation of database normalization [ 37 ], etc.…”

Section: Minimum Message Lengthmentioning

confidence: 99%

Minimum Message Length in Hybrid ARMA and LSTM Model Forecasting

Fang

Dowe

Peiris

et al. 2021

Entropy

Self Cite

View full text Add to dashboard Cite

Modeling and analysis of time series are important in applications including economics, engineering, environmental science and social science. Selecting the best time series model with accurate parameters in forecasting is a challenging objective for scientists and academic researchers. Hybrid models combining neural networks and traditional Autoregressive Moving Average (ARMA) models are being used to improve the accuracy of modeling and forecasting time series. Most of the existing time series models are selected by information-theoretic approaches, such as AIC, BIC, and HQ. This paper revisits a model selection technique based on Minimum Message Length (MML) and investigates its use in hybrid time series analysis. MML is a Bayesian information-theoretic approach and has been used in selecting the best ARMA model. We utilize the long short-term memory (LSTM) approach to construct a hybrid ARMA-LSTM model and show that MML performs better than AIC, BIC, and HQ in selecting the model—both in the traditional ARMA models (without LSTM) and with hybrid ARMA-LSTM models. These results held on simulated data and both real-world datasets that we considered.We also develop a simple MML ARIMA model.

show abstract

Section: Minimum Message Lengthmentioning

confidence: 99%

Minimum Message Length in Hybrid ARMA and LSTM Model Forecasting

Fang

Dowe

Peiris

et al. 2021

Entropy

Self Cite

View full text Add to dashboard Cite

show abstract

“…Database normalization is the process wherein a database is transformed in order to ensure that it adheres to specific design standards that reduce data redundancy, improve data integrity, ensure the ability to perform structured queries and more importantly allow the database to be extended without the need for substantial restructuring [Date, 2002]. It is conducted mostly by humans, but it is possible to be achieved via machine learning [Dowe and Zaidi, 2010]. Typically, there are 4 levels of Database normalization (more do exist, but they are less common).…”

Section: Database Design Principlesmentioning

confidence: 99%

Development of a functional model for the description of protein interaction networks

Gioutlakis¹,

Γιουτλάκης²

View full text Add to dashboard Cite

Η κατανόηση του συσχετισμού μεταξύ του γονοτύπου και του φαινοτύπου ενός οργανισμού είναι μια από τις κυριότερες προκλήσεις που αντιμετωπίζουν οι επιστήμες ζωής σήμερα. Ένα από τα σημαντικότερα βήματα για την επίτευξη αυτού του σκοπού είναι η χαρτογράφηση του δικτύου πρωτεϊνικών αλληλεπιδράσεων (ΔΠΑ) για κάθε είδος οργανισμών και ιδιαίτερα για τον άνθρωπο. Για τον λόγο αυτό, έχουν γίνει μέχρι σήμερα δεκάδες χιλιάδες επιστημονικά πειράματα, που καταγράφουν τμήματα των δικτύων αυτών, τα αποτελέσματα των οποίων συλλέγονται από πρωτογενείς βάσεις δεδομένων πρωτεϊνικών αλληλεπιδράσεων. Όμως, διαπιστώνεται ότι αυτές οι βάσεις παρουσιάζουν ιδιαίτερα μικρή αλληλοεπικάλυψη, περιγράφουν τα δεδομένα τους με μη-συμβατούς όρους μεταξύ των βάσεων, και το κυριότερο, περιγράφουν τις καταγεγραμμένες αλληλεπιδράσεις σε διαφορετικά επίπεδα αναφοράς της γονιδιακής πληροφορίας. Λόγω της μη γραμμικής δομής της γονιδιακής πληροφορίας, οι μετατροπές ανάμεσα στα επίπεδα αυτά είναι μη-αντιστρεπτές, και τα παραγόμενα δίκτυα είναι μη-ισομορφικά, με αποτέλεσμα να περιέχουν ασάφειες και ψευδώς θετικά αποτελέσματα. Ο σκοπός της εργασίας αυτής είναι η ανάπτυξη μιας νέας μεθόδου σύνθεσης πολυεπίπεδων δεδομένων, που ονομάζουμε οντολογική σύνθεση, η οποία μπορεί να χρησιμοποιηθεί για προβλήματα όπως τα ανωτέρω. Μέσω της μεθόδου αυτής, δημιουργήθηκε η μετα-βάση δεδομένων για το δίκτυο πρωτεϊνικών αλληλεπιδράσεων στον άνθρωπο, PICKLE 2.0 (Protein InteraCtion KnowLedgebasE). Για την εύκολη και γρήγορη κατασκευή και ανανέωση της, αναπτύχθηκε και ένας αυτοματοποιημένος αλγόριθμος ο οποίος βασίστηκε σε νέες δομές δεδομένων που σχεδιάστηκαν για να παρέχουν ειδικές βελτιστοποιήσεις για τα χαρακτηριστικά των βιολογικών δεδομένων. Η PICKLE είναι διαθέσιμη στον ιστότοπο http://www.pickle.gr/.

show abstract