Patent Big Data Analysis by R

Patent Big Data Analysis by R: Technology Management(TM)
Massive results from researching and developing technologies are pile variously equivalent to patent, paper, or news story. To enhance the technological aggressiveness, most firms have tried to investigate the results with efficiency. The quantity of the accumulated results is very giant, thus we’ve troublesome to investigate them. However we’ve to investigate the massive knowledge containing the results of developed technologies. To unravel this downside, we have a tendency to use R knowledge language as Associate in nursing approach to technology analysis (TA). Atomic number is to investigate the developed results of target technology exploitation qualitative and quantitative ways, equivalent to statistics and metropolis survey. During this paper, we have a tendency to study on a quantitative approach supported statistics, and analyze patent knowledge by R knowledge language. A patent document contains complete info of the developed technology, as a result of legal system protects the inventor’s right for a restricted period. So we have a tendency to propose a technique of patent massive knowledge by R knowledge language for technology management (TM). For example a way to apply our analysis to real filed, we have a tendency to perform a case study. This study contributes to R;D designing and new development by thulium through atomic number.

Keywords: R data language, patent big data, technology analysis, technology management
Technology management (TM) could be a set of activities that area unit technology analysis, technological innovation and valuation. The aim of thulium is to perform R;D coming up with efficiency and effectively, that the thulium is vital to enhance fight of a corporation. Technology analysis (TA) is to investigate the results of researched and developed technologies like patent and paper. Victimization the atomic number results, we are able to forecast future technology or realize relationship between technologies for R;D coming up with or new development. There are unit 2 major approaches to atomic number, those area unit qualitative and quantitative strategies. The city could be a representative methodology for qualitative atomic number, and a preferred approach in quantitative atomic number is patent analysis. The city is trusted the experts’ expertise, thus this can be subjective atomic number approach. Compared, the patent analysis is to investigate patent documents victimization statistics and machine learning formula, this approach is additional objective than the qualitative atomic number approach. During this paper, we tend to specialize in patent Analysis as an objective and quantitative atomic number approaches. Additionally we tend to introduce AN R information science for additional economical patent information analysis. Our R information science is consisted of R project and information science. R project is free and open code for applied mathematics computing and visual image. Information science is to review information additionally as massive information together with organization, storage, collecting, and analysis. We tend to analyze patent information victimization information science methodologies, additionally, use R project as analytical code. So we tend to manage technology with efficiency and effectively victimization the atomic number results. Next section introduces our analysis backgrounds that area unit R project code, information science, and management of technology. In Section three, we tend to propose R information science methodology for economical and effective thulium. A case study let’s say however our study is employed in sensible domain is delineate in Section four. Last section describes conclusions and future works associated with the projected analysis.

R Data Language for Patent Big Data Analysis
R could be an information language for applied math computing and mental image. For the primary time, R was developed on the S project at Bell Labs. This is often associate in nursing object-oriented artificial language and has various functions for information analysis, thus R is sweet computer code for information science. R project is comprised of 2 modules, that square measure base, and packages. Once we install the R project from the location of R project first off, the R base module is ready. This R base includes the functions for basic statistics and graphics, like descriptive statistics, linear model, clustering, and a few plots. But, it’s the limitation for advanced analyses like support vector machine, organic process computation, fuzzy agglomeration, etc. To resolve this downside, R provides the package module. The R package extends the power of R computing and graphics. For instance, to use the functions for support vector machine, additionally, we have a tendency to install the package of ‘e1071’ to R base. Additionally, the package of ‘tm’ has several functions of text mining for information preprocessing and text information analysis. This package is incredibly helpful for giant information analysis; as a result of most huge information contain text information. Thus R is an economical and effective language for information science. Figure one shows 2 modules of R information language.

Two modules of R create R to be associate economical information language. This mixture is helpful to use several strategies of knowledge science. During this paper, we have a tendency to specialize in combining the R system and information science. Information science (DS) is that the study of knowledge. DS includes all area units regarding information that are information assortment, transformation, design, storage, analysis, visual image, and preparation. Therefore DS is knowledge base and wishes several skills in each space from scientific discipline to engineering, or from arithmetic to business. Information transformation and analysis area unit vital problems in DS fields, as a result of a lot of knowledge area unit unstructured and not numeric, and that we ought to novel patterns from victimization data analysis. During this paper, we have a tendency to specialize in information preprocessing for building structured information and information analysis for locating meaty patterns in metallic element field. Figure two shows the DS composition.

First, information assortment is to collect information from each supply like mobile, social networks, web, document, etc. This contains applied mathematics sampling for giant information. That is, after we have a problem to investigate huge information, we should always perform sampling from the inheritance huge information. Second, information storage is to avoid wasting and manage huge information exploitation info system and cloud computing. Next, we have a tendency to outline and construct information design like information model and arrangement. Supported the information structure, we will perform information analysis like statistics and machine learning rule. As an example, exploitation of the graph of information structure, we have a tendency to create various network models like social network analysis (SNA), and theorem network model. In general, we have a tendency to take under consideration a lot of information analysis than alternative areas of DS; we have a tendency to use the results of information analysis directly for business and management of technology. During this paper, we have a tendency to apply DS to economical and effective thulium. Thulium is to manage the technological resources of nation and company by knowledge domain. Typically the technological resources embrace holding (IP) of patent and paper moreover as technology innovation of technological road- mapping and finance. Thulium is to attach technology and management. Typically technology space deals with the engineering fields like mechanical technology, bio, and chemical, data and communication, medical technology, nanotechnology, etc. The consultants of technology pay a lot of careful attention to technology than management. But, they have the mind of management for economical and effective R&D designing or technological innovation. Compared, the consultants of management show a lot of concern for management than technology, as a result of the management includes business areas like finance, account, marketing, strategy, new development, etc. To boost the aggressiveness of an organization; thulium ought to be performed well. Figure three represents the structure of thulium.

There were two ways to deal with TM of an organization. The first approach was a subjective procedure in light of abstract strategies, for example, Delphi. Quantitative technique by target strategies, for example, the measurable patent investigation was another approach for TM. In this paper, we propose a patent examination approach for proficient and viable TM. We consolidate R and DS for patent investigation. Next figure demonstrates the proposed procedure.

To start with, we gather innovative assets for patent examination. We principally seek patent record information as mechanical asset, in light of the fact that a patent report incorporates nitty gritty consequences of created innovation, for example, title, conceptual, innovators’ name, connected date, claims, illustrations, figures, reference, family licenses, universal patent grouping (IPC) codes, and so forth. So a patent report has not numeric information, for example, content. This isn’t organized on the grounds that organized information incorporates a table of the database (DB) which is comprised of line and section for perception and variable individually. Likewise, every component of the table is numeric information. To apply measurements and machine learning calculation to patent information, we ought to change patent archive into organized information. In this paper, we utilize the R framework for information change. The ‘to’ bundle is famous R bundle for building the organized information. When we get the organized information, we can utilize various investigation strategies. The R framework incorporates numerous capacities for information investigation in its base module and bundles. As a matter of first importance, we perform an unmistakable examination and draw time arrangement plot. To know the fundamental normal for gathered information, these are vital. Utilizing the capacities given by R base, we get the outcomes. We utilize ‘sna’ bundle for building a progressed factual model to get the innovative connection between advancements. To know the mechanical systems administration is imperative to TM. We extra utilize ‘xlsx’ bundle to peruse Excel-type information. Most gathered patent information are Excel documents. Table 1 demonstrates association R and information science.

The ‘tm’ bundle for information change gives a few capacities, for example, ‘Corpus’ and ‘documenttermmatrix’. For the clear insights, R base module gives ‘synopsis’ work for registering mean and difference, and we utilize the ‘sna’ bundle for SNA and chart. Utilizing “bunch” bundle, we can perform group investigation for patent or innovation gathering. Furthermore, the bundles of “xlsx” and “NLP” are utilized for exceeding expectations record control and regular dialect handling individually. Likewise utilizing the aftereffect of patent information investigation, we make proficient and successful R&D arranging in the arrangement and application stage. We will play out a contextual analysis in the next area.

A Case Study
To confirm the execution of the proposed philosophy, we play out a contextual investigation utilizing genuine mechanical asset. For this situation contemplate, we gathered patent reports related to patent examination innovation. We gather the licenses from the Korea Intellectual Property Rights Information Service (KIPRIS), one of the patent databases. The sought patent information was from the United States and China. Add up to the number of licenses was 86, including 33 the U.S. licenses and 53 China licenses. Next figure demonstrates the quantities of connected licenses of the U.S. what’s more, China.

The two countries grew energetically the advancements identified with the patent examination in the 2000s, additionally connected their created innovations to licenses at a similar timeframe. Particularly, in 2009, China connected 22 licenses identified with patent examination. This was the exceptionally astonishing outcome. Next, we did information change and information investigation utilizing the recovered patent report information. First, we ought to preprocess the gathered patent information, in light of the fact that the information was not organized. As per the proposed technique, we changed the sought patent information into organized information. Utilizing the ‘tm’ bundle of R, we influenced the report to term grid. This comprised of line and segment which were patent and term separately. Every component of the lattice is the happened recurrence of a term in each patent record. Next R codes demonstrate the patent information change to organized information, patent term grid.

Library(tm) # loading tm package library(xlsx) # loading xlsx package data=read.xlsx(“input.xlsx”, 1, header=T) data.cor=Corpus(vectorsource(data)) data.dtm=documenttermmatrix(data.cor)
The ‘library’ workloads R bundle into the R framework. The ‘xlsx’ bundle can read Excel record to R framework by ‘read.xlsx’ work. The ‘Corpus’ and ‘documenttermmatrix’functions of ‘tm’ bundle were utilized for information change. Likewise, we extricated watchwords from the patent archives. Table 2 demonstrates top 20 catchphrases of the US and China.

The catchphrases with thicker textual style are the novel watchwords in every country. The basic watchword, for example, ‘patent’, ‘examination’, ‘show’, ‘family’, and so on speak to general advances for patent investigation. The particular watchwords of the US which are ‘guarantee’, ‘record’, ‘look’, ‘introduce’, ‘server’, ‘report’, ‘catching’, ‘content’, and ‘scholarly’ demonstrate the innovations of content examination and detailing for patent investigation. In correlation, China has the innovations of measurable reference investigation and programming for patent examination by the one of a kind catchphrases of ‘characterization’, ‘reference’, ‘include’, ‘positioning’, ‘PC’, ‘relate’, ‘measurements’, ‘customer’, and ‘programming’. In this paper, we assembled further developed models utilizing SNA. We utilize ‘plot’ capacity of ‘sna’ bundle to get SNA chart as take after.

Library(sna) # loading sna package
Gplot(, …) # drawing SNA graph
Utilizing ‘sna’ bundle, most importantly, we should stack this bundle on R framework. The main contention of ‘plot’ is input information including watchwords and next contentions are about the alternatives of SNA chart. Figure 6 indicates the SNA chart of China utilizing top ten catchphrases.

The SNA diagram has two segments. Segment 1 incorporates just ‘technique’ catchphrase, and segment 2 incorporates other all watchwords aside from ‘strategy’. The ‘patent’ catchphrase is situated in focal. There are ‘database’ and ‘framework’ amongst ‘patent’ and ‘investigation’. So we realized that the innovation of the database framework is critical to a patent investigation in China. Next figure demonstrates SNA chart of the US utilizing top ten watchwords.

Not at all like the SNA diagram of China, in the SNA plot of the US, are the catchphrases of ‘patent’ and ‘examination’ associated with each other specifically. The ‘hunt’ watchword made a segment itself. Additionally the watchwords of ‘show’ and ‘data’ influence to ‘patent’ through ‘framework’. We realized that the ‘archive’, ‘framework’, and strategy’ impact to ‘patent’ straightforwardly. In this way, we found that the advancements of record framework are essential to the patent investigation. Figure 8 indicates the SNA diagram of China utilizing all watchwords.

Dissimilar to the SNA diagram of China utilizing top ten catchphrases, this SNA chart has just a single segment. All watchwords are associated with each other. Additionally, the ‘patent’ catchphrase is completely associated with numerous watchwords. The ‘technique’ catchphrase is associated with different watchwords through the ‘candidate’ and ‘customer’ catchphrases. Figure 9 demonstrates the SNA diagram of the US utilizing all watchwords.

In this SNA chart, the ‘pursuit’ catchphrase builds apart itself. This outcome is same as the SNA diagram by top ten watchwords. Not at all like the instance of China, are the two SNA diagrams of the US by top ten and all watchwords like each other. Particularly the ‘server’ watchword and also ‘patent’ is situated in the focal. Utilizing the outcomes for this situation consider, we can assemble more proficient and powerful R&D getting ready for the innovation of patent investigation.

In this paper, we proposed a TM philosophy utilizing R information science. We joined the R framework and information science for productive and powerful TM. The R is opened free programming for measurable figuring and perception. Likewise, the utilization of the R bundle can broaden the computational capacity of R boundlessly. This is the quality of R framework. Information science is to learn about information and also huge information. We utilized the R as an information dialect. In this way, utilizing R information science, we can perform various quantitative investigations. In our exploration, we connected our philosophy of R information science to TM. We utilized the ‘tm’ bundle for information change, and ‘sna’ and ‘xlsx’ bundles for patent information examination. To show how we apply our approach to genuine space, we played out a contextual investigation. The objective innovation was a patent investigation, so we gather the patent reports identified with patent examination. We utilized the R information science approach for this contextual investigation. In our future works, we will consider more various R bundles for patent information change and examination, and apply them to more productive and successful TM.

A. T. Roper, S. W. Cunningham, A. L. Porter, T. W. Mason, F. A. Rossini and J. Banks, “Forecasting and Management of Technology”, John Wiley & Sons, (2011).

S. Jun, and S. Park, “Examining Technological Innovation of Apple Using Patent Analysis”, Industrial Management & Data Systems, vol. 113, iss. 6, (2013), pp. 890-907.

R. Kostoff, D. Toothman, H. Eberhart and J. Humenik, “Text mining using database tomography and bibliometrics: a review”, Technological Forecasting and Social Change, vol. 68, (2001), pp. 223-252.

V. W. Mitchell, “Using Delphi to Forecast in New Technology Industries”, Marketing Intelligence & Planning, vol. 10, iss. 2, (1992), pp. 4-9.

H. Liimatainen, E. Kallionpää, M. Pöllänen, P. Stenholm, P. Tapio and A. Mckinnon, “Decarbonizing road freight in the future – Detailed scenarios of the carbon emissions of Finnish road freight transport in 2030 using a Delphi method approach”, Technological Forecasting and Social Change, vol. 81, (2014),

Y. Tseng, C. Lin and Y. Lin, “Text mining techniques for patent analysis”, Information Processing and Management, vol. 43, no. 5, (2007), pp. 1216-1247.

S. Jun, “Technology forecasting using patent analysis based on cross-impact”, Information – An International Interdisciplinary Journal, vol. 16, no. 6(B), (2013), pp. 3853-3864.

S. Jun and S. Lee, “Extracting Key Technology Using Advanced Fuzzy Clustering”, International Journal of Software Engineering and Its Applications, vol. 7, no. 4, (2013), pp. 315-322.

S. Jun and S. Lee, “Patent Analysis Using Bayesian Network Models”, International Journal of Software Engineering and Its Applications, vol. 7, no. 3, (2013), pp. 205-212.

S. Jun and S, Lee, “Emerging Technology Forecasting Using New Patent Information Analysis”, International Journal of Software Engineering and Its Applications, vol. 6, no. 3, (2012), pp. 107-115.

S. Park and S. Jun, “New Technology Management Using Time Series Regression and Clustering”, International Journal of Software Engineering and Its Applications, vol. 6, no. 2, (2012), pp. 155-160.

R Development Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria,, (2015).

J. Stanton, “Introduction to Data Science”, Syracuse University, (2013).

S. Jun, S. Lee and J. Ryu, “A Divided Regression Analysis for Big Data”, International Journal of Software Engineering and Its Applications, vol. 9, no. 5, (2015), pp. 21-32.

Wikipedia, the free encyclopedia,, (2014).

D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel, F. Leisch, C. Chang and C. Lin, “Package e1071”, CRAN R Project, (2015).

I. Feinerer, K. Hornik and D. Meyer, “Text mining infrastructure in R”, Journal of Statistical Software, vol. 25, no. 5, (2008), pp. 1-54.

C. T. Butts, “Social Network Analysis with sna”, Journal of Statistical Software, vol. 24, iss. 6, (2008),

S. G. Bottcher and C. Dethlefsen, “Learning Bayesian Networks with R”, DSC 2003 Working Papers
(Draft Versions), (2003), pp. 1-11.

M. Maechler, and P. Rousseeuw, “Package cluster”, CRAN R Project, (2015).

A. A. Dragulescu, “Package xlsx”, CRAN R Project, (2015).

K. Hornik, “Package NLP”, CRAN R Project, (2015).

KIPRIS, Korea Intellectual Property Rights Information Service,, (2015).